2016 DJI Dev Challenge Failure Analysis and Lessons Learned

In the time after the competition Alex figured out what had gone wrong. On august 26th 2016, just two days before the competition, a commit had gone in to refactor some flight parameters and simplify things.

Seems like a simple enough change right? Move away from ROS’ dynamic reconfigure infrastructure and simply use parameters written in launch files. The dynamic reconfigure was important for field trials where we were empirically tuning flight parameters and we needed a way of quickly changing things on the fly. Let’s go through the changes introduced by this commit.

So the first change we can see that the parameters are removed from the parameter generation file and put into the launch file. Some parameters are outright removed and some values are changed but nothing too out of the ordinary. Yaw parameters were removed (but reintroduced farther down) and the flight_level_low was reduced from 3.75m to 3.5m.

Next, we had changes in the moving landing node itself. The callback is simplified to remove parameters which are no longer in the .cfg file and lines are added to fetch them directly in the parameter server. Can you spot the mistake? To fully grasp the mistake we need a bit of background information on dynamic reconfigure. As Jack O’Quin explains:

When you register your [dynamic reconfigure] callback, it gets invoked immediately with the values currently defined in the parameter server and a level of 0xffffffff. Those initial values will include anything set in your launch file. You don’t need to read the parameters yourself using getParam().

If you look at the configure callback of MovingLanding_node.cpp you can see that most lines are simple assignment except on line 91 where the landing speed also has a minus sign in front of it. Whereas before this commit, the callback would be immediately called and the parameter would set a negative landing velocity, now the velocity stayed positive.

Here is the best explanation we can give for what happened at the DJI challenge. The quad approached the truck and as it saw the landing tag it started climbing in altitude due to the positive velocity. Once it got high enough that the AprilTag detection failed, it lowered down back to its flight_level_low altitude to search for the tag. Rinse and repeat, this was the oscillation we saw instead of the landing.

As for the “air break dancing” we saw where the quad didn’t seem to come back to the truck remains to be explained. My best theory is that somewhere along the way, DJI’s data transparent transmission which we used to send IMU and GPS data failed or became intermittent while the quad was far away (lowering the quad’s flight altitude seemed to have helped with this). As the connection cut off and came back while the truck drove away in a loop, some buffered GPS positions were sent to the quad. As both the current and older positions were sent back, the quad would constantly switch between trying to fly to the current position and flying to a position somewhere on the track. If this is the correct explanation, then proper data time stamping and clock synchronization would have taken care of this.

### 3. Get some sleep

As the weeks progressed on, I could definitely feel the diminishing returns effect. I could spend more hours at the office while accomplishing much less. I was overworked but I didn’t want to admit it and things felt too urgent for me to give fewer hours. Giving a consistent 12 hours a day every day for weeks is a sure way of destroying your productivity. Especially when doing work requiring a certain amount creativity with complex logical thinking mixed in. Programming while tired is a sure way of introducing bugs into your code.

### 4. Make sure to set the expectations at the start

When starting a project, make sure everyone involved states what they are ready to give and how much they care for the project. It’s fine to say that you care little and you only have 1 hour to give a week. But at least then, the team can assign you things that are only worth 1 hour per week.

### 5. Enforce code freeze procedures

As much as it might pain some people to get their code frozen when they know that they can add one more feature or clean up one more thing. The only code that will for sure work is the one that has been tested. Not the one that was modified after a test. As the team manager, I should have forced a code freeze immediately after the successful test before leaving Montreal. Even if your code ends up missing a feature and no matter how much the team argues that they can do better, testing new code in production (competition) is never a good idea.

In the weeks leading up to the competition we did have a small discussion about code freezing but the answer I got was simply “never, there’s too many things to do”. Lesson learned, no code freeze is simply not an option when the stakes are high.