RL-based Guidance and Control: Our autonomous airship controlled by our Deep-residual Reinforcement Learning based control method.
In this context, we seek to answer the question how can autonomous robots perceive, understand and interact with their environment, given i) very little prior information about the environment, ii) little knowledge about its own system dynamics and iii) dynamic environments with changing conditions, e.g., changing visibility, occupancy, etc? Secondly, under these conditions how can these robots optimize energy consumption, computational requirements and the communication bandwidth? Can these questions be answered through learning-based methods -- specifically reinforcement learning, and learning in simulation and from synthetic data?
For many complex perception problems, such as human or animal MoCap, it is hard to model a reliable objective function that models the perception uncertainty. These models can not only be difficult to derive but also highly non-convex. We therefore take a deep RL approach to solve the perception-driven formation control problem. The key idea is to model the problem as a sequential decision making problem and learn optimal policies for each robot/agent in the team that maximizes the joint MoCap accuracy -- embedded as rewards only during training. For human MoCap, we learned navigation policies for multiple aerial robots purely in simulation and deploy on real robots. Here observations were joint detections on 2D images obtained by the robots. Ongoing work involves end-to-end DRL methods that overcome the need for joint detections, and DRL approach for animal MoCap. Figure: An illustration of an aerial MoCap system where aerial robot agents learn formation control policies based on MoCap performance rewards.
In this context we develop various RL and DRL-based methods ranging from airship control, autonomous landing of multi-rotors on moving platforms, autonomous soaring of gliders, and collision avoidance in large teams of robots. For airship control, we have developed a DRL method that combines a classical PID with learning and leverages PID's stability properties. For autonomous landing, our RL approach learns an optimal policy for 2D horizontal motion of the robot, with a special focus on interpretable hyper-parameter derivation. For autonomous soaring, we develop DRL based approaches that maximize the exploitation of updrafts and balance it with a simultaneous waypoint navigation task. In the context of collision avoidance, we explore attention networks that allow scalability to a very large number of robots and interpretability of the RL policy. Figure: Reinforcement learning based autonomous landing by our aerial robot on a moving platform.
Glider pilots use updrafts to achieve long flight durations and cover large distances in cross-country soaring. Flying without propulsion requires a constant trade-off between completely
contradictory directions of action: On the one hand, the primary goal of traveling from A to B, for example, should be pursued. On the other hand, updrafts have to be found and exploited. In an
ongoing project, involving autonomous glider UAVs, we are developing methods to automatically localize and characterize thermals. Tactical decision-making in motion planning, characterized by a
stochastic environment as well as long-term correlations between actions taken and goals achieved, are addressed using a reinforcement learning method. A cooperative approach of multiple gliders
makes the task even more challenging. Flight tests have demonstrated that the performance and feasibility of applying these methods in the context of flight control. Figure: Our autonomous glider (in simulation) finding an energy-efficient trajectory.
Through the development of various synthetic generation procedures, we have successfully overcome the challenges of collecting real-world ground truth data of dynamic environments and challenging
scenarios. In our work in aerial human MoCap project, we utilized Unreal Engine and RenderPeople assets to create a dataset for human pose and shape (HPS) estimation using the AirSim toolkit. By
doing so, we were able to solve several issues, including HPS from aerial views, multi-view ground truth information, and diverse pose generation. Separately, our work on GRADE, which is related
to our active SLAM project, was developed using ISAAC Sim and the NVIDIA Omniverse suite. Through a novel method to automatically generate data of dynamic environments with minimal effort, we
overcame issues of state-of-the-art robotics simulator like Gazebo, such as lack of visual realism and limited controllability of the simulation environment. As a result, we were able to
demonstrate that current state-of-the-art methods that perform Dynamic SLAM are over-optimized for currently available datasets and do not generalize well. Furthermore, the generated data is
realistic enough for both humans and zebras, allowing for detectors using only synthetic data with performances that are as good as or better than pre-trained models. We have released code and
data related to all these works. Figure: A scene from our synthetic data generation work.