Robot Decision-making and Control

Perception-driven Formation control: A team of our three autonomous aerial robots, tracking a person while maintaining a formation that ensures minimum joint uncertainty in the cooperatively estimated position of the person.

In this area of research we investigate methods to answer the following two questions.

  • How to infer meaning from the estimated state, or directly from the raw sensor measurements, and make high-level decisions based on that? Examples include inferring behavior of the tracked subjects (animals or humans), predict the future motion and trajectory of the subjects, infer risk of collision with the subjects or the static environment, etc.
  • Based on the perceived state of the environment or/and inferred situation, how to control the robots to accomplish various given tasks? Tasks could include, for example, monitoring/following a group of subjects, or mapping a desired location or transporting objects between locations.

Perception-driven Formation control

To achieve a required degree of quality in perception, usually quantified as estimation uncertainty, a team of robots/agents need to coordinate their actions and share independently-obtained information with their teammates. To achieve this, we develop classical (model predictive control) and AI-based methods (reinforcement learning) that explicitly account for perceptual gain in their objective function or rewards, respectively. At the same time the formation control methods must account for environmental and task-related constraints such as collision avoidance and angular configuration maintenance. To this end, we have introduced a novel force function-based approach that keeps the formation objective convex. In other methods, we have focused on heterogeneous robot teams to accomplish this task. Figure: Formation control: A team of three airships tracking a human while maintaining a perception-driven formation in simulation and in real scenario.

Animal Behavior Inference

Traditionally, the quantification of animal behavior in the wild has relied on direct observations and opportunistic sampling by researchers and volunteer groups. Observers spend hours in the field, carefully documenting behavioral events and collecting data on individual animals. While this approach has provided valuable information, it is labor-intensive, time-consuming, and often limited in capturing the full complexity and subtlety of the animal's behavior, especially over large spatiotemporal scales and animal groups.


Modern machine learning techniques to automatically detect behavior in videos of animals have been shown to generalize well across different behaviors and environments. Nonetheless, they can be computationally expensive, requiring access to high-end GPUs which limit their use to after the data has been collected. Moreover, they strongly depend on large amounts of reliable and diverse annotated datasets to train models. Ideally, the annotated dataset should entail a diverse range of behaviors, different times of the day, different environments, and animal group sizes. To exploit the full potential of machine learning to enable ecologists/conservationists to study animal behavior in the wild, requires an easy-to-use automated annotation tool that can assist in generating large amounts of high-quality annotated data while reducing the reliance on manual time-consuming effort.

In our ongoing work in this context, we propose solutions for both problems, i) a novel network model to infer animal behavior and, ii) a novel methodology for quickly generating large amounts of labelled video data (every frame) of animals to reliably train that network. Our method's workflow is bootstrapped with only a small amount of manually-labelled video frames. Then, building upon an open-source tool, Smarter-LabelMe (also developed by us recently), we develop a method that leverages deep convolutional visual detection and tracking in combination with our behavior inference model to quickly produce large amounts of reliable training data. We demonstrate the effectiveness of our method on aerial videos of plains and Grevy's Zebras (Equus quagga and Equus grevyi) (see Figure for example results). We plan to open-source the code of our method as well as provide large amounts of accurately-annotated video datasets of zebra behavior using our method. Figure: Animal Behavior Inference: Results from our ongoing work on animal behavior inference. Images contain Grévy's zebras recorded by us in Mpala research center in Kenya (top row) and Prezwalski's horses in the Hungarian Steppe recorded by us during one of our field trips there.

Animal-Robot Interaction

While one of the downstream goals of animal MoCap is to estimate behavior automatically from the images, it is also important to study and quantify the effects of the presence of drones in the vicinity of the animals. Thus, we are studying the effects of UAV noise (sound) on the animal behavior. It is important to establish the distances/altitudes and aerial robot configurations that affect the natural behavior of these animals -- the very same behavior we wish to estimate using aerial robots. To this end, we have performed extensive flight experiments around various species in their natural habitat and manually logged changes in their behavior. Using these and automatically inferred behavior on the recorded videos, in our ongoing work we aim to decipher and quantify how and which species are most affected by drone noise. Figure: One of our Parrot Anafi drone flying close to Prezwalski's horses in the Hungarian Steppe.