Animal Behavior and Motion Capture

Motion Capture means estimation of pose and shape of an entity. Pose means the skeletal position (position of major joints in 3D space), and shape can be visualized as a mesh of vertices on the subject's body. At FRPG, our core interest in this context is to study and develop vision-based methods for animal and human MoCap in outdoor, unknown and unstructured environments. These include, for example, wild animals in their natural habitat.

Animal MoCap: Animal MoCap can help us understand (and quantitatively estimate) their behavior. Behaviors include primitive ones such as walking, running, foraging and complex ones such as interaction among peers and with the environment. By studying how their behavior varies over days, seasons and environmental conditions, it is likely that we can map behavioral changes in species to changes in the environment. Subsequently, this can allow policy makers and ecologists us to develop scientific evidence-based roadmaps for biodiversity preservation. To perform animal MoCap, we are developing multi-view methods that fuse information from aerial images of animals, obtained simultaneously from cameras mounted on multiple aerial vehicles that track and follow the animals.
Animal Behavior: While one of the downstream goals of animal MoCap is to estimate behaviors, such as walking, running, drinking, etc., automatically from the images, we are first studying the effect of UAV noise (sound) on the animal behavior. It is important to establish the distances/altitudes and aerial robot configurations that affect the natural behavior of the animals -- the very same behavior we wish to estimate using aerial robots. To this end, we perform extensive flight experiments around various species in their natural habitat and and manually log changes in their behavior.

Animal Datasets: In order to develop vision-based methods for animal MoCap, we take the deep learning approach. Animals need to be detected, classified and identified in camera images. To train networks for this, a large amount of annotated dataset of real animals is required. To this end, we perform data collection at various locations, including the Wilhelma zoo in Stuttgart, Hortobagy National Park in Hungary and Mpala conservancy in Kenya.
Human MoCap: We have developed various methods for human MoCap from aerial images acquired simultaneously from multiple aerial robots. The methods range from optimization based to end-to-end learning based. Typically we employ 2D joint detectors that provide measurements of joints on images. A body model, learned using large number of human body scans is used as a prior to predict the measurements, assuming arbitrary camera extrinsics. Thereafter, the body model parameters and the camera extrinsics are jointly optimized to explain the 2D measurements in the least squares sense. The end-to-end method learns to directly predict the body model parameters using as input only the images from an aerial robot and compact measurements communicated to it from its teammates. We also develop methods for real-time execution of these MoCap methods on the aerial robot's on-board computer.