Design of Real-Time Multiple Object Visual Detection and Tracking Systems

This thesis focuses on the task of Real-time visual object tracking, which given a video sequence, provides every object of interest with a unique identity and a trajectory across video frames. This is a fundamental task of many video analytics applications like traffic monitoring or video surveillance in general. Many real-life applications demand these systems for their integration in different architectures, especially: distributed ones and low power embedded devices. Two of the most noteworthy challenges for developing a distributed system for traffic monitoring are the real-time operation with hundreds of vehicles and the total occlusions which hinder the tracking of the vehicles. Our system deals with these two challenges based on three modules: detection, tracking, and data association. First, vehicles are identified through a deep learning-based detector. Second, tracking is performed with a combination of a Discriminative Correlation Filter and a Kalman Filter. This permits us to estimate the tracking error to make tracking more robust and reliable. Finally, the data association through the Hungarian algorithm combines the information of the previous steps. To show the applicability of the system in real-life scenarios, we present two traffic application scenarios: anomaly detection in traffic monitoring and roundabout input/output analysis. The evaluation of the system has been performed with real-life video data sets with over 2,000 vehicles. The development of real-time multiple object tracking systems on low-power edge devices as IoT nodes, without compromising accuracy, is a challenge due to the limited computing capacity of said devices. For this purpose, a system has been designed that extends a joint architecture of tracking and detection by adding a module comprised of appearance-based and movement-based trackers that allow maintaining the identity of the objects of interest for longer periods while alleviating the burden of the detector. Our system is mapped onto an embedded GPU platform, cutting down power consumption significantly with respect to a server GPU. Tracking performance metrics show a 51.1\% in Multiple Object Tracking Accuracy (MOTA) on the MOT16 dataset. This, in conjunction with a real-time processing speed of 25.2 FPS for up to 45 simultaneous objects and low power consumption of 15W, make our system an ideal solution for a wide range of video analytics applications.

keywords: multiple object tracking, visual tracking, edge computing, deep learning, convolutional neural networks (CNNs), data association, computer vision, traffic monitoring, object detection