by Dylan Sam, '21
Several technologies come into play when programming image recognition into a car. An image sensor known as an event camera is utilized, which does not capture full images but only the changes in an image’s pixels’ intensities. These pixels that exhibit dynamic intensity changes are called “events” and are highlighted in the dataset. These cameras have recently shown to work hand-in-hand with convolutional neural networks, which are good at extracting specific image features, to understand and predict motion in images and video. Furthermore, these “natural motion detectors and automatically filter out any temporally-redundant information”, so they serve as an incredibly useful method of data collection. These scientists looked to combine these two technologies to illustrate their abilities to accurately respond to dangerous event scenarios.
The scientists used the DAVIS Driving Dataset 2017 (DDD17) to train their neural networks. The dataset “contains approximately 12 hours of annotated driving recordings (for a total of 432 GB) collected by a car under different and challenging weather, road and illumination conditions”. The data was representative of the different events that were encountered while driving, including weather, road, and lighting variations. Furthermore, the driving records were annotated, which allow the neural network to understand which are considered “good” and “bad” steering actions. When training the neural network, the model can better update its predictions based on learning proper steering protocols. To pass the data into the neural networks, the researchers processed the images into representations that the model can understand by converting events into event frames, pixel by pixel, and then fed them into the network, where it could map the frames to steering angles. The network can then learn a process of converting from a given image to choose the proper steering angle away from any issues.
To evaluate their results, the researchers calculated the root-mean-squared error (RMSE) and the explained variance (EVA). The RMSE determines average prediction error, or how many degrees the model was different from the labelled answer. The EVA determines the variation in the predicted output of the model compared to the labelled images. The goal is to have an RMSE close to 0 and a EVA score close to 1. The model did the best on the images during the day, where it had an RMSE of 2.99 degrees and a EVA of of 0.742. When training on the full dataset, the model had an RMSE of 4.10 degrees and an EVA of 0.826. Thus, on average, the model was only 4.10 degrees away from the labelled “correct” steering protocol. The researchers were successful in showing that the combination of event cameras and deep learning models can create a useful steering algorithm for self-driving cars.
The utilization of the natural motion sensors of event cameras and the characteristics of neural networks provide a powerful means of steering in self-driving cars. This group of researchers have developed an accurate framework, which illustrates that this idea can further be developed or refined and implemented in industry. The growing abilities for image recognition and deep learning may lead to much safer and efficient roads.