We propose an image-based collision risk prediction model and a training strategy that allows training on simulated video data and successfully generalizes to real data. By doing so, we solve the data scarcity problem of collecting and labeling real (near) collisions, which are exceptionally rare events. Domain generalization from simulated to real data is taken into account by design by decoupling the learning strategy, and using task-specific, domain-resilient intermediate representations. Specifically, we use optical flow and vehicle bounding boxes, since they are instinctively related to the task of collision risk prediction and because their simulated-to-real domain gap is significantly lower than that of camera video data, i.e., they are more domain resilient. To demonstrate our approach, we present RiskNet, a novel neural network for image-based collision risk prediction, which classifies individual frames of a video sequence of a front-facing camera as safe or unsafe. Additionally, we present two novel datasets: the simulated Prescan dataset (which we intend to make publicly available) for training and the YouTube Driving Incidents Database (YDID) for real-world testing. The performance of RiskNet, trained solely on simulated data and tested on the real-world YDID, is comparable to that of a human driver, both in accuracy (91.8% vs. 93.6%) and F1-score (0.92 vs 0.94).
@inproceedings{schoonbeek2022learning, title={Learning to predict collision risk from simulated video data}, author={Schoonbeek, Tim J and Piva, Fabrizio J and Abdolhay, Hamid R and Dubbelman, Gijs}, booktitle={2022 IEEE Intelligent Vehicles Symposium (IV)}, pages={943--951}, year={2022}, organization={IEEE} }