Slobodan Djukanovic presents FCNN-based acoustic vehicle counting and speed estimation

On 2020-08-05 11:00 at G205, Karlovo náměstí 13, Praha 2

We address vehicle counting (VC) and vehicle speed estimation (VSE) based on the
sound that vehicles produce while passing by the microphone. The sound offers
numerous advantages with respect to the vision (microphones are less expensive,
consume less energy and require less storage space than cameras, they are not
affected by visual occlusions and lighting conditions, they are easier to
install and maintain, and have low wear and tear).

In our previous research (S. Djukanović, J. Matas and T. Virtanen, „Robust
Audio-Based Vehicle Counting in Low-To-Moderate Traffic Flow“, accepted for
presentation at 2020 IEEE Intelligent Vehicles Symposium (IV), October 20-23,
2020, Las Vegas, USA), we formulated VC as a regression problem, i.e., VC was
estimated using the predicted distance between a vehicle and the microphone.
Since minima of the proposed distance function correspond to vehicles passing by
the microphone, VC was carried out via local minima detection in the predicted
distance. The distance was predicted using the support vector regression (SVR)
with standard audio features combined with the high-frequency power feature
shown to improve the regression accuracy in noisy environements.

In our new research, we significantly improve the VC accuracy by carrying out a
supervised coarse-fine distance regression. Both coarse and fine regressions are
performed using fully connected neural networks (FCNNs), with newly introduced
high-frequency log-mel spectrogram as input feature. The method is trained and
tested on a traffic-monitoring dataset containing 422 short, 20-second
one-channel sound files with a total of 1421 vehicles passing by the microphone.
Relative VC error in a traffic location not used in the training is around 1%
within a very wide range of detection threshold values, significantly wider than
that of the original approach. In addition, using the FCNN-based regression
instead of the SVR-based one provides significant computational savings in
training and testing phases.

In this talk, we will also present first results of our VSE approach based on
the distance regression. We define the distance function so that it depends on
the speed of vehicles, thus allowing VSE directly from the predicted distance.
We propose two VSE methods, one which estimates the speed directly from the
slope of the distance and the other one, machine learning based, which performs
speed regression using the samples of the predicted distance as features. The
methods are trained and tested on a dataset collected for this research, with
207 sound files of seven different vehicles passing by the microphone with
constant speed (range from 30 to 102 km/h) preset by the cruise control system.
The initial VSE results of both methods are of similar accuracy, i.e., the root
mean square error of speed estimation of both methods is around 13 km/h.