Visual scene understanding in unconstrained scenarios of daily human living has been one of the main goals of Computer Vision since the field's beginnings. It is also a crucial requirement for many applications in the near future of mobile robotics and smart vehicles.
While the general goal still poses considerable challenges, significant progress has been made in the development of mobile vision systems that can perform robust object detection and tracking in busy inner-city scenes. Given this progress, it is appropriate to ask where do we stand? What are the scenarios we can already address with today's technology and where are the remaining challenges? Most importantly, what are the next steps we should focus on in order to make further progress? For this, I will show recent research results and point out some promising future directions.