What are datasets? A dataset is typically a big bunch of data, for instance, a database of written letters, digits, images of human faces, stock market data that scientists can use to test their algorithms on. If two research groups wish to find out whose algorithm performs better at recognizing traffic signs, they run their techniques on one of these datasets and test their methods on equal footings. For instance, the CamVid dataset stands for Cambridge-driving Labeled Video Database, and it offers several hundreds of images depicting a variety of driving scenarios. It is meant to be used to test classification techniques: the input is an image, and the question is for each of the pixels, which one of them belongs to what class.Classes include roads, vegetation, vehicles, pedestrians, buildings, trees and more.
These regions are labeled with all the different colors that you see on these images. To have a usable dataset, we have to label tens of thousands of these images, and as you may imagine, creating such labeled images requires a ton of human labor. The first guy has to accurately trace the edges of each of the individual objects seen on every image, and there should be a second guy to cross-check and make sure everything is in order.
That’s quite a chore. And we haven’t even talked about all the other problems that arise from processing footage created with handheld cameras, so this takes quite a bit of time and effort with stabilization and calibration as well. So how do we create huge and accurate datasets without investing a remarkable amount of human labor?
Well, hear out this incredible idea. What if we would record a video of us wandering about in an open-world computer game, and annotate those images. This way, we enjoy several advantages:
1. Since we have recorded continuous videos, after annotating the very first image, we will have information from the next frames, therefore if we do it well, we can propagate a labeling from one image to the next one. That’s a huge time saver.
2. In a computer game, one can stage and record animations of important, but rare situations that would otherwise be extremely difficult to film. Adding rain or day and night cycles to a set of images is also trivial because we simply can query the game engine to do this for us.
3. Not only that, but the algorithm also has some knowledge about the rendering process itself. This means that it looks at how the game communicates with the software drivers and the video card, tracks when the geometry and textures for a given type of car are being loaded or discarded, and uses this information to further help the label propagation process.
4. We don’t have any of the problems that stem from using handheld cameras. Noise, blurriness, problems with the lens, and so on are all non-issues. Using this previous CamVid dataset, the annotation of one image takes around 60 minutes, while with this dataset, 7 seconds.
Thus, the authors have published almost 25000 high-quality images and their annotations to aid computer vision and machine learning research in the future. That’s a lot of images, but of course, the ultimate question arises: how do we know if these are really high-quality training samples? They were only taken from a computer game after all! Well, the results show that using this dataset, we can achieve an equivalent quality of learning compared to the CamVid dataset by using one-third as many images. Excellent piece of work, absolutely loving the idea of using video game footage as a surrogate for real-world data.
Are you a gamer? then you could help us in Open Source Self-Driving Car (OSSDC.org) development! OSSDC can use your help to generate datasets from lots of driving/simulator games, Follow @gtarobotics for more details.
See the scene around min 17, that would be tricky to do in an SDC.