Images Of Simulated Cities Help Artificial Intelligence To Understand Real Streetscapes

Recent advances in artificial intelligence and deep learning have revolutionized many industries, and might soon help recreate your neighborhood as well. Given images of a landscape, the analysis of deep-learning models can help urban landscapers visualize plans for redevelopment, thereby improving scenery or preventing costly mistakes. To accomplish this, however, models must be able to correctly identify and categorize each element in a given image. This step, called instance segmentation, remains challenging for machines owing to a lack of suitable training data. Although it is relatively easy to collect images of a city, generating the ‘ground truth,” that is, the labels that tell the model if its segmentation is correct, involves painstakingly segmenting each image, often by hand.

Now, to address this problem, researchers at Osaka University have developed a way to train these data-hungry models using computer simulation. First, a realistic 3D city model is used to generate the segmentation ground truth. Then, an image-to-image model generates photorealistic images from the ground truth images. Their article, “Development of a synthetic dataset generation method for deep learning of real urban landscapes using a 3D model of a non-existing realistic city,” was published in Advanced Engineering Informatics. The result is a dataset of realistic images similar to those of an actual city, complete with precisely generated ground-truth labels that do not require manual segmentation.

“Synthetic data have been used in deep learning before,” says lead author Takuya Kikuchi. “But most landscape systems rely on 3D models of existing cities, which remain hard to build. We also simulate the city structure, but we do it in a way that still generates effective training data for models in the real world.”

After the 3D model of a realistic city is generated procedurally, segmentation images of the city are created with a game engine. Finally, a generative adversarial network, which is a neural network that uses game theory to learn how to generate realistic-looking images, is trained to convert images of shapes into images with realistic city textures This image-to-image model creates the corresponding street-view images.