How do google maps actually build a 3D environment street view? (Algorithm Explanation)

5 min readJan 26, 2023

How do google maps actually build a 3D environment street view?

Once in a while, we all have used the Google Maps app while traveling or for navigation. According to me, The famous street view feature of Google Maps is one of the most exciting part of it .

So few years back, Google starts an initiative where they want to develop a virtual environment for people through which they can visualize and explore the world from anywhere, anytime.
For this to become reality, The idea proposed was to combine multiple 2D images of a place (360° View) to form a 3D object/Environment . All Street View’s content comes from two sources — Google and contributors.

Even a normal person can contribute to the program by sending multiple images of a place and this is how by stitching together billions of panoramic images, we as a user can view a full 3D virtual environment of a place anywhere in globe.

Like sitting in my home, dreaming ☁️️️☁️☁️…. I can virtually enjoy and visit the beautiful Eiffel Tower, Paris ( in no time, no money or anything …. ) or anywhere else.

You can visit too : https://www.google.com/streetview/Eiffel_Tower

That’s quite exciting isn’t it ….
But let’s change our focus on now how actually they combine so many complex 2D images in a sequential order to form a perfect 3D environment. What’s the hidden concept and understanding of the algorithm at basic level to build a street view.

To understand this, let’s first break it down to simple problem and understand how a simple 3D object like cylinder, cone, sphere can be constructed using 2D static images of the same.

So let’s consider a solid cylinder, we have a solid cylinder and we took 3 views photo of it [Front view, Side view and Top view 2D photos].

So using these 3 views photo we want to reconstruct our solid cylinder.

In the process of 3D Reconstruction we will follow following steps :
- Contour detection
- Shape Detection of the contour
- Pixel Ratio detection
- Combining these multiple views using Rotation and Translation Algorithms

So Contour detection is nothing but defining the boundary of the given shape .
Like for the given cylinder above, considering the front view we will have a rectangle at the boundary so the detection of rectangle using intensity difference is called contour detection.

rectangle boundary for front view of cylinder

Now we have the contour(boundary) for a particular view, next step will be to detect the shape of that contour(boundary).

As a building block stage, the basic shapes are :- Circle, Triangle, Square, Rectangle, Pentagon and Hexagon .
This can be easily detected using number of sides and sides length.

Next once we have done the above both steps for all the views we had earlier, we need to find the pixel-ratio in which they need to be presented in the actual 3D environment model.

Why do we actually need to compute Pixel-ratio for each view 2D image?
The answer is that it may be possible that the 2D static image of a particular view(front/side/top) are not in exact 1:1:1 ratio, It may be possible that the ratio is x:y:z instead of 1:1:1 so that’s why we can’t assume all pixel-ratio same as it may end up in a distorted 3D model in the end .

Now, in the last stages of algorithm, we use Rotation and Translation algorithm to combines these views into a 3D SCAD file.

Rotation Algorithm :- Tell us how the actual 3D model is rotated from vertical axis.

Translation Algorithm :- On complex objects, The axis of multiple child 3D objects don’t lie on same axis so we need the translation algorithm to ensure that the child objects combine with translated axis to form the 3D Model.

So in the end, We get a 3D Model (SCAD File) which can be used in scad software or can be represented as point cloud representation.

But in the real world scenario, The actual object are not so simple like a solid cylinder, They are actually complex object which are combination of these simple 3D objects so there we can apply Parent-Child Concept to form a 3D model.

A tree is created for each parent-child. Here, each shape node class object is a node in the binary tree, and each shape node is a child node of an empty node class object. The empty node class only holds the string of scad syntax, including the operation between the two children.

So this is a project where I tried to imitate the idea of street view of google map at basic building-block level, in-taking only 3 views and constructing a 3D SCAD file as output.

You can Find more details (Code, Explanation, Implementation Video) about my project here : https://github.com/Priyansh-15/3-D-Image-Build-from-Multiple-Images

Google Map-Street View Link : https://www.google.com/streetview/

Written by Priyansh Sharma