Dense Disparity Map Explained

mirella melo
Mar 17, 2024
6 min read

In certain applications, full scene-depth information is essential, such as in the detailed reconstruction of environments for realistic simulations, in elevation maps for urban planning, in agriculture for optimizing resource use or detailed monitoring of crops, and in virtual reality environments to provide total immersion, among others.

We can use a stereo camera to achieve a three-dimensional reconstruction of a scene. To do so, we must obtain a so-called dense disparity map, which provides a depth value for each pixel of the image. As shown in Figure 4, this calculation can be performed using local, global, or semi-global methods.

Local methods evaluate the similarity between pixels through point information or the pixel's surroundings. They are usually faster alternatives but more susceptible to noisy and inaccurate results. On the other hand, global methods assign disparity values considering information from the entire image. Its results are more accurate. However, they depend on several iterations, increasing their computational complexity and time consumption, which can take from seconds to minutes [1]. These two approaches can be associated to form semi-global method algorithms, where there is both a step with a local and a global strategy. The table below summarizes the main characteristics mentioned above.

Method	Description	Benefits	Disadvantages
Local	They evaluate the similarity between pixels using point information or information from the pixel's surroundings.	Fast and efficient in terms of processing.	More susceptible to noise and inaccuracies, especially in areas of uniform texture or edges.
Global	They consider information from the entire image to assign disparity values.	More accurate results are achieved by considering the scene as a whole.	High computational complexity and longer processing time.
Semi-Global	They combine aspects of local and global methods.	Good balance between accuracy and efficiency.	Intermediate complexity. It may not be as fast as local methods.

Table 1: Advantages and disadvantages of each method. Source: Author.

Stereo matching steps

To better understand the stereo-matching process, we will use the taxonomy proposed by Scharstein and Szeliski [2], which classifies the steps of any stereo-matching algorithm, whether local, global, or semi-global. It is important to emphasize that the steps are not a mandatory part of all proposed techniques. They are:

Calculation of correspondence costs;
Aggregation of costs;
Calculation of disparity;
Disparity improvement.

For a better understanding, a disparity map acquisition technique, the Sum of Absolute Differences ( SAD ), will be illustrated along with the description of each classification.

1. Calculation of Correspondence Costs

Cost indicates how similar or different two pixels are to each other. Its values come from the so-called cost function, wherein a similarity function, the lower its value, the greater the similarity indicator. Following the example of the SAD technique, it uses pixel intensity as a similarity criterion (represented in grayscale in Figure 1). Equation 1 illustrates the cost function known as the absolute difference (AD), where 𝐼 represents the intensity value of the pixel in question. This equation, in conjunction with Figure 1, shows that a given pixel 𝑝(𝑖, 𝑗) in the base image 𝐵(𝑖, 𝑗) is compared with other 𝑁𝑑 pixels in the corresponding base image 𝑀(𝑖, 𝑗), where 𝑁𝑑 represents the search disparity range, which in this case is equal to five. Repeating this process for each 𝑁𝑑 pixels of 𝐵(𝑖, 𝑗), a three-dimensional cost matrix 𝐶(𝑖, 𝑗, 𝑑) is generated, with dimensions of height and length of the image, and the value of the disparity range 𝑑.

Figure 1: Pair of left and right images; The three-dimensional cost matrix is filled from the adopted cost function and the comparison of 𝑁𝑑 pixels. Source: author [3]

(1)

2. Cost Aggregation

After obtaining a first cost matrix 𝐶(𝑖, 𝑗, 𝑑), the aggregation step represents the way in which these costs will be interpreted and transformed into new data. Figure 2 illustrates how the SAD technique performs this process, where the calculation of an iteration is shown. Aggregation is done by summing costs within size windows fixed (exemplified by the squares with a red border), repeating the process in each layer of 𝑑 of the matrix 𝐶. By repeating this process for each cost value in the matrix 𝐶(𝑖, 𝑗, 𝑑), a new matrix, the cost aggregation matrix 𝐴(𝑖, 𝑗, 𝑑), is formed.

Figure 2: Illustration of how the aggregation matrix is formed considering the SAD technique. Source: author [3].

It is important to highlight that global approach algorithms usually do not present this step, replacing it with a global cost minimization procedure, which takes the entire disparity map into account.

3. Disparity Calculation

In local strategies, the disparity calculation step involves determining the disparity associated with the minimum value (or maximum, in the case of a similarity function) of the vector 𝐴(𝑖, 𝑗, 𝑘) for 𝑘=1,2...,𝑁𝑑. In this approach, only the disparity value 𝑘 is varied, as represented in Equation 2.

(2)

Figure 3 provides a visual representation of this step. It is important to note that the cost matrix involved in this process depends on the technique adopted and may be the initial* or aggregation* matrix. Considering the SAD technique, the illustration refers to the cost aggregation matrix 𝐴(𝑖, 𝑗, 𝑑). As illustrated in Figure 3, each value of 𝐴 is evaluated using a similarity function. The lowest 𝑑 value (the one that results in the highest similarity score) is selected as the disparity value for that particular pixel (𝑖, 𝑗). This selection process is repeated across the entire aggregation matrix to generate a 2D grayscale disparity map. In this map, pixels with higher disparity values are rendered darker, visually representing the depth of each point in the scene.

*The initial cost matrix is computed by evaluating the matching cost for each potential disparity at each pixel. In contrast, the aggregated cost matrix results from combining these initial costs over a neighborhood or path to enforce smoothness and improve the accuracy of disparity estimation.

Figure 3: Illustration of how the disparity map is formed from the cost matrix initial or aggregation function and a similarity function. Source: author [3].

For global strategies, there are several iterations in search of a minimization of the global cost function that takes into account the entire image. This function has a data energy term and a smoothness term, as shown in Equation 3, where 𝜆 is an input parameter that balances the influence of factors and defines the degree of smoothing of the map so that the lower this value, the more smoothed it's the result. Equation 3 is iterated until the moment in which the lowest value of the global energy function is found.

(3)

The energy data factor 𝐸data(𝑑), presented in Equation 4, measures how well the attribution of disparities. The lower its value, the better. 𝐼𝐵 denotes all pixels in the base image, and 𝐶(𝑝,𝑝') is the cost function that relates a pixel 𝑝(𝑖, 𝑗) of the base image and 𝑝'(𝑖, 𝑗 − 𝑑) 𝑑=0,1...𝑁𝑑 −1 of the corresponding image. This calculation, for example, could be carried out using the AD function to determine the absolute difference in intensity values.

(4)

The smoothing energy factor assumes that a surface does not present abrupt discontinuities, so the equation related to the factor 𝐸smoothness(𝑑) penalizes abrupt disparities between neighboring pixels, designated by 𝑝 and 𝑞 in Equation 5; 𝑁 denotes all pairs of neighboring pixels in the base image. The function 𝑠() assigns a penalty if the disparity of 𝑝 is (a lot or a little) different from that of 𝑞. Thus, the penalty results in an increase in smoothness energy.

(5)

The influence of data energy and smoothness can be seen in Figure 4. The disparity maps are the results of different settings of 𝜆 in Equation 3. It is possible to point out that if we define 𝜆 = 0, that is, disregard the smoothness term, the result of the global method boils down to a purely local method.

Figure 4: Result of the disparity map for increasing values of the term 𝜆 in Equation 3. Source: [4].

4. Improving Disparity

The disparity refinement step is adopted as a way to improve the preliminary result. Most refinement methods follow the scheme of detecting possible flaws and filling them, followed by a filtering step [5]. There is a wide range of works that contemplate and explore this stage, as exemplified in [6].

Challenges

Failures are caused by situations that make it difficult to associate pixels correctly. Challenges include:

Radiometric differences: differences in lighting between frames can cause issues. This stems from subtle variations between cameras of the same model, affecting everything from the lens to the light-sensitive sensors. The work [7] provides a detailed assessment of how this factor can impact the results.
Low-texture regions: areas with little texture create ambiguity in correspondence.
Repeated structures: the repetition of a pattern within the scene can lead to incorrect matches due to similar appearances in multiple locations.
Translucent or transparent objects: these materials can confuse depth perception as they do not reflect light.
Occlusion regions: parts of the scene that are visible in one image but not in the other can lead to incomplete data, complicating the depth mapping process.
Abrupt and deep discontinuities.

Figure 5 shows the disparity map generated from the local SAD technique with a 3×3 window size. It is possible to visualize the noisy results due to the low texture, abrupt discontinuity, and pattern repetition.

Figure 5: Left image followed by the disparity map resulting from the technique SAD with 3×3 window size. Difficulties encountered, such as low texture, are highlighted on the map. Source: [4].

References

[1] Lazaros, Nalpantidis, Georgios Christou Sirakoulis, and Antonios Gasteratos. "Review of stereo vision algorithms: from software to hardware." International Journal of Optomechatronics 2.4 (2008): 435-462.

[2] Scharstein, Daniel, and Richard Szeliski. "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms." International journal of computer vision 47 (2002): 7-42.

[3] MELO, Mirella Santos Pessoa de. Navigable region mapping from a SLAM system and image segmentation. MS thesis. Federal University of Pernambuco, 2021.

[4] Bleyer, Michael, and Christian Breiteneder. "Stereo matching—State-of-the-art and research challenges." Advanced topics in computer vision (2013): 143-179.

[5] Yan, Tingman, et al. "Segment-based disparity refinement with occlusion handling for stereo matching." IEEE Transactions on Image Processing 28.8 (2019): 3885-3897.

[6] Hamzah, Rostam Affendi, and Haidi Ibrahim. "Literature survey on stereo vision disparity map algorithms." Journal of Sensors 2016.1 (2016): 8742920.

[7] Hirschmuller, Heiko, and Daniel Scharstein. "Evaluation of stereo matching costs on images with radiometric differences." IEEE transactions on pattern analysis and machine intelligence 31.9 (2008): 1582-1599.