Evaluation of grid maps produced by a SLAM algorithm

When evaluating the performance of a SLAM algorithm, quantifying the produced map quality is one of the most important criteria. Often, the produced map is compared with (1) a ground-truth map (which can be easily obtained in simulation) or (2) with another existing map that is considered accurate (in case of real world experiment where the ground-truth is not always available ).

Basically, grid maps are images, so image similarity measurement metrics can be used in this case. In this post, we consider three different metrics: Mean Square Error (MSE), K-nearest based normalized error (NE) and Structure Similarity Index (SSIM)

Metrics

Let $K$ being the produced grid map and $I$ being the reference grid map (ground truth map or another high accuracy map).

Mean Square Error (MSE)

Calculating Mean square error between two grid maps is very simple, basically it measures the sum of all cell intensity square errors between $K(i,j)$ and $I(i,j)$:
$$MSE=\frac{1}{mn}\sum_{i=0}^{m-1}\sum_{j=0}^{n-1}[I(i,j)-K(i,j)]^2$$

This metric required the two images having the same size, $MSE=0$ indicates perfect similarity.
One major problem of this metric is that large distances between cell intensities do not necessarily mean the contents of the maps are dramatically different. Therefore, MSE is not very accurate in evaluation of grid maps. It is presented here for reference purpose only.

K-nearest base normalized error (NE)

This metric is based on the sum of the distance from each occupied cell of the reference map $I$ to the nearest occupied cell in the resulting map $K$ determined using the K-nearest neighbors algorithm (k=1):

$$NE=\frac{\sum_{i=0}^{m-1}\sum_{j=0}^{n-1}\sqrt{(k-i)^2+(l-j)^2}[I(i,j)=0]}{\sum_{i=0}^{m-1}\sum_{j=0}^{n-1}[I(i,j)=0]}$$

$NE \geq0$, where $0$ indicates perfect similarity. $(k,l)$ is the position of the the nearest neighbor of the ground-truth occupied cell $I_{i,j}$ on the resulting map $K$: $K_{k,l}=knn_{i,j}^1(I,K)$. The metric provides a normalized error of cells distance between the ground-truth and the produced map.
One downside of this metric is that it only considers the occupied cells of the ground-truth map. This may cause false-positive measure when the occupied cells on the resulting map outnumbers the ground-truth map. Theoretically, a resulting map with all cells being occupied can be considered as zero-error (NE=0) by the metric and therefore yields a perfect match. In real world situation, a robot may visit a place many times and overlap many versions of that place on the resulting map with some error margins. Thus, these error margins can cause the false-positive measure using the metric.
Moreover, the metric is applied globally and only estimates the (global) perceived error of the map which is not always accurate to measure the similarity of local structures on the map.

Structure Similarity Index (SSIM)

The SSIM metric provides a measure of (local) structures similarity that compare local patterns of cell intensities.

$$SSIM(x,y)=\frac{(2\mu_x\mu_y+c_1)(2\sigma_{xy}+c_2)}{(\mu_x^2+\mu_y^2+c_1)(\sigma_x+\sigma_y+c_2)}$$

As shown in the equation, the metric compares two maps using a $N\times N$ window (or subsample/pattern) located at the $(x,y)$ position on the maps. The structure similarity index is calculated based on the mean of cell intensities in the $x$ and $y$ direction of the window, the variant of intensities in the $x$ and $y$ direction of the window, along with the covariance. The SSIM value can vary between -1 and 1, where 1 indicates perfect similarity. Unlike NE that estimate the perceived error, the SSIM attempts to model the perceived change in the structure information and therefore, is more accurate in measuring map structure similarity.

So we use $NE$ to measure the global perceived errors and $SSIM$ to measure local structure similarity of the resulting maps.

Evaluation process

A python script implementing the three metrics is available here.
For the evaluation, the map $K$ need be to aligned with the map $I$ first. The aligned map need to have similar size to the reference map. This can be done automatically (using open cv) or manually, after the alignment, the evaluation result can be obtained by executing the following command:

./metric.py path/to/ground_truth_map  path/to/aligned_map

The result will be something like this:

sum of error:  380634.0
normalize error (NE): 5.9073470528

Mean square error (MSE): 3760.15724384

Structure similarity index: (SSIM) 0.838009186185

Reference

Xuan Sang LE, Luc Fabresse, Noury Bouraqadi, & Guillaume Lozenguez. (2018). Evaluation of Out-of-the-box ROS 2D SLAMs for Autonomous Exploration of Unknown Indoor Environments. In The 11th International Conference on Intelligent Robotics and Applications. Newcastle, Australia, August 9-11, 2018

Simple (naive) document clustering using tf-idf and k-mean (Score: 0.14)
Setting up a software stack for autonomous mono robot exploration and mapping using ROS (Score: 0.11)
Adding noise to odometry data published by the Gazebo simulator (Score: 0.08)
Data visualization: global view of blog posts relationship (Score: 0.07)
AntOSDK tutorial: Developing a simple text editor (Score: 0.05)

Comments

The comment editor supports Markdown syntax. Your email is necessary to notify you of further updates on the discussion. It will be hidden from the public.