Project 4A

Shoot and Digitize Pictures

The pictures to merge are:

kitchen1kitchen2
Kitchen (left)Kitchen (right)
sofa1sofa2
Living room (left)Living room (right)
desk1desk2
Desk with tarot cards (left)Desk with tarot cards (right)

Pictures to rectify are:

computer
Laptop (16:9)
TV
Television (16:10)

Recover Homographies

In this part, we want to find the matrix H such that

H[xy1]=[abcdefgh1][xy1]=[wxwyw]

, where x,y is the point coordinate in the source image, and x,y is the point coordinate in the target image.

Expanding the equation, we have:

{ax+by+c=wxdx+ey+f=wygx+hy+1=w

then solve the formula:

{ax+by+c=(gx+hy+1)xdx+ey+f=(gx+hy+1)y
{ax+by+cgxxhyx=xdx+ey+fgxyhyy=y

Now we can use the least square errors to estimate the matrix H by applying at least 4 corresponding points to the equation.

Warp the Images

After computing the matrix H, I used inverse warping with bilinear interpolation to warp project the image to the target project plane.

The coordinates after inverse warping should be translated to fit in a new canvas. Therefore, I find the minimal x,y values and translate the whole image with that x,y values and the new canvas size is then determined by (xmax,ymax).

We are able to rectify the images now:

TVTV_rec
TelevisionRectified television
computercomputer_rec
LaptopRectified Laptop

Blend images into a mosaic

Finally, we can build a panorama by warping corresponding points to the same coordinates.

I first tried naive averaging in the intersecting part of two images:

full_kitchen_mean
full_sofa_mean
full_desk_mean

The naive averaging didn't give preferable results, specifically, there are clear lines resulting from naive averaging in the blended image.

Therefore, I used the weighted averaging as my second approach. The weights of intersecting parts are determined by a linear function, where the leftmost part has a weight of 0 and the rightmost part has a weight of 1. The weighting masks of the second image are shown below:

lmaskrmask
Left maskRight mask

The subtle noises on the right mask resulted from the way I implemented it. I construct the mask by mask = np.any(image, axis=2). Thus, the black pixels in the image would generate the black pixels on the mask. However, it has no negative effect on our blending since the pixels are black and no blending is needed in those black pixels.

The results are:

full_kitchen
full_sofa
full_desk

The quality of the panoramas is much better than the naive averaged one.

Project 4B

In part B, the objective is to create a automatic stitching algorithm. It consists of several steps:

  1. Detect corner features using Harris corner detector

  2. Reduce feature points with Adaptive Non-Maximal Suppression (ANMS)

  3. Extract feature descriptors from each feature point

  4. Match feature descriptors

  5. Use RANSAC to compute a homography

Harris Corners

I first applied Harris corner detector to find the feature points that look like corners. This resulted in many feature points in images.

auto_sofa_harris_lauto_sofa_harris_r

Adaptive Non-Maximal Suppression

Since there is too many feature points from the previous step, I used Adaptive Non-Maximal Suppression (ANMS) to reduce the feature points but also keep the "useful" points that uniformly spread in the image.

I computed the pair-wise L2 distance for each feature points, and update the supression radius by the formula given by paper:

ri=min|xixj|, s.t. f(xi)<crobustf(xj), xjI

Then, I set the minimum supression radius with the 200th largest suppression radius to find 200 "useful" feature points from the images.

auto_sofa_anms_lauto_sofa_anms_r

Generate Feature Descriptors

The feature descriptor of each feature point is a 64-dimensional vector. For each feature point, I used its surrounding, axis-aligned 40×40 pixels and downsampled them (with Gaussian blur) into 8×8. Then, normalized it so that the mean is 0 and the standard deviation is 1. Finally, they are flattened to be a 64-dimensional vector as a feature descriptor.

It looks like:

descriptor

Match Feature Descriptors

I computed pair-wise L2-distance of feature descriptors, and only retained the feature descriptor pairs (i,j) such that:

di,j<Dmean1.5×Dstd

, where di,j is the L2 distance between ith feature descriptor of the first image and the jth feature descriptor of the second image, and Dmean,Dstd are the mean value and the standard deviation of the L2-distance of all feature descriptor pairs. By doing this, I filtered out roughly 93% of noisy pairs.

Furthermore, I used Lowe's 1-NN/2-NN algorithm to filter out noisy pairs. I chose the pairs with e1NNe2NN<0.25 . The result is shown below:

auto_sofa_lowe_lauto_sofa_lowe_r

RANSAC

Finally, I implemented RANSAC to give a robust homography. As described in the lecture, I randomly sampled 4 pairs, computed the homography, counted the inliers, and used the largest inlier set to compute the final homography.

After doing RANSAC, we're able to mosaic the images now.

Belows are the final results:

auto_sofa_ransacfull_sofa
AutoManual
auto_desk_ransacfull_desk
AutoManual
auto_kitchen_ransacfull_kitchen
AutoManual

What have you learned?

I found that it is very interesting to use a random algorithm (RANSAC) to generate robust results. The power of randomness is amazing.