CS 280A HW1 Report

Problem Formulation

Given three unaligned color channels (RGB) of an image, find a good alignment for them and generate the aligned image.

Problem Example

Given an image like this:

monastery

Our goal is to place them together and generate a full-color image:

monastery_aligned

In contrast, placing them together without any optimization yields:

monastery_unaligned

Method

In this section, we will introduce each procedure of the final algorithm, including:

Finally, we will give the full picture of the solution at the end of this section.

Given two unaligned channels, we search over the pre-defined displacement window (in our implementation, it is [-18, 18] pixels,) scoring each displacement with a score matric, which we'll discuss later, and choosing the one with the highest score.

Note that in this procedure, we score the channels for (18+18+1)2=1369 times. It may be time-consuming for larger images, especially when computing the match score is non-trivial. Another shortcoming of this procedure is that the pre-defined window may not be sufficient for large images. One possible solution to this is to enlarge the pre-defined window size. However, it could significantly increase the computation efforts. Therefore, we need the "image pyramid" to accelerate the process.

Image Pyramid

Image pyramid rescales the original image in multiple sizes and exhaustively searches from the smallest image to the original image. Once we find a displacement on a small image, we can narrow down the search window and apply it to a larger image. The process repeats until we find the displacement on the original image.

In our implementation, we rescale images by a factor of 2, and the smallest rescaled is no larger than 100000 pixels. Formally, for a m×n image Im×n, the image pyramid P is defined as:

P={I2km×2kn | k=0,1,2,,K},where 2Km×2Kn=22Kmn<100000 and 2K+1m×2K+1n=22K+2mn100000

For convenient nomination, let Ik denote I2km×2kn. If we find a displacement (xk,yk) on Ik, then the displacement on Ik1 should be near at (2xk, 2yk). That is, (xk1, yk1)(2xk, 2yk). Therefore, we can conduct an exhaustive search with a small search window at (xk1, yk1) on Ik1.

Match Score Metrics

The next question is, how to define a "good alignment" between two channels? Since the channels were derived from the same image, we assume that a "good alignment" makes the two channels "similar". In this section, we'll introduce two pixel-based metrics (L2-distance and dot product) and one hybrid metric (Canny edge detection).

L2-Distance (Euclidean Distance)

A straightforward intuition for measuring the similarity of two channels C1 and C2 is to compute the Euclidean distance between them. We compute the distance between each pixel from two channels and sum it up. The match score SL2 is defined as:

SL2(C1,C2)=(x1,y1)C1,(x2,y2)C2[(x1x2)2+(y1y2)2]

Note that we take the negative sign to make sure that the higher score represents the higher similarity.

However, we didn't use Euclidean distance in our final implementation since it doesn't give a favorable result.

Dot Product

Another method is to compute the dot product v1v2, where v1 is a flattened vector of C1 and v2 is a flattened vector of C2. The match score Sdot is defined as:

Sdot(C1,C2)=v1v2, where vi is the flattened vector of Ci

Canny Edge Detection

The above pixel-based metrics assume that in a "good alignment", the value of each corresponding pixel pair shares the same pattern. In Euclidean distance, it assumes the value between two corresponding pixels is close. On the other hand, the dot product score encourages the image that has high values in both pixels in many corresponding pixel pairs. Though, a real-world image doesn't follow this pattern. For instance, a red color pixel has a high value in the red channel and has low values in the rest. The above methods are likely to fail in matching an image with many red pixels.

To solve this problem, we need to find a feature that is consistent in different channels. We assume that object edges can be observed in all channels. Canny edge detection is a popular edge detection algorithm. It starts with Gaussian filter to remove the noise, find the intensity gradients of the image, apply thresholds to determine potential edges, and finally detect edges by hysteresis thresholding. For more information, refer here.

After applying Canny edge detection, it gives a matrix consisting of only {0,1}, where 1 stands for the edges. Then we use the dot product approach mentioned above to score the similarity between two channels. Sedge is defined as:

Sedge(C1,C2)=Sdot(fCanny(C1),fCanny(C2))

Method Pseudocode

Our method's pseudocode is presented as follows:

The aligning process starts with align_channels(), which calls pyramid_align() twice to align three channels. pyramid_align() is a recursive function. Its base case calls a search_align() with a large search window. Its recursive cases rescale the two channels to a smaller size and call itself to obtain the displacement on the smaller channel, then call a search_align() with a small search window. search_align() conducts an exhaustive search over two given channels. It first finds the edges of two channels by the Canny edge detection algorithm, and exhaustively computes the dot product score between two channels in the search window.

Bells & Whistles

Automatic Cropping

We conduct automatic cropping after the aligned image is generated. Since there are borders in aligned images, we use the Canny edge detection algorithm to detect them, and find their exact coordinate by diagonally approaching. That is, we start from the left-top point (0,0), and iteratively increase the coordinate to (1,1), (2,2), (3,3), until we reach the border edge. A similar procedure is done on the right-bottom point as well. We then crop the image according to the found left-top point and right-bottom point.

Automatic White Balance

We implement the "grey world" method mentioned in the lecture slide on page 36. We compute the mean value of each channel and linearly scale them to force the mean value to 0.5 . Note that we need to do this after automatic cropping since there are redundant white (or black) borders in the uncropped images.

Result

For each image pair below, the left one is the one without automatic white balance while the right one is the one with automatic white balance.

cathedral_combined

church_combined

emir_combined

harvesters_combined

icon_combined

lady_combined

melons_combined

monastery_combined

onion_church_combined

sculpture_combined

self_portrait_combined

three_generations_combined

tobolsk_combined

train_combined

Offset Table

Offsetred channelgreen channel
emir.tif(107, 40)(49, 24)
church.tif(58, -4)(25, 3)
three_generations.tif(114, 12)(56, 12)
melons.tif(176, 14)(80, 10)
onion_church.tif(107, 35)(52, 24)
train.tif(85, 29)(41, 0)
icon.tif(89, 23)(39, 16)
self_portrait.tif(175, 37)(77, 29)
harvesters.tif(123, 13)(60, 17)
sculpture.tif(140, -27)(33, -11)
lady.tif(120, 13)(56, 10)
cathefral.jpg(12, 3)(5, 2)
monastary.jpg(3, 2)(-3, 2)
tobolsk.jpg(6, 3)(3, 3)

Other Images

extra1_combined

extra2_combined