Skip to the content.

Project 4

[Auto]Stitching Photo Mosaics - Project Spec

  1. Project 4A
    1. Shoot the Pictures
    2. Recover Homographies
    3. Warp the Images
    4. Image Rectification
    5. Blend the images into a mosaic
  2. Project 4B

Project 4A

Image Warping and Mosaicing - Project Spec

Shoot the Pictures

To do this project, I needed to take pictures to perform image rectification and mosaicing.

For the image rectification, I chose to take a picture of a Computer Science Mentors CS 88 worksheet and a piece of art in Soda Hall 380 (both taken at an angle so they can later be rectified). I then rescaled the images by a factor of 0.3 to make processing the images faster in later parts (fewer pixels to operate on).

Worksheet Art
worksheet art

For image mosaicing, I needed to take 3 sets of photos of the same scenery and the same center of projection (e.g. only the camera lens rotates, but the axis of rotation is the same). I chose to take pictures of a hike on the Berkeley Fire Trails, a path on campus between Valley Life Sciences Building and Haviland Hall, and a view of Doe Library and the Memorial Glade going down the North Gate path. I also rescaled these images by a factor of 0.3.

Location Left Image Right Image
Fire Trails fire trails left fire trails right
Campus Path campus path left campus path right
Doe Library doe library left doe library right

Recover Homographies

A homography is a mapping between any 2 projective planes with the same center of projection. (See lecture slides from Fall 2024.) We can use homographies to warp images and perform rectification and mosaicing.

To compute a homography from a source point \((s_{x_i}, s_{y_i}, 1)\) to a destination point \((wd_{x_i}, wd_{y_i}, w)\), you need to compute the values in the \(3 \times 3\) homography matrix \(H\) below. Also note that the source and destination points are homogeneous coordinates.

\[\begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \begin{bmatrix} s_{x_i} \\ s_{y_i} \\ 1 \end{bmatrix} = \begin{bmatrix} wd_{x_i} \\ wd_{y_i} \\ w \end{bmatrix}\]

Assuming you know \(H\), you can apply it to every point \(i\) in the source image.

To find \(a\) through \(h\), you need to solve this system of linear equations since we know \(s_x, s_y, d_x, d_y\) for a subset of \(i\), the correspondence points, which are manually marked using the correspondence tool from Project 3.

\[\begin{bmatrix} s_{x_1} & s_{y_1} & 1 & 0 & 0 & 0 & -s_{x_1} * d_{x_1} & -s_{y_1} * d_{x_1} \\ 0 & 0 & 0 & s_{x_1} & s_{y_1} & 1 & -s_{x_1} * d_{y_1} & -s_{y_1} * d_{y_1} \\ \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots \\ s_{x_n} & s_{y_n} & 1 & 0 & 0 & 0 & -s_{x_n} * d_{x_n} & -s_{y_n} * d_{x_n} \\ 0 & 0 & 0 & s_{x_n} & s_{y_n} & 1 & -s_{x_n} * d_{y_n} & -s_{y_n} * d_{y_n} \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} d_{x_1} \\ d_{y_1} \\ \dots \\ d_{x_n} \\ d_{y_n} \end{bmatrix}\]

Note: \(n\) is the total number of homogeneous coordinate pairs in the source/destination image.

As you can see, the system is overdetermined if \(n > 4\). Because of this, we must use least squares to find a “best fit” solution.

Warp the Images

Now that we have a way to compute \(H\), we can perform warping by writing a function warp_image(img, H). Here is an overview of the warping algorithm:

  1. Compute \(H^{-1}\)
  2. Determine the size of the warped image
    1. Get the \((x, y, 1)\) coordinates of the corners of the source image
    2. Warp the source corners to get the destination corners by doing H @ src_corners, where src_corners is a \(3 \times 4\) matrix (each column is a homogeneous coordinate representing a corner)
    3. Normalize the destination corners (e.g. divide \((wx, wy)\) by \(w\))
    4. Get the min and max \(x\) and \(y\) coordinates to figure out the size of the warped image
  3. Determine all of the \((x, y, 1)\) coordinates inside the warped image. Call this \(3 \times n\) matrix dest_pts (each column is a homogeneous coordinate).
  4. Perform an inverse warp (like in Project 3)
    1. Do H_inverse @ dest_pts
    2. Normalize the matrix product like in step 2.3
    3. Use scipy.ndimage.map_coordinates to interpolate color values

Image Rectification

To rectify the worksheet and art images, I marked their corners and then hardcoded their corresponding points based on the assumption that the worksheet is 8.5 x 11 inches and that the art has a 2:3 ratio (width:height):

Source Points Destination Points
worksheet source points worksheet destination points
art source points art destination points

I then computed \(H_{\text{worksheet}}\) and \(H_{\text{art}}\), and performed the warping algorithm described in the previous section to rectify the images. I also cropped the resulting warped image to remove unnecessary black pixels created by performing the projective transformation.

Worksheet Rectified Art Rectified
worksheet rectified art rectified

Note that the rectified worksheet top is not perfectly straight despite the hardcoded rectangular destination points. This is because in the source image, the paper is not completely flat on the table due to the dog-eared corners.

Blend the images into a mosaic

To create an image mosaic (e.g. stitching together each pair of images of the Berkeley landscape), I can also do the same warping using homographies. Specifically, the approach is to:

  1. Determine correspondence points manually using the correspondence tool linked above
  2. Warp image 1 to image 2
  3. Zero pad warped image 1 and original image 2 so that their dimensions match
  4. Blend warped image 1 with original image 2

I experimented with various blending methods. First, I tried a naive blending by taking the average of the padded images. This led to noticeable edges between warped image 1 and image 2:

fire trails blended using average blend

Next I tried doing blending using a Laplacian stack with a “half half” alpha mask like in Project 2, where all the pixels on the left of the mask are 1 and all the pixels on the right of the mask are 0:

half half alpha mask

This was a significant improvement from the naive blending method, but there is a noticeable artifact at the top. You can also see a faint vertical line at the midpoint of the mosaic:

fire trails blended using half half alpha mask

The best result was achieved with a mask using the distance transform of each image using cv2.distanceTransform, and then finding locations where left distance transform is greater than the right distance transform (I called this where_greater). The final mask is made by np.dstack-ing where_greater for each of the 3 color channels (RGB). I also cropped the blended images to remove any unnecessary black pixels.

Image Left Image Distance Transform Right Image Distance Transform Distance Transform 1 > Distance Transform 2
Fire Trails left fire trails image distance transform right fire trails image distance transform fire trails distance transform 1 greater than distance transform 2 visualization
Campus Path left campus path image distance transform right campus path image distance transform campus path distance transform 1 greater than distance transform 2 visualization
Doe Library left doe library image distance transform right doe library image distance transform doe library distance transform 1 greater than distance transform 2 visualization

Here are the final blended results:

Fire Trails Campus Path Doe Library
blended fire trails blended campus path blended doe library

This entire mosaicing process can be further generalized to make a mosaic of multiple images from the same scenery to form a panorama. Instead of warping one image to another, we can warp all images to a center image. This will be left for the next part of the project!

Project 4B

In progress!