Project 2 · CS231M 2015

Released on Sunday, April 26, 2015. Due by Friday, May 8, 2015, 11:59 PM.

Instructions

Starter Code

Setup

  1. Import the starter code into Eclipse.
  2. Right-click on the imported project, select Properties, and under Android, verify that the OpenCV library path is correct. If not, replace it with the correct reference.
  3. Set the OPENCV_PATH environment variable in Eclipse's preferences (under C / C++ > Build > Environment).
  4. Verify everything builds successfully.

Submission

  1. Your write up must be a PDF file.
    Copy it to your project folder and name it report.pdf.
  2. Execute create-submission.py in the project folder. This should create a zip archive named [sunet-id]-project-2.zip.
  3. The zip archive contains everything that needs to be submitted.
    Email it to cs231m+p2@gmail.com

Honor Code

Your submission must be your own work. No external libraries, besides the ones already referenced by the started code, may be used. We expect all students to adhere to the Stanford Honor Code.

Overview

Introduction

Artsy is an augmented reality app for paintings. Here's a preview of what it can do:

There are two broad tasks demonstrated in the preview: classification and tracking. Artsy automatically classifies the painting using a combination of convolutional neural network and support vector machines. Next, it tracks the painting as the user moves around. All the processing is done on the device in real time. The figure below shows the various components you'll be working on for this assignment. The shaded boxes denote the components that will be treated as black boxes.

Operation Instructions

Here's how Artsy works:

  • You aim your camera at a painting (making sure you're close enough that the painting fills the screen; more on this later).
  • Tap the screen to begin.
  • Artsy will first classify the painting.
  • Next, it will augment the painting with a bounding box and its title.
  • You can now move around, keeping the camera aimed at the painting. Artsy will track the painting, and keep the augmentation localized over it.
  • Tap the screen to restart the classification and tracking.

Paintings

We'll be using a small set of 5 paintings for this assignment. It includes:

You'll need a color copy of a few of these paintings (in particular, of Convergence and Concetto Spaziale).

1. Image Classification — 10 Pts

As discussed in class, convolutional neural networks (CNN) have recently dominated the field of image classification. Currently, most CNNs are trained using powerful GPUs over multiple days. Luckily, a single forward-pass for classification is tractable on mobile platforms. However, it is not uncommon for these networks to take up over a gigabyte of memory. On memory constrained mobile platforms, this becomes an issue. To work around this, our classifier uses a variant of the network originally published by Krizhevsky, Sutskever, and Hinton, modified to work within our memory limitations. We achieve this reduction in memory usage by suffering a slight drop in accuracy.

The CNN we will use was originally trained on the ImageNet dataset. However, we're interested in classifying 5 paintings. As it turns out, the features learned by a CNN are quite powerful and are often applicable for tasks beyond classifying their original training set (this paper on CNN features goes over some interesting results). Therefore, we will use the CNN as a feature extractor. The actual classification will be done using one-vs-all SVM classifiers. We have provided you with 5 pre-trained SVM classifiers, corresponding to each painting.

1.1 Implementation

In Classifier.cpp, follow the instructions and implement the classify function.

2. Tracking — 70 Pts + 10 Extra Credit

Preface

The tracking subsystem operates in one of the following states:

  1. Initialization. The tracker is provided an initial camera frame. It uses this frame to extract an initial set of reference features. We make the simplying assumption that the initial frame consists entirely of the painting, with no other objects visible. Therefore, any features extracted in the image are guaranteed to belong to our "target".
  2. Tracking. The KLT tracker uses the optical flow algorithm discussed in class to estimate the new position of each tracked feature.
  3. Re-localization. If the KLT tracker fails, then we switch to the ORB tracker to re-detect the painting. Until re-localization succeeds, we consider ourselves as being "Lost".

The initialization occurs when the user taps the screen. The flowchart below describes the how we transition between the remaining states.

2.1 Optical Flow Tracking — 40 Pts

In KLTTracker.cpp, implement the initialize and track functions.

In PlaneTracker.cpp, implement the estimate_homography and track functions.

In Augmentor.cpp, implement the render_bounds function.

You might find it useful to develop this using the provided test videos (in particular, mona-lisa.avi).

Hint — The correspondences estimated by the KLT tracker should be between the previous frame and the current frame. However, you want to estimate a homography between the intial frame and the current frame. Consider the fact that transformations can be chained.

2.2 Relocalization — 20 Pts

In ORBTracker.cpp, implement the initialize and track functions.

In PlaneTracker.cpp, update the track function to include the relocalization logic, as shown in the flowchart above. When re-initializing the KLT tracker, make sure you pass in the RANSAC inliers (as determined during the homography re-estimation following the ORB matching) as the initial points.

Test your relocalization system using the mona-lisa-blur.avi test video. You should be able to successfully recover after the motion blur disruption.

2.3 Analysis — 10 Pts

  1. Our tracking system prefers some paintings over others (everyone's a critic). Compare your tracking results for Jackson Pollock's Convergence vs Lucio Fontana's Concetto Spaziale. Give an explanation for your observations.
  2. You might notice that the KLT tracker drifts over time. What are some possible explanations for this drift?
  3. Try varying the window size parameter in the calcOpticalFlowPyrLK function call. How does it affect the tracking? Provide an explanation for your observations.
  4. Repeat the previous question, varying the number of pyramid levels this time.
  5. How robust is the ORB-based relocalization system? What are some scenarios where the relocalization fails (and why)?
  6. Could we use our current system for tracking and augmenting sculptures instead of paintings? Why or why not?

2.4 Extra Credit — 10 Pts

Implement a relocalization system capable of handling mona-lisa-blur-extra-credit.avi. In particular, your tracking results should be reasonably good after the second blur towards the end.

3. Corner Detection — 20 Pts

Preface

In class, we discussed the Good Features to Track paper by Shi and Tomasi, and the accompanying corner detection algorithm (which is similar to the Harris Corner algorithm). In this section, we will replace OpenCV's goodFeaturesToTrack function with one that calls our own version of the Harris corner detector.

The version we covered in class computes the eigenvalues of the second moment matrix:

$$ A = \sum_{x, y} w(x, y) \begin{bmatrix} I_x I_x & I_xI_y \\ I_x I_y & I_y I_y \end{bmatrix} $$

However, computing the eigenvalues $\lambda_1$ and $\lambda_2$ can be expensive. As a result, most practical implementations use the following scoring function instead:

\begin{align} S &= \lambda_1 \lambda_2 - \kappa \cdot (\lambda_1 + \lambda_2)^2 \\ &= \text{det}(A) - \kappa \cdot \text{trace}^2(A) \end{align}

Where $\kappa$ is an empirical constant typically between 0.04 — 0.06.

The score can now be utilized as follows:

  • A large $S$ implies we have a corner.
  • $S < 0 $ implies we have an edge.
  • A small value for $|S|$ implies that we have a flat texture-less region.

3.1 Implementation — 20 Pts

In KLTTracker.cpp, implement the harris_corner_detector function.

Set use_my_harris_detector = true in the initialize function to test your implementation.

References

  1. Lecture Slides on Optical Flow and Tracking
  2. Lecture Slides on Neural Networks and Decision Trees for Machine Vision
  3. Pyramidal Implementation of the Lucas Kanade Feature Tracker: Description of the Algorithm — Jean-Yves Bouguet.
  4. Good Features to Track — Jianbo Shi, Carlo Tomasi.
  5. ImageNet Classification with Deep Convolutional Neural Networks — Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton.
  6. CNN Features off-the-shelf: an Astounding Baseline for Recognition — Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, Stefan Carlsson.
  7. Multiple View Geometry — Richard Hartley, Andrew Zisserman.
    Sections 4.7 and 4.8: Robust Homography Estimation using RANSAC.
  8. A Combined Corner and Edge Detector — Chris Harris, Mike Stephens.