Currently it is possible to automatically change the visual content of video sequences. One of the commercial applications of this technology is the automatic replacement of billboards. In this application a billboard with publicity of a certain brand is detected and seamlessly removed or replaced by another. This may be due to legal reasons (i.e. tobacco or alcohol publicity is allowed in the country where a sporting event takes place but not where it is retransmitted), previous economical agreements or to better target the audience. An example of this last reason is seen on the following image on which the Gaz de France logo (top) is replaced by an EMB logo (bottom). This particular example was taken from "Real-Time Billboard Substitution in a Video Stream," Proceedings of the 10th Tyrrhenian International Workshop on Digital Communications, G. Medioni et al.
The main goal of the final assignment consists on detecting a logo in a video sequence and smoothly replacing it by another. This is very similar to what you have done in the repainting lab. The most challenging task here is how to detect the object and compute the necessary homography mapping automatically. Ideally you should detect any occurrences of the selected logo and replace it unobtrusively with another one. An uninformed observer should desirably not recognize that the sequences has been faked.
The complexity of this final assignment is higher than the previous ones. We recommend you to start solving the problem straight away and use Wednesday lab hours to solve any doubts that may come up.
seq007.zip
(top) shows an outdoor scene where the object
of interest was extracted from a database(beer sign). Frame 595 from
indoor sequence
newspaper.zip
(bottom) shows an object of
interest (newspaper headline) cropped from a different frame.
Indoor | Outdoor |
Note that the two beer logos are not exactly the same as they differ in size and orientation. Yet, a human has no problem identifying them as the same object. A wide area of computer vision deals with robustly detecting and matching objects and with the measurement of their similarity . As you will have to detect the same object over a large number of images we suggest that you use a SIFT detector. Scale-invariant feature transforms (SIFT) is an algorithm for extracting distinctive features from images. An implementation for non-commercial use together with the articles describing it's theoretical background are available at the author's web-site (http://www.cs.ubc.ca/~lowe/keypoints/ ). It has a simple interface with Matlab. Here is an example of the common features found in the grayscale version of the image pairs shown above.
You can see from the examples above that many, but not all, of the matches are correct. Some corresponding pairs are clearly mismatches - outliers. Part of the task is to select robustly which points are correct according to a predefined geometric model (inliers) and which are incorrect (outliers). One of the most used tools for this is the RANSAC algorithm. RANSAC algorithm was explained in the ransac.pdf lecture and homographies in the homography.pdf lecture.
You may use some functions or codes in general that are available and free for education/research purposes. The responsibility is, however, still yours. You should fully understand how the codes work and be able to explain why you have used it. No excuses for malfunction of external codes will be accepted. Besides, we cannot promise any support in debugging of the external codes.
For the matching part of the task you may consider to use some support codes of the advanced digital image processing course.
publish
in order to
automate the generation of the output.