Monday, March 3, 2008

Project 3

by Luis Daniel Ballesteros

Goal:
The goal of this project was to create a vision program that can differentiate me from other people or things. I chose to broaden the scope of the program to have it differentiate one thing from another, as opposed to just saying me or not me. Ideally by the time the program is done, it will be able to determine if a thing it sees is something it has seen before or something new. If it has been seen before, it should be identified, and if it is new, it should be added to a database to be recognized in the future.

Algorithm:
To find faces in the picture or objects of interest, I used the approach discussed in class relating to knowing what is background and what is foreground. Therefore before each sample, a calibration picture must be taken and the program compares the sample to this to find new aspects of meaning in the picture.
After finding the point of action, or object in the picture, the program approximates the size of it, making a box by finding the leftmost, topmost, rightmost, and bottommost pixels in the object of action and remembers these points for comparison in the future. Thus since each picture in the database is saved with this information, it can translate each picture directly to the active site.
Since active spots in one picture might be bigger in one picture than another, when making a comparison to another picture in the database, it finds the ratio of the size of one active site to another. and compares proportionately. For example if the sample is 2 times as tall and 3 times as wide as the picture in the database it's comparing to, it compares a 2x3 rectangle in the sample to a single pixel in the database. For simplicity, we would take the average color in said 2x3 rectangle for comparison.
When comparing pixels, it treats the RGB elements of the pixel as a 3 dimensional vector, and subracts one vector from another to find the color difference. We then take the length of this vector to enumerate the difference as something that can be compared. When comparing whole sections, I keep a sum of the difference enumerizations for the whole picture. This scheme is imperfect, as it is possible to trick the algorithm with certain perfectly symmetrical combinations, but keeping a several hundred dimension vector of differences at each pixel for two different image comparisons, and then finding the length of the vector of their difference to enumerate the difference is not so fun.
For finding a match, I do a simple threshold match. The sample picture would have to have a certain difference from the database picture to be considered a match. If there is no match, the picture is not recognized and is added to the database. Pretty simple scheme here.
The end result is a program which takes a picture to analyze as an argument and uses the color distribution to compare it to pictures it knows, finds the differences and determines if it is a close enough match to any of them. Although this is pretty much the algorithm discussed in class, it seemed like the best way to do it.

Results:
First of all, active sections of the picture are surprisingly well recognized. Further, pictures of me are recognized, but certain conditions must be met. Since the end result of the algorithm is creating color histograms, a significant change in lighting completely breaks the algorithm. This is why, at first, when performing tests in my dark room when the difference in overall color variety was not so high, the algorithm had a significantly higher chance of failing. Once I moved to a more well-lit area, it worked much better since shadows and light angles made much less of a difference.
I fist tested this against people who had a significant difference in color. The algorithm was pretty good at differentiating me from my indian friend, and my friend with blonde hair. Compared to my friend who had similar color hair as me, but a beard this also worked fairly well.
Overall, although the program is not perfect and is sensitive to lighting conditions, it did a satisfactory job.

Obstacles:
Math! I had a weird time finding the best ways to enumerate differences between pictures, and I am still not satisfied with what I decided on. This is probably a large field of research though, so I don't feel too bad.
Also, I first started trying to write this program to analize .jpg images, but I found them to be far to complicated to decompress and get the data from. I settled on using 24-bit color bitmap files since the data is easy to analize and manipulate. It is simply a header followed by 8 bits of red, 8 bits of green, 8 bits of blue for each pixel therein.
I refrained from actually saving the analized active site as a new, smaller picture since I found I could not correctly create the bitmap header needed for the new picture, so I decided it wasn't worth the time to debug. Thus I just save the information for finding it within the picture itself. Note that although this is a rectangular region, after the calibration process, non-active pixels are set to 0, so they contribute nothing towards enumerating differences.

Todo:
I think that a better difference enumeration algorithm would have been great. Also a difference algorithm that is sensitive to location AND color, as opposed to mine that is just sensitive to color with some consideration to location, would improve my program at least tenfold.
I also apologize for using the algorithm presented in class, I don't know much about this sort of image manipulation and found some of the suggestions very helpful. So then, another way to improve the program is to make the algorithm more original.

Samples:
Here are some better samples of what my program generates for different faces, you can see it is imperfect, but overall well behaved.
A further comparison of both processed images then accurately results that they are not the same person.
Some data is lost however, since my friend reflects light a lot and part of his face ends up with the same color configuration as the background...