Thursday, April 22, 2010

Idea for images.google.com

A link under each image for "similar images." This isn't some AI comparison feature. This is simply finding more-or-less identical images, that may just be in other resolutions or image formats.

The way to do this would be to scale each picture down into a smaller grid, i hear 4x4 is a good number. Then just take a hash of the 16 color values. (the color gamut should also be scaled down, but not sure by how much.) Everything with the same hash is an identical picture.

It would be nice if relative comparisons could be made, i.e. ordering pictures by similarity, too. How many pixels (and the color gamut) you scale the image down to determines how precise a match has to be to fit; so google could choose a level that they think will satisfy the average user. Or, they could even create multiple hashes for each image based on different scales, then order image results by which hashes match...

Even more convenient would be if there were some kind of hash where the linear difference between any two hashes is also the level of difference between two pictures. I'm not quite sure if that's logically possible, but then, the above idea can actually be thought of that way..

To actually compare image distances, a good way could be to take color distance between every corresponding pixel between the two grids, square each distance, sum them, then take the square root of that.. but that's way too intensive a comparison for a google search. (Also it ignores the dimension of object borders changing location, but that could be solved by a 2-dimensional levenshtein distance.. which is even more insanely computation-intensive. and i wonder how you can integrate that with a color distance formula.. i wonder if it would work to take an n-dimensional levenshtein distance ( http://www.itl.nist.gov/iad/mig/publications/storage_paper/lrec06_v0_7.pdf ) where L*a*b are three dimensions and x, y location are two more..)