Yes, I'm aware it's late in the year 2008, I'm aware this stuff isn't as fresh as WPF 3D or Ruby Processing.
As I've posted earlier, I've accrued some treasured junk. Now that I have all this junk, what am I to do? Well, um…I didn't really know either.
So I started messing around.
The first thing I did was to determine the average color for a single image. I'm not sure exactly where I'm going, but I figure, hey, if you want to get a rough "picture" of what an image looks like, it's not a bad idea to look at the average color value. And we're using the RGB breakdown for color, meaning white is #FFFFFF (256,256,256), black is #000000 (0,0,0), and everything else falls in between.
Note that in my case, performance is not a big deal; I'm doing all these calculations one pixel at a time which, as you might image, is suboptimal. Mostly a straightforward operation:
public static Color Average(Image image){ using (Bitmap bitmap = new Bitmap(image)) { int red, green, blue; long redRunningSum = 0, greenRunningSum = 0, blueRunningSum = 0; long numPixels = bitmap.Width * bitmap.Height;
foreach (Color pixelColor in ImageHelper.GetPixelsFor(bitmap)) { redRunningSum += pixelColor.R; blueRunningSum += pixelColor.B; greenRunningSum += pixelColor.G; }
red = (int)(redRunningSum / numPixels); green = (int)(greenRunningSum / numPixels); blue = (int)(blueRunningSum / numPixels);
return Color.FromArgb(red, green, blue); }}
Ok, so why do we care—it's a function, right? Well, okay, yes—but here's a PowerShell function you may also find interesting:
function Average-Images ($filenames){ [void][reflection.assembly]::Loadfile("C:\a\sandbox\ImgTest\bin\Debug\ImgTest.dll") $i = 1 $total = $filenames.count $results = @() foreach ($filename in $filenames) { write-host "$i - $($i*100/$total)%- $($filename)" $i++ $img = [System.Drawing.Image]::FromFile($filename) $o = new-object PSObject $avg = [ImgTest.ImageHelper]::Average($img) add-member -inp $o -membertype "NoteProperty" -name "Filename" -value $filename add-member -inp $o -membertype "NoteProperty" -name "Image" -value $img add-member -inp $o -membertype "NoteProperty" -name "Red" -value $avg.R add-member -inp $o -membertype "NoteProperty" -name "Green" -value $avg.G add-member -inp $o -membertype "NoteProperty" -name "Blue" -value $avg.B $results += $o } $results}
So. This is getting interesting. What the "Average-Images" function above does is create a custom object with some useful properties: we've got the original filename, we've got a still-breathing reference to the System.Drawing.Image object, and we're storing the "average pixel's" red, green, blue values as individual properties. The resulting objects look something like this:
Maybe it's still not interesting for you. That's fine, 'cause this party's* just getting started!*despite what I've just written, this is not a party
I have one more piece of "infrastructure" to explain, before we can get cooking: I've created a PowerShell function called "Make-Html," which creates a permanent HTML file listing all the images I want to see, in the order I want to see them. As an added bonus, the function immediately launches the newly-created file in my browser. Here's the code:
$startDir = "C:\a\ps1\scrape\"function Make-Html ($fullfilenames, $resultingFilename){ $files = $fullFilenames | % { $_.split("\")[-1] } $tags = $files | % { "<div style=""float:left;""><img src=""$_""/></div>" } $html = @"<html><head><title>$($resultingFilename)</title></head><body>$($tags)</body></html>"@
$html > "$($startDir)$($resultingFilename).html" ii "$($startDir)$($resultingFilename).html"}
Ok, I know, we're still not doing anything.
Okay, as I say to everyone, the real power of PowerShell is its object piping. PowerShell pipes objects, not text; this is something best seen, not heard, and hopefully we'll see a little something today. The objects we'll be slinging through the pipeline today are, as mentioned above, custom objects that have a Filename, an Image, and the RGB values representing the image's average (mean?) color.
So, let's count how many items we have:
Awesome. Let's count how many items we have that are more red than any other color:
Hmm, that was unexpected, 359 red-dominant images out of 503, that's proportionally huge. I'll point out that I did some extra fanciness to get this count to evaluate on one line, but usually (i.e. when I'm not posting to my blog) I'll work my way in parts, not all at once. So the same thing, split out, would be:
That's more realistic.
Okay, one more thing before we go. Finding out most of my pictures are red-dominant has me wondering: what about the other two? Let's work with the objects a little* to massage the answer out of them:*a lot; ugly function that pulls out the dominant color not shown
Weird.
This is the pattern: we'll ask a burning question, we'll form this question as a PowerShell pipeline, and we'll see the results.
Question: can we see the images in order of "redness"?
Pipeline:
$a | sort red | % { $_.filename }
Results:
Least red:
Most red:
Summary: okay, that makes sense. We used a naive algorithm that simply counted the red value, meaning that a pure black image or a pure blue image would have the "least redness" and a pure white image would have as much "redness" as a pure red image. Hmm, we can fix this. Onwards!
Summary:
Question: Okay, so we're looking for redness. Let's call this proportional redness. Hmm, here we go:
$relativelyRed = $a | select filename, @{Name="redness"; expression={$_.red / ($_.red+$_.green+$_.blue) }}$relativelyRed | sort redness | % { $_.filename }
Summary: now that's more like it. Our earlier naive results were instructive, but this is more what I was looking for.
Question: okay, so let's stop messing with redness. Instead, let's find out what images have the most variance between the colors. We're less interested in the white-gray-gray-gray-black spectrum, and are looking for more colorful images. Let's do this:
$variance = $a | select filename, @{Name="Variance"; Expression={$avg = ($_.red+$_.green+$_.blue)/3; $var = [math]::Abs($_.red-$avg) + [math]::abs($_.green-$avg)+ [math]::abs($_.blue-$avg); $var} }make-html -fullfilenames ($variance | sort variance | % { $_.filename }) -resultingFilename "variance"
Most balanced:
Most variance:
Summary: most interesting, besides a grouping of the "grayish" and "black and white" images all together, is the smattering of images that have color, but are so perfectly balanced they're nestled right in there with the pure black-and-white images. Neat.
This post is already too long. There's not too much else to say, besides a) stuff is awesome, and b) with the aid of either PowerShell functions or .NET library calls, you can do some complex things. If you only remember one thing from this post, try and pick up the impression I'm trying to leave. This is how I see PowerShell: it's an experimental playground where I morph a thought, an idea, slowly into something workable, and in each step along the way, I'm getting feedback and refining, and in the end, I've satisfied my curiousity. Maybe it's something as useless as basic image analysis using System.Drawing.
Incidentally, if you want to see how the professionals do this kind of thing, check out Multicolr - an color search engine indexing 10 million Flickr pictures, which makes the stuff I did above kind of pitiful looking :) When I checked last, the Multicolr site was slow, otherwise it's neat; check it out.
Remember Me
a@href@title, b, blockquote@cite, em, i, strike, strong, sub, sup, u
Powered by: newtelligence dasBlog 2.2.8279.16125
Disclaimer The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.
© Copyright 2010, Peter Seale
E-mail