Tuesday, October 28, 2008 1:52:14 AM UTC #

Yes, I'm aware it's late in the year 2008, I'm aware this stuff isn't as fresh as WPF 3D or Ruby Processing.

As I've posted earlier, I've accrued some treasured junk. Now that I have all this junk, what am I to do? Well, um…I didn't really know either.

So I started messing around.

Messing around with System.Drawing: first, infrastructure

The first thing I did was to determine the average color for a single image. I'm not sure exactly where I'm going, but I figure, hey, if you want to get a rough "picture" of what an image looks like, it's not a bad idea to look at the average color value. And we're using the RGB breakdown for color, meaning white is #FFFFFF (256,256,256), black is #000000 (0,0,0), and everything else falls in between.

Note that in my case, performance is not a big deal; I'm doing all these calculations one pixel at a time which, as you might image, is suboptimal. Mostly a straightforward operation:

public static Color Average(Image image)
{
    using (Bitmap bitmap = new Bitmap(image))
    {
        int red, green, blue;
        long redRunningSum = 0, greenRunningSum = 0, blueRunningSum = 0;
        long numPixels = bitmap.Width * bitmap.Height;

        foreach (Color pixelColor in ImageHelper.GetPixelsFor(bitmap))
        {
            redRunningSum += pixelColor.R;
            blueRunningSum += pixelColor.B;
            greenRunningSum += pixelColor.G;
        }

        red = (int)(redRunningSum / numPixels);
        green = (int)(greenRunningSum / numPixels);
        blue = (int)(blueRunningSum / numPixels);

        return Color.FromArgb(red, green, blue);
    }
}

Ok, so why do we care—it's a function, right? Well, okay, yes—but here's a PowerShell function you may also find interesting:

function Average-Images ($filenames)
{
    [void][reflection.assembly]::Loadfile("C:\a\sandbox\ImgTest\bin\Debug\ImgTest.dll")
    $i = 1
    $total = $filenames.count
    $results = @()
    foreach ($filename in $filenames)
    {
        write-host "$i - $($i*100/$total)%- $($filename)"
        $i++
        $img = [System.Drawing.Image]::FromFile($filename)
        $o = new-object PSObject
        $avg = [ImgTest.ImageHelper]::Average($img)
        add-member -inp $o -membertype "NoteProperty" -name "Filename" -value $filename
        add-member -inp $o -membertype "NoteProperty" -name "Image" -value $img
        add-member -inp $o -membertype "NoteProperty" -name "Red" -value $avg.R
        add-member -inp $o -membertype "NoteProperty" -name "Green" -value $avg.G
        add-member -inp $o -membertype "NoteProperty" -name "Blue" -value $avg.B
        $results += $o
    }
    $results
}

So. This is getting interesting. What the "Average-Images" function above does is create a custom object with some useful properties: we've got the original filename, we've got a still-breathing reference to the System.Drawing.Image object, and we're storing the "average pixel's" red, green, blue values as individual properties. The resulting objects look something like this:
image

Maybe it's still not interesting for you. That's fine, 'cause this party's* just getting started!
*despite what I've just written, this is not a party

I have one more piece of "infrastructure" to explain, before we can get cooking: I've created a PowerShell function called "Make-Html," which creates a permanent HTML file listing all the images I want to see, in the order I want to see them. As an added bonus, the function immediately launches the newly-created file in my browser. Here's the code:

$startDir = "C:\a\ps1\scrape\"
function Make-Html ($fullfilenames, $resultingFilename)
{
    $files = $fullFilenames | % { $_.split("\")[-1] }
    $tags = $files | % { "<div style=""float:left;""><img src=""$_""/></div>" }
    $html = @"
<html><head><title>$($resultingFilename)</title>
</head><body>
$($tags)
</body></html>
"@

    $html > "$($startDir)$($resultingFilename).html"
    ii "$($startDir)$($resultingFilename).html"
}

Ok, I know, we're still not doing anything.

Let's warm up

Okay, as I say to everyone, the real power of PowerShell is its object piping. PowerShell pipes objects, not text; this is something best seen, not heard, and hopefully we'll see a little something today. The objects we'll be slinging through the pipeline today are, as mentioned above, custom objects that have a Filename, an Image, and the RGB values representing the image's average (mean?) color.

So, let's count how many items we have:
image

Awesome. Let's count how many items we have that are more red than any other color:
image

Hmm, that was unexpected, 359 red-dominant images out of 503, that's proportionally huge. I'll point out that I did some extra fanciness to get this count to evaluate on one line, but usually (i.e. when I'm not posting to my blog) I'll work my way in parts, not all at once. So the same thing, split out, would be:
image

That's more realistic.

Okay, one more thing before we go. Finding out most of my pictures are red-dominant has me wondering: what about the other two? Let's work with the objects a little* to massage the answer out of them:
*a lot; ugly function that pulls out the dominant color not shown
image 

Weird.

Skipping ahead to the end

This is the pattern: we'll ask a burning question, we'll form this question as a PowerShell pipeline, and we'll see the results.

Question: can we see the images in order of "redness"?

Pipeline:

$a | sort red | % { $_.filename }

Results:

Least red:

20080707024010
200549766 

Most red:

20080524165059
439193020

Summary: okay, that makes sense. We used a naive algorithm that simply counted the red value, meaning that a pure black image or a pure blue image would have the "least redness" and a pure white image would have as much "redness" as a pure red image. Hmm, we can fix this. Onwards!



Question: Okay, so we're looking for redness. Let's call this proportional redness. Hmm, here we go:

Pipeline:

$relativelyRed = $a | select filename, @{Name="redness"; expression={$_.red / ($_.red+$_.green+$_.blue) }}
$relativelyRed | sort redness | % { $_.filename }

Results:

Least red:

883437048
20080327175323

Most red:

899605173
20080418093945
20080524164842

Summary: now that's more like it. Our earlier naive results were instructive, but this is more what I was looking for.



Question: okay, so let's stop messing with redness. Instead, let's find out what images have the most variance between the colors. We're less interested in the white-gray-gray-gray-black spectrum, and are looking for more colorful images. Let's do this:

Pipeline:

$variance = $a | select filename, @{Name="Variance"; Expression={$avg = ($_.red+$_.green+$_.blue)/3; $var = [math]::Abs($_.red-$avg) + [math]::abs($_.green-$avg)+ [math]::abs($_.blue-$avg); $var} }
make-html -fullfilenames ($variance | sort variance | % { $_.filename }) -resultingFilename "variance"

Results:

Most balanced:

783914459
281592264
741444854

Most variance:

20080831161327
20080701085401
787193910
307907780

Summary: most interesting, besides a grouping of the "grayish" and "black and white" images all together, is the smattering of images that have color, but are so perfectly balanced they're nestled right in there with the pure black-and-white images. Neat.

Final bits

This post is already too long. There's not too much else to say, besides a) stuff is awesome, and b) with the aid of either PowerShell functions or .NET library calls, you can do some complex things. If you only remember one thing from this post, try and pick up the impression I'm trying to leave. This is how I see PowerShell: it's an experimental playground where I morph a thought, an idea, slowly into something workable, and in each step along the way, I'm getting feedback and refining, and in the end, I've satisfied my curiousity. Maybe it's something as useless as basic image analysis using System.Drawing.

Incidentally, if you want to see how the professionals do this kind of thing, check out Multicolr - an color search engine indexing 10 million Flickr pictures, which makes the stuff I did above kind of pitiful looking :) When I checked last, the Multicolr site was slow, otherwise it's neat; check it out.

Categories: Awesomeness | PowerShell
Technorati:  | 
Tuesday, October 28, 2008 1:52:14 AM UTC  #     |  Comments [1]  |  Trackback
Wednesday, January 21, 2009 4:10:38 PM UTC
Interesting post Peter.

As I was reading this I was thinking about pornography and how we can better manage it and keep it out of the hands of children. You might have heard about the flesh filter. Those formulas toward the end of the post seemed earily familiar. How hard would it be to devise a formula to sort/identify your images based on a set of palettes. Maybe you could devise your own "flesh filter". It would need to identify specific RGB values and then it would need to do some type of surface area profiling to count "how much" flesh tones in contiguous blocks are in an image. Just a thought.
Name
E-mail
Home page

Comment (Some html is allowed: a@href@title, b, blockquote@cite, em, i, strike, strong, sub, sup, u) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

Enter the code shown (prevents robots):

Live Comment Preview
Syndication

Search
Posts on this page
Categories
Sites I visit regularly
About

Powered by: newtelligence dasBlog 2.2.8279.16125

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2010, Peter Seale

Send mail to the author(s) E-mail



Sign In