Loving Data and Everything Else: Summation Trick for Object Recognition

In this article I want to explain a feature of the blob detection I built.

I've been working on my autonomous boat project and what I needed was a way to detect where the boat was and where it was going. I chose to put bright stickers (green and blue) on the boat. Here's a nice picture of me testing out the detection algorithm.

Program recognizes a green and blue spot on the paper

Binary images

The beginning of finding these object is simple. The video images from the webcam have three color channels (red, green and blue). So the first step in this process is to look at pixels and decide if we find them particularly green (I'll talk about green, but you can replace it with blue and you get the same story). If this is the case we assign a 1 to that pixel and otherwise we assign a zero, so we get a binary image.

There is a big possibility that there are some pixels that are green, but don't belong to the sticker I'm looking for. We do know that the middle of the sticker is probably surrounded by a lot of green pixels.

Think of it this way, if I could add up all the pixels in my binary (1's and 0's) image around some pixel I'll know if I'm looking at a green area or not. If the sum is small it was just some stray green pixel, if the sum is really high I've found a really green area. I'll try and find the pixel which gives me the most green neighbors and decide that that pixel is the middle of sticker (or close enough).

Sound easy, but it felt like the computation of all those sums could be to heavy to perform in a real-time video. To estimate how many calculations that is. Say we have a 800 by 600 video image (small) and we look at an area of 10 by 10 pixels around every pixel (small again). That means roughly 800*600*10*10=4.8 million pixel additions!

I don't know if Processing can handle that at 30 frames per seconds, but I didn't want to try when I already knew a way to speed it up by a factor of 25. Not possible?

Less is more

I'll explain the method I used in one dimension first before I explain the 2D image method.

Imagine you have a vector of numbers i.e. U = [2, 5, 7, 1, 3, 2, 8, 6 , 7, 2, ....] and you want the sum of 10 consecutive numbers in this vector, for every position.

To make it clear I give the positions a name 1, 2, 3, etc.

So this sum I am looking for is

V=[1+2+3+4+5+6+7+8+9+10, 2+3+4+5+6+7+8+9+10+11, ....].

Instead of doing this directly we take an intermediate step. We construct the following vector:

As you can notice the first sum I am looking for can already be found in position ten of this new vector. In position eleven we find 1+2+3+4+5+6+7+8+9+10+11, which is almost the second sum we are looking for. So have so subtract 1 which is found in position one of our new vector.

Last example: in element twelve we find 1+2+3+4+5+6+7+8+9+10+11+12, where the 1+2 are too much. This is found in the second element of the vector w.

As you hopefully understand know. You only have to look at two numbers in our intermediate vector and subtract them to find the sum of four numbers. How much does that save in total calculations?

Building the intermediate vector costs 2*n additions

pseudocode:

w(1) = v(1)

for i = 2:n

w(i) = w(i-1) + v(i)

endfor

And after that the summed vector s takes another 2*n additions

s(1) = w(10)

for i = 2:n
s(i) = w(i+10)-w(i-1)

endfor

So that brings the total to 4*n numbers that need to be added. If we had done it in the normal fashion it would have been 10*n additions.

The more numbers that needed to be added, the more you save.

More of that

But what about 2D? We want the sum in a whole around a pixel. Again we have to take an intermediate step to reduce the number of additions. I usually think of an image as a big matrix, so imagine one where every position has number as a name.

In this example the rows are 100 elements long, but that doesn't matter. We'll call the new intermediate matrix N and the first row will look very familiar now:

[1, 1+2, 1+2+3, 1+2+3+4, ....]

The second will be filled as follows [1+101, 1+2+101+102, 1+2+3+101+102+103, ...]. What we have done is adding elements again from the second row in M, but also add the element that lies above in N. The very last element in the matrix N while have the sum of all the elements of M if we continue like this.

One example should show how easy it is to find the sum of an area now. Say we want the sum of the elements in squares 102, 103, 202 and 203. This is a simple example that doesn't speed up, but it is easier to understand.

Let's look the third element in the third row. It has these four numbers added we are looking for. But also a couple to much (1,2,3, 101 and 201). The first can be removed by subtracting the square in blue. 101 and 201 can be removed by substracting the square in purple. You see that we subtract 1 twice, but we can correct this by adding the red square.

Ok, in this example we add and subtract four numbers when we could have added the four numbers directly.

But if you wanted to have the sum of a ten by ten area the calculation would still only involve four numbers ,instead of a hundred (tadaa factor 25!!).

For a position i,j in an image for an area of n by n pixels the sum becomes:

For me it's an old trick, but maybe someone finds it useful. I still find it satisfying when I find an opportunity to use it.

Loving Data and Everything Else

Pages

Monday, June 23, 2014

Summation Trick for Object Recognition

Binary images

Less is more

More of that

No comments:

Post a Comment