Aphi 186 Image processing blog ni Josh: August 2013

Thursday, August 29, 2013

Musical Blobs. A12 Playing Notes by Image Processing

Who would have thought that image processing is not just for images but also for music? Well, humans can read notes and process it in their brains and sing the note or play it on an instrument so why can't a computer do so as well?

First, a music sheet is required. The song choice has something to do with my dream of wanting to learn to play the piano. I've already downloaded a few music sheets before and this song, 'Fireflies' by Owl City came to mind. I love this song and I hope you like it too :)
http://www.youtube.com/watch?v=psuRGfAaju4

Actually, the activity requires a simple music sheet. Being confident with my musical and 186 skills, I wanted to challenge myself. I also didn't want to play the usual nursery rhymes that my batch mates would probably have chosen such as the song about Mary's lamb or a twinkling star. No, I wanted to play a song about fireflies. Haha. Just for variety.

The main part of the activity was to determine the tones denoted by the notes involved in the song which involves the image processing part. The process can be outlined in 3 steps which is described by the images in the figure below.

I simply loaded the image of the music sheet to scilab and cropped the part that I will be using which is the first line of the song. For convenience, I cropped out the G-clef already but unfortunately couldn't remove the 'arranged by'. Thank you Mr Luke Erickson. I converted the loaded image to binary using im2bw with a threshold of 0.8. I did a few experiments on this using trial and error and found that a threshold of 0.8 appeared to contain most of the information required but not add extra artifacts. The result is the first line in the figure above. Then, I actually forgot the second step which was to invert the image. I forgot that the morphological operators work on a white object in a black background. This was easily done by getting the inverse of the boolean matrix. In scilab, the syntax is simply ~M. The result is the second line of the figure above. Finally, the third line in the figure is the result of applying OpenImage with a structuring element of a circle. Again, the amazing method of Trial And Error (TAE) was used to find the best size of the circle of the structuring element. The choice of a circle was pretty common sense. We want to retain the circular elements in the image. These denote what note is to be played. With the help of SearchBlob, each note was identified by finding their location in space, specifically the y pixel where they exist. Since each note line and space denotes a single tone and also due to the fact that the lines are horizontal, the y pixel can automatically be converted to a tone. This is where the musical side of the activity begins.

G^#₂/A^b₂	103.83	332.
A₂	110.00	314.
A^#₂/B^b₂	116.54	296.
B₂	123.47	279.
C₃	130.81	264.
C^#₃/D^b₃	138.59	249.
D₃	146.83	235.
D^#₃/E^b₃	155.56	222.
E₃	164.81	209.
F₃	174.61	198.
F^#₃/G^b₃	185.00	186.
G₃	196.00	176.
G^#₃/A^b₃	207.65	166.
A₃	220.00	157.
A^#₃/B^b₃	233.08	148.
B₃	246.94	140.
C₄	261.63	132.
C^#₄/D^b₄	277.18	124.
D₄	293.66	117.
D^#₄/E^b₄	311.13	111.
E₄	329.63	105.
F₄	349.23	98.8
F^#₄/G^b₄	369.99	93.2
G₄	392.00	88.0
G^#₄/A^b₄	415.30	83.1
A₄	440.00	78.4
A^#₄/B^b₄	466.16	74.0
B₄	493.88	69.9
C₅	523.25	65.9
C^#₅/D^b₅	554.37	62.2
D₅	587.33	58.7
D^#₅/E^b₅	622.25	55.4
E₅	659.26	52.3
F₅	698.46	49.4
F^#₅/G^b₅	739.99	46.6
G₅	783.99	44.0
G^#₅/A^b₅	830.61	41.5
A₅	880.00	39.2
A^#₅/B^b₅	932.33	37.0

http://www.phy.mtu.edu/~suits/notefreqs.html

Although humans can easily understand notes, computers are not as musical. We have an idea of what a middle C sounds like and can easily reproduce the whole scale after hearing only a reference of one note or by playing it in an instrument. Some people can even find the perfect pitch without a reference. The computer on the other hand speaks a more Physics language, in frequency. Each note, assuming we're not tone deaf, corresponds to a certain frequency of sound. So the computer determines the y-position of each note while the human determines the note using his music skills. Then the human uses a conversion table, specifically the one in the table above to provide the computer a basis for transformation from note position to frequency. After all the notes are identified and transformed to Hz, the notes are inserted into a sine wave and played using the sound() function.

I guess it would also be necessary to discus some of the cheats I used to make this activity simpler. As you may have noticed, the transformation from the original image to the circles only image was not perfect. Here is the image again so you won't have to scroll so far.

I got all the notes in the G-clef. However, there is one extra circle in the second measure that comes from the bar in one of the notes. Also, there is an extraneous 'note' that comes from the person who arranged this piece, the u from Mr. Luke. These were easily removed by disregarding these fake notes.

Then, we have a bigger problem. The F-clef contains 3/4 notes which are not shaded. These were not included in the final image because the structuring element used was a solid circle. This was remedied by simply adding their corresponding frequencies to the array containing all the frequencies. The F-clef also adds another problem. I have yet to play more than one note at the same time. Piano teachers would kill me. I cheated by simply playing the notes individually. The result still sounds good since there are rests in the G-clef when there are notes in the F-clef. The downside however is that instead of having a background sound of the chord, there is none. An idea is to add the two sine waves with the different frequencies together. Will it give the same effect? I'm not sure since I haven't tried it yet. I think I can do it if I find I have more time. This would be an interesting extension of this activity. Finally, I was lucky to find a piece that contains only notes of the same duration, 1/8. Actually, there are 3/4 but I just converted them to 1/8 as well. Haha. Having different notes would require additional image processing to identify not only the tone but also the duration as well. But so far, I would like to believe that I have placed an effort beyond expected by using a not so easy piece, to some degree. Sound quality was slightly improved by adding short pauses between each note simply by adding a few zeros in the final matrix containing the sine waves. I would like to believe that I deserve a 10 for this activity.

Recent update.
Tor was very smart to suggest to me to upload the resulting sound.
It was easily done by using the 'savewave' function.
Enjoy :)

https://dl.dropboxusercontent.com/u/19434317/5th%20year/186/a12%20notes/sound.wav

Monday, August 19, 2013

Cure for Cancer. A11, Application of Binary Analysis

Ok, this won't really cure cancer by itself, but this is an exercise that could potentially be useful in differentiating abnormal cells.

First, we study an image of a scan containing similarly sized circles of paper randomly distributed on a sheet of paper shown below. This symbolizes the normal cells.

We check the histogram to find the proper threshold for transforming it to a binary image.

From this, we find that most of the image has a value less than around 210. This would represent the darker background while the the values greater than 210 would represent the lighter circles. 210/255 is around 0.82 which is what we use as a starting point for our binary threshold. I tried out threshold values from 0.81 to 0.87 with an increase of 0.02 shown below.

We find that there is a field of artifacts on the right side of each image which decreases as we increase the threshold. However, as the threshold is increased, some parts of the circle are also eaten away. A simple use of the morphological operation openimage takes away the artifacts on the right. I tried two types of structuring elements and multiple numbers of sizes for both. This took a lot of time since I was trying to remove all the artifacts but also keeping most of the circles complete. In the end, I decided to use 0.84 as my threshold shown in the leftmost image. The image on the center is the result of using a square structuring element with size one while the last uses the circle structuring element. I also kind of cheated a bit by simply cropping the top of the image off to remove the white line.

It appears that using a threshold of 0.84 and openimage with a circle structuring elemnt and size one removes all the artifacts but keeps most of the circles complete. Next, searchblobs from IPD was used to identify the blobs. It designates each connected white pixel a certain integer starting at one. Using this, 33 blobs were identified. However, a closer inspection shows that some of these blobs have zero pixels. I find this weird and address this problem by setting a limit in the size of the real blobs. Indeed there is a big discrepancy from the size of the blobs identified initially as shown in the histogram below.

This histogram gives us an idea of the relative abundance of blob sizes. We find that majority is in the 500 range. The big discrepancy is due to imperfections in the circles found. The small sizes are pixels that are probably unattached to the main circle. The larger blobs identified are for connected or overlapping circles. We zoom in to the part of the histogram where most of the blobs are found.

Finally, I settle for a range from 300 to 600 shown in the histogram below.

In this range, the mean is 492.2 while the stdev is 56.
Next, I take a break before applying the information gained here to isolate cancer cells represented by abnormally bigger cells in the next image.

After a merienda break, I apply the same settings to isolate each blob. I use im2bw with the same threshold of 0.84 and get the image below. Beside it is the result of applying the same cleaning method namely openimage with the same structuring element of circle size 1. Note the similar result of cleaning out all the artifacts on the right side of the image similar to the previous one.

Using filterbysize as suggested by Floyd, I find the normal sized cells shown below. The lower bound used was the mean-stdev and the upper bound mean+stdev.

We find that the result only contains the perfect circles. The incomplete circles and the circles that are clumped together are not included. This calls for better cleaning of the images. Anyway, let's focus on the goal of identifying the abnormally big circles. Lucky for us, the abnormally big circles in the given image do not overlap with other circles. We simply change the limits of our filterbysize to find them. Now I try setting the lower bound to mean+stdev while consider not to accept circles clumped together by setting setting the upper bound to 2xmean-stdev.

The result was not perfect but it greatly reduced the number of circles present. It also only reveals 3/5 of the larger circles and still contained a few other 'normal' circles. Now, I try again finding the best limits of the size and find a result shown below.

Here we find 4/5 larger cells and only get one extra normal cell. I just hope the final grade isn't proportional to the right - wrong circles. I used the limits m+d and 2m-d. The extra twin normal cells were included because the area they occupy are within the range that these larger cells do.

Again, I spent a lot of time on cleaning the image but still wasn't able to separate overlapping circles. I believe that this caused the poor results. I give myself an 8 for effort.

Sunday, August 11, 2013

Magic wand. A9, Color segmentation

I'm not sure what magic wand you're thinking of but what I'm talking about here is the tool in Photoshop. That's the first thing I think about when I read about this activity.

This time, we use a simple image recycled from an activity in 187. This is a very nice image with two solid colors standing out namely the red spectrometer on top of my green apple notebook.

this kind of recycling does not save the environment

The parametric process basically samples a certain region of interest which should more or less encompass the colors of the object of interest. We use a Gaussian distribution to determine the probability that each pixel is colored with the the region of interest. For the red spectrometer, we sample a part of it shown below and also the resulting image when the Gaussian distribution applied.

shiny red spectrometer

severely color blind

The resulting image is white for those that are similar in color to the region of interest. The lines on the right side are part of the orange wires from the original image. This means that it also contains some red values that are similar to our region of interest. We also notice that a small part on the right side of the spectrometer is black. This is because it has a slightly different color due to lighting probably which is beyond the reach of the distribution. However, the overall result is a good reproduction of the spectrometer.

We apply the same process but this time take a good portion of the notebook as a region of interest.

animo LaSalle!

prolly what you'll see if you had x-ray vision

Personally, I think this is another good result wherein the green notebook is identified almost completely. However, it also contains a few losses such as in the right side, again prolly because of a difference in lighting. Take note that the 'Green Apple' text is also black since it is already a different shade of green not included in our region of interest.

We also applied non-parametric segmentation as a comparison. This involves mapping the histogram of the region of interest and backprojecting the image with this histogram. I think all of us would agree that the process is much more tedious than the previous method since it involves mapping in a 2d histogram, which requires a loop within a loop and backprojecting which also requires the same process. It is a headache to code and for the computer as well since loops within loops take a lot of computer resources. Let's just see if the results justify the effort.

This time, we see that it has also identified the spectrometer but also beyond that. It has also identified the red inter-reflection on the shiny notebook. The result shows that the non-parametric method has a greater tolerance for colors that are already much different from the region of interest. However, this is not the case for the green notebook.

dirty notebook

This time, it is able to identify less than the entire notebook. There are spots on the left side of the notebook that are black. The only reason I can think of to explain this result is that the 2d histogram contains bins that are too large. It is unable to differentiate closely colored objects to the point that it identifies similarly colored objects as the same. For both coding and running speed and resulting image, the parametric method wins hands down.

For me, this is one of the more interesting topics. As I've discussed in a previous post, we never really think about what goes on behind photoshop. I wouldn't be surprised if Photoshop actually uses the parametric method in its magic wand tool. Before, it was truly magical for the computer to identify similarly colored pixels but now, the secret, the Physics is out and the magic wand tool just became a little less magical.

There are few things that can be done to extend this work such as to use a smaller bin size in the 2d histogram and to try it out on more images and colors. However, I think I did what has to be done, enough for me to give myself a 9. I would like to thank Anjali for helping me understand this activity.

Plastic Surgery. A10, morphological operations

I find it odd that it's the first time I heard of these morphological operations namely erosion and dilation. I guess what we know is the very basic which is resize, paliitin or palakihin. In essence, erosion is decreases while dilation increases the image or shape. However, it goes beyond just increasing or decreasing the size but also changes the shape of the image more into a certain structuring element. It actually feels very much like correlation- which results in an image that is in between two images. The difference with this is that there is a change in size as well.

Before we go into the code, we were asked to draw our expected results. While we're at it, I also demonstrated my drawing and coloring skills or lack of it. I'm tempted to ask you not to judge. The reactions of my friends would determine who my true friends are :P
The images below are scans of my guesses for the given shapes and structuring elements. The columns show the result for the certain structuring element which are from left to right a 2x2 square, 2x1 rectangle, 1x2 rectangle a 2 element diagonal and a 3x3 cross. The rows denote whether it undergoes erosion or dilation.

fyi, i did pass preschool

and that's why they invented ms paint

fyi, my eyesight is still 20-20

sa panahon ngayon, mahirap na talaga makahanap ng straight

Aside from my fine motor skills, I also apologize that some of my guesses are wrong as exhibited by the results in Scilab shown below. The first set shows the effects of erosion. The first column are the original shapes which are, again, a square, a 'triangle', square frame and cross. The other columns are the result of erosion with structural elements in the first row.

scilab skills > fine motor skills

And finally, below is dilation with the same format.

Comparing my expected results with the actual ones, we find that my guesses aren't so bad. I'm tempted to leave the comparison to you so you'd have to put additional effort to find the differences. Haha. Well, I'll give you some. For the square, I got most of the erosion right except the diagonal. Similarly, the only difference is the cross for the dilation. The dilation with the diagonal is also not spot on but close enough. The idea is there anyway. I don't really want to talk about the erosion of the triangle but I did get some right for dilation. There are slight differences for the square frame. Again, at least the idea is there. I have an idea on the general shape of the result. I didn't however, get the dimensions right. The dilation of the cross looks pretty good except that I mixed up the result with the square and cross structuring element. Again, for erosion, I know where to make bawas in the image but not how much so I didn't get the cross and diagonal.

Just a note especially to the kids. (as if the future generations would use my blog as a basis for their own. well, you never know. well if you are, you're doing it wrong. visit some of my more hardworking batchmates' blogs! hahaha) I initially had a problem especially with the solid square. Apparently, the original image needed black spaces. I originally made a solid white 5x5 square. I got quite frustrated since nothing was happening no matter what I dilate or erode it with.

I have to say, this activity is a bit different from the others. For one, we were asked to draw. It's not the most interesting topic but I still like it because it is relatively shorter and easier to understand than the others. I give myself a 9 since I believe I made a solid effort for a straightforward activity. I'm also proud to say that I did this activity all by myself! Although, I did get a little help from Eric- I asked him for graphing paper. Ironically, I only got one piece from him and made mistakes on it. Good thing I had some more around the house. Thanks Eric! :D