We've seen computers do all sorts of things. Just here in our journey through AP 186, we've even coded a few interesting things ourselves. We've improved the contrast of a very dark image and saved a wedding picture, removed the lines on a picture of a crater of the moon and even taught the computer to read and play musical notes on a staff. However, this final activity before the project could just probably the most interesting, at least in concept. We will be teaching the computer to learn.
I have to admit that I haven't completely grasped the whole concept of Neural Networks but here is a summary based on how I understand it. We give the code certain groups of characteristics that belong to one of a few objects and then tell the code that this belongs to this object. For example, you have a bunch of fruits. We put in the characteristics of a banana and tell the computer that this is a banana. This is part of the learning phase. Then we have a few more lines that processes this input. The code gives weights to each of the input so that it can determine the proper output. Finally, we have the testing phase wherein we input certain characteristics again and hope that the code would output the proper fruit.
We were given the neural network code of an AND logic gate. The learning input are:
[0 0] [0 1] [1 0] and [1 1]
and the expected output used for the learning is:
[0 0 0 1]
That is, for the corresponding learning input, the expected output should only be 1 for the fourth learning input which is both 1.
Using the learning input for the testing input, we get the result, we get
0.0054392 0.0304290 0.0291350 0.9501488
which could be approximated to [0 0 0 1]
Little by little, we can play with this code just to see how well this computer learned how to learn.
Let's try mixing up the input.
[1 1] [0 0] [0 1] [1 1]
The computer shows that this is no problem for it giving us a result of
0.9501488 0.0054392 0.0291350 0.9501488
or [1 0 0 1]
as expected.
Commercial break.
I just suddenly noticed that this is probably the least graphic project which has no figures. Don't worry though. I'll be using Neural Networks for my final project which is very graphic...
Now, we try to make something a bit more difficult.
How about and XOR logic gate. It outputs 1 if only 1 digit is 1. You got that?
If not, learn it at the same time as the neural network code does, by example.
0 0 = 0
0 1 = 1
1 0 = 1
1 1 = 0
The result is 1 if and only if 1 digit is 1.
get it now?
because the computer has already
and gives us this output
0.0473429 0.4593762 0.9103764 0.4669348
which is right, right?
wrong.
well, sort of.
The expected output is already given above.
However, the computer didn't get it completely right.
I relate this to people guessing.
It's as if the computer is just 50% sure that the second and last input are 1.
A short chat with Chester gives me valuable insight. He suggests changing the learning rate and/or the training cycles. He said that the learning rate changes the sensitivity of the neural network to the weights.
Indeed changing the learning rate from 2.5 to 5 gives us
0.0304071 0.9671456 0.9653771 0.0425667
which were the results we were expecting.
Right? Right.
Moving on with using neural networks for the pattern recognition activity. This is where neural networks come in handy. The input characteristics I use now are the R G and B values of the bills. Then of course, I also input 0 for 1000 peso bills and 1 for 500 peso bills as the expected output. Of course we can easily find the output of basic logic gates but now, I challenge you to identify the bill just by the RGB values which the neural network is able to do to some extent.
At a learning rate of 10 and 400 learning cycles, the output is
0.0000802
0.0233898
0.0057143
0.5679243
0.2478101
0.0000542
0.0067900
0.9166035
0.9996348
0.9857239
0.9996027
0.9979951
0.9994319
0.7868495
which is a good approximation of the expected 0 0 0 0 0 0 0 1 1 1 1 1 1 1
since the first 7 test input were 1000 and the rest were 500 peso bills.
There were a few discrepancies though such as the third test input where the computer is pretty unsure about the classification. There were also a few less unsure answers such as the fourth and the last input. Although the value they have is still obviously closer to the right answer.
In summary,
1/14 was 50-50
2/14 were 70-80% correct
while the rest, 11/14 were spot on.
Errors could come from the fact that some of the test values are near the values of the other object. That is, some 1000 peso bills have RGB values similar to that of the 500 peso bill.
I give myself a 9.
Aphi 186 Image processing blog ni Josh
Tuesday, October 1, 2013
Friday, September 20, 2013
for rich or richer. A14 Pattern recognition
For this experiment, I will attempt to classify something that most UP students only see during enrollment, 500 and 1000 peso bills. As a UP student in the middle of September, I don't have such bills so I just use the first few good results from Google. Being lazy, I just get all the types available, front and back, new and old. I cropped the images to contain only the bill and no background. I did not alter the image in any other way, specifically the color or brightness. I then used half of the total amount, around 7/14 of the pics I found for each for learning, and the other half for testing.
From the images reserved from learning, I extract the red, green and blue components easily and find the mean values. From intuition, it is expected that the blue 1000 peso bills should have a higher mean value for blue while the yellow 500 peso bills should have more of the green. I plot these mean values for each test subject in 3d shown as the solid circles. The color of the circles conveniently represent the color of the bills, blue for 1000 and yellow for 500. The resulting plot shows that the 1000 peso bills are roughly grouped together and the 500 peso bills are found in another cluster. The test bills were similarly plotted. As expected, the 3d plot of the test bills shown as the hollow circles are found in their corresponding clusters. Next, the mean of the mean of the R, G and B values of the 1000 and 500 peso bills were calculated and plotted as well represented by the stars with the bill with the same color. We find each at the center of their corresponding clusters. The test bills were then classified by using the distance formula from these two stars. If the tested bill is closer to the yellow star, it is a 500 peso bill. Otherwise, it is a 1000 peso bill. This method applied to this set of bills yield an accuracy of 100%. The code is quite lengthy but is actually just a bit repetitive. You can try it out if you don't believe me. However, you don't have the pictures of the bills. You can contact me if you want a copy. A few snapshots of the final 3d plot from different angles are shown below. You can simply run the code to play around with the 3d plot for yourself. I find myself spending quite some time doing that. Because everything is better in 3d.
The part of this experiment where I learned the most is in 3d plotting. I spent quite some time figuring the 3d plot out and how to make it look presentable. Just to clarify, the x axis is the red value, the y is the green and the z is blue. The degree to which the data are scattered can be attributed to the fact that different kinds of bills were used. Again, I didn't care whether it was a picture of the front or back of an old or a new bill. However, this could also be considered as a strength of this code wherein it is not limited in identifying a specific type of bill only. The downside is that there is a part of the plot wherein the two clusters are almost touching already. I think this could be attributed to the fact that some of the bills, especially the front of the old bills contain a lot of white. The code can further be tested by adding another bill to be classified. I recommend 50 peso bills which are red so it shouldn't be confused with the yellow 500. However, I blame the lack of internet at the moment for the reason why I can't download the pictures of 50 peso bills and add it to the code. In the end, I give myself a 9 because I believe I was able to do what was required satisfactorily and was even able to give a good visual representation of the classification. I am also happy that somehow, I was also able to incorporate some basic normalization techniques for the color values I learned from AP187.
bn = 'C:\Users\Shua\Desktop\p1000\p1000'
mr1 = []
mg1 = []
mb1 = []
mr5 = []
mg5 = []
mb5 = []
cmr1 = []
cmg1 = []
cmb1 = []
cmr5 = []
cmg5 = []
cmb5 = []
//learn blue 1000
for i = 1:7
a = imread([bn + string(i) + '.jpg'])
a = double(a)
mr1 = [mr1 mean(a(:,:,1))]
mg1 = [mg1 mean(a(:,:,2))]
mb1 = [mb1 mean(a(:,:,3))]
end
mmr1 = mean(mr1)
mmg1 = mean(mg1)
mmb1 = mean(mb1)
tot1 = mr1 + mg1 + mb1
mtot1 = mean(tot1)
param3d(mr1./tot1,mg1./tot1,mb1./tot1)
p = get('hdl')
p.line_mode = 'off'
p.mark_mode = 'on'
p.mark_size = 1
p.mark_foreground = 2
param3d(mmr1./mtot1,mmg1./mtot1,mmb1./mtot1)
p = get('hdl')
p.mark_style = 14
p.mark_size = 4
p.mark_foreground = 2
//learn yellow 500
bn = 'C:\Users\Shua\Desktop\p500\p500'
for i = 1:7
a = imread([bn + string(i) + '.jpg'])
a = double(a)
mr5 = [mr5 mean(a(:,:,1))]
mg5 = [mg5 mean(a(:,:,2))]
mb5 = [mb5 mean(a(:,:,3))]
end
mmr5 = mean(mr5)
mmg5 = mean(mg5)
mmb5 = mean(mb5)
tot5 = mr5 + mg5 + mb5
mtot5 = mean(tot5)
param3d(mr5./tot5,mg5./tot5,mb5./tot5)
p = get('hdl')
p.line_mode = 'off'
p.mark_mode = 'on'
p.mark_size = 1
p.mark_foreground = 7
param3d(mmr5./mtot5,mmg5./mtot5,mmb5./mtot5)
p = get('hdl')
p.mark_style = 14
p.mark_size = 4
p.mark_foreground = 7
//check 1000
bn = 'C:\Users\Shua\Desktop\p1000\c1'
for i = 1:7
a = imread([bn + string(i) + '.jpg'])
a = double(a)
cmr1 = [cmr1 mean(a(:,:,1))]
cmg1 = [cmg1 mean(a(:,:,2))]
cmb1 = [cmb1 mean(a(:,:,3))]
end
ctot1 = cmr1 + cmg1 + cmb1
param3d(cmr1./ctot1,cmg1./ctot1,cmb1./ctot1)
p = get('hdl')
p.line_mode = 'off'
p.mark_style = 9
p.mark_size = 1
p.mark_foreground = 2
//distance formula
dr121 = cmr1./ctot1-mmr1./mtot1
dg121 = cmg1./ctot1-mmg1./mtot1
db121 = cmb1./ctot1-mmb1./mtot1
d121 = sqrt(dr121^2 + dg121^2 + db121^2)
dr125 = cmr1./ctot1-mmr5./mtot5
dg125 = cmg1./ctot1-mmg5./mtot5
db125 = cmb1./ctot1-mmb5./mtot5
d125 = sqrt(dr125^2 + dg125^2 + db125^2)
//check 500
bn = 'C:\Users\Shua\Desktop\p500\c'
for i = 1:7
a = imread([bn + string(i) + '.jpg'])
a = double(a)
cmr5 = [cmr5 mean(a(:,:,1))]
cmg5 = [cmg5 mean(a(:,:,2))]
cmb5 = [cmb5 mean(a(:,:,3))]
end
ctot5 = cmr5 + cmg5 + cmb5
param3d(cmr5./ctot5,cmg5./ctot5,cmb5./ctot5)
p = get('hdl')
p.line_mode = 'off'
p.mark_style = 9
p.mark_size = 1
p.mark_foreground = 7
dr521 = cmr5./ctot5-mmr1./mtot1
dg521 = cmg5./ctot5-mmg1./mtot1
db521 = cmb5./ctot5-mmb1./mtot1
d521 = sqrt(dr521^2 + dg521^2 + db521^2)
dr525 = cmr5./ctot5-mmr5./mtot5
dg525 = cmg5./ctot5-mmg5./mtot5
db525 = cmb5./ctot5-mmb5./mtot5
d525 = sqrt(dr525^2 + dg525^2 + db525^2)
Thursday, September 5, 2013
Comprehending Compression. A13, Image compression
Pictures are everywhere. It has become so common today that many of us from our generation would probably take at least a picture a day. In fact, this blog entry would probably contain at least 5 images, which is the average number of pictures in this blog. Just imagine all the billions of photos uploaded to Facebook and the amount of disk space that is required to store all of these. We would probably need new technologies and super fast internet to support all the information in each picture or just use a file compression.
The concept of compression comes from the idea that you can represent the same file or picture with less information to be stored. In this activity we use PCA or principal components analysis. From my understanding, the most common way to compress images is through Fourier transform. It states that any function is a sum of sines and cosines. The same is true for pictures which is represented by a matrix of values. We know the Fourier transform of Dirac deltas are sine waves. Superimposing various sine waves of different frequencies and along the horizontal and vertical axes would allow you to reconstruct any image. However, this would most likely require a large amount of data to be stored as well if you reconstruct an image exactly the same through this method. What is interesting is how just a few iterations would result in a good reconstruction of the original already. The rest of the information for more sine waves could be discarded to save on space. The result may not be exactly the same as the original but depending on the amount of information stored, would come pretty close.
We have an image of my uncle's cat for the same reason my uncle brought him to our house one evening. None.
Thanks to the miracle of image compression and broadband internet, we wouldn't have to wait so long for it to upload and you to download his serious face :|
Next, to simplify matters, we take the grayscale of the image. This will also be the comparison for our file size which is 147 KB.
Next, we cut the image into small 10x10 sub-images. It is intuitive that a 10x10 sub image would be easier to reconstruct than the whole 600x800 image. It would probably need an infinite amount of eigenfuctions to reconstruct which is counter productive. We use cumsum on lambda to find the threshold required. We find the 99% threshold at the 30th lambda so we use the 1st 30 eigenfunctions. I am thankful to mam for providing the code. A little modifications and we get a working code with this result with some help from Nestor.
The concept of compression comes from the idea that you can represent the same file or picture with less information to be stored. In this activity we use PCA or principal components analysis. From my understanding, the most common way to compress images is through Fourier transform. It states that any function is a sum of sines and cosines. The same is true for pictures which is represented by a matrix of values. We know the Fourier transform of Dirac deltas are sine waves. Superimposing various sine waves of different frequencies and along the horizontal and vertical axes would allow you to reconstruct any image. However, this would most likely require a large amount of data to be stored as well if you reconstruct an image exactly the same through this method. What is interesting is how just a few iterations would result in a good reconstruction of the original already. The rest of the information for more sine waves could be discarded to save on space. The result may not be exactly the same as the original but depending on the amount of information stored, would come pretty close.
We have an image of my uncle's cat for the same reason my uncle brought him to our house one evening. None.
What are you looking at?
Thanks to the miracle of image compression and broadband internet, we wouldn't have to wait so long for it to upload and you to download his serious face :|
Next, to simplify matters, we take the grayscale of the image. This will also be the comparison for our file size which is 147 KB.
don't try anything funny. I'm watching you.
Next, we cut the image into small 10x10 sub-images. It is intuitive that a 10x10 sub image would be easier to reconstruct than the whole 600x800 image. It would probably need an infinite amount of eigenfuctions to reconstruct which is counter productive. We use cumsum on lambda to find the threshold required. We find the 99% threshold at the 30th lambda so we use the 1st 30 eigenfunctions. I am thankful to mam for providing the code. A little modifications and we get a working code with this result with some help from Nestor.
I said don't try anything funny!
If we panic, we might think that we did something wrong and try to write the code from scratch. Don't worry because a quick check of the matrix of the reconstructed image shows weird values and then a quick elementary normalization of the values gives us this. The original grayscale image is shown below the reconstructed as a quick comparison.
spot the difference
Visually, it is impossible to tell the difference between the two but there are some significant differences, trust me. first of all is the file size. We have 106/147 KB which is 72% of the original. Not bad especially if they appear exactly the same. And to prove that they aren't exactly the same, here is the difference of the values of the reconstruction and the original in image form.
ktnxbye
I believe I was able to do the the activity sufficiently and give myself a 9.
Thursday, August 29, 2013
Musical Blobs. A12 Playing Notes by Image Processing
Who would have thought that image processing is not just for images but also for music? Well, humans can read notes and process it in their brains and sing the note or play it on an instrument so why can't a computer do so as well?
First, a music sheet is required. The song choice has something to do with my dream of wanting to learn to play the piano. I've already downloaded a few music sheets before and this song, 'Fireflies' by Owl City came to mind. I love this song and I hope you like it too :)
http://www.youtube.com/watch?v=psuRGfAaju4
Actually, the activity requires a simple music sheet. Being confident with my musical and 186 skills, I wanted to challenge myself. I also didn't want to play the usual nursery rhymes that my batch mates would probably have chosen such as the song about Mary's lamb or a twinkling star. No, I wanted to play a song about fireflies. Haha. Just for variety.
The main part of the activity was to determine the tones denoted by the notes involved in the song which involves the image processing part. The process can be outlined in 3 steps which is described by the images in the figure below.
I simply loaded the image of the music sheet to scilab and cropped the part that I will be using which is the first line of the song. For convenience, I cropped out the G-clef already but unfortunately couldn't remove the 'arranged by'. Thank you Mr Luke Erickson. I converted the loaded image to binary using im2bw with a threshold of 0.8. I did a few experiments on this using trial and error and found that a threshold of 0.8 appeared to contain most of the information required but not add extra artifacts. The result is the first line in the figure above. Then, I actually forgot the second step which was to invert the image. I forgot that the morphological operators work on a white object in a black background. This was easily done by getting the inverse of the boolean matrix. In scilab, the syntax is simply ~M. The result is the second line of the figure above. Finally, the third line in the figure is the result of applying OpenImage with a structuring element of a circle. Again, the amazing method of Trial And Error (TAE) was used to find the best size of the circle of the structuring element. The choice of a circle was pretty common sense. We want to retain the circular elements in the image. These denote what note is to be played. With the help of SearchBlob, each note was identified by finding their location in space, specifically the y pixel where they exist. Since each note line and space denotes a single tone and also due to the fact that the lines are horizontal, the y pixel can automatically be converted to a tone. This is where the musical side of the activity begins.
http://www.phy.mtu.edu/~suits/notefreqs.html
First, a music sheet is required. The song choice has something to do with my dream of wanting to learn to play the piano. I've already downloaded a few music sheets before and this song, 'Fireflies' by Owl City came to mind. I love this song and I hope you like it too :)
http://www.youtube.com/watch?v=psuRGfAaju4
Actually, the activity requires a simple music sheet. Being confident with my musical and 186 skills, I wanted to challenge myself. I also didn't want to play the usual nursery rhymes that my batch mates would probably have chosen such as the song about Mary's lamb or a twinkling star. No, I wanted to play a song about fireflies. Haha. Just for variety.
The main part of the activity was to determine the tones denoted by the notes involved in the song which involves the image processing part. The process can be outlined in 3 steps which is described by the images in the figure below.
I simply loaded the image of the music sheet to scilab and cropped the part that I will be using which is the first line of the song. For convenience, I cropped out the G-clef already but unfortunately couldn't remove the 'arranged by'. Thank you Mr Luke Erickson. I converted the loaded image to binary using im2bw with a threshold of 0.8. I did a few experiments on this using trial and error and found that a threshold of 0.8 appeared to contain most of the information required but not add extra artifacts. The result is the first line in the figure above. Then, I actually forgot the second step which was to invert the image. I forgot that the morphological operators work on a white object in a black background. This was easily done by getting the inverse of the boolean matrix. In scilab, the syntax is simply ~M. The result is the second line of the figure above. Finally, the third line in the figure is the result of applying OpenImage with a structuring element of a circle. Again, the amazing method of Trial And Error (TAE) was used to find the best size of the circle of the structuring element. The choice of a circle was pretty common sense. We want to retain the circular elements in the image. These denote what note is to be played. With the help of SearchBlob, each note was identified by finding their location in space, specifically the y pixel where they exist. Since each note line and space denotes a single tone and also due to the fact that the lines are horizontal, the y pixel can automatically be converted to a tone. This is where the musical side of the activity begins.
G#2/Ab2 | 103.83 | 332. |
A2 | 110.00 | 314. |
A#2/Bb2 | 116.54 | 296. |
B2 | 123.47 | 279. |
C3 | 130.81 | 264. |
C#3/Db3 | 138.59 | 249. |
D3 | 146.83 | 235. |
D#3/Eb3 | 155.56 | 222. |
E3 | 164.81 | 209. |
F3 | 174.61 | 198. |
F#3/Gb3 | 185.00 | 186. |
G3 | 196.00 | 176. |
G#3/Ab3 | 207.65 | 166. |
A3 | 220.00 | 157. |
A#3/Bb3 | 233.08 | 148. |
B3 | 246.94 | 140. |
C4 | 261.63 | 132. |
C#4/Db4 | 277.18 | 124. |
D4 | 293.66 | 117. |
D#4/Eb4 | 311.13 | 111. |
E4 | 329.63 | 105. |
F4 | 349.23 | 98.8 |
F#4/Gb4 | 369.99 | 93.2 |
G4 | 392.00 | 88.0 |
G#4/Ab4 | 415.30 | 83.1 |
A4 | 440.00 | 78.4 |
A#4/Bb4 | 466.16 | 74.0 |
B4 | 493.88 | 69.9 |
C5 | 523.25 | 65.9 |
C#5/Db5 | 554.37 | 62.2 |
D5 | 587.33 | 58.7 |
D#5/Eb5 | 622.25 | 55.4 |
E5 | 659.26 | 52.3 |
F5 | 698.46 | 49.4 |
F#5/Gb5 | 739.99 | 46.6 |
G5 | 783.99 | 44.0 |
G#5/Ab5 | 830.61 | 41.5 |
A5 | 880.00 | 39.2 |
A#5/Bb5 | 932.33 | 37.0 |
Although humans can easily understand notes, computers are not as musical. We have an idea of what a middle C sounds like and can easily reproduce the whole scale after hearing only a reference of one note or by playing it in an instrument. Some people can even find the perfect pitch without a reference. The computer on the other hand speaks a more Physics language, in frequency. Each note, assuming we're not tone deaf, corresponds to a certain frequency of sound. So the computer determines the y-position of each note while the human determines the note using his music skills. Then the human uses a conversion table, specifically the one in the table above to provide the computer a basis for transformation from note position to frequency. After all the notes are identified and transformed to Hz, the notes are inserted into a sine wave and played using the sound() function.
I guess it would also be necessary to discus some of the cheats I used to make this activity simpler. As you may have noticed, the transformation from the original image to the circles only image was not perfect. Here is the image again so you won't have to scroll so far.
I got all the notes in the G-clef. However, there is one extra circle in the second measure that comes from the bar in one of the notes. Also, there is an extraneous 'note' that comes from the person who arranged this piece, the u from Mr. Luke. These were easily removed by disregarding these fake notes.
Then, we have a bigger problem. The F-clef contains 3/4 notes which are not shaded. These were not included in the final image because the structuring element used was a solid circle. This was remedied by simply adding their corresponding frequencies to the array containing all the frequencies. The F-clef also adds another problem. I have yet to play more than one note at the same time. Piano teachers would kill me. I cheated by simply playing the notes individually. The result still sounds good since there are rests in the G-clef when there are notes in the F-clef. The downside however is that instead of having a background sound of the chord, there is none. An idea is to add the two sine waves with the different frequencies together. Will it give the same effect? I'm not sure since I haven't tried it yet. I think I can do it if I find I have more time. This would be an interesting extension of this activity. Finally, I was lucky to find a piece that contains only notes of the same duration, 1/8. Actually, there are 3/4 but I just converted them to 1/8 as well. Haha. Having different notes would require additional image processing to identify not only the tone but also the duration as well. But so far, I would like to believe that I have placed an effort beyond expected by using a not so easy piece, to some degree. Sound quality was slightly improved by adding short pauses between each note simply by adding a few zeros in the final matrix containing the sine waves. I would like to believe that I deserve a 10 for this activity.
Recent update.
Tor was very smart to suggest to me to upload the resulting sound.
It was easily done by using the 'savewave' function.
Enjoy :)
https://dl.dropboxusercontent.com/u/19434317/5th%20year/186/a12%20notes/sound.wav
Tor was very smart to suggest to me to upload the resulting sound.
It was easily done by using the 'savewave' function.
Enjoy :)
https://dl.dropboxusercontent.com/u/19434317/5th%20year/186/a12%20notes/sound.wav
Monday, August 19, 2013
Cure for Cancer. A11, Application of Binary Analysis
Ok, this won't really cure cancer by itself, but this is an exercise that could potentially be useful in differentiating abnormal cells.
First, we study an image of a scan containing similarly sized circles of paper randomly distributed on a sheet of paper shown below. This symbolizes the normal cells.
We check the histogram to find the proper threshold for transforming it to a binary image.
From this, we find that most of the image has a value less than around 210. This would represent the darker background while the the values greater than 210 would represent the lighter circles. 210/255 is around 0.82 which is what we use as a starting point for our binary threshold. I tried out threshold values from 0.81 to 0.87 with an increase of 0.02 shown below.
We find that there is a field of artifacts on the right side of each image which decreases as we increase the threshold. However, as the threshold is increased, some parts of the circle are also eaten away. A simple use of the morphological operation openimage takes away the artifacts on the right. I tried two types of structuring elements and multiple numbers of sizes for both. This took a lot of time since I was trying to remove all the artifacts but also keeping most of the circles complete. In the end, I decided to use 0.84 as my threshold shown in the leftmost image. The image on the center is the result of using a square structuring element with size one while the last uses the circle structuring element. I also kind of cheated a bit by simply cropping the top of the image off to remove the white line.
It appears that using a threshold of 0.84 and openimage with a circle structuring elemnt and size one removes all the artifacts but keeps most of the circles complete. Next, searchblobs from IPD was used to identify the blobs. It designates each connected white pixel a certain integer starting at one. Using this, 33 blobs were identified. However, a closer inspection shows that some of these blobs have zero pixels. I find this weird and address this problem by setting a limit in the size of the real blobs. Indeed there is a big discrepancy from the size of the blobs identified initially as shown in the histogram below.
This histogram gives us an idea of the relative abundance of blob sizes. We find that majority is in the 500 range. The big discrepancy is due to imperfections in the circles found. The small sizes are pixels that are probably unattached to the main circle. The larger blobs identified are for connected or overlapping circles. We zoom in to the part of the histogram where most of the blobs are found.
Finally, I settle for a range from 300 to 600 shown in the histogram below.
In this range, the mean is 492.2 while the stdev is 56.
Next, I take a break before applying the information gained here to isolate cancer cells represented by abnormally bigger cells in the next image.
After a merienda break, I apply the same settings to isolate each blob. I use im2bw with the same threshold of 0.84 and get the image below. Beside it is the result of applying the same cleaning method namely openimage with the same structuring element of circle size 1. Note the similar result of cleaning out all the artifacts on the right side of the image similar to the previous one.
Using filterbysize as suggested by Floyd, I find the normal sized cells shown below. The lower bound used was the mean-stdev and the upper bound mean+stdev.
We find that the result only contains the perfect circles. The incomplete circles and the circles that are clumped together are not included. This calls for better cleaning of the images. Anyway, let's focus on the goal of identifying the abnormally big circles. Lucky for us, the abnormally big circles in the given image do not overlap with other circles. We simply change the limits of our filterbysize to find them. Now I try setting the lower bound to mean+stdev while consider not to accept circles clumped together by setting setting the upper bound to 2xmean-stdev.
The result was not perfect but it greatly reduced the number of circles present. It also only reveals 3/5 of the larger circles and still contained a few other 'normal' circles. Now, I try again finding the best limits of the size and find a result shown below.
Here we find 4/5 larger cells and only get one extra normal cell. I just hope the final grade isn't proportional to the right - wrong circles. I used the limits m+d and 2m-d. The extra twin normal cells were included because the area they occupy are within the range that these larger cells do.
Again, I spent a lot of time on cleaning the image but still wasn't able to separate overlapping circles. I believe that this caused the poor results. I give myself an 8 for effort.
First, we study an image of a scan containing similarly sized circles of paper randomly distributed on a sheet of paper shown below. This symbolizes the normal cells.
We check the histogram to find the proper threshold for transforming it to a binary image.
From this, we find that most of the image has a value less than around 210. This would represent the darker background while the the values greater than 210 would represent the lighter circles. 210/255 is around 0.82 which is what we use as a starting point for our binary threshold. I tried out threshold values from 0.81 to 0.87 with an increase of 0.02 shown below.
We find that there is a field of artifacts on the right side of each image which decreases as we increase the threshold. However, as the threshold is increased, some parts of the circle are also eaten away. A simple use of the morphological operation openimage takes away the artifacts on the right. I tried two types of structuring elements and multiple numbers of sizes for both. This took a lot of time since I was trying to remove all the artifacts but also keeping most of the circles complete. In the end, I decided to use 0.84 as my threshold shown in the leftmost image. The image on the center is the result of using a square structuring element with size one while the last uses the circle structuring element. I also kind of cheated a bit by simply cropping the top of the image off to remove the white line.
It appears that using a threshold of 0.84 and openimage with a circle structuring elemnt and size one removes all the artifacts but keeps most of the circles complete. Next, searchblobs from IPD was used to identify the blobs. It designates each connected white pixel a certain integer starting at one. Using this, 33 blobs were identified. However, a closer inspection shows that some of these blobs have zero pixels. I find this weird and address this problem by setting a limit in the size of the real blobs. Indeed there is a big discrepancy from the size of the blobs identified initially as shown in the histogram below.
This histogram gives us an idea of the relative abundance of blob sizes. We find that majority is in the 500 range. The big discrepancy is due to imperfections in the circles found. The small sizes are pixels that are probably unattached to the main circle. The larger blobs identified are for connected or overlapping circles. We zoom in to the part of the histogram where most of the blobs are found.
Finally, I settle for a range from 300 to 600 shown in the histogram below.
In this range, the mean is 492.2 while the stdev is 56.
Next, I take a break before applying the information gained here to isolate cancer cells represented by abnormally bigger cells in the next image.
After a merienda break, I apply the same settings to isolate each blob. I use im2bw with the same threshold of 0.84 and get the image below. Beside it is the result of applying the same cleaning method namely openimage with the same structuring element of circle size 1. Note the similar result of cleaning out all the artifacts on the right side of the image similar to the previous one.
Using filterbysize as suggested by Floyd, I find the normal sized cells shown below. The lower bound used was the mean-stdev and the upper bound mean+stdev.
We find that the result only contains the perfect circles. The incomplete circles and the circles that are clumped together are not included. This calls for better cleaning of the images. Anyway, let's focus on the goal of identifying the abnormally big circles. Lucky for us, the abnormally big circles in the given image do not overlap with other circles. We simply change the limits of our filterbysize to find them. Now I try setting the lower bound to mean+stdev while consider not to accept circles clumped together by setting setting the upper bound to 2xmean-stdev.
The result was not perfect but it greatly reduced the number of circles present. It also only reveals 3/5 of the larger circles and still contained a few other 'normal' circles. Now, I try again finding the best limits of the size and find a result shown below.
Here we find 4/5 larger cells and only get one extra normal cell. I just hope the final grade isn't proportional to the right - wrong circles. I used the limits m+d and 2m-d. The extra twin normal cells were included because the area they occupy are within the range that these larger cells do.
Again, I spent a lot of time on cleaning the image but still wasn't able to separate overlapping circles. I believe that this caused the poor results. I give myself an 8 for effort.
Sunday, August 11, 2013
Magic wand. A9, Color segmentation
I'm not sure what magic wand you're thinking of but what I'm talking about here is the tool in Photoshop. That's the first thing I think about when I read about this activity.
This time, we use a simple image recycled from an activity in 187. This is a very nice image with two solid colors standing out namely the red spectrometer on top of my green apple notebook.
The parametric process basically samples a certain region of interest which should more or less encompass the colors of the object of interest. We use a Gaussian distribution to determine the probability that each pixel is colored with the the region of interest. For the red spectrometer, we sample a part of it shown below and also the resulting image when the Gaussian distribution applied.
This time, we use a simple image recycled from an activity in 187. This is a very nice image with two solid colors standing out namely the red spectrometer on top of my green apple notebook.
this kind of recycling does not save the environment
The parametric process basically samples a certain region of interest which should more or less encompass the colors of the object of interest. We use a Gaussian distribution to determine the probability that each pixel is colored with the the region of interest. For the red spectrometer, we sample a part of it shown below and also the resulting image when the Gaussian distribution applied.
shiny red spectrometer
severely color blind
The resulting image is white for those that are similar in color to the region of interest. The lines on the right side are part of the orange wires from the original image. This means that it also contains some red values that are similar to our region of interest. We also notice that a small part on the right side of the spectrometer is black. This is because it has a slightly different color due to lighting probably which is beyond the reach of the distribution. However, the overall result is a good reproduction of the spectrometer.
We apply the same process but this time take a good portion of the notebook as a region of interest.
animo LaSalle!
prolly what you'll see if you had x-ray vision
Personally, I think this is another good result wherein the green notebook is identified almost completely. However, it also contains a few losses such as in the right side, again prolly because of a difference in lighting. Take note that the 'Green Apple' text is also black since it is already a different shade of green not included in our region of interest.
We also applied non-parametric segmentation as a comparison. This involves mapping the histogram of the region of interest and backprojecting the image with this histogram. I think all of us would agree that the process is much more tedious than the previous method since it involves mapping in a 2d histogram, which requires a loop within a loop and backprojecting which also requires the same process. It is a headache to code and for the computer as well since loops within loops take a lot of computer resources. Let's just see if the results justify the effort.
This time, we see that it has also identified the spectrometer but also beyond that. It has also identified the red inter-reflection on the shiny notebook. The result shows that the non-parametric method has a greater tolerance for colors that are already much different from the region of interest. However, this is not the case for the green notebook.
dirty notebook
This time, it is able to identify less than the entire notebook. There are spots on the left side of the notebook that are black. The only reason I can think of to explain this result is that the 2d histogram contains bins that are too large. It is unable to differentiate closely colored objects to the point that it identifies similarly colored objects as the same. For both coding and running speed and resulting image, the parametric method wins hands down.
For me, this is one of the more interesting topics. As I've discussed in a previous post, we never really think about what goes on behind photoshop. I wouldn't be surprised if Photoshop actually uses the parametric method in its magic wand tool. Before, it was truly magical for the computer to identify similarly colored pixels but now, the secret, the Physics is out and the magic wand tool just became a little less magical.
There are few things that can be done to extend this work such as to use a smaller bin size in the 2d histogram and to try it out on more images and colors. However, I think I did what has to be done, enough for me to give myself a 9. I would like to thank Anjali for helping me understand this activity.
Plastic Surgery. A10, morphological operations
I find it odd that it's the first time I heard of these morphological operations namely erosion and dilation. I guess what we know is the very basic which is resize, paliitin or palakihin. In essence, erosion is decreases while dilation increases the image or shape. However, it goes beyond just increasing or decreasing the size but also changes the shape of the image more into a certain structuring element. It actually feels very much like correlation- which results in an image that is in between two images. The difference with this is that there is a change in size as well.
Before we go into the code, we were asked to draw our expected results. While we're at it, I also demonstrated my drawing and coloring skills or lack of it. I'm tempted to ask you not to judge. The reactions of my friends would determine who my true friends are :P
The images below are scans of my guesses for the given shapes and structuring elements. The columns show the result for the certain structuring element which are from left to right a 2x2 square, 2x1 rectangle, 1x2 rectangle a 2 element diagonal and a 3x3 cross. The rows denote whether it undergoes erosion or dilation.
Before we go into the code, we were asked to draw our expected results. While we're at it, I also demonstrated my drawing and coloring skills or lack of it. I'm tempted to ask you not to judge. The reactions of my friends would determine who my true friends are :P
The images below are scans of my guesses for the given shapes and structuring elements. The columns show the result for the certain structuring element which are from left to right a 2x2 square, 2x1 rectangle, 1x2 rectangle a 2 element diagonal and a 3x3 cross. The rows denote whether it undergoes erosion or dilation.
fyi, i did pass preschool
and that's why they invented ms paint
fyi, my eyesight is still 20-20
sa panahon ngayon, mahirap na talaga makahanap ng straight
Aside from my fine motor skills, I also apologize that some of my guesses are wrong as exhibited by the results in Scilab shown below. The first set shows the effects of erosion. The first column are the original shapes which are, again, a square, a 'triangle', square frame and cross. The other columns are the result of erosion with structural elements in the first row.
scilab skills > fine motor skills
And finally, below is dilation with the same format.
Comparing my expected results with the actual ones, we find that my guesses aren't so bad. I'm tempted to leave the comparison to you so you'd have to put additional effort to find the differences. Haha. Well, I'll give you some. For the square, I got most of the erosion right except the diagonal. Similarly, the only difference is the cross for the dilation. The dilation with the diagonal is also not spot on but close enough. The idea is there anyway. I don't really want to talk about the erosion of the triangle but I did get some right for dilation. There are slight differences for the square frame. Again, at least the idea is there. I have an idea on the general shape of the result. I didn't however, get the dimensions right. The dilation of the cross looks pretty good except that I mixed up the result with the square and cross structuring element. Again, for erosion, I know where to make bawas in the image but not how much so I didn't get the cross and diagonal.
Just a note especially to the kids. (as if the future generations would use my blog as a basis for their own. well, you never know. well if you are, you're doing it wrong. visit some of my more hardworking batchmates' blogs! hahaha) I initially had a problem especially with the solid square. Apparently, the original image needed black spaces. I originally made a solid white 5x5 square. I got quite frustrated since nothing was happening no matter what I dilate or erode it with.
I have to say, this activity is a bit different from the others. For one, we were asked to draw. It's not the most interesting topic but I still like it because it is relatively shorter and easier to understand than the others. I give myself a 9 since I believe I made a solid effort for a straightforward activity. I'm also proud to say that I did this activity all by myself! Although, I did get a little help from Eric- I asked him for graphing paper. Ironically, I only got one piece from him and made mistakes on it. Good thing I had some more around the house. Thanks Eric! :D
Subscribe to:
Posts (Atom)