Thursday, August 29, 2013

Musical Blobs. A12 Playing Notes by Image Processing

Who would have thought that image processing is not just for images but also for music? Well, humans can read notes and process it in their brains and sing the note or play it on an instrument so why can't a computer do so as well?

First, a music sheet is required. The song choice has something to do with my dream of wanting to learn to play the piano. I've already downloaded a few music sheets before and this song, 'Fireflies' by Owl City came to mind. I love this song and I hope you like it too :)
http://www.youtube.com/watch?v=psuRGfAaju4

Actually, the activity requires a simple music sheet. Being confident with my musical and 186 skills, I wanted to challenge myself. I also didn't want to play the usual nursery rhymes that my batch mates would probably have chosen such as the song about Mary's lamb or a twinkling star. No, I wanted to play a song about fireflies. Haha. Just for variety.

The main part of the activity was to determine the tones denoted by the notes involved in the song which involves the image processing part. The process can be outlined in 3 steps which is described by the images in the figure below.

I simply loaded the image of the music sheet to scilab and cropped the part that I will be using which is the first line of the song. For convenience, I cropped out the G-clef already but unfortunately couldn't remove the 'arranged by'. Thank you Mr Luke Erickson. I converted the loaded image to binary using im2bw with a threshold of 0.8. I did a few experiments on this using trial and error and found that a threshold of 0.8 appeared to contain most of the information required but not add extra artifacts. The result is the first line in the figure above. Then, I actually forgot the second step which was to invert the image. I forgot that the morphological operators work on a white object in a black background. This was easily done by getting the inverse of the boolean matrix. In scilab, the syntax is simply ~M. The result is the second line of the figure above. Finally, the third line in the figure is the result of applying OpenImage with a structuring element of a circle. Again, the amazing method of Trial And Error (TAE) was used to find the best size of the circle of the structuring element. The choice of a circle was pretty common sense. We want to retain the circular elements in the image. These denote what note is to be played. With the help of SearchBlob, each note was identified by finding their location in space, specifically the y pixel where they exist. Since each note line and space denotes a single tone and also due to the fact that the lines are horizontal, the y pixel can automatically be converted to a tone. This is where the musical side of the activity begins.

G#2/Ab2103.83332.
A2110.00314.
A#2/Bb2116.54296.
B2123.47279.
C3130.81264.
C#3/Db3138.59249.
D3146.83235.
D#3/Eb3155.56222.
E3164.81209.
F3174.61198.
F#3/Gb3185.00186.
G3196.00176.
G#3/Ab3207.65166.
A3220.00157.
A#3/Bb3233.08148.
B3246.94140.
C4261.63132.
C#4/Db4277.18124.
D4293.66117.
D#4/Eb4311.13111.
E4329.63105.
F4349.2398.8
F#4/Gb4369.9993.2
G4392.0088.0
G#4/Ab4415.3083.1
A4440.0078.4
A#4/Bb4466.1674.0
B4493.8869.9
C5523.2565.9
C#5/Db5554.3762.2
D5587.3358.7
D#5/Eb5622.2555.4
E5659.2652.3
F5698.4649.4
F#5/Gb5739.9946.6
G5783.9944.0
G#5/Ab5830.6141.5
A5880.0039.2
A#5/Bb5932.3337.0
http://www.phy.mtu.edu/~suits/notefreqs.html
Although humans can easily understand notes, computers are not as musical. We have an idea of what a middle C sounds like and can easily reproduce the whole scale after hearing only a reference of one note or by playing it in an instrument. Some people can even find the perfect pitch without a reference. The computer on the other hand speaks a more Physics language, in frequency. Each note, assuming we're not tone deaf, corresponds to a certain frequency of sound. So the computer determines the y-position of each note while the human determines the note using his music skills. Then the human uses a conversion table, specifically the one in the table above to provide the computer a basis for transformation from note position to frequency. After all the notes are identified and transformed to Hz, the notes are inserted into a sine wave and played using the sound() function.

I guess it would also be necessary to discus some of the cheats I used to make this activity simpler. As you may have noticed, the transformation from the original image to the circles only image was not perfect. Here is the image again so you won't have to scroll so far.
I got all the notes in the G-clef. However, there is one extra circle in the second measure that comes from the bar in one of the notes. Also, there is an extraneous 'note' that comes from the person who arranged this piece, the u from Mr. Luke. These were easily removed by disregarding these fake notes.
Then, we have a bigger problem. The F-clef contains 3/4 notes which are not shaded. These were not included in the final image because the structuring element used was a solid circle. This was remedied by simply adding their corresponding frequencies to the array containing all the frequencies. The F-clef also adds another problem. I have yet to play more than one note at the same time. Piano teachers would kill me. I cheated by simply playing the notes individually. The result still sounds good since there are rests in the G-clef when there are notes in the F-clef. The downside however is that instead of having a background sound of the chord, there is none. An idea is to add the two sine waves with the different frequencies together. Will it give the same effect? I'm not sure since I haven't tried it yet. I think I can do it if I find I have more time. This would be an interesting extension of this activity. Finally, I was lucky to find a piece that contains only notes of the same duration, 1/8. Actually, there are 3/4 but I just converted them to 1/8 as well. Haha. Having different notes would require additional image processing to identify not only the tone but also the duration as well. But so far, I would like to believe that I have placed an effort beyond expected by using a not so easy piece, to some degree. Sound quality was slightly improved by adding short pauses between each note simply by adding a few zeros in the final matrix containing the sine waves. I would like to believe that I deserve a 10 for this activity.

Recent update.
Tor was very smart to suggest to me to upload the resulting sound.
It was easily done by using the 'savewave' function.
Enjoy :)

https://dl.dropboxusercontent.com/u/19434317/5th%20year/186/a12%20notes/sound.wav

No comments:

Post a Comment