Monday, August 19, 2013

Cure for Cancer. A11, Application of Binary Analysis

Ok, this won't really cure cancer by itself, but this is an exercise that could potentially be useful in differentiating abnormal cells.

First, we study an image of a scan containing similarly sized circles of paper randomly distributed on a sheet of paper shown below. This symbolizes the normal cells.

We check the histogram to find the proper threshold for transforming it to a binary image.
From this, we find that most of the image has a value less than around 210. This would represent the darker background while the the values greater than 210 would represent the lighter circles. 210/255 is around 0.82 which is what we use as a starting point for our binary threshold. I tried out threshold values from 0.81 to 0.87 with an increase of 0.02 shown below.


We find that there is a field of artifacts on the right side of each image which decreases as we increase the threshold. However, as the threshold is increased, some parts of the circle are also eaten away. A simple use of the morphological operation openimage takes away the artifacts on the right. I tried two types of structuring elements and multiple numbers of sizes for both. This took a lot of time since I was trying to remove all the artifacts but also keeping most of the circles complete. In the end, I decided to use 0.84 as my threshold shown in the leftmost image. The image on the center is the result of using a square structuring element with size one while the last uses the circle structuring element. I also kind of cheated a bit by simply cropping the top of the image off to remove the white line.

It appears that using a threshold of 0.84 and openimage with a circle structuring elemnt and size one removes all the artifacts but keeps most of the circles complete. Next, searchblobs from IPD was used to identify the blobs. It designates each connected white pixel a certain integer starting at one. Using this, 33 blobs were identified. However, a closer inspection shows that some of these blobs have zero pixels. I find this weird and address this problem by setting a limit in the size of the real blobs. Indeed there is a big discrepancy from the size of the blobs identified initially as shown in the histogram below.
This histogram gives us an idea of the relative abundance of blob sizes. We find that majority is in the 500 range. The big discrepancy is due to imperfections in the circles found. The small sizes are pixels that are probably unattached to the main circle. The larger blobs identified are for connected or overlapping circles. We zoom in to the part of the histogram where most of the blobs are found.
Finally, I settle for a range from 300 to 600 shown in the histogram below.
In this range, the mean is 492.2 while the stdev is 56.
Next, I take a break before applying the information gained here to isolate cancer cells represented by abnormally bigger cells in the next image.

After a merienda break, I apply the same settings to isolate each blob. I use im2bw with the same threshold of 0.84 and get the image below. Beside it is the result of applying the same cleaning method namely openimage with the same structuring element of circle size 1. Note the similar result of cleaning out all the artifacts on the right side of the image similar to the previous one.



Using filterbysize as suggested by Floyd, I find the normal sized cells shown below. The lower bound used was the mean-stdev and the upper bound mean+stdev.

We find that the result only contains the perfect circles. The incomplete circles and the circles that are clumped together are not included. This calls for better cleaning of the images. Anyway, let's focus on the goal of identifying the abnormally big circles. Lucky for us, the abnormally big circles in the given image do not overlap with other circles. We simply change the limits of our filterbysize to find them. Now I try setting the lower bound to mean+stdev while consider not to accept circles clumped together by setting setting the upper bound to 2xmean-stdev.

The result was not perfect but it greatly reduced the number of circles present. It also only reveals 3/5 of the larger circles and still contained a few other 'normal' circles. Now, I try again finding the best limits of the size and find a result shown below.

Here we find 4/5 larger cells and only get one extra normal cell. I just hope the final grade isn't proportional to the right - wrong circles. I used the limits m+d and 2m-d. The extra twin normal cells were included because the area they occupy are within the range that these larger cells do.

Again, I spent a lot of time on cleaning the image but still wasn't able to separate overlapping circles. I believe that this caused the poor results. I give myself an 8 for effort.

No comments:

Post a Comment