Every so often, someone asks me if there are any good programs available for counting the number of objects (bacteria or cells or fluorescent proteins) in a photo. This program is my attempt at a simple algorithm to achieve this in plants.
In the past, when people asked about object counting, I haven't known how to answer these questions because I'm not very familiar with AI approaches, and if I created an AI approach to object counting, I would like to be able to tweak it if it performed poorly.
One non-AI approach that I've contemplated for some time involves considering a pixel part of an object if the pixel is a certain color, and then grouping together any objects that are within a certain distance of each other. After some years of having this in the back of my mind, a friend mentioned that she was having trouble getting AI to recognize the boundaries of a plate, and I finally decided to implement this algorithm to see if it works.
You can test the algorithm for yourself by installing snakemake on your computer and then cloning this repository. The script can be executed with
snakemake -s object_counter.smk --cores 2 --use-conda
I have included two "test" plates with this repository, and the object_counter.yml file can be edited to point to different input images or tweak settings. Input images are in the input_images folder. I've also included output files for these inputs, which can be found in the results folder.
Out of curiosity, I've also included a standalone python script that is hardcoded to analyze bacterial colonies that approximate a specific color. After cloning the repository and installing the pillow python module, this script can be executed by changing directory to the scripts folder and running
python3 rgb_counter.py
The output will be sent to the 'cropped_objects' subfolder, which I've also included with this repository.
The approach works by first finding all pixels that are a specific color. In my case, I'm looking for pixels where the "green" component reaches some threshold level of skew relative to the "red" and "blue" components. I set all pixels that don't reach this threshold to black (RGB 0,0,0). The output image can be used to see if this thresholding approach is capturing all of the objects of interest and only these objects.
Next, I feed these pixels into a function that checks if each pixel is close to a "box" of pixels. If the pixel is within a set "distance" of a box, it gets added to the box and the boundaries of the box are expanded. If the pixel is not near any existing boxes, a new box is created. Boxes that overlap are merged, and the resulting boxes are potential objects. Any objects that attain a minimum threshold size are considered "valid" objects, and I count the number of objects retrieved.
The algorithm completes relatively quickly (within a couple of seconds per image) for a few reasons:
- After initially processing the image, only pixels that match the color are retained, so the number of pixels that need further parsing is relatively small.
- Each pixel is only processed once, and the distance of a pixel from other "boxes" is only assessed if the pixel is outside the box. As boxes of pixels increase in size, the number of new pixels that fall outside the box decreases.
There is probably room for considerable improvement - I am currently sorting the boxes of pixels relative to each other every time I analyze a new row from the image. I could also probably add break statements to prevent a pixel from being checked against other groups once a pixel matches a group. I could probably also save time by not outputting images for every detected object.
The algorithm performs remarkably well on well-spaced plant matter, and with the right settings, it accurately counts the number of living plants and even correctly identifies multiple leaves as belonging to the same plant. However, it completely fails with my bacterial colony photo, where background color is not consistent and colonies touch each other. Here is a more complete list of constraints to be aware of:
- The objects of interest need to have a well-defined color relative to the rest of the photo.
- The objects need to be well-spaced without touching.
- In order to automate the process and make this script useful, the object size, photography setup and object spacing need to be relatively consistent - for each of these plates, I needed to tweak the settings considerably. Green objects on a white plate (where blue and red have relatively high values) have considerably less 'green' skew than green objects on a black plate. Bacterial colonies have a lot fewer pixels than plants, and the number of pixels between colonies is smaller also. Within the bacterial plate, the top third is significantly more reflective than the bottom 2/3rds, so all of the pixel intensities shift in a way that is easy to accommodate mentally but difficult to program.
This was a ton of fun. I was very surprised that the program works as well as it does on plants, precisely cutting out the boundaries of each plant and counting them accurately and quickly. I'm sure I'm not producing anything novel here, and that machine learning approaches are more adaptable to varying lighting conditions, colors, and spacings, but it's still really cool to see that occasionally, simple algorithmic solutions to basic counting problems can be easier to generate than machine learning approaches that need well-defined boundaries and training datasets. In the future, I'll probably try some machine learning and AI approaches to see how they stack up. I hope this code is useful to other people working on counting problems and computer vision - please feel free to use this software as long as you reference my original github repo.