How do you develop an apple-size-detection-system? We investigated many different approaches and concluded: we need a great object detector (a camera), computer vision and machine learning.
What is computer vision and machine learning?
Take a look at your hand. Your eyes most likely see 5 fingers. Now computer vision means that just by the use of your phone camera (or any camera), a computer system will be able to give you the exact same information, without you having to count. In order for computer vision to be able to decipher that there are fingers on the photo, the system needs to be trained and taught how fingers look like. This is, broadly speaking, called machine learning.
Machine learning/intelligent brain image from Toptal
These are the necessary steps to creating an apple-size-detector with computer vision and machine learning:
- Create an apple image
- Input the apple image in the system
- The system converts the raw image to greyscale
- Fine the edge map
- Detects apple contours
- Fit circles or ellipses over the apple contours
- Repeat until a substantial amount of data is created to be verified
But first things first. What’s the best way to create the image and process it with computer vision?
Pure Computer Vision
We started with pure / traditional computer vision, meaning that we only used one camera as a sensor for the system, and NO machine learning. . From the beginning we tested different input images (different varieties, different angles, different image qualities etc.). The outputs proved that single apples were detected, but the system struggled when conditions were not ideal. For instance, as soon as leaves or branches where depicted (especially in different light settings) the size detection became inaccurate.
Pure vision circle fitting: first output
Pure vision circle fitting: second output, with leaves covering some apples
Next we tried stereo vision as another option to find a solution for the project. Stereo vision means that this time we used dual cameras to generate a rough 3D surface reconstruction (a so-called point cloud) on the top layer of apple bins. The point cloud / 3D surface can then be analysed geometrically by the system to find the width of apples appearing as blobs on the surface. The downside of it: our growers would need a second camera. However, this may not be a concern in the future, as all new phones are now being released with dual cameras.
After finding that stereo vision was the best choice, we simply had to start teaching the machine all different ways an apple in a bin could look like, from all possible angles, different light conditions and qualities of the image data-set. This means we needed to collect a very large amount of images (data-sets) to teach the system how to locate and measure individual apples within an input apple bin image.
Find out how we solved this in our next post!