Table of Contents
Objective
This lesson will show you how to make the PiBot detect and classify different objects that it can see with its camera. It will show you how to program the PiBot to detect and classify different objects with the camera and display them on a webpage with the video from the camera. It will also show you how to modify the webpage from Lesson 4 to allow you to drive it around and show the different objects it detects.
Parts Required
The parts below are required to complete this lesson. Note that all parts except the heatsinks and objects to be detected are included in the OSOYOO kit that can be purchased on Amazon.
- Assembled PiBot from Lesson 4
- Raspberry Pi Heatsinks
- Objects that can be detected from the list below
- People
- Cars
- Airplanes
- TV or Monitor
- Chairs
- Laptop
- Dog
- Cat
- Backpack
- Tie
- Skateboard
- Cup
- Fork
- Knife
- Spoon
- Cell Phone
- Clock
Hardware Assembly
- Turn the PiBot on and place it so that it can drive around
Remove the cover from the lens of the CSI camera before turning the PiBot on
- Place object around in front of the PiBot so that it can see them and drive around
Software
Select the appropriate link below for instructions to setup the software on the PiBot and an exploration of how it works.
Algorithm Explanation
This lesson uses a method or algorithm called a machine learning neural network to process images and detect certain objects within the image. Machine learning algorithms are composed of certain rules which are then used by the computer to learn about the characteristics of the data. The specific algorithm that is used for this lesson learned what characteristics make up the different objects listed above when they are seen in an image. The rules that were used for this algorithm are called a neural network.
Neural networks are designed to mimic the way that our brains process information draw conclusions. A neural network is composed of several layers of neurons that connect to each other and the layers above and below them. Each neuron takes in data from the layer above it and processes it based on hints that it is given called weights. It then passes the results of its processing to the layer below it. The top layer gets its input directly from the data that is being processed and the bottom layer outputs it data back to the program that is running it. During training of the neural network, the result from the bottom layer are graded and the weights are updated to achieve a higher score. After training is complete, the individual weights of each neuron are saved into a model that can then be used by another program such as in this lesson to detect objects in an image. These final weights represent the neural network's idea of what features are most important to determining the correct answer without an expert needing to figure out those features.
Machine learning neural networks can never provide a 100% certain output. They always provide results as probabilities, such as “I'm 55% certain that there is a cat in this location of the image”. Even though they don't provide a guaranteed certainty of their output, they still can be very accurate and much more efficient than algorithms that are 100% certain. They can also be used to provide very good results from input data that would be impossible to get a 100% certain result. Because they are very efficient, they can be used to provide results in real-time like we do in this lesson instead of having to wait several seconds or minutes to determine what objects are in the image.