Imagine if computers could create 3D models from a single image. That’s exactly what humans do everyday all the time. How do we do it? What are the processes that we use to understand the world?
Feynman explains the physical process behind the process:
How can we extract the useful information from the sea of information out there?
1. Identify which parts of the image contain the image of the person.
2. Separate the figure of the person and the background.
3. Create a shape model of the person.
With the information of the shape of the person we can detect the
possible pose that the person.
1 – Identify parts of the image containing the object.
To identify the possible parts of the image that contain the object I will use feature recognition.
A feature that is widely used nowadays is HOG (histogram of gradients). An histogram of gradients is a record of all the possible angles of gradients of a certain part of the figure (a feature).
With many images of a certain object there is enough of a sample for determining how many times an angle of a gradient may appear in a figure.
When a new image is shown, then to determine if the object is present. For each set of pixels we determine the angles of the feature and compare it with the saved histogram of gradients. If the similarity passes a certain threshold, then the part of the image is considered to be the object of interest.
To achieve the goal of identifying where the parts of the image that may contain the object of interest I used the software and algorithm Pedro Felzenszwalb. His software can be found in the following website:
The figure above shows the possible use of running the code of Pedro trying to identify a bicycle.