_WeirdCat_ said:
Am i missing something?
If you are asking about tools that you can plug in and get an answer, there are tools for that.
But I think you are asking about programming one yourself.
SHORT VERSION: There are plenty of great books on the topic, including completely free amazing online textbooks like this and this.
LONG VERSION: The question is one of those types where if you don't know what to ask, you probably aren't ready for the answer. Most people are not ready for the math involved, nor will they ever study the math required.
The entire field is based on statistics and probabilistic math. Topics like image processing require knowledge of how pictures are encoded, which usually gets into signals processing and data compression; you can use image processing libraries to simplify some steps, but you'll still need to do math on them. Usually neural networks are an optional topic in the fourth or fifth year of university studies, or covered in graduate-level specialty topics.
_WeirdCat_ said:
So for training sake let it be 32x32 bool array defining a pixel where there is something or not.
When you want to recognize a pattern in noisy data that has examples, often either a multi-layer perceptron or radial basis function network is used for memorizing the data.
You have to figure out what input you plan on giving the machine, then provide lots and lots of example data sets to learn from. Usually you need the training set to be at least 10 times the number of features.
If you went with the easy route of feeding a full 32x32 array, that's 1024 features, so you'd need about 10K examples with a good mix of showing where something is, and where something is not.
You described a process of running a kernel over the array to identify some feature and extract it. A 4x4 kernel like you described would reduce it to 28x28, needing about 8K examples in your training data.
That might be exactly what you are looking for, or you might do better by using different learning algorithms, or by choosing better inputs by processing your data in other ways.
I did a bunch of this in a series of graduate level courses. It is extremely math intensive to understand and build, but once you've developed the system actually using a recognizer is quite easy.
_WeirdCat_ said:
Similar question? Hod they do face recognition? I mean by finding eyes, nose mouth position?
I did a bit of this in one class, detecting a hand that was pointing with a finger on a contrasting background. We talked about faces and facial recognition, but it gets really complex, really quickly.
For faces specifically, the code must do a bunch of image processing to identify and extract useful information. The field is called image registration, where you register where different elements are on images.
On a face, you might start by first looking for skin tones at all. Often that is done by converting from RGB to HSV color model and looking for skin tones. That is is much easier in HSV space since skin tones are similar hues regardless of a person's race, and also in bright or dim lighting. Sometimes a second pass is done in YCbCr space, and if you mask both, they can be tuned for >99% accuracy. (This was the first step in looking for a hand, seeing if there are any potentially-skin-pixels at all, building a mask to know where they are.)
Then when you think you're looking at flesh tones, you might run a kernel across it to search for near-white surrounding near-dark, those might be eyes. Then look for dark spots near each other, those might be nostrils. Look for patterns to indicate a mouth open or close. Then feed those in as inputs to figure out if you might possibly be looking at a face. You'll need to do more processing to make it rotation invariant, an upside down face is still a face. (With a hand you can identify a basically-square shape for the fist and a rectangle attached for a finger and for the arm, look for a roughly 4x4 and 4x1 dimension, and possibly have an arm as a rectangle going out as a 3-wide rectangle of arbitrary length potentially to the edge of the image. No need to recognize more fine details like eyes, noses, or mouths.)
If you want to recognize a specific face among many other faces, you'll need to register a collection of data points on a face beyond just eye, nose, and mouth points, but also cheeks, foreheads, ears, and you'll need registration data about the orientation of the head.
After you've got all the registration data, you can compare it to a database of known faces. You can then use statistical methods (including neural networks) to identify faces that are “close enough” to the sample. Naturally “close enough” requires tuning, and suffers from the curse of dimensionality; the more features you have, the more examples and processing are required to find matches.