Character recognition - why NNs?
Imagine you get the following:
123456781 X2 XX3 X X4 X5 X6 X7 X8 X
Clearly a 1, but imagine we test it against an M and a 1 which looks like this:
012345678 | 0123456781X X | 1 X2XX XX | 2 XX3X X X X | 3 X X4X X X | 4 X5X X | 5 X6X X | 6 X7X X | 7 X8X X | 8 X
Amazingly enough every single pixel in the original 1 is also in the M, but only one pixel from the original 1 is in the new 1 (6,2). What if we had had a 100% match, but the colors were inverted on one? We would get the biggest difference possible.
A neural network and other kinds of learning agents can learn to recognize these kind of errors as natural.
Quote: Original post by daniel_i_lThat is almost a description of how an NN makes it's decision anyway... but NNs are trainable for all the possible issues whereas your method depends on a pre-made, subjective decision on what constitutes every given character. Not terribly adaptive or flexible.
To recognize an input character:
2) check this 8X8 bitmap against all the saved char bitmaps and give a score for all the pixels that are the same in both maps.
Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play
"Reducing the world to mathematical equations!"
The problem is that certain of the pixels in that 8x8 grid have more significance in determining which letter. So you would have to assign different weights to each for your scoring function (which is basicly what NNs do) to be more versatile.
Also 8x8 is a very constrained input space, where a more realistic problem would be a 50x50 grid with ALOT more potential variability in the data that could be processed. Similarly instead of the basic ASCII subset, consider a Kanji alphabet (?? 3000 common symbols ??).
Pre analysis of data like that (Kanji) may also be done to identify sweeps sub-elements of the letter (as vectors/segments) instead of individual pixels to be fed as data into a NN.
Your method is also worse than some ANNs containing hidden layers and non-linear activation functions with properly set weights. An ANN with hidden layers and non-linear activation functions can theoretically approximate any function, provided that there are enough neurons. There are algorithms that can be used that would probably find a pretty good network topology and weights.
Your method is also better than an infinite number of ANNs which represent functions that do not solve your problem. If we just generated a random topology and weights, I doubt that the resulting network would be better than your solution (although there is a very slim chance it could be!).
I think when you are talking about neural networks, you mean techniques that use neural networks, such as various training algorithms. Backpropagation on a multilayer network with non-linear activation functions could possibly be used to find a function that is closer to the true function you are trying to approximate since the method you proposed will only work if the true function is linearly separable (because of its equivalence to a neural network with no hidden layers).
An example of a function that is not linearly separable is the exclusive or (xor) function. Let's say that the characters being recognized are only 2 pixels. One of the characters looks like 00 or 11, and the other character looks like 10 or 01. The method you proposed will never be able to classify all four of these examples correctly. Now for real characters the situation is a lot more complicated, but this example still demonstates that there are some things a more general neural network (or a lot of other function approximators) could recognize that your method cannot.
For alignement and other problems:
-the 8x8 or 5x7, etc bounding box should fit the entire character (the 1 in the example sould be stretched out on the 8x8 box)
-the matching is usually done with a simple sum of differences
-the original height/width/lineheight ratios should be used to differenciate between tall and short letters that result in the same stretched image
-grayscale scaling/matching with square differences can be used instead of black and white tests (this gives better results with scanned documents)
The usage of neural networks for this kind of job can be considered, but most commercial programs simply use brute force and do pattern matching. A really complex neural network can be built from the neural networks of all the tests done in the pattern matching solution. for a 8x8 sample and a 256 character library, a neural network of 64 inputs and 256 outputs can be constructed with 2 layers and 8*8*256 nodes. The weights should be based on the training set. For a perfectly trained neural network they should match the per character grayscale average of the training set. In this case (as usual) hand wiring the network or writing out the code explicitly results in a better system.
Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play
"Reducing the world to mathematical equations!"
In my machine learning class we used a dataset for our ANN character recognizing project that was digits written as zip codes on envelopes. It was amazing how much the characters would differ from one to another.
That project was a great way to test our ANN code, as it is very simple to map the inputs and outputs. Just one input for each pixel (with the value being the intensity), and 10 outputs representing the 'strength' of the inputs being that digit. It was easy to do and we got some pretty good results. It was even able to correctly classify digits that people would have trouble doing (because of how sloppy they were). According to my professor, the current best performing supervised machine learning algorithm/model on that dataset was a support vector machine.