Advertisement

Character recognition - why NNs?

Started by February 16, 2007 07:57 AM
16 comments, last by GameDev.net 17 years, 8 months ago
Almost every program that i've seen that does character recognition uses NNs. But why are NN's needed? Why can't you do something like the following: 1)save a 8X8 bitmap of each character. To recognize an input character: 1) scale it down to 8X8 pixels. 2) check this 8X8 bitmap against all the saved char bitmaps and give a score for all the pixels that are the same in both maps. 3) choose the character that scored the highest. To make it more flexible, instead of storing the bitmap for each char based on one char you could store the average of many ones as a greyscale bitmap. Then you could do the score by instead of checking if two pixels are the same (and then giving a score of 1 or 0 for each pixel) a score could be given by how similar they are to eachother ( 1 - |p1-p2| ). Why is this method worse than a NN? Thanks.
"We've all heard that a million monkeys banging on a million typewriters will eventually reproduce the entire works of Shakespeare. Now, thanks to the internet, we know this is not true." -- Professor Robert Silensky
What if all pixels are one pixel more to the left than you expect, then you would get close to zero match and a big difference. What if we have a black background or some lines in the background, it would totally screw up the score.

Imagine you get the following:
 123456781      X2     XX3    X X4      X5      X6      X7      X8      X

Clearly a 1, but imagine we test it against an M and a 1 which looks like this:
012345678   |   0123456781X     X    |   1     X2XX   XX    |   2    XX3X X X X    |   3   X X4X  X  X    |   4     X5X     X    |   5     X6X     X    |   6     X7X     X    |   7     X8X     X    |   8     X

Amazingly enough every single pixel in the original 1 is also in the M, but only one pixel from the original 1 is in the new 1 (6,2). What if we had had a 100% match, but the colors were inverted on one? We would get the biggest difference possible.

A neural network and other kinds of learning agents can learn to recognize these kind of errors as natural.
Advertisement
Quote: Original post by daniel_i_l
To recognize an input character:
2) check this 8X8 bitmap against all the saved char bitmaps and give a score for all the pixels that are the same in both maps.
That is almost a description of how an NN makes it's decision anyway... but NNs are trainable for all the possible issues whereas your method depends on a pre-made, subjective decision on what constitutes every given character. Not terribly adaptive or flexible.

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"



The problem is that certain of the pixels in that 8x8 grid have more significance in determining which letter. So you would have to assign different weights to each for your scoring function (which is basicly what NNs do) to be more versatile.

Also 8x8 is a very constrained input space, where a more realistic problem would be a 50x50 grid with ALOT more potential variability in the data that could be processed. Similarly instead of the basic ASCII subset, consider a Kanji alphabet (?? 3000 common symbols ??).

Pre analysis of data like that (Kanji) may also be done to identify sweeps sub-elements of the letter (as vectors/segments) instead of individual pixels to be fed as data into a NN.
--------------------------------------------[size="1"]Ratings are Opinion, not Fact
The algorithm you gave is equivalent to the function produced by a specific neural network. More specifically, a neural network with no hidden layers and no biases and with linear activation functions, with weights determined by analysis of the training bitmaps. Therefore your method is equivalent to an artifical neural network.

Your method is also worse than some ANNs containing hidden layers and non-linear activation functions with properly set weights. An ANN with hidden layers and non-linear activation functions can theoretically approximate any function, provided that there are enough neurons. There are algorithms that can be used that would probably find a pretty good network topology and weights.

Your method is also better than an infinite number of ANNs which represent functions that do not solve your problem. If we just generated a random topology and weights, I doubt that the resulting network would be better than your solution (although there is a very slim chance it could be!).

I think when you are talking about neural networks, you mean techniques that use neural networks, such as various training algorithms. Backpropagation on a multilayer network with non-linear activation functions could possibly be used to find a function that is closer to the true function you are trying to approximate since the method you proposed will only work if the true function is linearly separable (because of its equivalence to a neural network with no hidden layers).

An example of a function that is not linearly separable is the exclusive or (xor) function. Let's say that the characters being recognized are only 2 pixels. One of the characters looks like 00 or 11, and the other character looks like 10 or 01. The method you proposed will never be able to classify all four of these examples correctly. Now for real characters the situation is a lot more complicated, but this example still demonstates that there are some things a more general neural network (or a lot of other function approximators) could recognize that your method cannot.
Most commercial ocr software doesn't use neural networks, just simple pattern matching. The algorithm you described is what they do, but instead of neural networks, they just use a simple pattern table and compare each entry to each letter.

For alignement and other problems:
-the 8x8 or 5x7, etc bounding box should fit the entire character (the 1 in the example sould be stretched out on the 8x8 box)
-the matching is usually done with a simple sum of differences
-the original height/width/lineheight ratios should be used to differenciate between tall and short letters that result in the same stretched image
-grayscale scaling/matching with square differences can be used instead of black and white tests (this gives better results with scanned documents)

The usage of neural networks for this kind of job can be considered, but most commercial programs simply use brute force and do pattern matching. A really complex neural network can be built from the neural networks of all the tests done in the pattern matching solution. for a 8x8 sample and a 256 character library, a neural network of 64 inputs and 256 outputs can be constructed with 2 layers and 8*8*256 nodes. The weights should be based on the training set. For a perfectly trained neural network they should match the per character grayscale average of the training set. In this case (as usual) hand wiring the network or writing out the code explicitly results in a better system.
Advertisement
One big difference is whether you are expecting typical fonts - in which case you can use pattern matching with pre-defined patterns that match common fonts... or if you are trying to do handwriting recognition - in which case you have to train an NN over time to say "this is a 2, not a 7".

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

As you suggested, you don't have to use NNs to do character recognition (but NNs are generally fast). If you're interested in alternative ways of character matching, take a look at OpenCV (an open computer vision library)
I wonder how windows xp-tablet edition recognizes my handwriting... i didn't have to train it at all, and my handwriting is scratch.
There is a huge difference between typed fonts and handwritten data.

In my machine learning class we used a dataset for our ANN character recognizing project that was digits written as zip codes on envelopes. It was amazing how much the characters would differ from one to another.

That project was a great way to test our ANN code, as it is very simple to map the inputs and outputs. Just one input for each pixel (with the value being the intensity), and 10 outputs representing the 'strength' of the inputs being that digit. It was easy to do and we got some pretty good results. It was even able to correctly classify digits that people would have trouble doing (because of how sloppy they were). According to my professor, the current best performing supervised machine learning algorithm/model on that dataset was a support vector machine.

This topic is closed to new replies.

Advertisement