Advertisement

text representation for recognition

Started by January 28, 2006 11:53 AM
5 comments, last by Alrecenk 18 years, 9 months ago
I am creating a program to learn to recognize text. I pretty much have everything in my AI classes working, but I'm not sure how to feed the text into the network. I heard inputing a massive array of booleans was a bad idea, but I don't know of any other way to do it...Thanks in advance
Just think of the 2D matrix that contains the pattern as a 1D vector, formed either from row-major or column-major assignment from the matrix. Then use the 1D vector as the input vector (so you'll have as many input nodes as components in the vector). Make sense?

Cheers,

Timkin

BTW: are you preprocessing the character images before you feed them into the network, or just putting in raw images?
Advertisement
So, you are saying I should just input all the booleans as a massive array of inputs? Right now I process the images to be boolean and always a set size. I haven't actually tried inputting anything yet because I was looking for some other way to process the characters... Guess I'll see how it works that way then.
Well I feel kinda dumb. Don't know why I was thinking that wouldn't work. I had to fiddle with my some of my settings to get good results, but it seems to be working fine. I scale the images down to 10x10 arrays of integers(representing how many pixels were in each area of the image) and then apply a touch of blur(seems to help with generalizing). It's in an applet so anyone interested can check it out here: http://www.alrecenk.cjb.net:81/java/show/ai/math/test8.html
Good to hear its working. You might want to consider performing a principle component analysis on the training data and then transforming each input vector into the principle basis space. OCR tasks usually show better results on test data with this kind of preprocessing.

Have you tested the quality of your OCR network by adding noise to the test inputs and noting the classification error rate as a function of added noise variance? This is a good way to analyse your results and helps to quantify your networks performance.

Cheers,

Timkin
>>You might want to consider performing a principle component analysis on
>>the training data and then transforming each input vector into the principle
>>basis space.

VERY nice idea. Finally some usage for those esoteric PCA classes :P

Btw, where do you guys get the training data for these OCR problems?

-- Mikko
Advertisement
Principle component analysis seems interesting. I'll look into that when I have more time. I'm actually not using a network. I've been experimenting with other types of "function approximators". I'm hoping to create something more effecient than an ANN, but it is hard to say how effecient my systems are right now because all my test programs are so simple that they solve almost instantaneously. My AI can be trained to work with 18 symbols in about an eighth of a second. To answer uutee's question: in the applet I posted the symbols are drawn, processed and trained into the network at runtime.

This topic is closed to new replies.

Advertisement