Advertisement

Inputs to neural nets, and 'remembering'

Started by August 24, 2004 02:14 AM
20 comments, last by Vlion 20 years, 2 months ago
Hello all, I've been reading a little on neural nets recently and while the general idea is ok (perhaps I could even write a simple net) one thing has particularly struck me this far - the inputs to the net. The tutorial at ai-junkie.com has an example for recognising the a number/character from an 8x8 matrix of pixels. They can be on or off and each pixel is mapped to one input to the net. Then each outcome has one output ie 26 for each letter and 10 for each numeral and you look to see which is the highest. But is it really the case you have one input per pixel in this fashion? If you wanted to look at a full-screen screen-shot that's of the order of 1Mpixels needing an input each. In fact since it's rgb you'd presumably need one for each channel of each pixel? Firstly this seems really expensive, and secondly it's very rigid and inflexible - what if you suddenly have a 800x599 image when before you only had 800x600? My other query is about a neural net remembering a state. I'm looking to use nets for a racing game, possibly. I read with interest the recent article about Colin McRae Rally 2 and I got thinking about the inputs there. You could have the car's orientation, speed, spin rate and information about the driving line at the point nearest the car, 5m furhter on and 30m further on for example. But the data supplied for training the net - was each frame of the game used as one data point - the state of the controls and these inputs mentioned were used to train the network? Firstly that seems a huge set of training data (60 points per second), but how good is that? Wouldn't you like the neural net to 'keep a thought in mind' and remember the last second rather than just take every game update in isolation to act upon? Do current neural nets have the concept of memory, or maintaining a state?
Quote: Original post by d000hg
But is it really the case you have one input per pixel in this fashion? If you wanted to look at a full-screen screen-shot that's of the order of 1Mpixels needing an input each. In fact since it's rgb you'd presumably need one for each channel of each pixel? Firstly this seems really expensive, and secondly it's very rigid and inflexible - what if you suddenly have a 800x599 image when before you only had 800x600?
Yes, yes, yes, and yes. The moral of the story: The application for neural nets everyone learns in school, image recognition, is silly and impractical. (As are most neural net applications.)
Quote: Wouldn't you like the neural net to 'keep a thought in mind' and remember the last second rather than just take every game update in isolation to act upon? Do current neural nets have the concept of memory, or maintaining a state?
Surprisingly few AI algorithms which deal with unknown future circumstances have any concept of "plan". Take chess, for instance: A chess algorithm may think of a strategy 4 moves ahead, but it doesn't commit to the last 3 in any way. And then when its next turn comes, it starts from scratch, and either comes up with the same strategy or picks a new one. The same thing applies here: no long-term planning is necessary, just an algorithm that isn't prone to vacillating helplessly between two options (and few AI algorithms are).
Advertisement
So are you saying nets are bad for image recognition or just that that's not how it works? If the latter, how is it to be improved?
What about pattern recognition in general say voice recognition - how would you do that (with nets or otherwise)? Or what about a situation where the inputs change in number?
Most real-world image recognition (OCR, shape recognition, face detection, face recognition, etc.) solutions, whether they employ neural networks or not, pre-process the raw raster image data. The idea is to provide the model with a data representation which is invariant to things like translation, rotation, etc., and which minimizes the effects of noise. In the case of character recognition, for instance, some commonly-used features include: horzontal and vertical projection profiles, shape summaries (aspect ratio, etc.) and signatures.

Here are some examples:

http://www.ele.uri.edu/~hansenj/projects/ele585/OCR/OCR.pdf
http://research.cs.tamu.edu/prism/lectures/pr/pr_l1.pdf
http://cs.nju.edu.cn/people/zhouzh/zhouzh.files/publication/aim02.pdf
http://citeseer.ist.psu.edu/cache/papers/cs/4066/ftp:zSzzSzftp.ifi.uio.nozSzpubzSztrierzSzfeature.pdf/trier95feature.pdf

While I have seen direct input of binary pixels to a neural network provide reasonably accurate results, it is hardly optimal, partly for the reasons you mention.

-Will Dwinnell
http://will.dwinnell.com



Quote: Original post by d000hg
I've been reading a little on neural nets recently and while the general idea is ok (perhaps I could even write a simple net) one thing has particularly struck me this far - the inputs to the net. The tutorial at ai-junkie.com has an example for recognising the a number/character from an 8x8 matrix of pixels. They can be on or off and each pixel is mapped to one input to the net. Then each outcome has one output ie 26 for each letter and 10 for each numeral and you look to see which is the highest.

But is it really the case you have one input per pixel in this fashion? If you wanted to look at a full-screen screen-shot that's of the order of 1Mpixels needing an input each. In fact since it's rgb you'd presumably need one for each channel of each pixel? Firstly this seems really expensive, and secondly it's very rigid and inflexible - what if you suddenly have a 800x599 image when before you only had 800x600?


[Edited by - Predictor on September 3, 2004 7:29:56 AM]
Quote: Original post by Sneftel
The application for neural nets everyone learns in school, image recognition, is silly and impractical. (As are most neural net applications.)



Can you explain why you think so? The literature records the solution of many real problems by neural networks.


-Predictor
http://will.dwinnell.com


[Edited by - Predictor on September 5, 2004 2:56:41 PM]
Quote: Original post by Predictor
Can you explain why you think so? The literature records the solution of many real problems by neural networks.


i believe he did, by answering 'yes' to this:

Quote: Original post by d000hg
But is it really the case you have one input per pixel in this fashion? If you wanted to look at a full-screen screen-shot that's of the order of 1Mpixels needing an input each. In fact since it's rgb you'd presumably need one for each channel of each pixel? Firstly this seems really expensive, and secondly it's very rigid and inflexible - what if you suddenly have a 800x599 image when before you only had 800x600?


just because 'a solution is possible' doesn't mean it's a good or efficient one. I mean it's a solution to cut a deep line in masonry with a slow drip of water, but personally, i'd rather use a giant masonry saw. yes you can solve a lot of problems with a neural net, but you can also solve many of those problems with a straight up algorithmic approach that doesn't require training.

-me
Advertisement
Quote: Original post by d000hg
But is it really the case you have one input per pixel in this fashion? If you wanted to look at a full-screen screen-shot that's of the order of 1Mpixels needing an input each. In fact since it's rgb you'd presumably need one for each channel of each pixel? Firstly this seems really expensive, and secondly it's very rigid and inflexible - what if you suddenly have a 800x599 image when before you only had 800x600?


Quote: Original post by SneftelYes, yes, yes, and yes. The moral of the story: The application for neural nets everyone learns in school, image recognition, is silly and impractical. (As are most neural net applications.)


I will have to disagree by saying "no, no, no and, uh... no".

In character recognition, regardless of the actual mechanism used for classification (neural network, discriminant, etc.), typically glyphs are isolated and pre-processed before recognition occurrs. In most applications, pre-procesed characters ultimately occupy a relatively small raster. Varying character size is sometimes handled through scaling to a common resolution, but very often derivative features (such as invariant moments) are extracted and used as model inputs.

The principle applies when recognizing things other than scanned characters. If anything, even more pre-processing is required by applications such as face detection, in which it is common to scan the image, classifiying windows which are much smaller than the whole image.

It is true that the most common variety of neural networks, multlayer perceptrons, deal with a fixed number of inputs, but that is true of most other machine learning schemes (ignoring the well-studied techniques for accommodating missing values). Given, though, that most image processing work involving classification or recognition is much more efficiently performed on pre-processed data, I know of no reason that neural networks would, a priori, be a particularly unsuitable choice.

The bottom line is: There is absolutely no requirement that a neural network employ a "one pixel to one input" data representation.

-Predictor
http://will.dwinnell.com


[Edited by - Predictor on September 5, 2004 2:04:25 PM]
Hummm... About "remembering", its done with realimentation. If the outputs of the ANN are used as inputs, you can say the ANN "remembers" the last thing it did. :o)

[Edited by - S41NT on August 30, 2004 4:30:31 PM]
Quote: Original post by S41NT
Hummm... About "remembering", its done with realimentation. If the outputs of the ANN are used as inputs, you can say the ANN "remembers" the last thing it did. :o)
Interesting. Are there any models that decribe how the human brain stores data, particularly simplified models that could actually be implemented in a simulation?
Quote: Original post by d000hg
Are there any models that decribe how the human brain stores data, particularly simplified models that could actually be implemented in a simulation?



Quite a bit of research has been done in this area, but all models to date are simplified. Everyone's favorite, multilayer perceptron is pretty far down toward the "artificial" end of the spectrum. Toward the "realistic" end of the spectrum are (among others):

http://www.genesis-sim.org/GENESIS/

and

http://sulcus.berkeley.edu/FreemanWWW/manuscripts/IC8/87.html


-Predictor
http://will.dwinnell.com


[Edited by - Predictor on September 3, 2004 7:20:08 AM]

This topic is closed to new replies.

Advertisement