Inputs to neural nets, and 'remembering'

d000hg · 2004-09-12T00:50:40

Hello all, I've been reading a little on neural nets recently and while the general idea is ok (perhaps I could even write a simple net) one thing has particularly struck me this far - the inputs to the net. The tutorial at ai-junkie.com has an example for recognising the a number/character from an 8x8 matrix of pixels. They can be on or off and each pixel is mapped to one input to the net. Then each outcome has one output ie 26 for each letter and 10 for each numeral and you look to see which is the highest. But is it really the case you have one input per pixel in this fashion? If you wanted to look at a full-screen screen-shot that's of the order of 1Mpixels needing an input each. In fact since it's rgb you'd presumably need one for each channel of each pixel? Firstly this seems really expensive, and secondly it's very rigid and inflexible - what if you suddenly have a 800x599 image when before you only had 800x600? My other query is about a neural net remembering a state. I'm looking to use nets for a racing game, possibly. I read with interest the recent article about Colin McRae Rally 2 and I got thinking about the inputs there. You could have the car's orientation, speed, spin rate and information about the driving line at the point nearest the car, 5m furhter on and 30m further on for example. But the data supplied for training the net - was each frame of the game used as one data point - the state of the controls and these inputs mentioned were used to train the network? Firstly that seems a huge set of training data (60 points per second), but how good is that? Wouldn't you like the neural net to 'keep a thought in mind' and remember the last second rather than just take every game update in isolation to act upon? Do current neural nets have the concept of memory, or maintaining a state?

fup

463

August 31, 2004 08:43 AM

"Yes, yes, yes, and yes. The moral of the story: The application for neural nets everyone learns in school, image recognition, is silly and impractical. (As are most neural net applications.)"

That's a naive and misleading comment Sneftel. ANNS have much practical use and are often the tool of choice to impliment image/motion recognition applications.

Here's a list of some of the other stuff ANNs are being used for:

ftp://ftp.sas.com/pub/neural/FAQ7.html#A_applications

Just to add to Predictor's comments... I know of several research projects that are using "roving eyes" for image recognition. The input to the ANN is usually a window (sometimes circular, sometimes rectangular) of pixels, which the outputs of the ANN control by moving and "focusing" (zoom in - out). Roving eyes can scan images of *any* size.

Here's a link to one that scans a Go board.

http://nn.cs.utexas.edu/pub-view.php?RECORD_KEY(Pubs)=PubID&PubID(Pubs)=145

My Website: ai-junkie.com | My Books: 'Programming Game AI by Example' & 'AI Techniques for Game Programming'

RPGeezus

216

August 31, 2004 09:31 AM

Quote: Original post by fup
ANNS <SNIP> are often the tool of choice to impliment image/motion recognition applications.

With regards to the latter (the tool of choice) I don't think ANN's are as common as one might think. I've run across many such projects and comparitivley few of them are using ANN's.

I'm not saying ANN's cannot be used for this purpose, just that other techniques are more popular.

Most of the vision-based motion estimation related applications use some measure of variance to extract features and then look for correlations between data sets. There are tongs of different methods people are using for this..

A lot of image recognition problems focus mostly on segmentation and registration algorithms (of which there are many), and also statistically matching features in the segmented images.

Will

[EDITED: For clarity, as pointed out by predictor]

[Edited by - RPGeezus on August 31, 2004 11:31:47 AM]

------------------http://www.nentari.com

Predictor

198

August 31, 2004 10:23 AM

Quote: Original post by fup
ANNS have much practical use and are often the tool of choice to impliment image/motion recognition applications.

Quote: Original post by RPGeezus
I don't think this is true, Fup. I've run across many such projects and very few of them are using ANN's.

Most of the motion related applications use some measure of variance to extract features and then look for correlations between data sets.

A lot of image recognition problems focus mostly on segmentation algorithms (of which there are many) and then statistically matching features in the segmented image.

Two claims are made above by fup, "ANNS have much practical use..." and "...are often the tool of choice to impliment image/motion recognition applications." I conclude that your comments are meant to address the latter and not the former. I do not know whether neural networks are "often the tool of choice" for image recognition applications, but it is plain that they have met with success in many such applications.

-Predictor
http://will.dwinnell.com

[Edited by - Predictor on September 5, 2004 2:23:07 PM]

fup

463

August 31, 2004 11:18 AM

I didn't intend to imply that ANNs were the only choice (or most commonly used), only that they are a tool, one of several, which are used successfully for tackling such problems.

My Website: ai-junkie.com | My Books: 'Programming Game AI by Example' & 'AI Techniques for Game Programming'

d000hg

Author

1,208

September 01, 2004 06:29 AM

I'm thinking now that for a real-world device, you could directly map inputs to each pixel anyway. If your robot has 2 digital cameras with a certain resolution CCD, then unless you change them to a different resolution this wuldn't be a problem. And if you kept the aspect about the same, arbritarily resizing the images to whatever you wanted would be alright anyway - maybe it would be good to look at both the high-res version an a 30x20 version too?

RPGeezus

216

September 01, 2004 01:43 PM

Processing a low res version of an image first isn't such a bad idea.

You might be better off leaving the image processing to some known working algorithms, and then pass the results to your ANN for some decision making purposes.

For example, if you wanted to find a face in an image you might look for the eyes using a known algorithm, grab the area around what you think are eyes, and pass this to the ANN to determine if there is actually a face present.

Will

------------------http://www.nentari.com

Predictor

198

September 03, 2004 06:49 AM

Quote: Original post by d000hg
I'm thinking now that for a real-world device, you could directly map inputs to each pixel anyway.

Can you explain why you believe that this would be easy or even worthwhile? It's the construction of the recognizer that's going to be the hard part (whether it's a neural network or not), and I don't see how digesting such a high-res input would help. For a successful application, I think one will need to employ some sort of data-reducing pre-processing anyway. Now, if objects smaller than the entire scene were of interest, it would likely make more sense to scan across the image, repeatedly attempting to recognize targets within a window.

-Will Dwinnell
http://will.dwinnell.com

[Edited by - Predictor on September 5, 2004 2:49:50 PM]

d000hg

Author

1,208

September 03, 2004 06:51 AM

Well in theory if you have thousands of photographs of cows, you use that to train the network. The large number of inputs should still work I think?

But I thought a low resolution version could give general lighting information, and you could also use standard edge detection and so on to get other information?

Predictor

198

September 03, 2004 07:36 AM

Quote: Original post by d000hg
Well in theory if you have thousands of photographs of cows, you use that to train the network. The large number of inputs should still work I think?

I'm not saying that this can't be tried, but (depending on the resolution you're talking about) I don't think it will help (it may actually hurt), and it will be more computationally expensive.

Quote: Original post by d000hg
But I thought a low resolution version could give general lighting information, and you could also use standard edge detection and so on to get other information?

This is where I'd concentrate my efforts. Learning every possible configuration of 800x600 raw pixels (a resolution you made mention of) which mean "cow" verus every possible configuration of 800x600 raw pixels which mean "not cow" will be a lot of work. I suggest investigating standard image processing methods and using them to cut this job down to size.

-Predictor

http://will.dwinnell.com

d000hg

Author

1,208

September 07, 2004 07:01 AM

Going off-topic, wouldn't it be cool for AI in games if the computer player's only inputs were an image of the screen as a human would see it? I'm not suggesting it as a useful idea but it would be a cool project, espcially to include two sound channels too!

Inputs to neural nets, and 'remembering'

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Inputs to neural nets, and 'remembering'

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines