Advertisement

Inputs to neural nets, and 'remembering'

Started by August 24, 2004 02:14 AM
20 comments, last by Vlion 20 years, 2 months ago
"Yes, yes, yes, and yes. The moral of the story: The application for neural nets everyone learns in school, image recognition, is silly and impractical. (As are most neural net applications.)"

That's a naive and misleading comment Sneftel. ANNS have much practical use and are often the tool of choice to impliment image/motion recognition applications.

Here's a list of some of the other stuff ANNs are being used for:

ftp://ftp.sas.com/pub/neural/FAQ7.html#A_applications

Just to add to Predictor's comments... I know of several research projects that are using "roving eyes" for image recognition. The input to the ANN is usually a window (sometimes circular, sometimes rectangular) of pixels, which the outputs of the ANN control by moving and "focusing" (zoom in - out). Roving eyes can scan images of *any* size.

Here's a link to one that scans a Go board.

http://nn.cs.utexas.edu/pub-view.php?RECORD_KEY(Pubs)=PubID&PubID(Pubs)=145
Quote: Original post by fup
ANNS <SNIP> are often the tool of choice to impliment image/motion recognition applications.


With regards to the latter (the tool of choice) I don't think ANN's are as common as one might think. I've run across many such projects and comparitivley few of them are using ANN's.

I'm not saying ANN's cannot be used for this purpose, just that other techniques are more popular.

Most of the vision-based motion estimation related applications use some measure of variance to extract features and then look for correlations between data sets. There are tongs of different methods people are using for this..

A lot of image recognition problems focus mostly on segmentation and registration algorithms (of which there are many), and also statistically matching features in the segmented images.


Will

[EDITED: For clarity, as pointed out by predictor]

[Edited by - RPGeezus on August 31, 2004 11:31:47 AM]
------------------http://www.nentari.com
Advertisement
Quote: Original post by fup
ANNS have much practical use and are often the tool of choice to impliment image/motion recognition applications.


Quote: Original post by RPGeezus
I don't think this is true, Fup. I've run across many such projects and very few of them are using ANN's.

Most of the motion related applications use some measure of variance to extract features and then look for correlations between data sets.

A lot of image recognition problems focus mostly on segmentation algorithms (of which there are many) and then statistically matching features in the segmented image.


Two claims are made above by fup, "ANNS have much practical use..." and "...are often the tool of choice to impliment image/motion recognition applications." I conclude that your comments are meant to address the latter and not the former. I do not know whether neural networks are "often the tool of choice" for image recognition applications, but it is plain that they have met with success in many such applications.

-Predictor
http://will.dwinnell.com


[Edited by - Predictor on September 5, 2004 2:23:07 PM]
I didn't intend to imply that ANNs were the only choice (or most commonly used), only that they are a tool, one of several, which are used successfully for tackling such problems.
I'm thinking now that for a real-world device, you could directly map inputs to each pixel anyway. If your robot has 2 digital cameras with a certain resolution CCD, then unless you change them to a different resolution this wuldn't be a problem. And if you kept the aspect about the same, arbritarily resizing the images to whatever you wanted would be alright anyway - maybe it would be good to look at both the high-res version an a 30x20 version too?
Processing a low res version of an image first isn't such a bad idea.

You might be better off leaving the image processing to some known working algorithms, and then pass the results to your ANN for some decision making purposes.

For example, if you wanted to find a face in an image you might look for the eyes using a known algorithm, grab the area around what you think are eyes, and pass this to the ANN to determine if there is actually a face present.

Will
------------------http://www.nentari.com
Advertisement
Quote: Original post by d000hg
I'm thinking now that for a real-world device, you could directly map inputs to each pixel anyway.


Can you explain why you believe that this would be easy or even worthwhile? It's the construction of the recognizer that's going to be the hard part (whether it's a neural network or not), and I don't see how digesting such a high-res input would help. For a successful application, I think one will need to employ some sort of data-reducing pre-processing anyway. Now, if objects smaller than the entire scene were of interest, it would likely make more sense to scan across the image, repeatedly attempting to recognize targets within a window.

-Will Dwinnell
http://will.dwinnell.com


[Edited by - Predictor on September 5, 2004 2:49:50 PM]
Well in theory if you have thousands of photographs of cows, you use that to train the network. The large number of inputs should still work I think?

But I thought a low resolution version could give general lighting information, and you could also use standard edge detection and so on to get other information?
Quote: Original post by d000hg
Well in theory if you have thousands of photographs of cows, you use that to train the network. The large number of inputs should still work I think?


I'm not saying that this can't be tried, but (depending on the resolution you're talking about) I don't think it will help (it may actually hurt), and it will be more computationally expensive.

Quote: Original post by d000hg
But I thought a low resolution version could give general lighting information, and you could also use standard edge detection and so on to get other information?


This is where I'd concentrate my efforts. Learning every possible configuration of 800x600 raw pixels (a resolution you made mention of) which mean "cow" verus every possible configuration of 800x600 raw pixels which mean "not cow" will be a lot of work. I suggest investigating standard image processing methods and using them to cut this job down to size.

-Predictor


http://will.dwinnell.com
Going off-topic, wouldn't it be cool for AI in games if the computer player's only inputs were an image of the screen as a human would see it? I'm not suggesting it as a useful idea but it would be a cool project, espcially to include two sound channels too!

This topic is closed to new replies.

Advertisement