Advertisement

Neural Network Trouble

Started by July 24, 2009 05:15 PM
12 comments, last by Gil Grissom 15 years, 3 months ago
Hey, I'd like to try my hand at image recognition, but before I can do that I'd like to get a firm grasp on neural networks. Right now I'm having some trouble transferring the theory I've learned into practice. I understand the main concepts of a neural network but it's getting those concepts to work in a real situation that would actually DO something. I'm aware that there are many different types of ANNs that people have come up with to succeed in different tasks; are there any specific ones that are tailored towards pattern (nb: image) recognition? Also, does anyone have any good resources for neural networks? A lot of the information I've seen on the web so far is extremely old and likely out-dated for a field such as this. Cheers.
As far as programming, I've only ever implemented a simple form of backpropagation, so I'm certainly not very learned in the field. However, you might try checking out an existing software package before rolling your own, may help with finding the best types of networks to use. The only one I know of is emergent, but I'm sure there are others.
Advertisement
Quote: Original post by Side Winder
Right now I'm having some trouble transferring the theory I've learned into practice. I understand the main concepts of a neural network but it's getting those concepts to work in a real situation that would actually DO something.
Mmm, that's the hard part: feature selection. You want to turn your problem domain into a table of input and output numbers, with enough detail that the ANN can learn an appropriate pattern from them, without giving it so much detail that it ends up over-fitting. You could feed every pixel in; you could split the image up into sections and feed information about the average colour of each section; you've got lots of options.

Quote: I'm aware that there are many different types of ANNs that people have come up with to succeed in different tasks; are there any specific ones that are tailored towards pattern (nb: image) recognition?
You might want to look into self-organising maps, but an image can be turned into a 1D array of numbers and classified just like any other kind of data. When picking a network type, the important thing is less the type of input data you want to use, and more the learning behaviours and recurrence relations you want to support.

Richard "Superpig" Fine - saving pigs from untimely fates - Microsoft DirectX MVP 2006/2007/2008/2009
"Shaders are not meant to do everything. Of course you can try to use it for everything, but it's like playing football using cabbage." - MickeyMouse

I think that in fact neural networks is not the technique of choice for image recognition. The main reasons are that NNs are very difficult to train (especially for problems with many variables, as in the case of images) and that they are very prone to overfitting. The only successful recognition algorithm I recall that uses NNs is Yann LeCun's convolutional networks, but then again, that is so specialized that it doesn't have much to do with general NNs.

If you are interested in image recognition, I'd recommend reading about OpenCV. (There are probably many other sources, OpenCV is the one which is relatively easy to read.)

Another suggestion is to think of a specific problem you'd like to solve (e.g. face recognition, or organizing photographs, or something else) and then people would be able to recommend more specifically how to approach that problem.
Yeah, facial recognition was what I had in mind.

Gil Grissom, if not ANNs then what?

Thanks superpig, I'm reading about self-organising maps right now.

I'd rather not use any external packages for this; I'd quite like to learn the core of what's going on rather than using something that has already been programmed. I don't know if anyone's the same but instead of using the already-finished C++ STL and taking it all for granted I went through and made my own classes; sure they're not as efficient or as filled out, but I got a good grasp of what's going on in the underlying code.
Although there has been some work on recognizers which work specifically on image data, I suggest that you are better off developing a good set of features to be extracted from the raw image data, and feeding that to a more ordinary learning system. That learning could be a neural network, or it could be alot of other things: naive Bayes, linear or quadratic discriminant, logistic regression, tree or rule induction, etc.


-Will Dwinnell
Data Mining in MATLAB
Advertisement
Quote: Original post by Gil Grissom...NNs are very difficult to train (especially for problems with many variables, as in the case of images) and that they are very prone to overfitting.


Can you give an example of such behavior? How many variables is "many"? Why would you feed raw pixel data to any machine learning system?


Quote: Original post by Side Winder
Yeah, facial recognition was what I had in mind.

I recommend looking at eigenfaces (http://en.wikipedia.org/wiki/Eigenfaces) and Viola-Jones face detector (http://en.wikipedia.org/wiki/Robust_real-time_object_detection) for introduction.

Quote: Original post by Side Winder
Gil Grissom, if not ANNs then what?

It depends on the problem, but there are many other learning techniques, e.g. boosting (AdaBoost), or SVM. Graphical models (probabilistic models) became popular recently, you can try looking up Latent Dirichlet Allocation (LDA) for example.

Quote: Original post by Side Winder
I'd rather not use any external packages for this

I recommended looking at OpenCV mainly because it's aimed at someone who is not necessarily a professional, and also because it has code examples, so that even if you don't fully "get" something, you can look at the code.
Quote: Original post by Predictor
Can you give an example of such behavior?

Here's one. Suppose your object has 1000 features, and suppose each feature can be one of two "variants" (e.g. a person's eyes may be green or brown, their hair may be long or short, etc.). So the target function you need to learn is (A1 or A2) and (B1 or B2) and ....

In this case, you can train the network in one of two ways. First, you can use the 1000 disjunctions as inputs to the network (e.g. feature one is (A1 or A2), feature 2 is (B1 or B2), etc. -- 1000 features total). In that case, the network learns well and its performance is perfect (see http://i27.tinypic.com/219swg9.jpg, higher curves are better performance). Of course, that's not a very realistic training scenario, since you provide part of the answer to the network.

A more realistic training scenario is to provide the 2000 individual features as inputs to the network. E.g. input 1 is A1, input 2 is A2, input 3 is B1, etc. -- 2000 inputs total. But in this case, the NN fails to learn anything useful and the performance is very poor (see the second curve in the link above).

So that's an example of how a network has all the necessary data it needs to solve the problem, and the architecture is suitable for solving it, but standard training methods (I tried back-prop and a few extensions, like Levenberg-Marquardt) fail to learn.

Quote: Original post by Predictor
How many variables is "many"?

I'd say, as a rule of thumb, a few tens is many -- definitely 100 is many. That, of course, depends on the problem. If the problem is trivial -- e.g. just to AND of all inputs -- then it may work with 1000 features (as in the plot above). If the problem is very difficult, then a few tens of features and it doesn't work.

Quote: Original post by Predictor
Why would you feed raw pixel data to any machine learning system?

Philosophically, because that's the data I have. Neural networks, in theory, are universal learners, so whatever other "features" I can derive from the raw pixel data, they could too. Of course, in practice that almost never works.

As a more practical answer, in some cases (like with MNIST data or the "80 million tiny images" data), I just don't know what kind of features to use (the features that are used for "normal" images wouldn't work with this data), and the number of pixels is not that huge (< 1000 pixels). So it's tempting to feed the pixels to something and see what comes out. And if you do it with a reasonable method (like LDA), it actually works -- for the MNIST data it learns the strokes the characters consist of). But NNs don't work here either.

[Edited by - Gil Grissom on July 26, 2009 2:03:38 AM]
Gil,

The examples you give seem to require ignoring common practices in empirical modeling, such as feature selection. While one might get better results using something other than neural networks when ignoring this step, I'm not clear on why one would do so. Further, I don't see how any of this indicates over-fitting.

Regarding your original claim: "The main reasons are that NNs are very difficult to train (especially for problems with many variables, as in the case of images) and that they are very prone to overfitting.":

Though they have not outperformed alternatives in every situation, I have found that neural networks sometimes excel, haven't been slow to train, and have not exhibited overfitting.


-Will Dwinnell
Data Mining in MATLAB

This topic is closed to new replies.

Advertisement