Neural Networks for recognizing numbers
Hi,
I have finally developed a neural network that works ok but there are still some issues and it needs tweaking. I am trying to recognize handwritten numbers 0 to 9. I am passing it images of numbers. I resize the numbers by finding the top, left, bottom, and right edges and then stretching them to meet a 15 x 15 pixel bmp. The bmp is then fed into the network (20*15 = 300 input nodes, 500 hidden nodes, and 10 output). It works ok but I kind of just pulled the 500 hidden nodes out of nowhere.
How many hidden nodes should I have for this network?
What other teaks should I have? The network is having difficulty with recognizing 9's (sometimes confuses for 4's) and 8's
I made the same post in the general forum because I think this one is more for game AI. Here's the link http://www.gamedev.net/community/forums/topic.asp?topic_id=367189
There is no formula to determine the 'right' amount of nodes. To me 500 hidden layer neurons seems an awful lot; I'd rather pick them in the range 5-20. In your case the network will not likely learn anything (abstracting), but rather remember a lot of what it has seen (this is called memoization).
The problem, though is far more likely in your input feature set. Currently, you are passing just the (normalized) bitmaps but this is not generally known as a very good representation. For one, all spatial information about the image is lost. Also pixel-state (on/off) depends highly on hand writing, slantedness etc, etc.
To adapt your input feature set you should examine some actual input data and find axes that have a high variance. You could use a principal components analysis to find some good input representatoin. You might also consider the horizontal or vertical projections of each pixel column (count the on-pixels per row/column), or use a combination of either.
Good luck. Illco.
The problem, though is far more likely in your input feature set. Currently, you are passing just the (normalized) bitmaps but this is not generally known as a very good representation. For one, all spatial information about the image is lost. Also pixel-state (on/off) depends highly on hand writing, slantedness etc, etc.
To adapt your input feature set you should examine some actual input data and find axes that have a high variance. You could use a principal components analysis to find some good input representatoin. You might also consider the horizontal or vertical projections of each pixel column (count the on-pixels per row/column), or use a combination of either.
Good luck. Illco.
Interesting. Thanks for the response. It definately gave me some new things to think about.
Ok, I will try cutting down on the number of hidden nodes. I thought I needed that many hidden nodes because of hte large number of input nodes (300).
I don't quite get how the spacial information is lost. I still have x and y coordinates for every number. I check whether it is on or off (black of white) and then use that.
What is a principle components analysis?
Horizontal and vertical projects sound like a good thing to try.
Ok, I will try cutting down on the number of hidden nodes. I thought I needed that many hidden nodes because of hte large number of input nodes (300).
I don't quite get how the spacial information is lost. I still have x and y coordinates for every number. I check whether it is on or off (black of white) and then use that.
What is a principle components analysis?
Horizontal and vertical projects sound like a good thing to try.
Um, so, I redesigned my NN so that it counts up the number of vertical and horizontal projections. I've used these as my input nodes. So my inputs are now numbers that could be 0, or 10 etc. whereas before the inputs were just 0 or 1. Is it a problem that I am using inputs that are not binary?
My NN is not functioning properly now and I was wondering if this could be why? I am a NN beginner and don't really know the details too much.
My NN is not functioning properly now and I was wondering if this could be why? I am a NN beginner and don't really know the details too much.
Hi Wease. I cannot look into your specific situation, so it is hard for me to judge what is going right and where the problems are. However, here are some ideas.
Exactly. So the NN just receives a bunch of on/off states but not where they are, if I understand correctly. The NN can thus not deduce that pixel i which is on is next to pixel j which is off. This is what I meant by losing spatial information.
It is a way of analyzing your data such that you find a transformation of your input data such that variance is maximal between classes. With larger variance, it is easier for the NN to separate classes. Use Google to find some references.
It depends on what kind of NN you have. I assumed you have a multilayer perceptron. It is also common to work with floating-point inputs in the range [0,1]. You can always scale your input to match the desired NN input requirements.
Hope this helps some. You should also read up on NN theory and browse through practical examples -- it really helps. Good luck.
Illco
Quote:
I don't quite get how the spacial information is lost. I still have x and y coordinates for every number. I check whether it is on or off (black of white) and then use that.
Exactly. So the NN just receives a bunch of on/off states but not where they are, if I understand correctly. The NN can thus not deduce that pixel i which is on is next to pixel j which is off. This is what I meant by losing spatial information.
Quote:
What is a principle components analysis?
It is a way of analyzing your data such that you find a transformation of your input data such that variance is maximal between classes. With larger variance, it is easier for the NN to separate classes. Use Google to find some references.
Quote:
Is it a problem that I am using inputs that are not binary?
It depends on what kind of NN you have. I assumed you have a multilayer perceptron. It is also common to work with floating-point inputs in the range [0,1]. You can always scale your input to match the desired NN input requirements.
Hope this helps some. You should also read up on NN theory and browse through practical examples -- it really helps. Good luck.
Illco
Multilayer perceptrons work just fine with non-binary inputs. That is not the problem.
If you want to recognize handwritten numbers, I wouldn't use a 15x15 bitmap, which counts up to 225 input units, rather use something like 8x8 bitmap or maybe smaller. Use much more less hidden units. I usually use one third of the sum of input and output nodes for the number of nodes on the hidden layer. Usually, there are very little number of times you need more than one hidden layer.
I would say that try to use inputs between 0-1 if you are trying to use outputs between 0-1.
I would say that try to use inputs between 0-1 if you are trying to use outputs between 0-1.
Dan FeketeThe Shadowhand Co.If everybody has it's own matrix, then mine is an identity
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement