Advertisement

Arbitrary values in Neural Networks

Started by June 20, 2002 10:44 AM
27 comments, last by Cedric 22 years, 5 months ago
Hey! I just read fup''s tutorial on Neural Networks, and one passage strikes me:
quote: http://www.btinternet.com/~fup/nnt5.html Now we have defined our inputs and our outputs what about the hidden layer/s? How do we decide how many layers we should have and how many neurons we should have in each layer? Well, this is a matter of guesswork and something you will develop a ‘feel’ for. There is no known rule of thumb although plenty of researchers have tried to come up with one.
Why isn''t a genetic algorithm used to determine this kind of arbitrary stuff? In the tutorial, he makes plenty of arbitrary ''choices''. I understand that it''s a really basic tutorial, but Fup sounds like making arbitrary choices is how it''s supposed to be done. It seems to me that if all the methods were implemented, a genetic algorithm could find the best methods and the best values for each particular situation... Cédric
"Why isn''t a genetic algorithm used to determine this kind of arbitrary stuff?"

Answer : Because it is not the only way to do it.

In fact, genetic algorithms are a valid way to find those values, but they are never considered the best way to do anything. Genetic algorithns allow you to find a very good set of values for many kind of problems, but doesn''t ensure you that you will get the BEST values. So, they are not the ultimate way to do things.

In a presentation on neural networks, I would not present genetic algorithms as THE way to do it. But are genetic algorithms a nice way to do it ? Certainly. You can see the genetic algorithms as an automated way of doing the guesswork the author is talking about.

I must admit I myself don''t like the use of "you will develop a ''feel'' for". A short list of possible way to do it (including GA) would be better IMO.

So ... if you want to use GA to find those values, go ahead, they are a good way to do this job. Just keep in mind that although you may get excellent results, they are not guarenteed to be the best possible. Depending on what you are using NNs for, that may or may not be an issue.

Hoping this answers your questions.

Advertisement
It pretty much answers my question. Could you provide that ''short list of possible way to do it'' please?

And also, about GAs, you say that "they are never considered the best way to do anything".

Why?

Thanks,

Cédric
Although many people _think_ that things like the number of layers and the number of neurons in each layer are arbitrary they really aren''t. (If fup is reading this don''t take offence, I know you are just making the tutorial so that it is understandable : )

I will not comment on genetic algorithms for designing neural nets (since there is fup''s tutorial and you seem to have at least a basic grasp of the concepts there).

1. How many layers should we have?
There are only really 3 ways to go on this one.

We know that a neural network with only one layer is only linearly seperable. This of course isn''t very useful, so you probably don''t want to go with a one layer network.

A two layer network is able to _approximate_ any desired (continuous) function to any desired degree with an increased number of input and hidden neurons (as that number approaches infinity). This means that for almost every situation a 2 layer network will suffice (provided you have enough input and hidden neurons). Chances are you will almost always use this one.

A three layer network is as far as you should need to go (for your standard feedforward nets anyway). Any more layers than this and you will simply be in overkill. Kolmogorov''s theorem (1957), while not directly related to neural networks, does show some interesting facts. Primarily that every continouos function of several variables (closed & bounded input domain) can be represented as the superposition of a small number of functions of one variable. What this says about neural networks is that any continous function can be represented _exactly_ with a three layer net.

2. How many neurons should each layer have?

For a single layer network we don''t have any hidden layers -> so none (he he he)

For a two layer netowork this is a little more dificult. Since I mostly use 2 layer nets I have my own little rule of thumb: the number of hidden neurons equals two thirds the sum of the input and output neurons. This works quite well for most problems and (so I have heard) is used by one or more companies as the standard begining design of their nets.

Of course all rules of thumb do not take into account the number of training cases, noise, or the complexity of the function. Whatever you do never trust anyone who says that you will never need more than twice the number of hidden neurons as you have input neurons. That is a _myth_.

For many cases it is not possible to calculate the _best_ number of hidden units without some training and estemating the generalization error of each. If you have to many hidden neurons you will normally get a low training error but still have a high generilization error (overfitting). If you have too few hidden neurons you normally get high training error and high generalization error (underfitting and/or high statistical bias). So try out a few numbers and see how it trains (or use this information to quantify the fitness of the number for a genetic algorithm).

For a three layer network the stuff from the two layer network can be caried over. It is often harder to find when you are overfitting or underfitting with more than one hidden layer so it can be usefull to start with a decent ''universal approximater'' design and work down. Any continuous maping f(x) from d input variables x(sub)i to an output variable f can be repreented exactly by a neural network having d(2d+1) units in the first hidden layer and (2d+1) units inn the second hidden layer.

My boss is now peering at me though my door so I suppose I should wrap this up. I will say however, that after finding a good number of neurons applying a pruning algorithm can be helpful. While I am not going to go into how they work you can look them of for yourself if you are interested. (There are also construction algorithms but they are quite complex and you already seem to want to use genetic algorithms so...)

I hope this has helped. I am at work right now so this is all off the top of my head -> if I made any mistakes please let me (and the rest of us) know.

- mongrelprogrammer

(Note: When I refer to a layer I am talking about a layer of weights. A neural network with 1 input, 1 hidden, and 1 output layer would be a 2 layer network (at least the way I am writing))
- I hate these user ratings. Please rate me down. (Seriously) -
Thanks for your answer. Lots of interesting stuff, but I should probably comment on your introductory statement:
quote: Original post by mongrelprogrammer
Although many people _think_ that things like the number of layers and the number of neurons in each layer are arbitrary they really aren''t. (If fup is reading this don''t take offence, I know you are just making the tutorial so that it is understandable : )

Fup didn''t say that the number of layers and neurons is arbitrary. I did. I still think that it''s arbitrary. After all, if you have to _try_ multiple values to find the best one, then this seems arbitrary to me, and a GA could do the tests for you. Of course, designing a GA to choose between a double-layered or triple-layered network sounds like overkill, according to what you said...

Cédric
quote: Original post by mongrelprogrammer
Although many people _think_ that things like the number of layers and the number of neurons in each layer are arbitrary they really aren''t. (If fup is reading this don''t take offence, I know you are just making the tutorial so that it is understandable : )


quote: Original post by cedricl
Fup didn''t say that the number of layers and neurons is arbitrary. I did.


Point taken; though fup does say "this is a matter of guesswork and something you will develop a ''feel'' for." I see a _lot_ of people who think neural nets are black boxes and just stuff a bunch of neurons in it and think it will work or even just guess (incorrectly) at the number of neurons needed (though I don''t belive either of you thought that). I was just trying to make the distinction between guessing and finding : )

Let me just say I wasn''t saying anything bad about fup (or you for that matter). I was just trying to show that there aren''t any completly arbitrary values.

quote: Original post by cedricl
I still think that it''s arbitrary. After all, if you have to _try_ multiple values to find the best one, then this seems arbitrary to me, and a GA could do the tests for you.


Don''t tell that to one of my old professors : ) As a research assistant I was forced to find the optimal* number of hidden neurons though statistical means (it hurt my poor little brain) because he was working on a construction algorithm and wanted to check it. For the most part it is possible to mathematically find a proper number of hidden neurons. It just isn''t practical : )

quote: Original post by cedricl
Of course, designing a GA to choose between a double-layered or triple-layered network sounds like overkill, according to what you said...


You are quote right. If you can''t get a neural network to train with only 2 layers you should (most of the time) rethink what you want it to learn and how you preprocess & postprocess the data. Besides in most cases if you applied a genetic algorithm to this case it would say to make it a 3 layer net since it would train faster (unless you trained for a long time and tested generality).

- mongrelprogrammer

*By optimal I mean a decent _range_ of hidden neurons.
- I hate these user ratings. Please rate me down. (Seriously) -
Advertisement
Howdy.

Interesting discussion. I''m not an expert in the field, but I would agree that an optimal structure for your network probably exists for a particular problem, and that structure is likely different for a different problem. Finding it is a nasty business, I''m sure.

As for the GA approach, there is a great paper in Industrial Engineering and Chemical Research (1999) 38:4330-4336. This group used a hybrid GA-Evolutionary Programming approach in which the number of layers, the number of imputs, and a single output were set but the number of nodes in any particular hidden layer was optimized by GA. The weights of the system were optimized by EP at the same time. While a lot of improvements could be made, I''m sure, the results are very encouraging. Also note that they never needed more than 2 hidden layers and I believe they tried up to 5.

It''s a reasonably good paper and easy to read. It might help stimulate some ideas...

-Kirk
Hello everyone, I''m back from my biking holiday. Still in one piece! (although nothing else is ;0))

Let me try to clarify this question about hidden units. Unfortunately there isn''t space (or time) to cover everything in detail here so I''ll give you a quick ooverview.

To date, there is *no* way to determine the number of hidden neurons except by trial and error (I include GAs and similar search techniques here). See the neural network ''bible'' (Neural Networks for Pattern Recognition - Bishop) for conformation. Also, Warren Searle has a decent discussion about this in the comp.ai.neural-nets faq.

Each problem you tackle with a NN will usually require a different architecture. If the networks are static then the optimal topology is discovered by repeated experimentation by hand. Usually you start off with too many units and reduce them, taking care that underfitting is avoided. If you have too many units your network will overfit and lose the ability to generalize. Too few, and it won''t learn. The choice is not abitrary. Additionally, as MP mentioned above, there are ways of pruning the connections.

Another way of ending up with (hopefully) the optimal architecture is to use techniques like GAs to traverse the search space of topologies. There are numerous ways of doing this. One of the simplest is by defining a binary matrix of connections between neurons. A ''1'' represents a connection, a zero represents no connection. The matrix is then concatenated into a string and a GA is used to evolve the structure. Each epoch the population of networks is tested to see which are the better performers and offspring are produced as usual. This is just one of a myriad techniques that have been developed for determining the optimal architecture.

Before I go, let me just address this point:

quote: In fact, genetic algorithms are a valid way to find those values, but they are never considered the best way to do anything


This is untrue. Sometimes GAs are the best way of finding a solution because a heuristic does not yet exist. There are many examples of this.
I used GAs in my tutorial because I believe it is much easier for my readers to grasp the principal behind ANNs than by a discussion of gradient decent methods like backprop. The other bonus is that the networks do not require a training set.

I appologize in advance for any errors in these paragraphs as I''ve typed everything at a bionic pace and I haven''t had time to check.

bye!








Stimulate
quote: Original post by cedricl
And also, about GAs, you say that "they are never considered the best way to do anything".


quote: Original post by fup
This is untrue. Sometimes GAs are the best way of finding a solution because a heuristic does not yet exist. There are many examples of this.


Guess my point requires some more explanations ...

What I meant by "they are never considered the best way to do anything" is that it is never mathematically guarenteed to give the best possible result, there is no mathematical demonstration that the GA will actually lead to the best possible result. (If I'm wrong about this, please point me in the right direction).

So, there is no guarentee that the result you'll get is the best one in the space of solutions. Although, since exploring and testing the whole set of possibilities is not always feasible, GA are often the best solution we have available. (I would myself tend to use GA for finding the values in NN).

This also means that since there is no guarentee that it yields the best result, research for a perfect (or better) way to do it are still opened.

Hope this clarify the confusion I may have set with my post.

Thanks,

Ithyl


[edited by - Ithyl_Chantresonge on June 22, 2002 12:28:50 PM]

[edited by - Ithyl_Chantresonge on June 22, 2002 12:31:25 PM]
It does clarify your point, but if the GA has sufficient randomness, there is always a slight possibility that it will try an entirely different solution and possibly stumble upon the best one in the search space. So in this sense, if it has sufficient time, it will try all the possible combinations and find the best one eventually.

Of course, it''s better to try all the combinations one by one than wait for the GA to try them all, but I tend to believe that a GA will find the best solution.

Cédric

This topic is closed to new replies.

Advertisement