Advertisement

ANN - Why multiple layers?

Started by May 28, 2009 01:50 AM
4 comments, last by Predictor 15 years, 5 months ago
Hi, I'm trying to design a classification network in MATLAB. I'm reading in "Neural Network Toolbox™ 6 User’s Guide" and on p. 190 were they compare some training algorithms. But they work there in the examples with networks of 3-10-10-1 (thus, 2 hidden layers of 10 neurons each), and 9-5-5-2, etc. What is the advantage of this?

The number of layers determines the type of function an ANN can mimic. Conceptually training an ANN is akin to fitting a function on a given set of inputs and outputs. This (implicit) function may be linear, continuous or discontinuous, which can respectivily be approximated by ANNs with 0, 1 and 2 hidden layers. I'm not sure if this theorem has been completely proven, but I found it a good guideline to use when you have some idea of the nature of the relationship between inputs and outputs.

The significance of the number of neurons in the hidden layers is a bit more fuzzy, or at least it was when I studied ANNs. Back then the rule of thumb was to decrease/increase the number of neurons in the hidden layer to combat over/underfitting respectively. By using fewer hidden neurons than the number of input neurons, you're effectively reducing their indivual influence, which helps in combatting noise and thus overfitting. Using more hidden neurons has the opposite effect, giving inputs more influence and helping preventing underfitting and getting stuck in local optima.

However, these approaches each have their pitfalls of either falsely removing too much 'noise' or allowing insignificant inputs too much influence on the ouput. So it's a tricky optimization question for which I don't recall any definite guidelines. Most ANN implementations I've seen stick to using the same number of neurons in the hidden layer(s) as in the input layer for the first few training sessions and then vary the number of hidden neurons depending on how well the resulting ANN fits the theoretical/empirical situation.

Hope this helps & that my info isn't too outdated :)
Rim van Wersch [ MDXInfo ] [ XNAInfo ] [ YouTube ] - Do yourself a favor and bookmark this excellent free online D3D/shader book!
Advertisement
Quote: Original post by LitheonI'm trying to design a classification network in MATLAB.

I'm reading in "Neural Network Toolbox™ 6 User’s Guide" and on p. 190 were they compare some training algorithms.

But they work there in the examples with networks of 3-10-10-1 (thus, 2 hidden layers of 10 neurons each), and 9-5-5-2, etc.

What is the advantage of this?



First, it is important to understand the nature of the various layers. What is important here is the use of non-linear layers (layers of nodes which calculate weighted sums followed by nonlinear transfer functions). The input "layer" typically is an abstraction of the independent variables and performs no actual information processing. The output layer may or may not be nonlinear. It is common to use a linear output layer to simply scale the output range of the neural network (most often 0.0 to 1.0 or -1.0 to +1.0) to a more meaningful range.

That leaves the nonlinear hidden layers. The nodes in these layers are usually simple sigmoid functions, which means that (simplifying somewhat) they divide the input space using lines (or planes, hyperplanes, etc.). The number of hidden layers affects the nature of the shapes of the decision regions which can be represented by the neural network. For more explanation of this, see the comp.ai.neural-nets FAQ, Section 3:

comp.ai.neural-nets FAQ, Part 3 of 7: Generalization

Scroll down to the section titled "How many hidden layers should I use? ".


-Will Dwinnell
Data Mining in MATLAB

The XOR problem is a classic example of something that can't be done without a hidden layer. AND, OR, and NOT can all be done quite easily but XOR cannot. This is a very basic example of a trivial function that requires a 3rd layer, and it is easy to imagine real world scenarios that are effectively composed of XOR operations in some form.
Best case scenario is to evolve the network topology itself with something like NEAT:

http://en.wikipedia.org/wiki/NeuroEvolution_of_Augmenting_Topologies

Then you get a network that is the right size to solve the problem.

Not too sure, but is there such thing as non-ad-hoc methods of estimating the number of nodes/layers required to solve a problem? For anything sufficiently complex it seems like that would be extremely hard...
Quote: Original post by EJHNot too sure, but is there such thing as non-ad-hoc methods of estimating the number of nodes/layers required to solve a problem? For anything sufficiently complex it seems like that would be extremely hard...


Assuming a typical feedforward neural network with:

1. The target variable has been scaled to a useful range (so no linear output layer is needed)
2. There is one nonlinear hidden layer and one nonlinear output layer
3. Reasonably good weight initialization and training algorithms are used

The problem of model architecture is reduced to varying the number of hidden layer nodes. Simply begin with a small hidden layer and try successively larger ones, checking the validation error. Select the one with the least validation error.

There is no guesswork in this, and the creativity and expertise come in designing the sampling, input and output transformations, etc.


-Will Dwinnell
Data Mining in MATLAB

This topic is closed to new replies.

Advertisement