Advertisement

A Case For Extended Neural Networks?

Started by September 19, 2007 07:58 AM
10 comments, last by Timkin 17 years, 4 months ago
I'm new to AI, I have been consuming some tutorials on GAs and NNs and I'm trying to understand one particular thing; In AI Junkie's NN tutorial he demos NNs with a simple scavenger hunt: Actors are minesweepers, inputs are current direction (a 2D vector) and direction to nearest mine (a 2D vector), outputs are left-thrust and right-thrust for the left tread and right tread of the minesweeper. This is simple enough, there's one hidden layer with six nodes. Now here's my beef: if you asked me to write a behavior instead of creating an NN I would do something like this:

// the x value of the return is the left track's thrust
// the y value is the right track's thrust
// inputs are UNIT vectors
vec2 behavior(vec2 mydir, vec2 nearest){
    myleft = vec2(mydir.x, -mydir.y);
    vec2 thrust;
    thrust.x = dot(mydir,nearest) - dot(myleft,nearest);
    thrust.y = dot(mydir,nearest) + dot(myleft,nearest);

    return thrust;
}


So the problem I have with this, is that there's no way an NN can replicate this behavior EXACTLY. Because NNs scale the inputs and then add them together, but this behavior I've written absolutely requires that the inputs are multiplied together. So my question boils down to this, why don't we consider more types of neurons in NNs? For example, a product node:

number of inputs: n
Inputs:           inputs[]
Exponents:        exps[]
Scale Factor:     f       // may be negative, cannot be zero

output = f;
for (i=0;i<n;++i){
    output *= pow(inputs,exps);
}
If these nodes existed, it would be feasible for an NN to compute dot products, cross products and other exciting things too...
Geordi
George D. Filiotis
I don't want to speak for fup - he may actually chime in on this one himself.

My guess is that the interesting thing about his approach using GAs to develop the NNs is the emergence of an appropriate behavior. The emphasis is less on the absolute correctness of the behavior, but rather on the ability of a GA-NN system to develop a behavior that is competitive with a more closed form solution such as the one you presented.

-Kirk

Advertisement
Welcome to the world of NN's

Keep in mind that the term "Neural Network" is a general term and does not describe specific design choices such as activation methodology.

In any event, the response curves you have implimented by hand can most definately be done in a NN.
Quote:
Original post by Symphonic
So the problem I have with this, is that there's no way an NN can replicate this behavior EXACTLY. Because NNs scale the inputs and then add them together, but this behavior I've written absolutely requires that the inputs are multiplied together.

So my question boils down to this, why don't we consider more types of neurons in NNs?

...

If these nodes existed, it would be feasible for an NN to compute dot products, cross products and other exciting things too...


Second part first.

ANNs are able to compute mathematical functions, with a caveat.

Generally speaking, an ANN attempts generate a multidimensional decision surface. When you provide inputs to the ANN, it will give you the distance to or value of the nearest facet of the decision surface.

For feedback training (sounds like you are describing a backprop or feedforward network) you provide a whole lot of sample points and the network contorts toward a decision surface that matches your samples.


For your inputs it is common to give useful variations of your raw source data. Consider a spatial or geometric recognizer. You couldn't just feed it raw points, and probably wouldn't feed it exclusively the normalized points. You will probably include normalized distances and angles between them. You might provide sin() and cos() inputs, angle squared, axis distance squared, sum of axis distances squared, or other processed values that provide a useful dimension to the decision surface.

Note that you don't have to provide it the additional information, a simple backprop network could figure those out if it has enough internal nodes. Providing the pre-processed information adds dimensions of input in an attempt to decrease internal network complexity.

The problem is that for every additional input you provide, you are adding a dimension to the decision surface. The curse of dimensionality means that you need exponentially more inputs as your hypervolume expands. On the other hand, if you don't add dimensions you may need a more complex decision surface, which requires more internal nodes.


So in order to accurately cover a more complex math function with that kind of ANN, you need to provide enough sample points to adequately describe the decision surface. You also need enough internal nodes to let the decision surface accurately reflect the formula. Simple math functions (the classic XOR) are trivially solved with a single internal node and four sample training points. If you are trying to represent a complicated decision surface or use a wide range of input, you will need many inputs that cover all the relevant details of your problem space.


Now to your other questions about why we don't consider other types of nodes.

The short answer is that we do.

You are probably using a backprop network. That type of network works best with simple addition. The operation is fast, and the effects are well studied. If you choose to go to the research papers you can find different node uses and their adjusted training formulas, but they are generally not worth the additional work.

There are hundreds of other types of ANNs out there, mostly variations on a dozen or so simple themes like backprop, RBFs, self-organizing maps, Hopfield and Boltzman networks, etc. Some problems are better suited for different networks, and most allow for variations in their learning properties, including more complex math formulas.
Frob, thanks for your illumined reply, the network type in the example is feedforward, I'm not sure what backprop is or how it relates to this matter.

You talk about problem space, but as I see it, there is the input space and the output space (at least for the NNs discussed in that tutorial).

In the example given, the input is four reals (two 2D vectors) and the output is two reals. Since the operations of the NN are ONLY addition and scaling it is equivalent to an series of affine transformations mapping R4 (inputs) to R6(hidden layer) and then to R2(outputs), I see how it can learn to do something like a dot product (as you said, approximating the decision space). But it can't DO a dot product, that was my point.

However, equally significant is your statement that these are fast systems, I do have a pow() call in my example, which is a good example of something that isn't fast.

Clearly by your reply I have plenty of reading to do, and I haven't posited anything new in this thread :) Part of the learning process!
Geordi
George D. Filiotis
Quote:
Original post by Symphonic
You talk about problem space, but as I see it, there is the input space and the output space (at least for the NNs discussed in that tutorial).


Sorry, I'll explain the terms a bit more.

The problem space is the scope of the problem. The decision surface is the thing being modeled by your ANN.


Consider, for example, the operations AND, OR, NOT, and XOR. They all have one output and two inputs. Three of the operations (AND, OR, NOT) can be solved with a single line dividing the problem space. The XOR operation requires at least two lines dividing the problem space.

The decision surface of the AND, OR, and NOT operations just needs to handle a single division. The hypersurface on a 2D plane is a line, which is all you need for the operation. The decision surface for XOR is more complex, as it appears as two lines on a 2D plane.

For the first three problem spaces, the decision surface of the simple single line the network needs zero internal nodes. To handle the more complex decision surface of the XOR, you require at least one internal node. More complex problems require a more complex decision surface, and therefore more internal nodes. More internal nodes means more work for the learning algorithm.
Advertisement
I don't understand why frob is describing backpropagation and decision surfaces when Symphonic is using a genetic algorithm to generate a neural network with continuous outputs.

The common sigmoid functions found in neural networks are convenient for backpropagation. With genetic algorithms, they can still be a useful way to scale the input to a fixed range in a non-linear way, but there's nothing preventing you from using whatever operators you want in a network that is being optimized by a genetic algorithm. Such a network will resemble a real biological neural network even less, but that isn't important. Feedforward networks with sigmoid functions barely resemble a biological neural network to begin with.

You can use the genetic algorithm to optimize parameters that aren't even part of a network, like constants in a function. You can combine a neural network with other things and have the genetic algorithm optimize all of them. Genetic algorithms are a very general way to search for parameter values.

The neural network with sigmoid functions is useful because with enough hidden units and layers it has the potential to approximate any function. On the other hand, some functions are much easier to represent with a NN and some are very difficult. Using preprocessed inputs can help. There's nothing stopping you from providing the dot products as inputs to the neural network.

Genetic algorithms can also evolve network topologies, mathematical formulas, or entire functions. This is a step more complicated than merely optimizing some parameter values.
The real point is that ..

..the fact that the hand-coded routine uses a dot product is actualy irrelevant.

It just happens to be using dot product's to produce a specific set of output curvatures, but these curvatures (ANY curvatures) don't strictly require a dot product to be reproduce.

There is always more than one way to skin a cat in mathematics..

Using dot products for this problem is a limitation imposed on humans who like to construct functions out of concepts that they are already familiar with.
Quote:
Original post by Rockoon1
The real point is that ..

..the fact that the hand-coded routine uses a dot product is actualy irrelevant.

It just happens to be using dot product's to produce a specific set of output curvatures, but these curvatures (ANY curvatures) don't strictly require a dot product to be reproduce.

There is always more than one way to skin a cat in mathematics..

Using dot products for this problem is a limitation imposed on humans who like to construct functions out of concepts that they are already familiar with.


Yeah, define your own inner product and work in any L2 space you like!

...

I strongly recommand taking courses in both numerical methods and operational research before consuming anything related to neural networks and GA.
Also, one of the reasons to use AI techniques is because it is often hard (or impossible) to prove that your pet solution is anywhere near optimal .. Most problems do not deserve rigorous treatment and all we really want is a damn good solution.

For instance, I highly doubt that the dot product solution is near-optimal in practice. Is it worth the effort to try to show that I am wrong?

Some problems are so hard to attack analytically that people have all but given up hope.

An example of this is the various sorting network problems, where genetic agorithms have proven superior to all humans. Some of these networks are designed to require the fewest operations, some are designed to have the shortest dependency chains, and still others are designed to have a high manufacturing fault tolerance while minimizing the others. The GA is better than humans at all of them.

So powerfull is the Genetic Algorithm within simulations that care must be taken not to expose the GA to simulation flaws that it could take advantage of, because it usualy will.

This topic is closed to new replies.

Advertisement