Quote:Original post by Symphonic So the problem I have with this, is that there's no way an NN can replicate this behavior EXACTLY. Because NNs scale the inputs and then add them together, but this behavior I've written absolutely requires that the inputs are multiplied together.
So my question boils down to this, why don't we consider more types of neurons in NNs?
...
If these nodes existed, it would be feasible for an NN to compute dot products, cross products and other exciting things too... |
Second part first.
ANNs
are able to compute mathematical functions, with a caveat.
Generally speaking, an ANN attempts generate a multidimensional decision surface. When you provide inputs to the ANN, it will give you the distance to or value of the nearest facet of the decision surface.
For feedback training (sounds like you are describing a backprop or feedforward network) you provide a whole lot of sample points and the network contorts toward a decision surface that matches your samples.
For your inputs it is common to give useful variations of your raw source data. Consider a spatial or geometric recognizer. You couldn't just feed it raw points, and probably wouldn't feed it exclusively the normalized points. You will probably include normalized distances and angles between them. You might provide sin() and cos() inputs, angle squared, axis distance squared, sum of axis distances squared, or other processed values that provide a useful dimension to the decision surface.
Note that you don't have to provide it the additional information, a simple backprop network could figure those out if it has enough internal nodes. Providing the pre-processed information adds dimensions of input in an attempt to decrease internal network complexity.
The problem is that for every additional input you provide, you are adding a dimension to the decision surface. The curse of dimensionality means that you need exponentially more inputs as your hypervolume expands. On the other hand, if you don't add dimensions you may need a more complex decision surface, which requires more internal nodes.
So in order to accurately cover a more complex math function with that kind of ANN, you need to provide enough sample points to adequately describe the decision surface. You also need enough internal nodes to let the decision surface accurately reflect the formula. Simple math functions (the classic XOR) are trivially solved with a single internal node and four sample training points. If you are trying to represent a complicated decision surface or use a wide range of input, you will need many inputs that cover all the relevant details of your problem space.
Now to your other questions about why we don't consider other types of nodes.
The short answer is that we do.
You are probably using a backprop network. That type of network works best with simple addition. The operation is fast, and the effects are well studied. If you choose to go to the research papers you can find different node uses and their adjusted training formulas, but they are generally not worth the additional work.
There are hundreds of other types of ANNs out there, mostly variations on a dozen or so simple themes like backprop, RBFs, self-organizing maps, Hopfield and Boltzman networks, etc. Some problems are better suited for different networks, and most allow for variations in their learning properties, including more complex math formulas.