Neural network: Representing weights in code
Hi,
I'm wondering how to represent the weights of the connections in a multilayer neural network. I've read that a matrix is usually used for this, but I can't find anything specific. For a single layer network it's quite easy of course - IxO - where I = number of inputs, O = number of outputs.
What I'm thinking is that since there can only be a direct connection from layer N to layer N+1, having a MxM matrix (where M is the number of nodes in total) is a waste of space, especially if we've got many layers, since then only a few of the cells in the matrix is actually used. I can of course improve this by using a (M-<number of outputs>)x(M-<number of inputs>) since there will never be a connection FROM an output node to any other node, and there will never be a connection TO an input node from any other node.
Also I've considered using multiple matrices (one for each pair of layers that has connections to one another), but since all the literature I've read (so far) seems to only use a single matrix I'm guessing this isn't a widely used method? Are the waste of space simply ignored in any practical applications of neural networks?
So what is the preferred way of representing the weights in a multilayer neural network?
Thanks in advance! :)
Best regards,
Hallgeir
The natural way to arrange weights is by using one matrix per interface between consecutive layers. The weights don't just form a matrix in the sense that there are NxM of them: If you multiply that matrix between two layers by the vector that has the outputs of the first layer, the result is a vector with the linear combinations for the neurons on the second layer. Now you just need to pass those outputs through the transfer function (typically a sigmoid) and you are done.
For "bias", append a fake neuron to each layer that has constant output 1.
For "bias", append a fake neuron to each layer that has constant output 1.
I use ANNs in a game for PC/Xbox. The ANNs in the game require completely arbitrary topology networks, so instead of a matrix I use more of a tree-like system, with a Node class and Connection class. The node class has "activationFunction" and "activationLevel". The connection class has "from", "to", and "weight". All nodes and connections are stored in a flat array. The node array is stored in the order inputs, then outputs, then inner nodes, so you don't need to do any special handling for input or output nodes during activation.
The last time I used NNs, I did it EJH's way. A matrix approach seems like an optimisation rather than a natural representation.
Quote: Original post by Kylotan
The last time I used NNs, I did it EJH's way. A matrix approach seems like an optimisation rather than a natural representation.
If you need to support arbitrary topologies, you certainly should look at it more like a graph. If you use a multilayer organization (which I think is fairly typical), I have an easier time understanding conceptually what the ANN is doing as a series of linear transforms followed by smoothly capping the results (the sigmoid).
But there are always several ways to look at these things. I guess it depends on your background.
There are really two basic representations for a neural network (or any graph):
1. Weighted adjacency matrix.
2. Weighted incidence matrix.
Either of these can be represented as either a dense matrix (array with zeros for empty places) or a sparse matrix (like what EJH did).
A sparse-matrix representation of an adjacency matrix is also called an adjacency list.
To summarize:
To figure out what's best for your application, you only need to do the math: Figure out how much each structure costs you in memory (remember to count pointers; they can take up a majority of your space). The adjacency list article contains some analysis for the unweighted case (when adjacency matrices only need 1 bit per entry).
1. Weighted adjacency matrix.
2. Weighted incidence matrix.
Either of these can be represented as either a dense matrix (array with zeros for empty places) or a sparse matrix (like what EJH did).
A sparse-matrix representation of an adjacency matrix is also called an adjacency list.
To summarize:
Dense SparseIncidence x xAdjacency x x
To figure out what's best for your application, you only need to do the math: Figure out how much each structure costs you in memory (remember to count pointers; they can take up a majority of your space). The adjacency list article contains some analysis for the unweighted case (when adjacency matrices only need 1 bit per entry).
Cheers folks, all of this has been of great help!
Now I'll just to try put this into practice. ;) Shouldn't be a big problem!
Best regards,
Hallgeir
Now I'll just to try put this into practice. ;) Shouldn't be a big problem!
Best regards,
Hallgeir
Quote: Original post by alvaro
I'm curious, EJH: What do you use the ANN for?
Its a procedural content generation system where the ANNs represent and control particle system weapons. The ANNs are mutated based on player weapon usage data, so a (theoretically) infinite variety of particle weapons evolves based on the preferences of the players.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement