Advertisement

Perceptron learning question

Started by October 04, 2006 08:50 AM
8 comments, last by Buzz1982 18 years, 1 month ago
Hi, I was studying perceptron and its learning rule which is given by, w(new) = w(old) + m (t - y ) x I understood the problem of linear separability that simple perceptron will only be able to separate problems that r linearly separable. The example i was studying on a website shows that some outputs lying around the origin on a two dimensional plane and a line separating the two diffent outputs. This line is passing through the origin. My question is what if the cluster of outputs is not around the origin and somewhere else on the plane. How does adjusting weight using perceptron learning rule will help us. The only thing i think of is to somehow translate those outputs to get them around origin. But i dont know how to do that. The link where i was studying is http://www.willamette.edu/~gorr/classes/cs449/Classification/perceptron.html Please guide me Thanks Tariq
I believe you need to normalize the data, ie subtract out the mean to center around the origin.
Advertisement
In addition to having your selected inputs, you should include one additional input that is always active. This unit will emit a +1 at all times, and adjusting its weight will allow you to now move the intercept away from the origin.

This is the same as the standard equation for a line y = mx + b. If you remove the b, your line can only pass through the origin. This extra unit I mentioned, typically called a bias unit, is equivalent to b.

-Kirk
Yes u r right but i m not directly working with the output data. It is the dynamics of the perceptron network which is doing all the classification task. The plane is just used to describe or visualize what neural network does internaly. The solution should be in the form of perceptron learning rule ie there must be something in the learning rule that allow either the output data to be translated around the origin or the line separating the different output data to be moved from the origin and placed in the right position.

thanks
Tariq
Thanks kirkd

but what about the cases when there r more layers. do i have to apply this bias to the intermediate layers also or just applying the bias to the output neuron will work.

thanks
You probably already know this, but the impossibility of solving non-linearly-seperable problems using a two-layer perceptron is discussed in depth in Minsky and Papert's 1969 book "Perceptrons" (which you should be able to find in your school's library). They prove conclusively that a perceptron can solve any problem which is linearly seperable (whether about the origin or not), and no problems that are not linearly-seperable.

Adding another layer (and so making it an Artificial Neural Network, not a Perceptron) makes it possible to solve any function, whether linearly seperable or not. However, non-linearly seperable problems make it much easier from the ANN to get caught in a local-maximum when being trained. Some work on this can be found in Gori and Tesi's 1992 paper, "On the problem of local minima in backpropagation."

Quote:
Thanks kirkd

but what about the cases when there r more layers. do i have to apply this bias to the intermediate layers also or just applying the bias to the output neuron will work.

thanks

The algorithm for network learning in a multi-layered network is called back-propagation, or "backprop". Google for it, as there are dozens of tutorials. A good one can be found at www.cs.ucc.ie/~dgb/courses/ai/notes/notes9.pdf.

Edit:
By the way, the third diagram at the bottom of the webpage you showed is incorrect, or at least misleading. You don't need a four-layer ANN to solve the last diagram, it can be done with a three-layer ANN, so long as there are enough units in the hidden layer. The general theory that a 3-layer ANN can solve all functions is as yet unproven, but is generally assumed, as no one has yet come up with a counter-proof. Of course, whether or not it is possible to *train* a 3-layer ANN to solve that problem, I have no idea.
Advertisement
Quote:
but what about the cases when there r more layers. do i have to apply this bias to the intermediate layers also or just applying the bias to the output neuron will work.


One bias per layer is the rule. One in parallel with the inputs and connected each hidden layer neuron, one in parallel to each hidden layer connected to the next layer.

Including it at each layer gives each layer the ability to translate away from the origin.

Quote:
By the way, the third diagram at the bottom of the webpage you showed is incorrect, or at least misleading. You don't need a four-layer ANN to solve the last diagram, it can be done with a three-layer ANN, so long as there are enough units in the hidden layer. The general theory that a 3-layer ANN can solve all functions is as yet unproven, but is generally assumed, as no one has yet come up with a counter-proof. Of course, whether or not it is possible to *train* a 3-layer ANN to solve that problem, I have no idea.


Asbestos - are you sure about that?? I've seen it depicted both ways in different texts but haven't been able to find a definitive answer. I would expect that one hidden layer will give you the ability to generate any arbitrary polygon or polynomial. A second hidden layer will allow inclusion/exclusion of isolated polygons/polynomials. Is it possible to get disjoint polygons with only a single hidden layer, despite the number of nodes in that layer? I'll check Bishop when I get home...


-Kirk
Quote: Original post by kirkd
Asbestos - are you sure about that?? I've seen it depicted both ways in different texts but haven't been able to find a definitive answer. I would expect that one hidden layer will give you the ability to generate any arbitrary polygon or polynomial. A second hidden layer will allow inclusion/exclusion of isolated polygons/polynomials. Is it possible to get disjoint polygons with only a single hidden layer, despite the number of nodes in that layer? I'll check Bishop when I get home...


-Kirk


Woops, some research shows you're absolutely right, but which functions are which seems to be a thorny problem. G.J. Gibson and C.F.N. Cowan's "On the decision regions of multilayer perceptrons," appears to be the main work on the subject, but since I can't access that I looked at Kenyon and Paugam-Moisy's set of slides "Multilayer neural networks and polyhedral dichotomies." They show the three classic problems that can't be solved by a 3-layer ANN. They do say, though that
Quote:
It is straightforward that all polyhedral dichotomies which have at least one linearly separable function in their Boolean family can be realized by a one-hidden-layer network. However the converse is far from true. A counter-example was produced in [6]: adding extra hyperplanes (i.e. extra units on the 1st hidden layer) can eliminate the need for a second hidden layer

i.e., some dichotomies that appear not to be solvable by a 3-layer ANN can be solved by adding more units.

Anyway, thanks for making me double-check something I thought I had understood...
Asbestos,

Thanks for the response. Much more than making you double check, I'm glad to have an answer to the question that is more satisfying than I've been able to find before.

You're definitely right that hidden layers and numbers of units in a hidden layer is a VERY thorny problem. And that's for the simplest topology! Add in more complex topologies (recurrent, lateral, layer skipping connections) and we've got some serious issues to work out. A complete briar patch.

Ah, if only I were a grad student again...

-Kirk
Thanks a lot for ur replies and for providing references to relevant papers and books. I am unable to find the paper "On the decision regions of multilayer perceptrons" on the internet. I will appreciate if u send me a link or place where i can download this paper. Ur discussion was also very informative and helped me a lot.

Thanks again.

This topic is closed to new replies.

Advertisement