accelerated backprop learning. tanh(x) activation use

Started by yyyy August 29, 2006 06:20 AM

2 comments, last by yyyy 18 years, 2 months ago

Author

122

August 29, 2006 06:20 AM

I've seen some suggestions that tanh(x) = 2*a/(1+exp(-bx)) - a would provide accelerated learning for backprop trained ANN a=1.716, b=0.667 Guyon I.P. (1991) Application of neural networks to character recognition. International J of pattern recognition and artificial intelligence 5, 353-382 Though it is similar to sigmoidal activation function but its out puts are in the range of 0...1 and this tanh() like produces -1.716...1.716 outputs for a neuron. Will it affect performance for better or worse if I use 0.1 and 0.9 desired values for output neurons using sigmoidal activation function. If tanh like ones are better for -0.9 and 0.9 desired outputs?

Asbestos

169

August 29, 2006 10:57 AM

I'm not really sure what you're asking

Quote: Original post by yyyy
tanh(x) = 2*a/(1+exp(-bx)) - a would provide accelerated learning for backprop trained ANN a=1.716, b=0.667

Wait, are you saying that this is the tanh function, or that this version of the function provides good results? Tanh is not equal to what you wrote:

Tanh(x) = ((a-b)/(a+b)), where a = exp(x) and b = exp(-x).

Putting your function on a graph, it looks like what you have is a version of the standard sigmoid function, except with the intersection moved so it looks like a tanh function.

Math isn't my strong point, though, so maybe there's some clever way to get from one equation to the other, so maybe they are equal after all? I'm quite certain that they are not, in fact, equal.

Quote:
Though it is similar to sigmoidal activation function but its out puts are in the range of 0...1 and this tanh() like produces -1.716...1.716 outputs for a neuron.

Why are you going for values between +- 1.716? The two standard functions, sigmoid and step, usually output 0 - 1, and tanh outputs -1 - 1. It seems a little unorthodox to use a function that outputs those values, but I guess it shouldn't affect the final result.

Quote:
Will it affect performance for better or worse if I use 0.1 and 0.9 desired values for output neurons using sigmoidal activation function.
If tanh like ones are better for -0.9 and 0.9 desired outputs?

I'm really not sure what you're asking.

Usually, one sets the desired outputs to be 0 and 1, and then terminates backprop when the error is less than 1%, or something. The sigmoid function can get arbitrarily close to 1, so there's no point in setting the desired value to be 0.9 if what you actually mean is "close to 1". Obviously the function can't actually hit 1.0, so you need to terminate early. How early is up to you.

If your desired outputs are -1 and 1, by all means use tanh with -1 and 1 as your desired outputs and, again, terminate when the error is acceptable. There isn't much point in using -0.9 and 0.9. However, if your tanh function is the one you wrote above, it's going to be much more difficult to get them to output around -1 and 1. This is because the function starts flattening around 1.7, as you say, and so the graph is still quite steep around -1 and 1. If you want outputs of -1 and 1, just use the real tanh function.

Sorry if I don't understand what it is you're asking. Obviously I haven't read the article, so if something makes more sense from the article, let me know. It sounds like the author was trying to solve a specific problem, however, and found that the formula you wrote was good for that one problem. This doesn't mean that it would be good for all problems.

alvaro

21,607

August 29, 2006 04:10 PM

tanh(x) = -1 + 2/(1+exp(-2*x)), so it should be a=1 and b=2.

So using tanh(x) or the more common 1/(1+exp(-x)) doesn't change anything for hidden layers, since the difference between the two will be absorbed by the coefficient of the next layer. For the output layer, the only difference is the range, and the transformation is linear (well, affine). Using 0.1 and 0.9 with 1/(1+exp(-x)) is the same as using -0.8 and 0.8 with tanh(x).

yyyy

Author

122

August 30, 2006 03:48 AM

I can not get the article, it is not available for free. I've met this statement in some nice book.

The function is like the tanh(x) so it is actually f(x) = (2*a)/(1+exp(-bx)) - a

It is similar to tanh(x) as it produces step like graph -1 ... 1
and that f(x) is step like -1.716 ... 1.716

If any one read the article or about accelerated training. It is one one of the thing to get it faster the other I read is to use high momentum like 0.95

Then what is the reason if with sigmoidal we use 0.1 0.9 desired outputs and with tanh -0.9 and 0.9. May be it is that with such f(x) using -0.9 and 0.9 whould provide faster learning, but this is the same I presume as to use -0.5 and 0.5 with tanh.
Or may be larger output range of f(x) would provide faster learning using -1.6 and 1.6 for desired output

accelerated backprop learning. tanh(x) activation use

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

accelerated backprop learning. tanh(x) activation use

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines