accelerated backprop learning. tanh(x) activation use
I've seen some suggestions that tanh(x) = 2*a/(1+exp(-bx)) - a would provide accelerated learning for backprop trained ANN a=1.716, b=0.667
Guyon I.P. (1991) Application of neural networks to character recognition. International J of pattern recognition and artificial intelligence 5, 353-382
Though it is similar to sigmoidal activation function but its out puts are in the range of 0...1 and this tanh() like produces -1.716...1.716 outputs for a neuron.
Will it affect performance for better or worse if I use 0.1 and 0.9 desired values for output neurons using sigmoidal activation function.
If tanh like ones are better for -0.9 and 0.9 desired outputs?
I'm not really sure what you're asking
Wait, are you saying that this is the tanh function, or that this version of the function provides good results? Tanh is not equal to what you wrote:
Tanh(x) = ((a-b)/(a+b)), where a = exp(x) and b = exp(-x).
Putting your function on a graph, it looks like what you have is a version of the standard sigmoid function, except with the intersection moved so it looks like a tanh function.
Math isn't my strong point, though, so maybe there's some clever way to get from one equation to the other, so maybe they are equal after all? I'm quite certain that they are not, in fact, equal.
Why are you going for values between +- 1.716? The two standard functions, sigmoid and step, usually output 0 - 1, and tanh outputs -1 - 1. It seems a little unorthodox to use a function that outputs those values, but I guess it shouldn't affect the final result.
I'm really not sure what you're asking.
Usually, one sets the desired outputs to be 0 and 1, and then terminates backprop when the error is less than 1%, or something. The sigmoid function can get arbitrarily close to 1, so there's no point in setting the desired value to be 0.9 if what you actually mean is "close to 1". Obviously the function can't actually hit 1.0, so you need to terminate early. How early is up to you.
If your desired outputs are -1 and 1, by all means use tanh with -1 and 1 as your desired outputs and, again, terminate when the error is acceptable. There isn't much point in using -0.9 and 0.9. However, if your tanh function is the one you wrote above, it's going to be much more difficult to get them to output around -1 and 1. This is because the function starts flattening around 1.7, as you say, and so the graph is still quite steep around -1 and 1. If you want outputs of -1 and 1, just use the real tanh function.
Sorry if I don't understand what it is you're asking. Obviously I haven't read the article, so if something makes more sense from the article, let me know. It sounds like the author was trying to solve a specific problem, however, and found that the formula you wrote was good for that one problem. This doesn't mean that it would be good for all problems.
Quote: Original post by yyyy
tanh(x) = 2*a/(1+exp(-bx)) - a would provide accelerated learning for backprop trained ANN a=1.716, b=0.667
Wait, are you saying that this is the tanh function, or that this version of the function provides good results? Tanh is not equal to what you wrote:
Tanh(x) = ((a-b)/(a+b)), where a = exp(x) and b = exp(-x).
Putting your function on a graph, it looks like what you have is a version of the standard sigmoid function, except with the intersection moved so it looks like a tanh function.
Math isn't my strong point, though, so maybe there's some clever way to get from one equation to the other, so maybe they are equal after all? I'm quite certain that they are not, in fact, equal.
Quote:
Though it is similar to sigmoidal activation function but its out puts are in the range of 0...1 and this tanh() like produces -1.716...1.716 outputs for a neuron.
Why are you going for values between +- 1.716? The two standard functions, sigmoid and step, usually output 0 - 1, and tanh outputs -1 - 1. It seems a little unorthodox to use a function that outputs those values, but I guess it shouldn't affect the final result.
Quote:
Will it affect performance for better or worse if I use 0.1 and 0.9 desired values for output neurons using sigmoidal activation function.
If tanh like ones are better for -0.9 and 0.9 desired outputs?
I'm really not sure what you're asking.
Usually, one sets the desired outputs to be 0 and 1, and then terminates backprop when the error is less than 1%, or something. The sigmoid function can get arbitrarily close to 1, so there's no point in setting the desired value to be 0.9 if what you actually mean is "close to 1". Obviously the function can't actually hit 1.0, so you need to terminate early. How early is up to you.
If your desired outputs are -1 and 1, by all means use tanh with -1 and 1 as your desired outputs and, again, terminate when the error is acceptable. There isn't much point in using -0.9 and 0.9. However, if your tanh function is the one you wrote above, it's going to be much more difficult to get them to output around -1 and 1. This is because the function starts flattening around 1.7, as you say, and so the graph is still quite steep around -1 and 1. If you want outputs of -1 and 1, just use the real tanh function.
Sorry if I don't understand what it is you're asking. Obviously I haven't read the article, so if something makes more sense from the article, let me know. It sounds like the author was trying to solve a specific problem, however, and found that the formula you wrote was good for that one problem. This doesn't mean that it would be good for all problems.
tanh(x) = -1 + 2/(1+exp(-2*x)), so it should be a=1 and b=2.
So using tanh(x) or the more common 1/(1+exp(-x)) doesn't change anything for hidden layers, since the difference between the two will be absorbed by the coefficient of the next layer. For the output layer, the only difference is the range, and the transformation is linear (well, affine). Using 0.1 and 0.9 with 1/(1+exp(-x)) is the same as using -0.8 and 0.8 with tanh(x).
So using tanh(x) or the more common 1/(1+exp(-x)) doesn't change anything for hidden layers, since the difference between the two will be absorbed by the coefficient of the next layer. For the output layer, the only difference is the range, and the transformation is linear (well, affine). Using 0.1 and 0.9 with 1/(1+exp(-x)) is the same as using -0.8 and 0.8 with tanh(x).
I can not get the article, it is not available for free. I've met this statement in some nice book.
The function is like the tanh(x) so it is actually f(x) = (2*a)/(1+exp(-bx)) - a
It is similar to tanh(x) as it produces step like graph -1 ... 1
and that f(x) is step like -1.716 ... 1.716
If any one read the article or about accelerated training. It is one one of the thing to get it faster the other I read is to use high momentum like 0.95
Then what is the reason if with sigmoidal we use 0.1 0.9 desired outputs and with tanh -0.9 and 0.9. May be it is that with such f(x) using -0.9 and 0.9 whould provide faster learning, but this is the same I presume as to use -0.5 and 0.5 with tanh.
Or may be larger output range of f(x) would provide faster learning using -1.6 and 1.6 for desired output
The function is like the tanh(x) so it is actually f(x) = (2*a)/(1+exp(-bx)) - a
It is similar to tanh(x) as it produces step like graph -1 ... 1
and that f(x) is step like -1.716 ... 1.716
If any one read the article or about accelerated training. It is one one of the thing to get it faster the other I read is to use high momentum like 0.95
Then what is the reason if with sigmoidal we use 0.1 0.9 desired outputs and with tanh -0.9 and 0.9. May be it is that with such f(x) using -0.9 and 0.9 whould provide faster learning, but this is the same I presume as to use -0.5 and 0.5 with tanh.
Or may be larger output range of f(x) would provide faster learning using -1.6 and 1.6 for desired output
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement