BCM Neural Net
I am trying to implement the Bienenstock, Cooper, and Munro (BCM) learning rule.
(http://www.scholarpedia.org/article/BCM_rule)
It is basically a modified version of the Hebbian rule, allowing for the connections to weaken as well as strengthen. It does this via a sliding threshold, input that is below this threshold causes the connection to weaken, while above it, it strengthens.
My main issue is the lack of examples for this learning rule. I am not sure if my implementation is correct, even though the math is much easier to read compared to the formulas for Runge-Kutta integration. Perhaps someone can give me some vindication :).
Say I have the following input table:
Input 1 -
Input = 1
Weight = .5
Output = .5
Input 2 -
Input = .2
Weight = .5
Output = .1
The output of each input is the postsynaptic activity, and is determined by the formula.
y(i) = w(i) * x(i).
To determine the modification threshold, we average the new postsynaptic activity with the last one. If we just started we don't have a last one, so in that case, i'd average it with itself. And then take the power of that average, I've simply been using 2 for my power.
Theta_M = ((y(i) + y(i-1)) / 2)^2.
The change in weight (Delta W) is determined for each input by the formula,
dw(i) = y(i)*(y(i) - Theta_M) * I(i).
Where y is the postsynaptic activity, of input i. And Theta_M is the Modification Threshold. I do not use the weight decay. I add the delta weight to the weight for that connection.
New_Weight = Old_Weight + Delta_Weight
Is this the correct implementation of the formulas? Or am I missing something here? I've been trying to do a proof of concept in Excel, but I cannot seem to get it to work properly. Since I cannot seem to find any decent examples, I figured I'd ask the educated minds here :).
The appearance of "y(i)" suggests that you are forgetting most or all summations over input and weight index i; you have only one output y and many inputs x(i).
Maybe you are confusing this i with an index for successive outputs over time, as suggested by the appearance of "y(i-1)".
Your formula for theta is very wrong, not only because of the exponential decay, which is cheap but very different from averaging all past outputs y as you should, but more importantly because you take the square of your combination of y's obtaining a dimensionally wrong and out of range threshold.
Your "I(i)" is completely undefined. Scaling the change is a sound refinement of the formula on Scholarpedia (omitting the weight decay isn't), but the factor shouldn't vary over time or be different for different inputs.
Finally, why don't you use nonlinear activation functions?
Maybe you are confusing this i with an index for successive outputs over time, as suggested by the appearance of "y(i-1)".
Your formula for theta is very wrong, not only because of the exponential decay, which is cheap but very different from averaging all past outputs y as you should, but more importantly because you take the square of your combination of y's obtaining a dimensionally wrong and out of range threshold.
Your "I(i)" is completely undefined. Scaling the change is a sound refinement of the formula on Scholarpedia (omitting the weight decay isn't), but the factor shouldn't vary over time or be different for different inputs.
Finally, why don't you use nonlinear activation functions?
Omae Wa Mou Shindeiru
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement