Advertisement

Neural net trial and error learning

Started by September 01, 2005 04:37 PM
2 comments, last by Alrecenk 19 years, 2 months ago
I have a neural net system where all nodes connect to all other nodes. Is there any simple way I can rate a move without creating new nets? Right now the only method I'm using for learning is by evolution and survival of the fittest, and I would really like to have a function to say whether the last move was good or bad. I'm also unsure of how to create a function to make a given input more likely to give a given output(how to tell the net what good moves are). My system does not use booleans; all of the nodes have double values and all of the weights are doubles below 1. I don't really need help with the coding I just need an idea of how to do trial and error learning and making given inputs more likely to have certain outputs. edit: so far my idea has been to trace the path through the net using the highest connections or highest values, but since my net keeps data from previous moves high values may have nothing to do with the current or last move. Another idea was to randomize or mutate the net constantly and the rate of mutation would be relative to how good or bad the ai was doing(bad ai mutate alot, good ai mutate a little. However I still haven't the foggiest idea of how to do direct learning. Maybe some way of finding other ai that had those traits and breeding them in? here is the example program using my ai [Edited by - Alrecenk on September 4, 2005 7:37:11 PM]
I have that exact same problem myself... I know how to code everything, Ive got inputs and output in my (very simple) neural net. I know after which system I should give it reward, but I dont know how to make it update the weights according to the reward. Any ideas anyone?
Advertisement
Backpropagation. It's gradient descent for layered neural networks. The backpropagation algorithm uses simple formulas to calculate which direction each weight should be moved to decrease the difference between between the output and the desired output. The desired output for each input the network is trained on must be known in advance.

If the outputs for each input are not known and you're using genetic algorithms or simulated annealing based on a score given to the network at the end of a simulation, that's reinforcement learning. There are other algorithms for reinforcement learning besides completely random genetic algorithms, and there are more advanced algorithms that change the topology of the network in addition to changing weights. If you want to encourage certain behaviors with the genetic algorithm style reinforcement learning, you can give the network points when it does the action you want it to do. But sometimes you don't want to tell the network how to do something, you just want to tell it what to do and let it determine how to do it.

If all the training input/output pairs are known in advance, there are other methods of finding weights. I think there are ways of doing an approximate least squares regression to try to find the weights.

I've come up with and coded 4 learning methods since my last post edit. I haven't tested any of them yet though so I don't know that they work, but I'm going to post them anyways.

//weight[a] is the eight from a to b//node[] is the current energy at the nodes//n[] is a temporary array used for passing energy  //wires all nodes adding "a"   //relative to their current  public void wireto(int a, double r){  	for(k=0;k<n.length;k++)  		weight[k][a]+=node[k]*r ;  	}  		  	  	//wires nodes adding to "a"  	// relative to their current  	// if they're above  t  	public void boolwireto(int a, double t, double r){  	for(k=0;k<n.length;k++)  		if(node[k]>t)  		weight[k][a]+=node[k]*r ;  	}      //multiplies by r weights that were used  //and then substracts l  //after its decided a move needs to be rated  //you must clear the network and  //repass the initial input before rating  //as it loops through it will rate the moves  public void rate(int a, double r, double l){  	  	while(a-->0){  		  				for(k=0;k<n.length;k++)  				n[k] = 0 ;  				  		for(k=0;k<n.length;k++)  			if(node[k]>0)  				for(j=0;j<n.length;j++){  					weight[k][j]*=r ;  					weight[k][j]-=l ;  					n[j]+=node[k]*weight[k][j] ;  				}  			  		for(k=0;k<n.length;k++)  				node[k] = n[k] ;  				  		  		  	}  	}          //considers nodes on if they are above t  //adds r to connections between "on" nodes  //goes both ways  public void boolrate(double t, double r){  	  	for(k=0;k<n.length;k++){  		for(j=0;j<n.length;j++){  			if(node[k]>t && node[j]>t){  				weight[k][j]+=r ; }  			}  		}  	}



The first two methods I got the idea for when reading about hebbian learning. It looks at the nodes that "fired" and then connects them to a node I tell it should have fired. The second 2 methods are rating methods that try to figure out what cuased the last move and then change the weights accordingly. None of these have been optimized or tested, but it's food for thought.

I'm not sure that back propogation would work since my neural net doesn't have layers.

This topic is closed to new replies.

Advertisement