TD Learning and ANN's
I have recently come across Gerald Tesauro's article on Temporal-Difference Reinforcement Learning combined with Artificial Neural Networks as used in the TD-Gammon program. Temporal Difference Learning and TD-Gammon I was wondering, since the article is a little dated by now (1995), has anyone heard of any new research or projects using TD with ANN's to create self-teaching neural nets? I think that the idea has some potential for more real-time applications as processor speeds and memory size continue to increase with time. The idea of learning from experience continually (in small time steps) rather than having to stop and evaluate performance before learning can occur (such as with GA's) would be especially useful in realtime environments that are not episodic (no clear cut starting and stopping points). Has anyone looked into other uses of this TDL-ANN technique for other games or applications? See also: Reinforcment Learning: An Introduction
I haven't read this article yet so I dont know if this is helpful, but Artificial Intelligence had an article in 1999 that claims "/temporal coherence is/ a significant algorithmic improvement .. produces more stable final values .. temporal coherence produces faster learning than earlier methods, and TD learning can produce values that are superior to standard values".
It's called Temporal Difference Learning for Heuristic Search and Game-playing, by D.F. Beal and M.C. Smith.
It's called Temporal Difference Learning for Heuristic Search and Game-playing, by D.F. Beal and M.C. Smith.
I couldn't find an online version of the article, but I'll keep my eyes peeled for that issue/article. Thanks for the tip. Anyone else know of any ongoing projects or research tapping into TD learning methods?
Quote:
Original post by hatch22
Has anyone looked into other uses of this TDL-ANN technique for other games or applications?
A machine learning class I took a few semesters ago had a project where reinforcement learning was used to train a Mancalla player. My solution involved Sarsa with a neural network for the function approximator, but I failed miserably. The professor's solution used a similar method, but he did it right and had a working agent. It was pretty good...as I recall the good human players could beat it, but not consistantly, and none of the artificial players beat it ever.
I've been trying to fix mine ever since, and it still doesn't work. So I think the professor cheated. [wink]
CM
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement