Hello,
I am trying to simulate a robot that I want to build (in Unity, hah). To make it walk, I'm trying to use Q-Learning. It's actually working to some extent, but its walking performance could be a lot better. It suffers from an extremely crude approximation of the true utility function; a linear weighted sum of all the sensor input variables. The weights (parameters) are updated through Q-Learning, which is preferable to e.g. genetic algorithms in my case since I intend to implement the same algorithm in the robot microcontroller, and having something that adapts quickly in real time is necessary.
Now, my AI book mentions two ways I might proceed with this:
1. I might try to find and compute more features of the perceived state that I supply as input variables. This will allow me to continue with this crude approach, but it's problematic as it requires me to identify features. Furthermore, I already got way more inputs than I would like - for instance, for the robot to manage to walk at all I now have like 15 sensors (accelerometers, pulse encoders, etc). Although I can probably trim this down later, it makes more a more expensive and inefficient hardware implementation. Much better to solve that which can be solved in software.
2. It should be possible to discretize the continous variables into several parameters to obtain a much better utility approximation. The cart-pole balancing problem is mentioned, and indeed I checked out the BOXES (Michie and Chambers) experiment. Understanding this paper will take some time though due to its terminology differing, and it's also fairly old. Does anyone have a hint about newer implementations or papers that might perform better than BOXES or are easier to understand?