Advertisement

Markov decision processes vs normal searches

Started by July 30, 2002 06:57 PM
4 comments, last by Kylotan 22 years, 4 months ago
After reading about them in other threads, I just scanned some simple articles on Markov decision processes, and as far as I can see, it seems to be a method of taking uncertainty into account when doing some sort of planning. But surely you can do this with a normal search method where you just generate all the successor states? You could just modify the cost function by the probability of that particular state change occuring, no? Especially since the Markov assumption or whatever it is means that the new state only takes the previous state into account, so I would have thought you could simply factor the probability into the operator that generates successive states. What interests me is that, when you strip it down, it looks to be very similar to algorithms such as A*, BFS, etc, but there is no mention of this similarity. I am sure I am missing something important, so if anyone can shed some light on this without directing me to a 1000 page tome full of equations, I''d be very grateful. [ MSVC Fixes | STL | SDL | Game AI | Sockets | C++ Faq Lite | Boost | Asking Questions | Organising code files ]
Heh yeah I came to the same conclusion : the main difference is that Markov Decision Process sounds much cooler

Either that or Markov chains make up the problem space, while solutions can vary.

Not sure really. I''m sure Timkin will explain.


Advertisement
Argh.... I hate emacs... it''s got me using CTRL-w to cut text!!!

Okay, take two at this post...


A state space or plan space search solves an implicit MDP but returns only the optimal path or set of actions, whether otpimal is measured in terms of minimum cost or maximum utility (or minimum expected cost / maximum expected utility for stochastic problems). In the case of plan space search the decision problem is obvious. In the state space search, the decision problem is implicit in the mapping from states to actions. In non-monotonic domains (where there is not a 1-1 mapping from states to actions) state space search is not feasible.

While an MDP is just a decision problem, the solution of an MDP is actually a universal plan; a policy . This policy provides the optimal action to take in every possible state of the domain, so that no matter where the agent ends up after an action (because action outcomes are stochastic), the agent knows which action it should attempt. Since every state is covered, then eventually the agent will achieve the goal, even if it cannot guarantee taking a particular path through the state space.

Does this highlight the difference for you, between solving a search problem and solving an MDP?

In continuous or unbounded domains then the MDP is usually solved over a finite horizon of states that the agent is likely to visit while trying to achieve the goal.

As for POMDPs, things get even uglier. Since the agent has incomplete information regarding it''s state and that of its environment, then there is a lot more uncertainty in a) where the agent is at any time; and, b) where the agent is going to be at any future time. Finding an optimal policy in such domains is quite difficult.

I hope this helps clear things up a little.

Cheers,

Timkin
Ah that''s much better thanks. The Dean paper is tough reading (for me) but thanks for the link.


I must admit that it doesn''t really help much yet, but that is nothing to do with the quality of your explanation; more down to the fact that my AI understanding is not very advanced. I''ll try and read around the subject if I get a chance, and your post has given me a few more concrete things to look into, so thanks.

[ MSVC Fixes | STL | SDL | Game AI | Sockets | C++ Faq Lite | Boost | Asking Questions | Organising code files ]
If you have any particular questions about any of the papers you read, feel free to ask (either here or send me an email). I''m always happy to help people expand their ''bag of tricks'', particularly in AI.

Cheers,

Timkin

This topic is closed to new replies.

Advertisement