Advertisement

Does anyone else get this?

Started by September 15, 2004 09:14 AM
6 comments, last by hatch22 20 years, 2 months ago
This isn't really an ai related thing, but I'm programming little objects that can survive on their own without me giving them any direct commands or built in strategies. So basically they just see what they need, have a bunch of functions that they dont know what they do but trial and error a bit to try and survive. Sorta like self learning, I'm also later going add better data analysing so it learns more realistically, i might build in the data analysing, so they start of a bit intelligent. ............ But apart from that this is pretty complicated for me, but I can do parts of it, and i've got an idea of a whole lot of parts that will work together to make it work. But thinking about it off the top of my head i can't even begin to think on how to do it, but as i have a vague idea on how to do it, i should be able to. Like do you get that when you are trying something quite hard, and very complex but you are able to do it incrementally, but not really even beginning to know how each components going to work, only as you get there, slowly by slowly you can get it all working? As I find I can do some really neat stuff in programming, but not all in one hit. I guess you would have to be one of those crazily smart people who just program things in one go. edit: Also do you get that when you arn't even thinking about it sometimes it just dawns on you, and you have to write it down quickly before you forget. (btw I know its the wrong forum, but i just want to know what people in this forum think)
yes, I understand what you are saying. I am writing a poker bot that will play poker by itself. Not only play poker but play poker well enough to win. I am beginning programmer and have been working on it over a month now and so far I've had numerous revisions and have started completely over a couple of times. I start new programs and then implement that into the big one if the program works. I probably won't be of much help to you, though since what you want to do is way over my head.
Advertisement
The concept of "itterative design/programming" also applies to learning. If you know the general direction you are heading, learn/try/design/test/analyze in small, easily digestable chunks.

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

Bottom up programming.

Don't try to get it all at once, just do what you can, make it robust, build on top of it.
If a plant cannot live according to its nature, it dies; so a man.
two recomendations -

1. try to solve the problem yourself first. This will give you some insight into good & bad strategies, and the different ways to approach the problem.

2. first code up the first, stupidest, brute force approach you can think of. Maybe it can only do one thing out of 15 choices. Then add in more and more options. You'll almost certainly want to throw away the code you come up with, but again you'll get insight into the problem, and can identify what parts of your game universe are more or less important to the survival of your critters
I think that's a really interseting idea. Sounds kind of like a genetic algorithm to me... it might be interesting to try and implement that...

You'd have to have *some* basis for it - even animals and humans have base strategies - instincts.

So you'd probably want to put in some base events, like hunger, and need to reproduce... basically every basic need - back to biology!

I'm sure you could evolve some really interesting behaviors out of this though, but remember, life evolves through interacting with its environment, so you'd not only have to create a lifeform, but an environment for it if you wanted to model real genetic evolution.

But now I'm rambling.


As for me and moments of good coding; yeah, it happens. Sometimes it lasts for a day, sometimes for 5 minutes. Just makes life more interesting [wink].

Mushu - trying to help those he doesn't know, with things he doesn't know.
Why won't he just go away? An question the universe may never have an answer to...
Advertisement
Quote: Original post by Mushu

You'd have to have *some* basis for it - even animals and humans have base strategies - instincts.

So you'd probably want to put in some base events, like hunger, and need to reproduce... basically every basic need - back to biology!


well right now i've got a bunch of attributes, eg power,x,y etc

and then i got target power, target x, target y.

so it tries to reduce the difference of the attribute and target attribute
It sounds as if what you are trying to develop is similar to Reinforcement Learning. Reinforcement Learning works (in simplified terms) by rewarding the AI Agent (one of your objects) with a numerical reward when something good happens to it (i.e. they get something they need). Negative rewards can be used when something bad happens to it. You can think of rewards as degrees of pleasure or pain. The basic system works like this:

Initialization:
Every Action an Agent can perform is given some initial Value. In your program, you can think of a Value as the probability of survival. This Value indicates how benificial the Agent thinks the Action is in the long run given its current perception of the environment (State). In many Reinforced Learning systems all Actions are given the same Value to start with, but this is not a requirement. Actions are also given immediate Rewards for doing something that increases the chance for success (in your program's case, object survival). Most Rewards for most Actions will be zero or very small because they will not show much immediate benefit. For example, an action like Eat would only be given a reward if Food was percieved and Health was needed (the State included See Food and Need Health). Now that everything is set up, the following cycle occurs:

1. The Agent percieves its Environment. The Agent's current knowledge of itself and its Environment define its State.

2. The Agent does an Action. The Action is chosen in one of two ways:

a) The Agent examines the Values of States that various Actions have been known to lead to and picks the Action that leads to the State with the best Value (not necessarily the best Reward).

or

b) The Agent picks a trial guess that is often completely random but does not have to be (it's up to you how a trial Action is picked).

3. The Environment is affected by the Agent's Action.

4. The Environment affects the Agent, either by something good happening, something bad happening, or nothing happening during the time it takes for the Action to occur.

5. If something good happened, the Agent recieves a Reward proportional to how much the Good Thing encourages success (survival). If something bad occurs, the Agent gets a negative Reward to show how the Action discourages success. If nothing happened, a Reward of zero is used.

6. The Agent updates the Value of the State it entered as a result of the Action taken. The updated State is based on the Reward recieved for the Action. If the Reward was beneficial, the probability of servival will increase by some amount.

Note: The Agent has not yet percieved this newly entered State (we will do that when we start the cycle over), but the result of an Action is always a State, even if the State is no different from the last (e.g. the Agent moves but doesn't notice any changes in its environment: no enemies, no danger, no food, or anything else of significance). Therefore the Agent can still assign a Value to an unknown state because it will be known shortly (when we get around to percieving it).

7. The Agent updates the Value of the previous State by some fraction of the difference between the Value of the last State and the Value of the current State. This is called the Temporal-Difference Method or TD Method.

8. Go back to step 1 and repeat until satisfied with the learned behavior, decreasing how often exploration (a trial Action) occurs if it seems that the Agent is behaving inconsistently after a long learning period.


The beauty of this system is that it is capable of self-learning and planning, which is something a Genetic Algorithm(GA) does not do. Reinforcement Learning(RL) learns by experience, while GA's only improve by "killing off" the "genetically dumb" agents and replacing them with better breeds where the probability of making a good choice in a given State is higher. Both will adapt to a situation, but RL can actually plan ahead based on previous experience without having to die. Of course, "dying" does not have to make the RL program forget what it learned either. A respawned or newly born RL can pick up where the dead one left off if you want. Planning works because the Agent is willing to put up with a lack of reward or even with pain for a while as long as the odds of recieving a big reward later are high. This means that instead of always picking the choice that seems immediately best like a GA would, it may learn to choose something that is even better down the road. When the program starts, the Agents should explore more possible actions so they don't get stuck using a strategy that is only somewhat effective early on. Then they should explore less frequently as time goes on and they hone their judgement toward making consistently good decisions and strategies. However, if they spend too much time exploring at the beginning they may not gain enough experience with particular good strategies that they have stumbled upon. Balancing the amount of exploration of new methods vs. tried and true methods is the tricky part with RL's.

For a good example of this system at work, check out:
http://www.fe.dis.titech.ac.jp/~gen/robot/robodemo.html

This site demonstrates a robot arm learning to pull itself forward. When you run the Applet, give it some time to figure out how to move. Wait for at least ten minutes before you really expect to see semi-intellegent behavior. After another ten to twenty minutes it should be doing quite well.

For more information on Reinforcement Learning, check out:
http://reinforcementlearning.ai-depot.com/Main.html

http://www.research.ibm.com/massive/tdl.html


There is also an online textbook here:
http://www.cs.ualberta.ca/~sutton/book/the-book.html
Although this gets quite technical and is a bit math intensive.

[Edited by - hatch22 on September 20, 2004 7:13:54 PM]

This topic is closed to new replies.

Advertisement