Need some advice/pointers (Neural Networks related)
Hi AI forum people,
I'm working on a little multi-user project that needs to do some kind of AI-learning.
Basically - I have a set of initial factors. From these factors, I want the system to generate some "suggestion" and present it to the user. The user would then provide some feedback on the suggestion, so the AI can learn what is a good suggestion.
The first problem I'm having is how to "do" the feedback. First of all - there is no "correct" suggestion - some are better than others and some are just silly - and users will give a suggestion conflicting feedback. Also - I expect users would provide lots of "no, generate another" input before they provide a "yes" input.
I'm also interested in the possibility of having users input their own solution and then having the system learn from that as well.
The second problem is how to make the system "understand" the difference between suggestions that are inadequate for the given initial factors and suggestions that are internally "bad" (either not quite right or completely nonsensical).
Allow me to give an example - this isn't quite what I'm doing - but the idea is similar: Say we're generating RPG characters - initial factors might include our stats (strength, intelligence, etc) and what beasties exist in the area.
So - suggesting a wizard with low intelligence is a poor match for the given factors. But if we have high intelligence a wizard is a good match for the factors. Suggesting a wizard in a robe is probably good for "little beasties", but a wizard in a cloak of protection is better for "big beasties".
However - a wizard in full plate mail - even though it might be appropriate for the factors (intelligent character, big beasties) - doesn't make sense. Nor does a fighter wearing plate mail and splint mail (or two sets of plate mail at once), or a naked cleric.
But the system doesn't have any knowledge of what is permissible except what the user gives it. If the user says that armoured wizards or naked clerics or characters in two sets of armour are ok - then that's what the system will learn.
I'm thinking that a Neural Network is an appropriate way to implement this, and reinforcement learning seems to be the right direction. But I'm really in need of a lot more advice (and perhaps links) on how to create such a system.
The other tricky bit is that the current target platform is PHP/MySQL. Any thoughts about this?
A lot of people here are more knowledgeable than me in the field of Neural Networks. I think that reinforcement learning is what you are looking for.
Maybe you would like to look into Support Vector Machines (SVMs) which seeme to be the current state of the art algorithm for reinforcement learning. It allows the program to learn criterions to create categories.
PHP is a full fledged programming language so you should be able to implement any algorithm on it but you probably won't find a lot of existing libraries that you could reuse.
Maybe you would like to look into Support Vector Machines (SVMs) which seeme to be the current state of the art algorithm for reinforcement learning. It allows the program to learn criterions to create categories.
PHP is a full fledged programming language so you should be able to implement any algorithm on it but you probably won't find a lot of existing libraries that you could reuse.
Are the factors over which solutions will be postulated continuous or discrete variables?
I ask this, because what you have here is a blind optimisation problem: that is, an optimisation where the objective function is unknown and only accessible through functional evaluations (in this case, asking for feedback from an 'expert'). Furthermore, while you could conceivably write down the constraints on the optimisation (define the input and output domains), you'd also like to find this out through evaluation (which is really going to slow things down).
If I were tackling this problem myself, I'd be looking at a system with at least two components: the first attempts to define the input domain by working out which factors are correlated with others in 'sensible' hypotheses and more importantly, which factors are correlated in nonsensical answers (suggesting that it's this combination that is bad); the second component uses the current iteration of the first to generate candidate solutions, which it then tries to learn an objective function for. Breaking your problem down into components should save you time online.
As for tools to implement this... that really depends on the algorithm one comes up with for solving each of the sub-problems... and the answer to my original question.
Cheers,
Timkin
[Edited by - Timkin on April 18, 2006 7:30:36 PM]
I ask this, because what you have here is a blind optimisation problem: that is, an optimisation where the objective function is unknown and only accessible through functional evaluations (in this case, asking for feedback from an 'expert'). Furthermore, while you could conceivably write down the constraints on the optimisation (define the input and output domains), you'd also like to find this out through evaluation (which is really going to slow things down).
If I were tackling this problem myself, I'd be looking at a system with at least two components: the first attempts to define the input domain by working out which factors are correlated with others in 'sensible' hypotheses and more importantly, which factors are correlated in nonsensical answers (suggesting that it's this combination that is bad); the second component uses the current iteration of the first to generate candidate solutions, which it then tries to learn an objective function for. Breaking your problem down into components should save you time online.
As for tools to implement this... that really depends on the algorithm one comes up with for solving each of the sub-problems... and the answer to my original question.
Cheers,
Timkin
[Edited by - Timkin on April 18, 2006 7:30:36 PM]
For the input factors - I have both discrete and continuous inputs. Using the RPG-character-generation example again - there might be discrete "setting/location" input - which could be the stock-standard "middle-ages", but might also be "Wild West" or "Feudal Japan" or "Outer-space" (or many other places). But then there's stats like "strength" and "intelligence", that are continuous.
Really the output is kind of like putting together a modular system. Some items go together - some don't (to varying degrees) - certain combinations are required to match the given input (while individual items might not do it on their own).
I'm afraid I've not found any literature, so far, on how to get a neural network to do this kind of "put something together" task.
I'm not quite sure what you mean by breaking the tasks up. I suppose I could get creative with the user input - somehow getting separate feedback for the suitability of a given output for the given input and for if the output is internally sensible. Is this what you mean?
Really the output is kind of like putting together a modular system. Some items go together - some don't (to varying degrees) - certain combinations are required to match the given input (while individual items might not do it on their own).
I'm afraid I've not found any literature, so far, on how to get a neural network to do this kind of "put something together" task.
I'm not quite sure what you mean by breaking the tasks up. I suppose I could get creative with the user input - somehow getting separate feedback for the suitability of a given output for the given input and for if the output is internally sensible. Is this what you mean?
Quote: Original post by Yvanhoe
Maybe you would like to look into Support Vector Machines (SVMs) which seeme to be the current state of the art algorithm for reinforcement learning.
There is also a support vector machines algorithm that supports badly labeled examples in the training set.
Quote: Original post by Andrew Russell
For the input factors - I have both discrete and continuous inputs. Using the RPG-character-generation example again - there might be discrete "setting/location" input - which could be the stock-standard "middle-ages", but might also be "Wild West" or "Feudal Japan" or "Outer-space" (or many other places). But then there's stats like "strength" and "intelligence", that are continuous.
The problem is not so much that they are discrete, but that they are non-ordered.
Have you considered expert systems and learning trees?
[Edited by - Steadtler on April 19, 2006 9:08:04 AM]
Quote: Original post by Andrew Russell
I'm not quite sure what you mean by breaking the tasks up.
Think of the problem of finding an optimal solution. First, you need to find solutions that are valid given the problem constraints and second, you need to find from within the valid solutions, the one that maximises your objective function. Most optimisations (even when conducted in a supervised learning scenario) assume that all hypothesese are valid when proposed. Yours are not.
Hence, you need to do something to try and minimise the number of invalid solutions proposed, lest you waste a lot of time generating invalid solutions and not actually searching the space of valid solutions.
Think of it like a poorly encoded chromosome. Many individuals will express the genotype incorrectly and presumably never live, while the few that do will represent a sparse sampling of the valid population. Such sparse samples will make optimisation of the chromosome difficult.
Through trial and error, you could deduce which attributes in your input space are highly correlated in invalid candidate solutions. When generating candidates for presenting to the 'expert', show preference for those that have a low correlation in invalid solutions. This will cover both situations of valid combinations and combinations that have not been considered much. Allow the expert to grade the solution on, for example, a scale from 0 to 5, with 0 being invalid and 1 to 5 being its suitability.
Having given this problem some more thought last night, I also think you might benefit from an information theoretic approach to generating candidate solutions. When you receive feedback about a current solution, you want to maximise the Value Of Information in the next candidate you generate, which will presumably be based on an alteration of the current candidate. Which attributes to change can be determined by a VOI algorithm. You can view this part of your problem as like playing 20 questions (or animal/vegetable/mineral).
I'll give it some more thought today and see if I can come up with anything else that may be useful to you.
Cheers,
Timkin
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement