Advertisement

autotesting ai programs

Started by September 03, 2006 07:42 PM
7 comments, last by Timkin 18 years, 2 months ago
Probably most people related to programming have heard about auto-testing their programs. Auto-testing refers to writing, for each routine in your program, of another routine that would test the first routine's correctness. For example, suppose you have implemented a function pow(x, y) that raises x to power y. Then you would have a testing routine that could look like this: void pow_autotest() { if ((pow(1, 10) == 1) && (pow(2, 5) == 32)) std::cout<<"OK"<<std::endl; else std::cout<<"error!"<<std::endl; } Basically, the idea is that you test the function for some inputs for which you know the correct answer. You then hope that if it behaves OK on these inputs, then probably it also behaves OK on all other inputs. My question is, how do you do it for a very complex routine? It was easy to pick some inputs to pow for which you know the answer. But suppose you implement back-propagation in a neural network. So you have a routine that receives the network and a set of training examples, and returns the trained network. How do you choose the test case when you know the correct output weights? I probably could reasonably simply derive the correct answer analytically in trivial cases (like one neuron with one synapse). But this is kind of a boundary case (like 1^10 in pow). A better test would be a two-layer network with two neurons per layer. But I would have no idea what the correct final weights are. Calculating them by hand could also be difficult. Imagine that you train on 50 examples and so calculate 50 updates. Or you train on just two examples, but then it takes 50 iterations to converge. Or you want to test a routine for rendering a 3D object, etc. Does anyone have experience with autotesting complex algorithms?
The answer to this is that you need to be able to write down a metric for 'correctness'. That's easy to do for functions like xy, but harder for arbitrary mappings of the form y = f(x) if you don't know what f() is (and only have input-output pairs (x,y). If you have a training set, then one way to test for the 'correctness' of the mapping over the set is to partition the set into a training subset and a testing subset. An 80/20 split is usual. Train on the 80% and test on the other 20%, note the performance using your chosen index (say RMSE over the test set) and then repartition the data and try again. If you get a consistent performance and approximately the same set of weights, then you can be fairly confident that the mapping you have is a good one.

So, how does this relate to problems like testing a 3D graphics algorithm? Well, again, you need a validation metric. So, you'd need to know what the scene was supposed to look like (perhaps by using a scene generated by another algorithm that you assume is correct). Then, use the same type of validation approach. Test your routine and compare its results to the 'correct' algorithms output. Do this over many cases and compute the average error difference. This is then a measure of how different the algorithms are on that training set.

Cheers,

Timkin
Advertisement
Actually, this is commonly called "unit testing", first time I hear it called autotesting...

Basically, when you're planning to unit test your software, you break it up in smaller parts. You then test each small parts for simplicity. If your functions are like a huge blackbox and you plan on pushing inputs and then testing 1024 cases, there's something wrong.

As for your backpropagation example... you should create a test case with known neurons/layers/inputs/desired outputs and then validate that with the given input the network outputs the desired output. If you're confident your implementation is "right", you can use its own data to build your testcase. This will detect if you're breaking your implementation by changing a function that's used internally and creating a bug. I would recommend that you calculate the result by hand though.

Complex method testing is often done by creating graphic output and looking at its behavior. If it looks strange, you'd look inside of code. If it was correct behavior you'd call yourself idiot and continue work.

Auto testing is often referred to creating two instances of your program, or replacing player by AI, and forcing it to play against each other. You could observe result when lunching and think about all misbehavior and crashes you see.
Quote: Original post by yunnat
... But suppose you implement back-propagation in a neural network. So you have a routine that receives the network and a set of training examples, and returns the trained network. How do you choose the test case when you know the correct output weights?


As you've probably gathered by now, this isn't how you test to see whether a network is correctly trained.

Let me give you a different example. I'm teaching eighth-grade math. I'm trying to teach Johnny the basics of algebra, so I give him a bunch of different questions. Now I want to see if he's learnt the rules I was teaching him. How do I do it?

What I don't do is to open up his head and see if his neurons look correct. This is because

a) It's a hard, messy operation
b) I don't know what I'm looking for

... and of course it would kill him.

If you already knew what the neuron weights should be, there's no point in using backprop or any other algorithm to get the weights (point b). And, as you say, checking each neuron weight in a 10-20-1 network is rediculous (point a).

Rather, I just give him a test with 100 new problems that he hasn't seen before. If he gets those questions right, or a statistically significant sample of them right, then I can say with a certain degree of statistical certainty that Johnny understands algebra. I can't ever say it with 100% certainty, but I can get pretty darn close.

So, if you're training backprop, you must have some way of generating input-output pairs. If you have 100 of them, simply reserve, say, 30 of them for testing, and train on the other 70. Of course, if you only have 100 pairs in total, you should ditch the ANN and just use a look-up table.

Thank you all for your inputs.

I'd like to clarify a few things.

The name 'autotests' may come from the XP book. 'Auto' should imply that the tests are completely automatic. I have a script which may well be called RunAllTests.pl, I run it with a single click of a mouse, and it gives me just one bit of information: either OK or Failed. It's essential for tests to require no human interaction, otherwise they will be too time-consuming to run.

So, I agree that looking at the behavior of the program and noticing any strange things is useful, especially right after you just finished implementing a major new piece of functionality. But, this is unsuitable as a strategy for auto-testing, because it would require a human observer.

In addition, I would like to distinguish between testing an algorithm and testing the implementation of an algorithm.

To test e.g. the back-propagation algorithm, you would train it on some examples, test on other examples, and will report that it performs 90% correct, compared to 60% correct for some alternative algorithm.

To test the _implementation_ of back-propagation, performance data is not useful. The implementation may give 10% performance, and it may be OK since back-prop happens to work poorly on your problem. The implementation may also give 90% performance and still be buggy but just happen to perform well on your data set. So, what you need to do is to compare your implementation of back-prop to its specification (e.g. from a textbook). How would you do this?
Advertisement
Quote: Original post by yunnat
So, what you need to do is to compare your implementation of back-prop to its specification (e.g. from a textbook). How would you do this?


Well, I guess no one answered the question in that way because it's just a basic programming question, not an AI question.

When you perform autotesting on a program you have written, all you are doing is testing each individual function, and seeing whether it behaves as expected. Well, ditto for a neural network.

The backprop algorithm is just a bunch of mathematical equations strung together. Each part of the equations will have its own method or function in your program. So all you have to do is to check each equation. Say you're using a sigmoid function as your activation function: when you input x, does it return sigmoid(x)? Use autotesting.

I'm not quite sure what you're asking. As you say, autotesting (or unit testing) is for individual functions. If you know what each function is supposed to do, you should know how to create a unit test for it. But then you say that you can't test the entire, complicated neural network as a whole. Well, autotesting doesn't test entire programs, it just tests the parts.

If you still want to test the entire network, see whether a 2-2-1 network can solve the XOR program. If it does, that's it, you're done. You can't check all the different weights, though, because there is no one right answer. Backprop may have made one of the hidden neurons the AND checker, or the other, you won't know in advance.
The main problem with what you're calling auto-testing in game programming is this:

1) you write a ton of methods and code and such. Things get outdated and refactored all the time. This would mean you end up re-writing test code all the time.

2) who's writing the functions that test your test code?

3) I've never seen test-driven development used in game code; honestly the game _is_ the test. It's pretty easy to see when things are going wrong. i.e. the test code would largely be redundant. It's more useful for database programming and such where there's no immediate indication that things are amiss.

Test driven development was _ok_ for web code, but when i was in that world we ended up abandoning it because we spent so much time debugging our test code and keeping it up to date that it was a waste of time. I'm clearly biased so whatever. I just hate the system.

-me
I'd have to agree with what Asbestos said...
Quote: Original post by Asbestos
When you perform autotesting on a program you have written, all you are doing is testing each individual function, and seeing whether it behaves as expected. Well, ditto for a neural network.

If what you want is implementation testing (and not algorithmic validation), then this is an issue of implementation quality and validation. How do you validate any program? You run it over its range of inputs and inspect the range of outputs. For a sufficient variety of input cases you compare the output cases to the theoretical case and score the implementation. You may deem the implementation to have failed if it gets at least 1 wrong answer. Presumably your theoretical cases come from a pre-solved problem (either an accepted 'gold standard' implementation, or a hand calculated one).

If it's a neural network implementation we're talking about, then implementation validation is just a matter of validating every mathematic operation implemented and the integrity of every data structure used under all of the transformations applied to that data structure.

Of course, most people implement neural networks by actually implementing a graph object, when what they should be doing is solving a set of linear algebraic equations (which are far easier to validate)!

Cheers,

Timkin

This topic is closed to new replies.

Advertisement