Advertisement

Estimating imprecise probabilities

Started by January 09, 2004 08:46 AM
10 comments, last by GameCat 21 years, 1 month ago
I''m looking for information on how to estimate a probability when you have limited information. More specifically I have a number of observations spread over time of whether a certain event E occurs or not. Later observations are in general more relevant than older ones and the observations are not independent, if E occurs at one point in time it is highly likely that it keeps occuring for a while until it stops occuring. In addition there can be large stretches of time with no observations at all. I do know basic probability theory and machine learning algorithms for this kind of thing but Laplace estimation seems a bit simple and I feel assuming that the process is a bernoulli process is going a bit far. Maybe I need to model the probability like a continous distribution, e.g. a beta one? I''ve also toyed with the idea of just weighting the data based on how old it is (e.g exponential drop off in accuracy with age), but I need a better idea of what established methods there are to determine what heuristics are acceptable. If anyone knows any introductory texts on this or pointers to people who have done research it would be great. In case it''s relevant, I need to estimate the probability of E occuring to calculate the expected value of perfect information whether E will occur or not.
Bayesian Networks are the primary solution for probabilistic reasoning with limited information, if you goto research.microsoft.com they have an excellent tool for constructing Bayesian networks to learn about them.

http://research.microsoft.com/adapt/MSBNx/

This is perfect for learning bayesian nets, but not much help for your own projects..

Hope this helps..

-lucas
-Lucas
Advertisement
Thanks for the link. I''m aware of Bayesian netrworks/decision nets, but I don''t really see how it helps me, since I''m observing a single event and trying to estimate its probability. The actual decision making is fairly simple, it''s estimating the probability correctly that I''m worried about.
Could you walk us through a little example? Is time discrete? Potentially, how many events might you be considering at one time?

-Predictor
http://will.dwinnell.com



Yes, time is discrete. In general I might have data like the following, where 1 means E occured, 0 that it didn''t occur and . (dot) that I don''t have any data for that time slice.

.....00....111100..

I don''t want to get to specific with examples since I really am interested in the more general problem. If I have the sequence 11111 for the latest five ticks I can be pretty sure that P(E) is high, but what if I have 11110 ? And what if I have very little information at all, do I use a uniform distribution for P(E) or do I use some determined in advance "a priori" value? I also toyed with the idea of using a beta distribution for the probability and just decrease the "number of tests" parameter when I get a simulation tick with no data. That still doesn''t account for the fact that later data in general is more relevant though.

What about modeling the probability in question using the past N events (using a regression, neural network, whatever)? N, of course, would be dictated by analysis and your data budget.

It would be necessary to understand the mechanism which causes the "missing" values to deal with them properly, but as a simple start, you might code non-existence of the event as -1, "missing" as 0 and existence as 1.

-Predictor
http://will.dwinnell.com

Advertisement
Is the 0/1 random? Basically, how do they relate? Or are you trying to extract a pattern, then use the pattern for prediction?

Perhaps my basic question is: why does 1111 mean a higher chance of a 1? Unless there is some statistical dependence between the 1/0/. which you are modelling, assuming a 1 next seems like Reverse Gamblers Fallacy.

More context would be helpful.

[edited by - BrianL on January 10, 2004 11:32:44 AM]
Let's sort out a few details of the problem...

Are you certain that the problem is non-Markovian. That is, does the state at time t, st , actually depend on the sequence of previous states st-1,st-2,...,st-n or is it merely that we believe it is because the sequence has a positive correlation (which actually could be entirely contributed by p(st|st-1) )? Only knowledge of the problem domain can answer this... unless you want to consider both cases?!

If you have no a priori information as to the transition distribution of the sequence then you'll have to learn it from data. Predictor's suggestion is one way, although it doesn't deal particularly well with missing information. Expectation Maximisation on a static Bayesian network that represents the rolled out non-Markovian process might be better. If you want to take into account diminishing importance of old observations, then this can be built into the arc weights between the current state and early time states. If it turns out that the problem is Markovian, you can learn your probabilities for a finite length rolled out Dynamic Bayesian Network. To take into account diminishing importance of early states, utilise the work of Jitnah & Nicholson.

My final suggestion is to implement Dual Estimation (estimating both the state of a process and the parameters of the model governing it). It can be implemented with coupled Recursive Filters or recurrent ANNs. Check out the work of Eric Wan as a starting point.

I look forward to reading more in this thread!

Cheers,

Timkin

[edited by - Timkin on January 11, 2004 7:29:28 PM]
Great! This is exactly the kind of information I was looking for. In general the previous state is the most important, in many cases it probably overshadows the earlier history of E. But I really don''t know for sure. In fact the whole point is that I don''t know a whole lot.

I''ll check out the pointers you provided but there''s one additional caveat. The method used has to be very computationally efficient since the (expected) utility I''m trying to maximize is roughly inversely proportional to computation time. So taking a really long while to make a great decision is kind of pointless...

BrianL, the sequence certainly isn''t random and it is reasonable to assume little change between adjacent time slices. Other than that though, we don''t know a whole lot about the model that governs the process.

All right, I''ll dive into my books and return when I have more questions.
quote:
Original post by GameCat
In general the previous state is the most important, in many cases it probably overshadows the earlier history of E. But I really don''t know for sure. In fact the whole point is that I don''t know a whole lot. ...

All right, I''ll dive into my books and return when I have more questions.



Do you have any data you could share?

-Predictor
http://will.dwinnell.com



This topic is closed to new replies.

Advertisement