Let me summarize my viewpoints (the first three are interconnected):
1) The frequentist school of thought trumps the Bayesian school of thought.
2) As a corollary, always seek a p-value. Disregard any Bayesian analysis with uninformed priors. (Of course, Bayes' theorem holds, it's just that it often is underdetermined.)
3)
Never speak of the probability of something that is not somehow randomized. If there is randomization, you
must know what is being randomized and how. For a valuable illustration of this, see the
Bertrand paradox (this is perhaps my favorite subject on the treachery of probability when the problem is not carefully defined).
4) All probability is ultimately quantum mechanical in nature. (I do not know if anyone else subscribes to this philosophy, but it makes sense to me. A thorough explanation of my belief would be too lengthy for this forum post.)
So you're fighting battles and looking for monster drop rates. Suppose you fight 100 battles against a particular monster and it drops the item you want 48 times. You would like to know P(p=0.4) (or something), where p is the probability that a monster drops an item, given that you got 43 out of 100 drops. In standard notation, you're really looking for P(p=0.4 | 48 out of 100 drops). Hence the allure of the Bayesian analysis. Seems simple enough, right?
No can do. That drop rate was
not randomized-- it was
chosen by a game designer-- and it therefore fails criterion number 3 listed above. Any discussion of the probability that p=0.4 is meaningless because it either
is or it
isn't according to the programming of the game.
We can, however, analyze the problem with p-values. Let the null hypothesis be H
0: p=0.4 and the alternative hypothesis H
a: p=/=0.4. Supposing the null hypothesis to be true, we would like to know the probability of obtaining a result at least as far out as 8 from the expected mean (40). Therefore, it is a two-tailed test. The p-value could almost certainly be obtained by application of the central limit theorem and estimation of the pdf as a normal distribution, but I'm just not feeling up to it right now, so I'll instead use the binomial distribution directly.
We want P(X>47 or X<33 | p=0.4) where X is the number of successful drops and p is the probability of a successful drop. Now it's simply a matter of finding b(48; 100, 0.4), b(49; 100, 0.4), b(50; 100, 0.4)... as well as b(32; 100, 0.4), b(31; 100, 0.4), b(30; 100, 0.4)... and so on, where b is the binomial distribution function, and adding them all up. Doing so, I obtain a p-value of about 0.12, which is generally not low enough to reject the null hypothesis at a reasonable significance level.
Take note of what happened here. I suggested that you fought 100 battles (a lot!) and even gave you a success rate close to 50% (which would be ideal) and yet you still couldn't reject the null hypothesis that the actual probability of a drop is 0.4. That sucks. What it means is that you have to fight a lot of battles-- maybe 1,000 or 10,000-- before you can start pinning down the probability with any reasonable certainty. I might have started this exercise with the null hypothesis that p=0.5 and although it would have given a larger p-value, it would by no means be conclusive over other theories.
What should you do? You have a few options:
1) Just assume that p = (number of successes)/(number of trials) and be done with it.
2) Make a lua script that fights battles for you and run the game at a high speed overnight to dramatically increase your sample size.
3) Attempt to find and/or deconstruct the RNG so that you can pinpoint exactly what the drop rate is. This could be an hour's work or several months' work, depending on the game.
4) Attempt to characterize the drop rates. It's likely that the drop rates are not completely arbitrary (if they are, you are
screwed!), but instead some integer fraction of 10 or some integer fraction of 256 or the reciprocal of an integer, etc. If you can characterize how they chose the drop rates based on your limited number of samples (beware of errors arising from small sample sizes), you may be able to pare down your null and alternative hypotheses to give you more definitive p-values.
5) Similar to option 2, find a few dozen close friends to run the game and report their drop rates, then take the aggregate statistics.
Option 1 is useless for low drop rates. Options 2 and 3 depend on the game and system being emulated. Option 4 is nice, but would likely rely on option 2 being used in conjunction with it. I consider option 5 a last resort, but I think TASVideos would be happy to help if it seemed worthwhile. If I were running the game, I'd first try option 3, then option 2 and option 4, finally begging for option 5 before giving up and settling on option 1.