LT Tracking Project Preliminary Results

mgallop · March 2016

A quick follow up to Dave's excellent point. If you want to test that the distribution of 4s is off, you can't just check that each count is within the expected range, and put some stars next to it if p < .05, since if the stats are true, then we should expect 1 out of every 20 4s to have a "significant deviation" and there are 26 4s. My hunch is that the right way to do this would be a chi-squared test, where we see if the overall null ".33 for the 5s, .35 for the 4s" is rejected by the data.

Alternatively, I was bored and wrote a monte-carlo in R, it takes two lines of code:

set.seed(1234)
tokens = rmultinom(100000, 450, c(rep(1/30, 3),rep(.035,26)))

The first three rows will give the number of draws of each 5 (lazy simplification, assume only current LTs), the next 26 will be the draws of each 4. Each column is a draw of 450 legendary tokens.

I'm going to just focus on the 4 draws here.

Lets look at the expected distribution of the minimum and maximum drawn 4:

mins = apply(tokens[4:29,], 2, min)
maxs = apply(tokens[4:29,], 2, max)

quantile(mins, c(.025, .975))
> 2.5% 97.5%
> 5 11
quantile(maxs, c(.025, .975))
> 2.5% 97.5%
> 20 28

So, the draws of 4s you recorded are actually more even than we would expect if we assumed the distributional facts are correct. You would expect to draw even fewer of your rarest 4 (1-3) and way more of your most common 4.

Note: I'm really jet lagged and this may have mistakes in it. When I'm more awake, I may see if I can look at how well the data corresponds to a chisq test.

Edit: Yep, I was applying to the wrong dimension. The spoiled conclusions were wrong and bad! So, the real results are that 4 starlords is slightly low for your most rare character, you should get between 5 and 11, and your most common 4s were in line with expectations...

PeterGibbons316 · March 2016

Mercurywolf wrote:

Based on your statement of 2 contributors providing 3/4 of the data you gleaned, this does not provide the best statistical analysis of Random Distribution. A better sample would be say 50 people reporting what they pulled for the exact same number of pulls, say 50-100.

With 2 people providing data on 327 (75% of 436) pulls, your statistical analysis is off because this doesn't reflect the whole. As has been stated, the 10% rate of 5* is NOT per person, but an average of a whole. Unfortunately, if the two major contributors were facing less than a 10% distribution, your data is skewed because of the weight of the contribution.

In a case like this, more accurate data is gleaned from having a larger sample size in the form of contributors, with each person contributing the same amount of data. Ex. If I were taking a survey, I would not be able to get good results if I gave 20 people a survey with 3 questions, and a different 5 people a survey with 20 questions. To get the best data, I'd have to give all 25 people the same survey with the exact same questions on each survey.

If the person/account/device/etc. isn't a factor then it doesn't matter where the pulls come from.

Your survey example is not applicable because the questions on the survey are different. In this case each pull is exactly the same (or so we assume).

In fact, having much larger samples from the same person actually helps to identify whether or not there are other factors at play linked to an individual player. I would argue that 200 pulls from one person would be better than 10 pulls from 20 people (although I'd rather have 200 from all 20 if I could get it.)

simonsez · March 2016

Mercurywolf wrote:

Based on your statement of 2 contributors providing 3/4 of the data you gleaned, this does not provide the best statistical analysis of Random Distribution.

Actually, if what I believe to be true, that the pulls of individual players cluster, but in aggregate, they look pretty normal, this was an accidentally good way to observe this, without having to do statistical tests that control for each individual submitting data.

HaywireII · March 2016

I got a chuckle out of your Starlord finding. I have five 5* covers (three Phoenix, one OML and one Surfer) and no Starlord yet. Last 4* (aside from Devil Dino) which I haven't pulled.

Jaedenkaal · March 2016

I've only pulled one starlord cover ever (red, as it happens) in a year and a half of playing (out of a standard token too, if I remember correctly).

I would agree that you do not have enough data to accurately model the probabilities of drawing each individual cover.

You might have enough to model the chances of drawing a 4* cover of a particular color (regardless of character), since you would expect that distribution to match the known distribution of colors among all 4* powers. Not sure how useful that would be, although presumably it's no different from any other probability distribution you are attempting to measure.

EDIT: Another thought: The ratio of covers pulled depends entirely on the number of possibilities available. We can't include all historical LT token drawing data since there are several 4* characters that were not present in previous LT pools. Any data we use can only be compared against other data sets from the same LT history 'segment'.

sc0ville · March 2016

Great thread, I'm enjoying reading the back and forth.

One correction:

"On the 5* front, OML Yellow and Red, Surfer Purple, Phoenix Green, and Goblin Purple "

jffdougan · March 2016

I've only opened two LTs this season (I'd self-define as pretty casual), and PM'd both results. Star-Lord purple and Thing green, both from Latest tokens. One token was the result of a PVE vault draw; the other came from raising Gamora to level 167.

Edit: might a Google Form work better as a way to keep track of such things? could then keep a running total & see whether stats even out a bit.

Unknown · March 2016

mgallop wrote:

A quick follow up to Dave's excellent point. If you want to test that the distribution of 4s is off, you can't just check that each count is within the expected range, and put some stars next to it if p < .05, since if the stats are true, then we should expect 1 out of every 20 4s to have a "significant deviation" and there are 26 4s. My hunch is that the right way to do this would be a chi-squared test, where we see if the overall null ".33 for the 5s, .35 for the 4s" is rejected by the data.

This is definitely an excellent point too; picking and choosing like I've done doesn't seem proper. But I would point out that while we might expect one out of 20 would be close to α, in this case we have 5 out of 27 at that point. Not as interesting as I thought, but maybe still a little interesting?

In any case, on some of these dimensions we are close enough to the line that user error might easily account for statistical artifacts (see: "X-23 black").

mgallop · March 2016

Malenkov wrote:

This is definitely an excellent point too; picking and choosing like I've done doesn't seem proper. But I would point out that while we might expect one out of 20 would be close to α, in this case we have 5 out of 27 at that point. Not as interesting as I thought, but maybe still a little interesting?

In any case, on some of these dimensions we are close enough to the line that user error might easily account for statistical artifacts (see: "X-23 black").

Quickly checked my MC, and yeah, you get at least 20 copies of at least 5 characters in about 2.8% of draws, which is slightly under conventional statistical significance.

LT Tracking Project Preliminary Results

Comments

Categories