WordCruncher: Significance Calculator

Comparing Corpora with Statistical Significance

You have data showing the number of times certain words occur in different corpora. Here a very small sample of the data:

	Corpus 1	Corpus 2
occurrences of king	1,959	136,885
Total words	1,151,029	1,001,610,938

Adjusting for the size of each corpus, you know that king occurs more frequently in corpus 1 than corpus 2. How can you know if this a significant difference between the proportions?

Confidence Interval for Proportions

Luckily, you have all of the information that you need in order to calculate a two sample confidence interval for proportions.

A confidence interval is a range of values, like [.021, .032]. With a certain level of confidence (95% confidence for this calculator), we can say that this range likely contains the true difference between two population proportions. If the confidence interval range does not span the value 0, there is likely a significant difference between the two proportions.

Example calculation for king

The calculator produced a confidence interval of [0.001490, 0.001641]. We can be 95% confident that the difference of the samples falls between 0.001490 and 0.001641, which means that it is very likely that there is a statistically significant difference between the proportions of king in these corpora.

Try the calculator out for yourself and see the results!

Calculator

Two Sample Confidence Interval for Proportions

Occurrences in sample 1

Sample Size 1

Occurrences in sample 2

Sample Size 2

Calculate

\[CI = p̂_1 - p̂_2 ± 1.96\sqrt{\frac{p̂_1(1-p̂_1)}{n_1} + \frac{p̂_2(1-p̂_2)}{n_2}}\]