Email questions, requests, and suggestions to wordcruncher@byu.edu.
BYU faculty and staff services: We can come to your office, demonstrate and install WordCruncher, answer questions, discuss your projects, and help you and your students as needed.
You have data showing the number of times certain words occur in different corpora. Here a very small sample of the data:
Corpus 1 | Corpus 2 | |
---|---|---|
occurrences of king | 1,959 | 136,885 |
Total words | 1,151,029 | 1,001,610,938 |
Adjusting for the size of each corpus, you know that king occurs more frequently in corpus 1 than corpus 2. How can you know if this a significant difference between the proportions?
Luckily, you have all of the information that you need in order to calculate a two sample confidence interval for proportions.
A confidence interval is a range of values, like [.021, .032]. With a certain level of confidence (95% confidence for this calculator), we can say that this range likely contains the true difference between two population proportions. If the confidence interval range does not span the value 0, there is likely a significant difference between the two proportions.
The calculator produced a confidence interval of [0.001490, 0.001641]. We can be 95% confident that the difference of the samples falls between 0.001490 and 0.001641, which means that it is very likely that there is a statistically significant difference between the proportions of king in these corpora.
Calculate
\[CI = p̂_1 - p̂_2 ± 1.96\sqrt{\frac{p̂_1(1-p̂_1)}{n_1} + \frac{p̂_2(1-p̂_2)}{n_2}}\]