Email questions, requests, and suggestions to wordcruncher@byu.edu.
BYU faculty and staff services: We can come to your office, demonstrate and install WordCruncher, answer questions, discuss your projects, and help you and your students as needed.
Vocabulary Dispersion Report
What if we measured vocabulary rankings not by the frequency of a word, but by the number of texts in which it occurs? The term “acclivity” is ranked # 343 in frequency (occurring 101 times) in Brandon Sanderson’s novels Skyward and Starsight, but does not even rank in the top 60,000 words in the Corpus of Contemporary American English (occurring only 9 times). While it occurs many times in a single book, it isn’t a common word you’ll encounter elsewhere.
Linguists use a measurement called “dispersion” to determine how dispersed a word is. If a word has an even dispersion, the word occurs through many of the texts. On the other hand, if a word has an uneven dispersion, the word occurs only in a few texts. One common usage for dispersion is in frequency dictionaries, which sort words by both a word’s frequency and dispersion. Here is an example from Routledge’s A Frequency Dictionary of German:
This dictionary shows the normalized frequency (Freq) and a dispersion metric (1.0 being the most evenly dispersed and 0.0 being the least). The methods for calculating dispersion vary, but for the purposes of this article, I’ll use R%—essentially a percentage of how many texts contain the word “X” (not accounting for the frequency of the word).
In the TED Talk corpus, the word king
has a frequency of 2,770 and its dispersion is 50.6. It is ranked #67 by frequency, but ranked #178 by dispersion. This is because the word is not as evenly dispersed in its frequency. On the other hand, the word eyes
has a frequency of 651 and is ranked #209 by frequency, but #177 by dispersion. Although the word is less frequent than king, we can trust that the word eyes
is found in more sections of the corpus.
That doesn't mean that frequency is inherently bad or that dispersion is better. Both are valuable pieces of information and tell a different story about the text. When I look at word lists, I prefer to see both frequency and dispersion to help me understand more about the words within my text. If you want to see the vocabulary dispersion report, remember to:
Analyze > Book Reports > Vocabulary Dispersion Report
.