Email questions, requests, and suggestions to wordcruncher@byu.edu.
BYU faculty and staff services: We can come to your office, demonstrate and install WordCruncher, answer questions, discuss your projects, and help you and your students as needed.
Type-To-Token Ratio
When someone swears a lot, people tend to think that person's vocabulary is limited. Linguists would call this a low type-to-token ratio (TTR). Types are how many unique words a text (or speech) has, and tokens are how many total words there are. The higher the ratio, the larger the vocabulary.
For example, Brandon Sanderson’s The Way of Kings has a TTR of 0.04 because Sanderson uses about 15,000 unique words over a 380,000-word book.
J.K. Rowling’s Harry Potter series has a TTR of 0.023 because Rowling uses about 25,500 unique words over a 1-million-word series.
Isn’t it fascinating to be able to assign a number to how rich an author’s vocabulary is? In this article, you'll learn how you can calculate the type-to-token ratio of a text in WordCruncher.
Once you’ve opened a book, you can immediately go to Analyze > Book Reports > Phrase Compare Report
.
Make sure you have the name of the book selected under the Book 1 menu in the upper-left corner. (Nothing needs to be added to the Book 2 menu.) This report is generally used to calculate the phrases of a book, but it also provides information on TTR.
Click on the Compare button and wait for all of the phrases to load. Then, click on the sigma (Σ) button. In the middle of the window that pops up, you’ll see the TTR for the book. Right below it, you’ll also see how many total types and total tokens the book has.
And that’s all there is to it! You now have the TTR of a whole book.
If you think about TTR long enough, you’ll realize that it’s not the best measurement when comparing books. Consider how many times words like “the” and “of” are repeated over a million-word text as opposed to a 100-word poem. The poem is definitely going to have a higher TTR because it’s smaller and doesn’t repeat words nearly as much.
Other measurements exist to deal with various sizes of texts, but that’s a story for next month! Do we still think TTR is a valuable metric? Absolutely! It’s still used widely despite its weakness. Why don’t you try exploring which books have the highest TTR?