WordCruncher Monthly

Book Releases and Development Update

TED German and Portuguese

Interested in learning more advanced vocabulary in a language? Try reading German or Portuguese TED Talks with the English version. We’ve finished aligning the sentences between the English transcripts with the German and Portuguese translations. You can now synchronize any of the languages together (English, Spanish, French, Portuguese, and German) by downloading and opening two or more versions at the same time. There are some sentences that won’t align perfectly, especially talks in the early 2000s. As translation has become more consistent over time, alignment has become more accurate, too. Check out these two language corpora today:

The Women's 3 Periodical Collection

In the last couple of months, we’ve released three periodicals: The Young Woman’s Journal, Woman’s Exponent, and The Relief Society Magazine. Each one alone is a treasure trove for great research, but we’ve combined all three of them into the The Women's 3 Periodical Collection. Why would you want all three of them together? While there is some overlap in the publications of these periodicals, they form a chronological history of publications written and/or edited by the women of The Church of Jesus Christ of Latter-day Saints. Researchers like Amy Easton-Flake and John Hilton III (pending publication) have written articles related to these three periodicals recently. They and others will likely use this collection for further research.

Development Update

We’re close to a new update to WordCruncher! It’s been several months since our last update, and most of the updates will be internal--meaning that our library of code has completely changed so that we can use that same code to make updates to our Mac version. We’re aiming to get this update out by the beginning of September. A new toolkit will also be released shortly after that.

The Vocabulary Dispersion report is also getting a nice update. After consulting with one of the corpus linguists at BYU, we've realized it lacks many of the core statistics to determine a word's dispersion throughout a text. Dispersion is an important aspect of a word's usage. If a word occurs a lot but only in one section of text, then it's unevenly dispersed throughout the text.

Once the Vocabulary Dispersion report is released with version 104, we'll update the guide page, which will explain the statistics more in depth. These are the columns to be added: Normalized Frequency, Average Reduced Frequency, Log Frequency, Standard Deviation population, Juilland's D, Coefficient of Variation, CV percent, Deviation of Proportions, Number of ranges containing Word, and Percent of ranges.

A sample Vocabulary Dispersion report sorted by frequency

See Other Articles from August 2021