Email questions, requests, and suggestions to wordcruncher@byu.edu.
BYU faculty and staff services: We can come to your office, demonstrate and install WordCruncher, answer questions, discuss your projects, and help you and your students as needed.
New Books
With the summer semester at BYU comes extra hours for our student programmers and editors, so we’re on track to release almost 20 new books over the summer! Some books have already been released, while others will be released in the coming months. Below we have a small description of a few books that have been released or will be released soon.
TED Talk Corpus – English and Spanish: Ted Talks have become a favorite at the WordCruncher team. This TED Talk corpus is a collection of all TED talks throughout time, and each sentence in another language is also aligned with English. That way, you can open two (or even three) versions of the same transcript and read them together. Each word is also tagged by part of speech and lemma, so people interested in studying the language of TED Talks can do so.
Only English and Spanish translations are available right now, but we have plans to add several more languages in the coming months. Look out for French, Portuguese, and German translations.
Interested in learning how we aligned the TED Talks by sentence? Watch our book highlight video here!
Quran – Multilingual: We’ve had Christian and Jewish texts in WordCruncher, and many people have asked when they can start researching the Quran. We’ve added not just one version of the Quran, but 119 versions in 45 different languages!
General Conference and The Scriptures (Part of Speech Version): A common request is making searches more specific. By adding part of speech to a word, you don’t just have to look for a word like cross and get all possible meaning. In the sentence, “Don’t be cross (adjective) with me or else I’ll steal your precious cross (noun), so you’ll learn to never cross (verb) me,” there are three different parts of speech for one word! To narrow searches, we have added part of speech to the English Scriptures. Soon we will also be adding part of speech to the General Conference addresses.
Young Woman’s Journal: This is a collection of 40 volumes from The Young Woman’s Journal. BYU Digital Collections has made the volumes available as scanned images, but there aren’t any resources available to do a deep dive into this wonderful resource. We’ve used Google Tesseract’s Optical Character Recognition to make it searchable. While it’s not a perfect transcription, we believe it’s the best one out there. If the transcription doesn’t make perfect sense, then you can compare the text to the original image of the page.
Project Gutenberg: If you’ve never heard of Project Gutenberg, then now is the time to get excited. Project Gutenberg is a web archive that contains tens of thousands of books that are in the public domain. This is where people can go to read Jane Austen, Sherlock Holmes, or any book that’s over a century old. Well, why not add all of Project Gutenberg into a single WordCruncher book? Of course, this is a very big book, so we’ll also be releasing smaller bookshelves that contain specific content like fiction and psychology books.