Email questions, requests, and suggestions to wordcruncher@byu.edu.
BYU faculty and staff services: We can come to your office, demonstrate and install WordCruncher, answer questions, discuss your projects, and help you and your students as needed.
Adding Metadata to Your Corpus
We are entering an age when there is so much information about a text that you can literally analyze text in a million different ways. Researchers here at BYU are interested in looking at changes in language over time, differences in word usage between genders, and much more. To do this, we need more than just text files—we need metadata.
Metadata are pieces of data that can be attributed to text. For example, we could categorize a text as a newspaper, a magazine, or a spoken interview. This text genre categorization is a piece of metadata. You can embed metadata to your text with WordCruncher attributes and then filter your searches based on the embedded text. Imagine you have a corpus of Tweets. If you want to limit your searches to Tweets with more than 10 retweets, you will need to embed metadata into every tweet so that WordCruncher knows what to search for.
But what’s the easiest way to embed attributes? To get started, you can take these three easy steps:
1. Organize your data into a table
The WordCruncher Indexer is currently only designed to convert file formats like Word XML and TXT files. That’s because it was originally designed for books. However, it’s much easier to embed attributes onto text in a tabular format such as a CSV.
You can take a look here for more details on how your CSV file should look, but here’s a brief explanation:
By the way, if your data has any characters outside of a-z like ñ or é, you should use Google Sheets to export your CSV. That’s because Microsoft Excel doesn’t support characters used outside of ASCII, so your text will look corrupted outside of Excel.
2. Converting your CSV
The end goal is to get your CSV into a WordCruncher ETBU file. You can do this by dragging and dropping your file onto the CSV to ETAX converter page.
Once you convert your CSV into an ETAX file, use the WordCruncher Indexer to convert the ETAX file into an ETBU. Then you can add it to WordCruncher!
3. Filter searches with attribute bounds
With your CSV file now in WordCruncher, you can now do searches based on the metadata you’ve added.
Add your file to your bookshelf and open it. In a search window, type in your search term.
Go to Bounds > Reference or Attribute Bounds
.
Select the attributes you would like to filter your search with and click “Insert Item(s) to Bound.”