ETAX Documentation

This is the documentation for the XML of WordCruncher's ETAX file. You can use this to learn about different properties available for WordCruncher. If you're using the Indexer to convert file formats like TXT, RTF, and XML 2003, then an ETAX file will be generated. All the tag elements used in WordCruncher are listed and described below.

The contents of the <etax> element consist of an optional, single <sifx> element followed by a list of paragraph elements (any combination of elements or <ptbl> elements) or <include/> elements. Any text contained in this element is ignored unless otherwise specified by a child element.

This is the root element for an ETAX file. The following attributes are used only one at a time for clarity, but multiple attributes can be added to the <ETAX> tag. The sample below contains an example of every attribute, but not every attribute needs to be used.

Sample <etax>

<?xml version="1.0" encoding="utf-8"?>

...

</etax>

Universal ID for Book

Format:

Sample:

Adds an ID to the book, which is useful when a book is being added to the official WordCruncher Bookstore. It is recommended but not necessary for personal use.
guid is a universally unique identifier. You can copy one by clicking on this button. The Indexer program also can generate one.

External SIFX Filename

Format:

Sample:

The SIFX tag contains all of the styling and document information about an individual book. See the Style Information section for more information. The <SIFX> tag can be added as the first tag within the <etax>, but it can also be an external file.
sifxPath is the relative file path for the SIFX.
For example, if your ETAX file book.etax is in the same directory as your SIFX file book.sifx, then your <etax> tag should look like this: <etax sifx="book.sifx">

External ETTX Filename

Format:

Sample:

The ETTX tag contains information necessary to synchronize two or more books together. See the Synchronization section for more information. The <ETTX> tag can be added after the <sifx> tag, but it can also be an external file.
ettxPath is the relative file path for the ETTX.
For example, if your ETAX file book.etax is in the same directory as your ETTX file book.ettx, then your <etax> tag should look like this: <etax ettx="book.ettx">

External EPOSX Filename

Format:

Sample:

The EPOSX tag contains information necessary to add part of speech search functionality. See the Part of Speech section for more information. The <EPOSX> tag can be added after the <sifx> tag, but it can also be an external file.
eposxPath is the relative file path for the EPOSX.
For example, if your ETAX file book.etax is in the same directory as your EPOSX file book.eposx, then your <etax> tag should look like this: <etax eposx="book.eposx">

External EMTX Filename

Format:

Sample:

The EMTX tag contains information necessary to add a morphology search table to the search window. See the Morphology section for more information. The <EMTX> tag can be added after the <sifx> tag, but it can also be an external file.
emtxPath is the relative file path for the EMTX.
For example, if your ETAX file book.etax is in the same directory as your EMTX file book.emtx, then your <etax> tag should look like this: <etax emtx="book.emtx>

Add ISBN Number to Book

Format:

Sample:

Adds the ISBN Number to a WordCruncher Book. (Not implemented)
isbnNum is the ISBN number for a book.

Set Expiration Date for Book

Format:

Sample:

Adds an expiration date for when the book should expire. The book will be usable through the given date.
date should be formatted as yyyy-mm-dd

Update Book Edition

Format:

Sample:

Tells the bookstore that this is a new edition of the book. This is only used when upgrading reference levels. See English Scriptures as an example.
editionNum should be an integer.

This optional element should appear immediately after the <etax> start element and before any other element in the file. It contains additional information about the book.

Name	Values	Description
title	Possible additions being considered.	Use as default when adding book to library.
author
publisher
copyright
printDate
revision

This optional element must be a child of the <bookInfo> element. The text of this element will be stored in the library when the book is added to the library. This element has no attributes.

Document Styling

This <sifx> element is used to define styles like the document width and default font sizes, paragraph styles, and text styles. It is also used to define the levels in a table of contents and metadata through attributes and tagwords.

This element is optional and comes as the first element within the <etax> root element. This element has no attributes and any text contained in this element is ignored.

The Document Style Element makes global changes to a WordCruncher book. This includes default colors, margins, tabs, and more. It must be an empty element. The following attributes are used only one at a time for clarity, but multiple attributes can be added to the <DS/> tag. The sample below contains an example of every attribute, but not every attribute needs to be used.

Sample <DS/>

Default Text Color

Format:

Sample:

Changes the color of text not specified in any Text Styles.
foregroundColor is a string of an RGB color for the font's color.
backgroundColor is a string of an RGB color for the font's background color.

Search Result Hit Color

Format:

Sample:

Changes the color of the current search result hit in the Search Results window. Defaults to white;dkblue.
foregroundColor is a string of an RGB color for the font's color.
backgroundColor is a string of an RGB color for the font's background color.

Nearby Search Result Hit Color

Format:

Sample:

Changes the color of neighboring Search Results hits that are nearby the current search hit result. Defaults to white;dkgray.
foregroundColor is a string of an RGB color for the font's color.
backgroundColor is a string of an RGB color for the font's background color.

Default Hyperlink Color

Format:

Sample:

Changes the color of hyperlinks not specified in any Hyperlink Styles.
foregroundColor is a string of an RGB color for the font's color.
backgroundColor is a string of an RGB color for the font's background color.

Default Reader Bar Color

Format:

Sample:

Changes the color of the reader bar, which is used to highlight the line of text where the cursor is located.
foregroundColor is a string of an RGB color for the font's color.
backgroundColor is a string of an RGB color for the font's background color.

Default Reference Color

Format:

Sample:

Changes the color of reference level text not specified in the tSt attribute of the Levels. The text of reference levels are not shown by default. They are turned on in Book Options.
foregroundColor is a string of an RGB color for the font's color.
backgroundColor is a string of an RGB color for the font's background color.

Default Font Size

Format:

Sample:

Changes the default font size for text without a font size specified in any Text Styles .
unitMeasurement refers to one of the screen independent units of measurement: points (p), twips (t), inches (i), or centimeters (c).
measurementSizeDesktop is an integer for the font size used for desktop monitors (Windows).
measurementSizeMobile is an integer for the font size used for mobile devices (iOS).

Line Width of Document

Format:

Sample:

Changes the maximum width of the document. If the window is smaller than the specified size, text will wrap according to the wrap attribute of Paragraph Styles .
unitMeasurement refers to one of the screen independent units of measurement: points (p), twips (t), inches (i), or centimeters (c).
measurementSizeDesktop is an integer for the font size used for desktop monitors (Windows).
measurementSizeMobile is an integer for the font size used for mobile devices (iOS).

Change Document Left Margin

Format:

Sample:

Changes the margin width on the left side of the document.
unitMeasurement refers to one of the screen independent units of measurement: points (p), twips (t), inches (i), or centimeters (c).
measurementSizeDesktop is an integer for the font size used for desktop monitors (Windows).
measurementSizeMobile is an integer for the font size used for mobile devices (iOS).
Click here to open a conversion tool between the four measurements.

Change Document Right Margin

Format:

Sample:

Changes the margin width on the right side of the document.
unitMeasurement refers to one of the screen independent units of measurement: points (p), twips (t), inches (i), or centimeters (c).
measurementSizeDesktop is an integer for the font size used for desktop monitors (Windows).
measurementSizeMobile is an integer for the font size used for mobile devices (iOS).
Click here to open a conversion tool between the four measurements.

Layout Direction

Format:

Sample:

Determines the layout direction.
layoutDirection can either be ltr (left-to-right) or rtl (right-to-left).
Books that are predominately right-to-left text should be set to rtl. Individual paragraphs can still be set to ltr if the text has both left-to-right and right-to-left text.

Interface Layout Direction

Format:

Sample:

Determines the layout direction of the citation above the Book View window.
layoutDirection can either be ltr (left-to-right) or rtl (right-to-left).

Concordance

Marks the document as a concordance.
Concordance can either be yes or no.

Dictionary

Format:

Sample:

If this attribute exists, the document is marked as a dictionary. This is the lexicon file that the dictionary uses. Cannot be used with searchlex or thesaurus.
option can either be yes or no.

Index Style Select

Format:

Sample:

This selects styles that are not indexed by default.
style can be any combination of the following values: bold, italic, script, underline, strikeout, revised, overbar, underbar, caps, effect, and hidden.

Dictionary Entry Level

The level type code to use as the entry level for the dictionary. Ignored if dict is not output.

Lowest Level Type

This is the level type to be used in “lowest level” outputs (copy).

Lowest Level Search

This is the level type to be used in “lowest level” searches.

Search Lexicon Entry Level

The level type code to use as the entry level for the search lexicon. Ignored if searchlex is not output.

Thesaurus Entry Level

The level type code to use as the entry level for the thesaurus. Ignored if thesaurus is not output.

Tag Window Default Paragraph Style

The default paragraph style to be used in the Tag Window.

Ruby Text Display

Format:

Sample:

Where to display the ruby text in relation to the base text.
location can be top, bottom, or off.

Ruby Text Index

Format:

Sample:

Whether or not to index the ruby text.
option can either be yes or no.

Ruby Text Justification

Format:

Sample:

Justification of the ruby text.
option can be center, left, or right.

Ruby Text Lexicon

Name of the lexicon this ruby text should be included in. If omitted, the ruby text is included in the same lexicon of the base text.

Ruby Text Position

Format:

Sample:

Adjustment of the position of the ruby text.
value is a measurement value .
Click here to open a conversion tool between the four measurements.

Ruby Text Size

Format:

Sample:

Size of the ruby text based on a percentage of the height of the base characters.
percentage can be any integer between 40 and 80 (default is 60).

Ruby Text Style

Text style to us for the ruby text. The size specified in the text style is ignored.

Search Lexicon

Format:

Sample:

If this attribute exists, the document is marked as a search lexicon. Cannot be used with dict or thesaurus.
If a book is marked as a thesaurus, an attribute type with the code ‘T’ and name ‘Type’ is predefined. Currently there are no values defined for this attribute type.
option can either be yes or no.

Thesaurus

Format:

Sample:

If this attribute exists, the document is marked as a thesaurus. Cannot be used with dict or srchlex.
If a book is marked as a thesaurus, an attribute type with the code ‘T’ and name ‘Type’ is predefined. Currently defined values for this attribute type are ‘Syn’ and ‘Ant’ for synonym and antonym respectively. This attribute type can be used to define the type of sibling word.
option can either be yes or no.

Zoom

Format:

Sample:

This is the general zoom percentage of the document.
percentage can be any integer between 50 and 300 (default is 100).

This is the paragraph style element. It must be an empty element. The first paragraph style listed will be considered the default paragraph style.

Sample <PS/>

Style Name

Format:

Sample:

The name of the paragraph style (required).
string is the name of the style you wish to use.
Note that the first <PS/> tag will be considered the default style. If a tag does not have an st attribute, it will use the first defined style.

Paragraph Wrapping Style

Format:

Sample:

Changes the style of how paragraphs will wrap at the end of the document width.
wrappingStyle can be one of two wrapping styles:

win will wrap text to the next line at the end of the document width (default value).
line will NOT wrap text. A horizontal bar will appear to allow users to scroll through the text. (Used for books that need to maintain the original line structure).

Paragraph Direction

Format:

Sample:

Determines the paragraph direction.
layoutDirection can either be ltr (left-to-right) or rtl (right-to-left).

Paragraph Justification

Format:

Sample:

Determines the paragraph justification.
justifySetting can be left, center, right, or full.

Text Style

Format:

Sample:

The name of a text style associated with this paragraph style. If no text style is active, this style will be used.
string is the name of the text style you wish to use.

Space Before

Format:

Sample:

Extra space added before the paragraph.
value is a measurement value .
Click here to open a conversion tool between the four measurements.

Space After

Format:

Sample:

Extra space added after the paragraph.
value is a measurement value .
Click here to open a conversion tool between the four measurements.

Line Height

Format:

Sample:

Fixed height of the line.
value is a measurement value .
Click here to open a conversion tool between the four measurements.

Line Spacing

Format:

Sample:

Line spacing multiplier.
value can be any real number between 0.25 and 32.0 (default is 1.0).

First Line Indent

Format:

Sample:

First line indent.
value is a measurement value .
Click here to open a conversion tool between the four measurements.

Left Indent

Format:

Sample:

Left indent.
value is a measurement value .
Click here to open a conversion tool between the four measurements.

Right Indent

Format:

Sample:

Right indent.
value is a measurement value .
Click here to open a conversion tool between the four measurements.

Background Color of Text

Format:

Sample:

Set the background color for the paragraph. Defaults to white. clrBk is a string of an RGB color for the font's color.

Define Tab Measurements

Format:

Sample:

Set the measurements of tabs for a paragraph. To add tabs in your paragraph, refer to the <tab/> section under Paragraph Format Elements.
values for one tab consists of m:Tln[,n] where:
• m can be any of the following, and denotes the unit of measurement that should be used for n:
- t = twips
- p = points
- i = inches
- c = centimeters
• T can be:
- L = left
- R = right
- C = center
- D = decimal
• l is optional to set the style of tab. If omitted, you'll get whitespace like any normal person would use. l can be:
- d = dash
- p = dot
- u = underline
- w = dot blank
- x = underline dot
- y = dash blank
• n is the measurement size for the tab. For example, a 12-point left-justified tab would be p:L12.
- [,n] is an optional measurement size for iOS devices. The square brackets must be omitted, and are shown here only to demonstrate its optionality. For example, you can set a 12-point left-justified tab for the Windows program and a 6-point tab for the iOS app with p:L12,6.
NOTE: to modify multiple tabs within a paragraph, you can add additional ;Tln[,n] codes after the initial set. For example, a 12-point left-justified tab followed by a 24-point center-justified tab could be set with p:L12;C24.

Default Tabs

Format:

Sample:

Default tab sets for the paragraph. These sets occur after the last explicitly defined tab set.
value is a single tab measurement value.

Left Border

Format:

Sample:

The border to use for the left of the paragraph.
values can be one or multiple semicolon-separated values, of the format: m:n[,n];p[,p];l;clr, where:
- • m can be any of the following, and denotes the unit of measurement that should be used for n and p:
  - t = twips
  - p = points
  - i = inches
  - c = centimeters
- • n is the measurement size for the width of the border
  - [,n] is an optional measurement size for iOS devices. The square brackets must be omitted, and are shown here only to demonstrate its optionality.
- • p is the measurement size for padding
  - [,p] is an optional measurement size for iOS devices. The square brackets must be omitted, and are shown here only to demonstrate its optionality.
- • l can be:
  - single
  - double
  - dot
  - dash
  - wave
- • clr is a color value for the border color.
- NOTE: all semicolon separated sections are optional, however the semicolons are not (e.g. if you wish to omit specifications for the 'l' and 'clr' options, your value might look something like this: "t:20;36;;").

Right Border

Format:

Sample:

The border to use for the right of the paragraph.
For more information on the values this attribute takes, see the Left Border section.

Top Border

Format:

Sample:

The border to use for the top of the paragraph.
For more information on the values this attribute takes, see the Left Border section.

Bottom Border

Format:

Sample:

The border to use for the bottom of the paragraph.
For more information on the values this attribute takes, see the Left Border section.

This is the Text Style element. This element customizes the text at a character level. These are typically applied to paragraphs by adding the tSt attribute to the <PS/> tag. They can also define tagwords, which are used for applying metadata to a word or phrase.

Sample <TS/>

Style Name

Format:

Sample:

The name of the text style (required).
string is the name of the style you wish to use.

Text Height

Format:

Sample:

Height of the font used for displaying text.
Click here to open a conversion tool between the four measurements.

Text Width

Format:

Sample:

Width of the font used for displaying text (Not recommended).
Click here to open a conversion tool between the four measurements.

Lexicon Name

Format:

Sample:

The name of the lexicon associated with this text style. Lexicons (or word lists) are often used to separate text from headings, footnotes, etc.
string is the name of the lexicon you wish to use.

Font Face

Format:

Sample:

Regular font face.
string is the name of the font face you wish to use.

Small Font Face

Format:

Sample:

Font face used for a small format ETBU.
string is the name of the font face you wish to use.

Alternate Font Face

Format:

Sample:

Alternate font face to use if fFace does not exist.
string is the name of the font face you wish to use.

Font Family

Format:

Sample:

Font family (Not recommended).
option can be decorative, default, modern, roman, script, or swiss.

Font Pitch

Format:

Sample:

Font pitch (Not recommended).
option can be default, fixed, proportional, or variable.

Font Quality

Format:

Sample:

Output quality of the font for the text.
option can be antialiased, cleartype, default, draft, non-antialiased, or proof.

Tag Type

Format:

Sample:

Name of a tag type style to apply to this text style.
code is a single character tag type code.

Text Color

Format:

Sample:

Color of the text.
value is the color values for the foreground and background, formatted as: [foreground;background].

Underline Color

Format:

Sample:

Color of the underline.
value is a single color value.

Strikeout Color

Format:

Sample:

Color of the strikeout.
value is a single color value.

Overbar Color

Format:

Sample:

Color of the overbar
value is a single color value.

Underbar Color

Format:

Sample:

Color of the underbar
value is a single color value.

Character Properties

Format:

Sample:

Character style of the text.
option can be one or many of each of the following options, but only one from each category:
- • bold
- • italic
- • superscript | subscript
- • hidden
- • revised
- • noindex
- • underline | dash-underline | dot-underline | double-underline | wave-underline
- • emboss | engrave | outline | shadow
- • smcaps | allcaps | smcaps-up
- • subword
- • tag
- • strikeout | dash-strikeout | dot-strikeout | double-strikeout | wave-strikeout
- • overbar | dash-overbar | dot-overbar | double-overbar | wave-overbar
- • underbar | dash-underbar | dot-underbar | double-underbar | wave-underbar

This is the lexicon element. This element is used to define separate categories that words in the document can be stored in. At least one <LEX> element must be defined. These are empty elements.

Sample <LEX/>

Style Name

Format:

Sample:

The name of the lexicon (required).
string is the name of the lexicon you wish to use.

Language

Format:

Sample:

The logical language of all the words in the lexicon.
lang is the standard language name or standard language abbreviation for the lexicon.

Decimal Separator

Format:

Sample:

The character that should be used as the decimal separator for the lexicon. If specified, then grp must also be specified.

Group Separator

Format:

Sample:

The character that should be used as the numeric group separator for the lexicon. If specified, then dec must also be specified.

Ignore Characters

Format:

Sample:

List of ignore characters. These are characters that will not be included in the text of a word.
string is the characters you wish to ignore.

Break Characters

Format:

Sample:

List of break characters. These are characters that will automatically break a word.
string is the characters you wish to use as break characters.

No-break Characters

Format:

Sample:

List of no-break characters. These are characters that will not automatically break a word.
string is the characters you wish to use as no-break characters.

Text Style

Format:

Sample:

Default style for the lexicon.
string is the name of the text style you wish to use.

Word Break Mode

Format:

Sample:

Word breaking mode for this lexicon.
option can be hidden-nobrk, style-brk, or script-nobrk.

This is the hyperlink style element. These define general styles used for hyperlinks. It must be an empty element.

Sample <HLS/>

Style Name

Format:

Sample:

The name of the text style (required).
string is the name of the style you wish to use.

Hyperlink Type

Format:

Sample:

The hyperlink type.
option can either be icon or phrase.

Text Height

Format:

Sample:

Height of the font used for displaying text.
Click here to open a conversion tool between the four measurements.

Text Width

Format:

Sample:

Width of the font used for displaying text (Not recommended).
Click here to open a conversion tool between the four measurements.

Font Face

Format:

Sample:

Regular font face.
string is the name of the font face you wish to use.

Small Font Face

Format:

Sample:

Font face used for a small format ETBU.
string is the name of the font face you wish to use.

Print Font Face

Format:

Sample:

Font face used for printing.
string is the name of the font face you wish to use.

Font Family

Format:

Sample:

Font family (Not recommended).
option can be decorative, default, modern, roman, script, or swiss.

Font Pitch

Format:

Sample:

Font pitch (Not recommended).
option can be default, fixed, proportional, or variable.

Font Quality

Format:

Sample:

Output quality of the font for the text.
option can be antialiased, cleartype, default, draft, non-antialiased, or proof.

Text Color

Format:

Sample:

Color of the text.
value is the color values for the foreground and background, formatted as: [foreground;background].

Underline Color

Format:

Sample:

Color of the underline.
value is a single color value.

Strikeout Color

Format:

Sample:

Color of the strikeout.
value is a single color value.

Overbar Color

Format:

Sample:

Color of the overbar
value is a single color value.

Underbar Color

Format:

Sample:

Color of the underbar
value is a single color value.

Character Properties

Format:

Sample:

Character style of the text.
option can be one or many of each of the following options, but only one from each category:
- • bold
- • italic
- • superscript | subscript
- • hidden
- • revised
- • noindex
- • underline | dash-underline | dot-underline | double-underline | wave-underline
- • emboss | engrave | outline | shadow
- • smcaps | allcaps | smcaps-up
- • subword
- • tag
- • strikeout | dash-strikeout | dot-strikeout | double-strikeout | wave-strikeout
- • overbar | dash-overbar | dot-overbar | double-overbar | wave-overbar
- • underbar | dash-underbar | dot-underbar | double-underbar | wave-underbar

Emphasis

Format:

Sample:

Forces emphasis of the target
option can either be yes or no.

Library

Format:

Sample:

Forces the execution of the hyperlink to look in the Library first for a matching target.
option can either be yes or no.

This is the level type element. It must be an empty element.

Sample <LVL/>

Level Type Code

Format:

Sample:

The single character code to be used for the level type (required).
code is a single character.

Level Name

Format:

Sample:

The name of the level type (required).
string is the name of the level type you wish to use.

Plural Name

Format:

Sample:

The plural name of the level type (Recommended).
string is the plural name of the level type you wish to use.

Separator

Format:

Sample:

String to use in the citation line just before displaying this type of level.
string is the character(s) you wish to use as a separator.

Text Style

Format:

Sample:

Text style to use Regular font face.
string is the name of the text style you wish to use.

Hide

Format:

Sample:

This attribute is used to keep reference codes of this type from appearing in the display of the text. It will not hide the reference code from citations or output operations.
option can either be yes or no.

Hide Output

Format:

Sample:

This attribute is used to omit reference codes of this type from citations in copy/paste/print operations. It will not hide them in the text.
option can either be yes or no.

Internal

Format:

Sample:

This is the same as hidden, except that this code and all it’s children are also not shown in TOC or Citations.
option can either be yes or no.

Tag Type

Format:

Sample:

Name of a tag type style to apply to this level type. The name and abbreviation of every level of this type will be indexed as the given tag type. If this is omitted, then the tag type defined in tSt will be used.

Lexicon Style

Format:

Sample:

The name of the lexicon associated with this level type. Used only if the name will be indexed. If this is omitted, then the lexicon defined in tSt will be used if it is defined, otherwise the indexer will use the currently defined lexicon.
string is the name of the lexicon style you wish to use.

This is the attribute element. These are used as attributes for reference levels. It must be an empty element.

Sample <ATTR/>

Attribute Type Code

Format:

Sample:

The single character code used for the attribute type (required).
code is a single character.

Level Name

Format:

Sample:

The name of the level type (required).
string is the name of the level you wish to use.

Plural Level Name

Format:

Sample:

The plural name of the level type (Recommended).
string is the plural name of the level you wish to use.

Tag Type

Format:

Sample:

Name of a tag type style to apply to this attribute type. The name of every attribute of this type will be indexed as the given tag type.
character is a single character.

Lexicon Style

Format:

Sample:

The name of the lexicon associated with this attribute type. Used only if the name will be indexed. If this is omitted, then the lexicon defined in the tSt attribute of the Level Type for the given reference level will be used. If that is not defined, the indexer will use the currently defined lexicon.
string is the name of the lexicon style you wish to use.

This is the tag type element. It must be an empty element. You can specify a maximum of 14 different tag types.

Sample <TAG/>

Level Type Code

Format:

Sample:

The single character code to be used for the level type (required).
code is a single character.

Level Name

Format:

Sample:

The name of the tag (required).

Plural Level Name

Format:

Sample:

The plural name of the level type (Recommended).
string is the plural name of the level you wish to use.

Expand Tags

Format:

Sample:

If yes, showing this tag type will automatically cause the expansion of any generic tag sections.
option can either be yes or no.

This is the reference tree element. Up to eight reference hierarchies can be defined in a document. This element allows the author to define a name for each of these “trees”. It must be an empty element.

Sample <TREE/>

Index

Format:

Sample:

The index of the reference tree (required).
number is an integer between 1 and 8.

Reference Tree Name

Format:

Sample:

The name to associate with the reference tree (required).
string is the name you wish to use for the tree.

Abbreviations

Format:

Sample:

If yes, then abbreviations will be shown in citations.
option can either be yes or no.

Show Citations

Format:

Sample:

If yes, then references from this tree will be shown in citations. If all trees are unspecified or set to no, then only the default tree will show.
option can either be yes or no.

Default Reference Tree

Format:

Sample:

Specifies the default reference tree. Only one tree can be marked as the default tree. If no trees are marked as the default, the first populated tree is used.
option can either be yes or no.

This is the phrase group element. This defines global properties for any phrase group that is used in the document. It must be an empty element.

Sample <GRP/>

Index

Format:

Sample:

The index of the phrase group (required).
index is an integer between 0 and 32000.

Lexicon Style

Format:

Sample:

The lexicon name to use for the phrase group (required).
string is the name of the lexicon you wish to use.

The <DRM/> tag is called the copyright management tag because it allows publishers to set specific restrictions on a text. These restrictions prevent users from copying too much of a copyrighted book while still allowing them to copy modest snippets of text for research. Books must have at least 1,000 words in them to make the <DRM/> tag work.

Sample <DRM/>

Restrict Hidden Text from Output

Format:

Sample:

Prohibits hidden text from being output in any copy/print/export operation.
By default, copying a portion of text that is marked hidden will show up visibly in programs like Word.
boolean can be yes or no

KWIC Style Reference List

Format:

Sample:

Changes the Search Results view into KWIC (Keywords in context) list rather than the default context-based view.
The maxWordSpan attribute is required when enabling the KWIC Style Reference List.
This feature is planned as a future development. It is not implemented yet.
boolean can be yes or no

Define Search Results Word Span

Format:

Sample:

Defines the maximum number of words before and after any hit word/phrase that can be output from the Search Results or similar reports.

If KWICStyleReferenceList is enabled, this also defines the maximum number of words that can be shown before and after the hit word/phrase.

Defaults to 10.
If integer is zero, then output and display revert to normal.
Max value is set to 1,000 but realistically should not exceed 100.

Require Word Gap between Search Results

Format:

Sample:

Prevents users from copying adjacent parts of the text, like when searching for all words in a text with *.
For situations where maxWordSpan applies, this is the maximum number of words that must occur between adjacent entries. If the gap is not satisified by the next hit, then context around adjacent entries will be hidden until the gap is satisfied.
The hit word/phrase will always be output.
integer is a value between 1 and 4,000.
If the value is zero, then overlap of adjacent entries is allowed (default behavior).

Define Total Words Allowed in Single Output

Format:

Sample:

Defines the maximum number of words that can be output from the Book View in any one copy/print/export.
A warning will also be given each time cumulative output counts exceed this threshold.
If integer is zero, no limit will be established on individual output operations and warnings will be based on percentage of maxOutputWords.

Limit Total Words Allowed in All Outputs

Format:

Sample:

Sets a limit on the total number of words that any one user is allowed to copy. For a normal sized book, a typical value may be around 10% of the book.
If the value is defined and is zero, all output is disabled. If this value is not defined, per user totals are not recorded. However, warnings may still be given if the copy threshold is violated.

This is the font embedding element. It must be an empty element. Use this element to embed a font that is used within the document in the ETBU file. The font will be extracted and installed when the document is opened in the viewer. This installation of the font is not permanent. It will be removed when the book is closed. Consequently, if a user copies text that uses a font not on their computer, programs like Word will look for an alternative font to use.

Font Family

Format:

Sample:

The family name of the font. For example: “Lato” or “Quivira” or “Roboto”. Do not specify a specific style such as bold or italics. The indexer will scan the system and embed all style variants of the requested font family (required).
string is the name of the font family you wish to use.

These elements provide information needed to translate citations from a previous version of the file to the current version. For instance, if in a previous ETB version (ver. 5) a particular citation was: “/Introduction” and this was changed to “/Title/Introduction” in the new version (ver. 7), this element will provide enough information to make this translation. This information is used primarily during the upgrade of note files that were attached to previous versions of the document in order to translate old citations to the new ones so that the new position of the notes can be located. It is NOT used in the indexing of the document.

Each <UPGRADE> element includes one or more empty <REF/> elements. Each <REF/> element defines one citation translation. Multiple (up to 4) <UPGRADE> elements can be included, one for each previous version. However, at this point there is only one previous version that has been released.

Attributes <UPGRADE><UPGRADE/>

*Name*	*Values*	*Description*
ver	5, 7	The previous file format version. This is usually 5 (non-Unicode) or 7 (Unicode).
file	File title. String[63]	This is the previous file title (i.e. the File name without the extension. (required).
codepage	Code page integer identifier. 1250 – Central European (Windows) 1251 – Cyrillic (Windows) 1252 – Western European (Windows) 1253 – Greek (Windows) 1254 – Turkish (Windows) 1255 – Hebrew (Windows) 1256 – Arabic (Windows) 1257 – Baltic (Windows) Others…	This is the codepage that will be used to translate the old citation to Unicode. (required). (May only be needed for 5)
edition	Integer	Edition of the file. Note: You also have to add the edition attribute to <etax>.

Attributes <REF/>

*Name*	*Values*	*Description*
old	Citation (without offset)	This is the citation in the previous version that is in need of translation. It can be a partial citation. (required).
new	Citation (without offset)	This is the new citation. When a citation is in need of translation, if an old citation is found, then it will be replaced with the new citation. A citation is considered a match if the old citation completely matches the citation being translated (up to the number of levels defined). If the citation in question has levels beyond the match, they are concatenated onto the end of the translated citation. All entries will be checked and the longest match will be used for translation. (required).
children	yes \| no	If yes, then the citation will only be translated if the citation in question has child levels beyond the match.
codepage	Code page integer identifier. 1250 – Central European (Windows) 1251 – Cyrillic (Windows) 1252 – Western European (Windows) 1253 – Greek (Windows) 1254 – Turkish (Windows) 1255 – Hebrew (Windows) 1256 – Arabic (Windows) 1257 – Baltic (Windows) Others…	This is the codepage that will be used to translate the old citation to Unicode. This will override the codepage specified in the <UPGRADE> element.

Attributes <EMBED/>

*Name*	*Values*	*Description*
fFamily	Font Family Name Example: <EMBED fFamily=”Quivira”/>	Embed a custom font that is installed on your computer.

This is the index options element. It must be an empty element.

Attributes

*Name*	*Values*	*Description*
wrdbrk	Any combination of the following values: hidden-nobrk \| style-brk \| script-nobrk	Word breaking options. This is the default for the whole document and can be overridden by the same attribute in the <LEX> element.
stopwrds	*stop* \| go	If a stopword file exists, this tells whether the words are stopwords or gowords.
comp	Any combination of the following values: off \| on \| text \| index	Compression options. Default is on.

Paragraph Elements

The body of an ETAX document consists of a list of paragraph elements. There are currently two types of paragraphs: normal and table.

This is the normal paragraph element. This paragraph element can have any attributes that a <PS/> element can have except the name attribute. These attributes become overrides to the currently active paragraph style. The element can also include the following attribute:

Paragraph Style

Format:

Sample:

The name of the paragraph style to use. If omitted the default paragraph style is assumed (i.e. the first paragraph style listed in the <sifx> element).
string is the name of the style you wish to use.

This is the table paragraph element. It is considered an alternate type of paragraph and marks the start of a table. The <ptbl> element can have any of the attributes that the element can have plus the following:

Column Width

Format:

Sample tables with three columns:

The width of each column in the table. It is recommended the number of columns in a table be limited to no more than 100.
measurements is a list of measurement values separated by semicolons. Each measurement value can be substituted with either an asterisk (*) or a percentage to represent an automatically calculated column width or a width based on a percentage of the line width.
NOTE: In a Word XML file, if any cell in a column has a different cell margin than the default for the table, an asterisk (*) well be used as the width for the column

Minimum Column Width

Format:

Sample:

The minimum width of any column in the table
value is a measurement value .

Table Type

Format:

Sample:

Visual style of the table
option can either be flat or 3d.

Cell Vertical Alignment

Format:

Sample:

Vertical alignment of the text in each cell.
option can be top, center, or bottom.

Horizontal Padding

Format:

Sample:

Horizontal internal cell padding.
value is a measurement value .

Vertical Padding

Format:

Sample:

Vertical internal cell padding.
value is a measurement value .

Spacing

Format:

Sample:

Spacing in-between adjacent cells
value is a measurement value .

Border

Format:

Sample:

Border width.
value is a measurement value .

Inner Border

Format:

Sample:

Inner border width.
value is a measurement value .

Border Color

Format:

Sample:

Color of the borders.
value is the color value for the border.

The Table Row Element <trow></trow>

This is the table row element. It defines each row of a table. This element contains a list of <tcell> elements. Any text contained within a <trow> element is ignored unless it is within a child element. This element can have any of the same attributes that a <tcell> element can have, except for spanCol and spanRow. These attributes will apply to each table cell on the row unless specifically overridden by the <tcell> element.

The Table Cell Element <tcell></tcell>

This is the table cell element. It defines each cell contained in a table row. This element contains a list of paragraph elements (either or <ptbl> elements). Any text contained within a <tcell> element is ignored unless it is within a child element. This element can contain the following attributes:

Cell Vertical Alignment

Format:

Sample:

Vertical alignment of the text in each cell
option can be top, center, or bottom.

Horizontal Padding

Format:

Sample:

Horizontal internal cell padding.
value is a measurement value .

Vertical Padding

Format:

Sample:

Vertical internal cell padding.
value is a measurement value .

Column Span

Format:

Sample:

Number of columns for this cell to span. This number cannot exceed the number of columns left in the row. If a cell spans multiple columns, the spanned cells are NOT omitted.
number is an integer between 2 and 32.

Row Span

Format:

Sample:

Number of rows for this cell to span. If a cell spans multiple rows, the spanned cells on the following rows are NOT omitted.
number is an integer between 2 and 32.

Background Color

Format:

Sample:

Background color of the cell. This color overrides the background color of a table.
value is the color value for the border.

Left Border

Format:

Sample:

The border to use for the left of the cell (see "Left Border" in <PS/>).
value is the border you wish to use.

Right Border

Format:

Sample:

The border to use for the right of the cell (see "Left Border" in <PS/>).
value is the border you wish to use.

Top Border

Format:

Sample:

The border to use for the top of the cell (see "Left Border" in <PS/>).
value is the border you wish to use.

Bottom Border

Format:

Sample:

The border to use for the bottom of the cell (see "Left Border" in <PS/>).
value is the border you wish to use.

Inside a paragraph definition () all printable characters will be included in the text of a paragraph. Any character with a Unicode value less than a SPACE (i.e. tabs, line feeds, carriage returns, etc.) will be ignored. This allows for some formatting of the paragraph text in the XML document. Several empty elements are used instead of these characters:

How to Insert a Tab Character

Format:

Leroy<tab/>Jenkins<tab/>24<tab/>$95,000

<tab/> adds a tab character.
The measurements for tab spacing can be set in two ways:
1. Adding a tabs attribute directly on the tag. For example: ...
2. Adding a st (style name) attribute to the tag and defining the tabs attribute in the assigned <PS/> tag.

Attributes

*Element*	*Description*
<br/>	Inserts a hard return. This does NOT break the logical paragraph.
<w/>	Inserts a hard word break.
<sp/>	Inserts a hard non-breaking space.
<xsp/>	Inserts a hard breaking space.
<zs/>	Inserts a zero width space.
<l/>	Inserts a left indent. This is a tab that also sets the left indent property for the rest of the paragraph. NOT RECOMMENDED. Use the paragraph style indents instead.
<r/>	Inserts a right indent. The same as the left indent, except extends from the right side of the paragraph. NOT RECOMMENDED. Use the paragraph style indents instead.
<d/>	Inserts a double indent. This is equivalent to inserting both a left and a right indent simultaneously. NOT RECOMMENDED. Use the paragraph style indents instead.
<lm/>	Inserts a Unicode LTR (Left to Right) mark. U+200E
<lo/>	Inserts a Unicode LTR override mark. U+202D
<le/>	Inserts a Unicode LTR embedding mark. U+202A
<rm/>	Inserts a Unicode RTL (Right to Left) mark. U+200F
<ro/>	Inserts a Unicode RTL override mark. U+202E
<re/>	Inserts a Unicode RTL embedding mark. U+202B
<pdf/>	Inserts a Unicode PDF (Pop Directional Format) mark. This is used to terminate any of the above directional override or embedding modes (<lo/><le/><ro/><re/>). U+202C

This element (<T>) is used to set the current text style. This is NOT an empty element. The text style will remain in effect for all text that is contained within the element and will override any text style specified in the paragraph element or the currently active paragraph style. If no text style element is output, or if text occurs outside this element, the text style specified in the paragraph or paragraph style will be used.

Text Style

Format:

Sample:

<T st="normal">Sample text</T>

The name of a <TS> record in the sifx (required).
string is the name of the text style you wish to use.

Every attribute of a text style can be overridden individually by several Text Style Override Elements. These elements are NOT empty elements. Their attributes remain in effect for all text that is contained within the element. These elements are listed below:

Attributes

*Element*	*Description*
<b>	Turns bolding on/off.
<i>	Turns italics on/off
<s>	Turns superscript or subscript on/off
<u>	Turns underline on/off.
<o>	Turns strikeout on/off.
<ob>	Turns overbar on/off.
<ub>	Turns underbar on/off.
<c>	Turns all caps or small caps on/off.
<e>	Turns on a special effect like embossing, engraving, outline, or shadow.
<x>	Turns indexing on/off.
<rev>	Turns on/off revised text.
<h>	Turns on/off hidden text.
<f>	Changes the font.
<lex>	Changes the currently active lexicon.
<sz>	Changes the size of the text.
<tt>	Changes the current tag type.
<t>	Turns the general tagtype flag on/off.
<sw>	Turns the subword flag on/off. All text in this element will be indexed as a subword.
<cf>	Changes the foreground color.
<cb>	Changes the background color.
<cu>	Changes the underline color.
<co>	Changes the strikeout color.
<cob>	Changes the overbar color.
<cub>	Changes the underbar color.
<ch>	Forces hard characters. Characters with this style cannot be delimiters. See “Hard Characters Element” below.

Many of the above elements use similar attributes. We will explain each below in groups.

Attributes <x><rev><h><t><sw>

*Name*	*Values*	*Description*
val	on \| off	This overrides the corresponding text style property and turns the property either on or off. By default, (or if this attributes is omitted) the property is turned on.

Attributes <s>

*Name*	*Values*	*Description*
val	*super* \| sub \| off	This overrides the script text style property and either turns off any scripting, or turns on superscript or subscript. By default, (or if this attributes is omitted) superscript is turned on.

Attributes <o><ob><ub>

*Name*	*Values*	*Description*
val	single \| dash \| dot \| double \| wave \| off	This overrides the corresponding text style property and sets the line style accordingly. By default, (or if this attributes is omitted) the single line style is used.

Attributes <c>

*Name*	*Values*	*Description*
val	*small* \| all \| up \| off	This overrides the caps text style property and sets the style accordingly. By default, (or if this attributes is omitted) small caps is used. The up style will show the text as small caps, but index the text as all caps. all makes the word in caps for the text and WordWheel.

Attributes <e>

*Name*	*Values*	*Description*
val	outline \| emboss \| engrave \| shadow \| off	This overrides the effect text style property and sets the style accordingly. (required)

Attributes <f>

*Name*	*Values*	*Description*
fFace	String	Regular font face
fFaceSm	String	Font face used for a small format ETBU.
fFacePrint	String	Font face used for printing.
fFamily	decorative \| default \| modern \| roman \| script \| swiss	Font family. (Not recommended).
fPitch	default \| fixed \| proportional \| variable	Font pitch. (Not recommended).
fQuality	antialiased \| cleartype \| default \| draft \| non-antialised \| proof	Output quality of the font for the text.

Attributes <lex>

*Name*	*Values*	*Description*
st	The name of a <LEX> record in the sifx.	This changes the currently active lexicon. Any words in this element will be indexed into that lexicon.

Attributes <sz>

*Name*	*Values*	*Description*
val	Either a measurement value, or a percentage, or one of the following: s \| v \| t \| L \| V \| H These represent small, very small, tiny, large, very large, and huge sizes relative to the current base font size.	This is a typical measurement value or a percentage. If this is a percentage then the size of the text is calculated dynamically by a percentage of the current window size. This is useful for writing title pages.

Attributes <tt>

*Name*	*Values*	*Description*
st	The code defined in a <TAG> record in the sifx. Single Character	This changes the currently active tag type.

Attributes <cf><cb><co><cu><cob><cub>

*Name*	*Values*	*Description*
val	A color value	This overrides the corresponding color property.

Reference level elements <R/> are used to define a hierarchical structure to the file. This structure is used as an address in hyperlinks to properly position when a hyperlink is taken. Up to eight different hierarchy “trees” can be defined in a single document. These “trees” can be overlapping. For instance, you may want a section/sub-section hierarchy, or a book/chapter/verse hierarchy, or a page/paragraph hierarchy all in the same document. There is no way in XML to define these multiple overlapping hierarchies. Therefore, we have implemented these elements as empty elements. When a reader encounters one of these elements they must record the level and tree and keep this reference active until the next element for the same tree is found.

If you want to index the values of a level, please see the <LVL> element in the <sifx> section.

Attributes <R/>

*Name*	*Values*	*Description*
ref	A reference definition in the following format: l,d[,t]:name Where: l= Code for a <LVL> record. Where: d= Number of the tree depth. Where: t= The tree number (1-8). Where: name= The name of the level.	This is the definition of this reference code. For instance, if we are defining a reference level for the title page of a document in the first reference tree we could define it as such: ref=”S,1:Title Page” The first paragraph in the title page could be: ref=”P,2:1” This assumes that <LVL> records were defined in the sifx that have the codes ‘S’ and ‘P’. Note that the level number for the title page is 1, while the level number of the paragraph is 2. This means that the paragraph is below (or part of) the title page in the hierarchy. Notice also that the tree number has been omitted. The default tree is always the first tree. Any text after the colon is the name of the level. This name may be omitted, however, if it is, then no text will be displayed in the table of contents for this reference level (required).
abrv	String	An abbreviation for the name of the level.
hide	yes \| no	This attribute is used to keep reference codes of this type from appearing in the display of the text. It will not hide the reference code from citations or output operations.
hideOutput	yes \| no	This attribute is used to omit reference codes of this type from citations in copy/paste/print operations. It will not hide them in the text.
internal	yes \| no	This is the same as hide, except that this code and all it’s children are also not shown in TOC or Citations.
attr	A reference attribute definition in the following format: a:name[;a:name]* Where: a= Code for a <ATTR> record. Where: name= The name of the attribute.	Each reference level can be given attributes defined by the author. These attributes consist of a type (defined in the sifx) and a name. For instance, you may want to give a “Topic” to a section: attr=”T:Budget” You may give more than one attribute: attr=”T:Budget;S:President” This assumes that <ATTR> records were defined in the sifx that have the codes ‘T’ and ‘S’ for possibly “Topic” and “Speaker” respectively. These categories are completely author defined and can be used to help limit or bound searches.

The <Re/> element is used to terminate the last level for a given tree. By default, once another reference element is encountered for a given level and tree, and other elements currently active for that tree which have a level greater than or equal to than the new element are automatically terminated. You may use this element to terminate a level manually.

Attributes <Re/>

*Name*	*Values*	*Description*
tree	A tree number (1-8)	This is the tree that should terminate the last reference level. Be default (if this attribute is omitted) the first tree is used.

There are several different types of hyperlinks:

Cross reference hyperlinks. These are used to jump from one section of text to another. The destination may also be in a completely different document.
Image hyperlinks. These are used to display a picture or image in a separate window.
Shell hyperlinks. These are used to spawn a separate application, start an email, or open a web browser.
DDE hyperlinks. These are used to control a second application.

The type of hyperlink can be determined from a code in the st attribute. Other attributes are used to define other properties of the hyperlink. Depending on the style of the hyperlink (either phrase or icon as defined in the <HLS> record in the sifx) the hyperlink is either an empty element (icon) or not (phrase). If the hyperlink is not an empty element, clicking on any text in the hyperlink element will execute the hyperlink.

Attributes <H/>

*Name*	*Values*	*Description*
st	The style of the hyperlink in the following format: t:name Where: t= X \| I \| S \| D Where: name= The name of an <HLS> record.	The type codes are defined as follows: X= A cross-reference hyperlink I= An image hyperlink S= A shell hyperlink (calls the ShellExecute function in Windows) D= A DDE hyperlink (required).
file	A path to a file, usually relative to the location of the current document. The following macros may be used to specify additional paths: %TEXT% - The current document path. %PROGRAM% - The WordCruncher program path.	This is used to specify an external file in: A cross-reference hyperlink (Optional) An image file in an image hyperlink (required) A file, web page, e-mail address, etc. for a shell hyperlink (required) It is not used for DDE hyperlinks – (st=”X\|I\|S”).
fileAux	One or more paths to files using the same rules as the file attribute. Files are specified using the following format: file[;file]*	Auxiliary files used to further define which book from a library should be used as the target. If the desired target is a Book Set, then this attribute can contain the other files in the set. The software will attempt to match each auxiliary file to other files in the book set. This is only used in cross-reference hyperlinks (Optional) - (st=”X”).
cit	A forward slash (/) delimited string of reference names optionally prefaced by a tree number and optionally terminated by a word offset or a reference gap number: [t:]/name[/name]*[ (:word[,subword [,tagword]]) \| (#gap)]	This is the destination reference hierarchy path to position to. Examples might be: cit=”/Section 1/Sub-Section 5” cit=”2:/Page 1/Paragraph 3” cit=”/Section 1/Sub-Section 5:3” This is only used for cross-reference hyperlinks - (st=”X”).
citRng	One or more citation ranges delimited by semicolons: cit[-cit][;cit[-cit]]* Where cit is the same as in the cit attribute, without the optional tree number. Also, the cit can be a relative citation based on the citation in the cit attribute.	This is used to emphasize a range of words or references when a hyperlink is taken. Each citation can be a full citation as defined in the cit attribute (without the optional tree number). Or it can be a relative citation based on the citation given in the cit attribute. If it is a relative citation, the cit should NOT be prefaced by a forward slash, and can optionally have one or more “..” levels which will remove one child level from the base citation. For instance: cit=”/Section/1” citRng=”1-2” will emphasize /Section/1 through /Section/2 cit=”/Section/3/7” citRng=”6-../4/9” will emphasize /Section/3/6 through /Section/4/9. This is only used for cross-reference hyperlinks - (st=”X”).
idx	Number	Some image formats can contain multiple images. This is the index of the image in the file. The default is the first image. This is only used with image hyperlinks - (st=”I”).
page	Number	Some images formats can contain multi-page images. This is used to specify a particular page. This is only used with image hyperlinks - (st=”I”).
rect	This is a rectangle measurement value: m:(x,y,w,h) Where m=t\|p\|i\|c t->twips p->points i->inches c->centimeters Where x,y,w,h= Real numbers. These correspond to the left, top, width, height dimensions of the source image.	This is used to crop the output of an image. It is only used with image hyperlinks - (st=”I”).
op	String	This is an OLE verb such as “open” or “print”. It is only used with shell hyperlinks- (st="S").
cmd	String	This is a user defined command string. It is used with shell hyperlinks and DDE hyperlinks - (st=”S\|D”). If path represents a data file, this should be omitted. The system will find the appropriate executable to show the file if it has been properly registered.
path	A path. The same macros that were used for the file attribute can be used here.	This is a path to use. It is only used with shell hyperlinks - (st=”S”).

Inline images are usually empty elements. If no file attribute is given for an inline image, then the text inside the element must consist of a base64 encoded image. No other text is allowed inside an inline image hyperlink.

Attributes

*Name*	*Values*	*Description*
file	A path to a file, usually relative to the location of the current document. The following macros may be used to specify additional paths: %TEXT% - The current document path. %PROGRAM% - The WordCruncher program path.	This is used to specify an external image file in an inline image (Optional). If this is omitted for an inline image then the text of the element must be a base64 encoded image.
idx	Number	Some image formats can contain multiple images. This is the index of the image in the file. The default is the first image.
page	Number	Some images formats can contain multi-page images. This is used to specify a particular page.
dim	The display dimensions of the image in a paired measurement value format: m:(x,y)[,(x,y)] Where m=t\|p\|i\|c t->twips p->points i->inches c->centimeters Where x,y= Real numbers. The second optional numbers are for small format ETBU.	This is the logical output dimensions of the image. (required) If one of the values is zero, the value will be automatically calculated from the image aspec ratio. If both values are zero, the image size will be calculated from the stored resolution of the image.
desc	String	Image description used in the ETGU. Omit if no image data!
rect	This is a rectangle measurement value: m:(x,y,w,h) Where m=t\|p\|i\|c t->twips p->points i->inches c->centimeters Where x,y,w,h= Real numbers. These correspond to the left, top, width, height dimensions of the source image.	This is used to crop the output of an image.

These elements are used to place small comments or notes above or below the main text in the document. It is commonly used to place furigana text in Japanese documents. The ruby text is positioned relative to any text that is between the begin element (<rt/>) and the end element (<rte/>). Both of these must be empty elements.

Attributes

*Name*	*Values*	*Description*
val	String	This is the ruby text to use. (required).
idx	*yes* \| no	If the ruby text is included within one logical word, the base characters can be replaced with the ruby text character and the new word indexed as well.
disp	*top* \| bottom \| off	The location to place the text.
just	*center* \| left \| right	The justification of the ruby text in relation to the base characters.
sz	Percentage	Size of the ruby text based on a percentage of the base font. A comma separated second value may be included for use in small format ETBUs.
pos	Measurement value	Ruby text will be automatically placed above or below the text. If additional space is desired, this value can be used to raise or lower the ruby text.
st	String[63]	A text style upon which to base the ruby text.
lex	String[63]	A lexicon to place any indexed words.

This element can be used to change the default behavior of the word parse in the WordCruncher Indexer program. The word parser does a very good job at finding the appropriate boundary between words. However, there may be times that the Indexer selects a word boundary that is not optimal for a particular situation. Any text inside a <ch> element (including whitespace) will be part of the current word and will not delimit the word. Please note that a word may still be terminated by the end of a paragraph, change in lexicon, or any other markup (<tab/> for instance) that would otherwise place a physical break in the word. Likewise, the word may not be automatically terminated at the end of the element if it is not immediately followed by a delimiter.

This element is used to create phrasal groups of words which are displayed on the word wheel. Since formatting elements can be used within these groups, and the groups cannot be split up, these are implemented as empty elements. When the reader encounters a group beginning element <g/> this group must remain active until the corresponding group ending element <ge/> is found. The exclusion elements (<gx/> and <gxe/>) work similarly, except these are used to exclude words from the middle of a phrase.

Attributes

*Name*	*Values*	*Description*
idx	Number between 1 and 32767	This is the index of the group. This is used to match up <g/>-<ge/> and <gx/>-<gxe/> elements. If this is omitted, the index defaults to zero. This is useful if nested groupings are desired.
lex	String	A lexicon name in which to place the indexed phrase. This attribute is not used for the <ge/>, <gx/> and <gxe/> elements. If this is omitted, the lexicon will default to either the lexicon specified in the <GRP> element in the sifx, or to the active lexicon when the element was first encountered.
pos	Part of Speech Code	The part of speech used for the whole phrase group, if applicable.

This element is used to include another file. If the included file has its own <sifx> element, it will be ignored. Consequently, indexer warnings/errors arise when the included file has SIFX elements that the main ETAX file does not have defined in its SIFX element. This element cannot be a child element of a or <ptbl> element.

Sample Include Tag

Format:

Sample:

Embeds an exteral file into the ETAX. You can embed ETAX, RTF, TXT, or XML 2003 files.
pathName is a string of the included file's file path in relationship to the main ETAX file.

Part of Speech Tagging

WordCruncher has the ability to attach a part of speech code to each word and include it with the word information in the index (WordWheel). These part of speech codes are built with predefined tags (see below for a chart of tags) which can be organized in a hierarchical manner. For instance, the part of speech code for a singular, common noun can be written with the three tags: “n.comm.sing”. While WordCruncher provides a rich set of predefined part of speech tags, there are occasions where they may be insufficient to fully describe a part of speech code. In these cases, some user defined tags can be added.

The Indexer does not currently tag texts for part of speech. It is recommended that you use an external part of speech tagger like Stanford NLP or TreeTagger to tag your texts. Other resources like NLTK, Stanza, or Spacy with Python are also useful for tagging texts. These part of speech taggers produce abbreviated markup codes like NN1 for singular nouns and NN2 for plural nouns. WordCruncher can also accept these markup codes for parts of speech, but an XML file called the EPOSX is needed to define how these markups are translated to WordCruncher part of speech codes.

Below are a few example EPOSX files for download. They can be used or modified to fit your needs.

C5.eposx

TreeTagger-DE.eposx

stanza-en.eposx

It is encouraged that you use one of the EPOSX files from our GitHub to create your custom EPOSX. However, here is the basic template needed for the file.

</userTags>

</markups>

</eposx>

Attribute	Example Values	Description
title	C5 C8 Stanza-English	Give a name to the type of part of speech, preferably in reference to the name of the tagger used.
wordSeparator	_ (Default) /	The character that is used to mark a word with the part of speech. If an underscore is the word separator, then your text should look like “word_NN0”.
ambiguitySeparator	- (Default) ?	This character is used within part of speech codes. Taggers may provide two parts of speech if there is ambiguity. If a hyphen is used for the ambiguity separator, then your text should look like “swimming_NN0-VBG”.

WordCruncher books are prepared with an underscore as the word separator and a hyphen as the ambiguity separator, as shown below:

WordCruncher has predefined tags that must be the first tag listed in each part of speech code. We call these “Primary” tags. Primary tags can also be used as “Secondary” tags if needed. For example, “n.adv” can be used to mark an adverbial noun. Below are listed the Primary tags. Use the “Tag Code” when defining a part of speech code. The “Tag Name” is given merely for convenience and readability.

Name	Code	Abbreviation
adjective	adj	j
adverb	adv	r
alphabet	alph	z
article	art	a
circumposition	circ	cp
classifier	clf	cl
clitic	clitic	ct
conjunction	conj	c
determiner	det	d
existential	exist	ex
interjection	interj	!
noun	n	n
null	null	nl
numeral	num	#
other	oth	o
postposition	postp	pp
preposition	prep	p
pronoun	pron	pn
particle	ptcl	pt
punctuation	punct	*
suffix	sufix	+
unclassified	uncl	u
verb	v	v

Secondary tags are categorized in this table by the grammatical category that are usually associated with them. However, they can be used as any category based on your part of speech schema. It is recommended that only one tag per category be used when defining a part of speech code. User tags can be added to this list of secondary tags if further defintion is required.

Name	Code	Abbreviation
Verb
lexical	lex	lex
auxiliary	aux	aux
modal	mod	mod
semiauxiliary	semiaux	semiaux
Noun
common	comm	c
proper	prop	pr
Common Noun
unit	unit	unit
direction	dir	dir
temporal	temp	temp
Noun Action
subject	sbj	sbj
object	obj	obj
Adjective and Adverb
comparative	comp	comp
superlative	superl	sup
evaluative	eval	eval
positive	pos	pos
negative	neg	neg
attributive	attr	attr
predicative	pred	pred
degree	deg	deg
Numeral
cardinal	card	card
ordinal	ord	ord
fraction	frac	frac
Conjunction
coordinating	coord	coord
subordinating	subord	sub
correlative	corr	corr
Pronoun and Determiner
definite	def	def
indefinite	indef	indef
demonstrative	dem	dem
exclamative	excl	excl
interrogative	interrog	interrog
personal	pers	pers
reflexive	refl	refl
irreflexive	irrefl	irrefl
relative	rel	rel
substitutive	substit	subst
independent	indep	ind
Tense
conditional	cond	cond
future	fut	fut
past	past	past
present	pres	pres
Aspect
imperfect	imperf	impf
perfect	perf	pf
progressive	prog	prog
aorist	aor	aor
pluperfect	pluperf	plpf
State
absolute	absol	absol
construct	constr	constr
Voice
active	act	act
passive	pass	pass
middle	mid	mid
causative	caus	caus
Mood
indicative	indic	indic
imperative	imp	impv
subjunctive	subj	subj
optative	opt	opt
infinitive	inf	inf
finite	fin	fin
gerund	ger	ger
participle	ptcp	pple
jussive	juss	juss
cohortative	hort	cohrt
Person
firstperson	1	1
secondperson	2	2
thirdpersion	3	3
Case
nominative	nom	n
genitive	gen	g
possessive	poss	poss
dative	dat	d
accusative	acc	a
locative	loc	loc
vocative	voc	voc
instrumental	instr	instr
absolutive	abs	abs
ergative	erg	erg
Gender
masculine	masc	m
feminine	fem	f
neuter	neut	nt
universal	univ	univ
Animacy
animate	anim	anim
inanimate	inanim	inan
human	hum	hum
nonhuman	nonhum	nonhum
Number
singular	sing	s
plural	pl	p
dual	dual	du
Honorifics
formal	forml	forml
informal	inform	inform
polite	pol	pol
royal	roy	roy
Other
abbreviation	abbr	abbr
contraction	contr	contr
foreign	foreign	forn
headline	head	head
title	title	title
marker	mark	mark
pronominal	pronom	pronom
truncated	trunc	trunc

In the <userTags> section of an EPOSX, a user can add up to 64 tag elements that are not specified in secondary tags list. Codes must have no spaces, word separator character, or ambiguity separator character.

Individual tags need to be in an empty <tag/> element. There are two attributes for this tag: code and name.

TABLE

Below are some examples of tags that you can make.

</userTags>

Part of Speech Markups

The <markups> element is the XML element where all markups defined by external taggers are given WordCruncher part of speech code equivalents using the <markup/> element. This element has two required attributes: text and code.

TABLE

Example Text

With the EPOSX prepared correctly, the text in the ETAX needs to follow the character separators. An example paragraph should look like this:

The_AT quick_JJ brown_JJ fox_NN1 jumped_VVD over_II the_AT lazy_JJ dog_NN1.

Alternatively, a user can use the default WordCruncher markup for tagging text. If done this way, you won't need to add anything else to the <EPOSX>.

The_art quick_adj brown_adj fox_n.sing jumped_v.past over_prep the_art lazy_adj dog_n.sing.

Available Features: Search features and the Neighborhood Report are available for part of speech books. Other reports, such as the Phrase Compare Report, will be updated later to use POS functionality.

Punctuation: Normal punctuation that is not defined in as a word separator or ambiguity separator will not be used for parts of speech. That means that it’s okay to have punctuation right after a part of speech tag without adding a zero space <zs/> tag before punctuation.
For example, if a sentence ends with “dog_NN1.”, then the period at the end will not be indexed as part of the POS tag.

If the word separator or ambiguity separator are part of words, then a <ch> tag should be wrapped around these characters. For example, my_function_NN0 will cause a problem because the underscore initiates the part of speech. Since function_NN0 is probably not the desired POS, this should be modified to my<ch>_</ch>function_NN0.

Attributes, Tag Words, Reference Levels: Any text that is in an attribute, tag word, or reference level will not be indexed for POS tags.

The Translation Table Element <ettx></ettx>

The ETTX tag is used for synchronizing two versions or translations of the same book. Books that use the ETTX tag in the WordCruncher Bookstore include The Scriptures, Quran, and TED Talks. An ETTX tag is also necessary when making footnotes for a book unless they are combined in a BookSet.

You should be intimately familiar with the interaction between the <LVL/> tags, <R/> tags, and <TREE/> tags before trying to synchronize books.

When using an ETTX tag between books, you must remember to:

Use the same LVL tags between both books within the SIFX. If both books have chapters, then it would make sense to have <LVL code="c" name="Chapter" plural="Chapters"/> in each SIFX.
Keep the table of contents similar for each book. The more similar your table of contents, the simpler the ETTX tag will be. If the tables of contents are the exact same for each book, then your ETTX will look like this:

Basic ETTX Example

Format/Sample:

<ettx>

</ettx>

The id is a UUID. Generate a UUID:
The name is the same as the name value from your TREE tag.
The tree is the index number of the TREE for which you’re synchronizing. Refer to the number in your TREE tag. This usually will be 1, unless you are using a secondary TREE solely for the purpose of synchronization.

If your table of contents are not the same, you will need to map the differences between the two of them. This is done by adding MAP tags between each tag that is different.

Identify the 1-letter code from the matching <R/> tag.

Create a LEVEL tag that has the attribute type. The value of attribute should be the 1-letter code from the <R/> tag.

Within the LEVEL tag, create a MAP tag that has two attributes: sync and src. Sync refers to the primary book’s reference text, and src refers to the secondary book’s reference text.

Add a MAP tag for each reference level that has differing text.

Table of Contents from The Scriptures

1. Identify the 1-letter code

The <R/> tags for each book looks like this:

Both of these <R/> tags start with V. This is the 1-letter code we’ll need.

2. Create a LEVEL tag

The LEVEL tag has one mandatory attribute: type. Add the 1-letter code as the value of attribute.

3. Create a MAP tag within the LEVEL tag

To make a map between the Holy Bible and Santa Biblia, the MAP tag will look like this:

It should be a child of the LEVEL tag, so place it within the <level type="v"> tag:

</level>

4. Add a MAP tag for each reference level

Repeat step 3 for each reference level in the table of contents, assuming each one has differing text.

…

</level>

Note that a separate LEVEL tag should be made for each 1-letter code that needs to be mapped. For example, the books of The Scriptures, (e.g. Genesis, Exodus), have the code B (for Book). That LEVEL tag looks like this:

</level>

If the translation table is updated after publishing the book to the Bookstore, specific attributes must be changed on the ETTX tag before a new version of the book is added in the Bookstore.

This section needs to be finished!

The Morphology Table Element <emtx></emtx>

This element is identical to the contents of the Morphology Table File described in the Example Files section. If the emtx attribute of the <etax> element is omitted, this element is used.

Morphology Table File (EMTX)

This file defines the optional morphology tables used to search for word attributes (or word tags). All values of name and code attributes are free form based upon the content of the target book. <title> elements are merely organizational. Below is an abbreviated example file:

<?xml version="1.0" encoding="utf-16" standalone="yes"?>

<emtx>

</title>

...

</item>

</title>

...

</item>

</title>

...

</item>

...

</title>

</title>

</tree>

</emtx>