ETAX Documentation

This is the documentation for the XML of WordCruncher's ETAX file. You can use this to learn about different properties available for WordCruncher. If you're using the Indexer to convert file formats like TXT, RTF, and XML 2003, then an ETAX file will be generated. All the tag elements used in WordCruncher are listed and described below.

The contents of the <etax> element consist of an optional, single <sifx> element followed by a list of paragraph elements (any combination of <p> elements or <ptbl> elements) or <include/> elements. Any text contained in this element is ignored unless otherwise specified by a child element.

This is the root element for an ETAX file. The following attributes are used only one at a time for clarity, but multiple attributes can be added to the <ETAX> tag. The sample below contains an example of every attribute, but not every attribute needs to be used.

Sample <etax>

<?xml version="1.0" encoding="utf-8"?>

<etax id="{e5956d68-c65d-4726-b4a8-c79bf4de3b4b}" sifx="book.sifx" ettx="book.ettx" eposx="book.eposx">

...

</etax>

Universal ID for Book

<etax id="guid">

  • Adds an ID to the book, which is useful when a book is being added to the official WordCruncher Bookstore. It is recommended but not necessary for personal use.
  • guid is a universally unique identifier. You can copy one by clicking on this button. The Indexer program also can generate one.

External SIFX Filename

<etax sifx="sifxPath">

  • The SIFX tag contains all of the styling and document information about an individual book. See the Style Information section for more information. The <SIFX> tag can be added as the first tag within the <etax>, but it can also be an external file.
  • sifxPath is the relative file path for the SIFX.
  • For example, if your ETAX file book.etax is in the same directory as your SIFX file book.sifx, then your <etax> tag should look like this: <etax sifx="book.sifx">

External ETTX Filename

<etax ettx="ettxPath">

  • The ETTX tag contains information necessary to synchronize two or more books together. See the Synchronization section for more information. The <ETTX> tag can be added after the <sifx> tag, but it can also be an external file.
  • ettxPath is the relative file path for the ETTX.
  • For example, if your ETAX file book.etax is in the same directory as your ETTX file book.ettx, then your <etax> tag should look like this: <etax ettx="book.ettx">

External EPOSX Filename

<etax eposx="eposxPath">

  • The EPOSX tag contains information necessary to add part of speech search functionality. See the Part of Speech section for more information. The <EPOSX> tag can be added after the <sifx> tag, but it can also be an external file.
  • eposxPath is the relative file path for the EPOSX.
  • For example, if your ETAX file book.etax is in the same directory as your EPOSX file book.eposx, then your <etax> tag should look like this: <etax eposx="book.eposx">

External EMTX Filename

<etax emtx="emtxPath">

  • The EMTX tag contains information necessary to add a morphology search table to the search window. See the Morphology section for more information. The <EMTX> tag can be added after the <sifx> tag, but it can also be an external file.
  • emtxPath is the relative file path for the EMTX.
  • For example, if your ETAX file book.etax is in the same directory as your EMTX file book.emtx, then your <etax> tag should look like this: <etax emtx="book.emtx>

Add ISBN Number to Book

<etax isbn="isbnNum">

  • Adds the ISBN Number to a WordCruncher Book. (Not implemented)
  • isbnNum is the ISBN number for a book.

Set Expiration Date for Book

<etax exp="date">

  • Adds an expiration date for when the book should expire. The book will be usable through the given date.
  • date should be formatted as yyyy-mm-dd

Update Book Edition

<etax edition="editionNum">

  • Tells the bookstore that this is a new edition of the book. This is only used when upgrading reference levels. See English Scriptures as an example.
  • editionNum should be an integer.

This optional element should appear immediately after the <etax> start element and before any other element in the file. It contains additional information about the book.

Name Values Description
title Possible additions being considered. Use as default when adding book to library.
author
publisher
copyright
printDate
revision

This optional element must be a child of the <bookInfo> element. The text of this element will be stored in the library when the book is added to the library. This element has no attributes.

Document Styling

This <sifx> element is used to define styles like the document width and default font sizes, paragraph styles, and text styles. It is also used to define the levels in a table of contents and metadata through attributes and tagwords.

This element is optional and comes as the first element within the <etax> root element. This element has no attributes and any text contained in this element is ignored.

The Document Style Element makes global changes to a WordCruncher book. This includes default colors, margins, tabs, and more. It must be an empty element. The following attributes are used only one at a time for clarity, but multiple attributes can be added to the <DS/> tag. The sample below contains an example of every attribute, but not every attribute needs to be used.

Sample <DS/>

<DS clrTxt="black;white" clrHit="white;172,17,79" clrHilite="black;214,214,214" clrHlink="5,57,113;white" clrReader="black;166,137,220" clrRef="12,100,78;white" tHeight="p:12" lnWidth="i:12" mrgL="p:12" mrgR="p:12" dir="ltr" dirCit="ltr"/>

Default Text Color

<DS clrTxt="foregroundColor;backgroundColor"/>

<DS clrTxt="black;white"/>

  • Changes the color of text not specified in any Text Styles.
  • foregroundColor is a string of an RGB color for the font's color.
  • backgroundColor is a string of an RGB color for the font's background color.

Search Result Hit Color

<DS clrHit="foregroundColor;backgroundColor"/>

<DS clrHit="white;dkblue"/>

  • Changes the color of the current search result hit in the Search Results window. Defaults to white;dkblue.
  • foregroundColor is a string of an RGB color for the font's color.
  • backgroundColor is a string of an RGB color for the font's background color.

Nearby Search Result Hit Color

<DS clrHilite="foregroundColor;backgroundColor"/>

<DS clrHilite="white;dkgray"/>

  • Changes the color of neighboring Search Results hits that are nearby the current search hit result. Defaults to white;dkgray.
  • foregroundColor is a string of an RGB color for the font's color.
  • backgroundColor is a string of an RGB color for the font's background color.

Default Reader Bar Color

<DS clrReader="foregroundColor;backgroundColor"/>

<DS clrReader="white;117,28,198"/>

  • Changes the color of the reader bar, which is used to highlight the line of text where the cursor is located.
  • foregroundColor is a string of an RGB color for the font's color.
  • backgroundColor is a string of an RGB color for the font's background color.

Default Reference Color

<DS clrRef="foregroundColor;backgroundColor"/>

<DS clrRef="21,70,36;white"/>

  • Changes the color of reference level text not specified in the tSt attribute of the Levels. The text of reference levels are not shown by default. They are turned on in Book Options.
  • foregroundColor is a string of an RGB color for the font's color.
  • backgroundColor is a string of an RGB color for the font's background color.

Default Font Size

<DS tHeight="unitMeasurement:measurementSizeDesktop,[measurementSizeMobile]"/>

<DS tHeight="p:12,11"/>

  • Changes the default font size for text without a font size specified in any Text Styles .
  • unitMeasurement refers to one of the screen independent units of measurement: points (p), twips (t), inches (i), or centimeters (c).
  • measurementSizeDesktop is an integer for the font size used for desktop monitors (Windows).
  • measurementSizeMobile is an integer for the font size used for mobile devices (iOS).

Line Width of Document

<DS lnWidth="unitMeasurement:measurementSizeDesktop,[measurementSizeMobile]"/>

<DS lnWidth="i:12,8"/>

  • Changes the maximum width of the document. If the window is smaller than the specified size, text will wrap according to the wrap attribute of Paragraph Styles .
  • unitMeasurement refers to one of the screen independent units of measurement: points (p), twips (t), inches (i), or centimeters (c).
  • measurementSizeDesktop is an integer for the font size used for desktop monitors (Windows).
  • measurementSizeMobile is an integer for the font size used for mobile devices (iOS).

Change Document Left Margin

<DS mrgL="unitMeasurement:measurementSizeDesktop,[measurementSizeMobile]"/>

<DS mrgL="i:1">

  • Changes the margin width on the left side of the document.
  • unitMeasurement refers to one of the screen independent units of measurement: points (p), twips (t), inches (i), or centimeters (c).
  • measurementSizeDesktop is an integer for the font size used for desktop monitors (Windows).
  • measurementSizeMobile is an integer for the font size used for mobile devices (iOS).
  • Click here to open a conversion tool between the four measurements.

Change Document Right Margin

<DS mrgR="unitMeasurement:measurementSizeDesktop,[measurementSizeMobile]"/>

<DS mrgR="i:1">

  • Changes the margin width on the right side of the document.
  • unitMeasurement refers to one of the screen independent units of measurement: points (p), twips (t), inches (i), or centimeters (c).
  • measurementSizeDesktop is an integer for the font size used for desktop monitors (Windows).
  • measurementSizeMobile is an integer for the font size used for mobile devices (iOS).
  • Click here to open a conversion tool between the four measurements.

Layout Direction

<DS dir="layoutDirection"/>

<DS dir="rtl"/>

  • Determines the layout direction.
  • layoutDirection can either be ltr (left-to-right) or rtl (right-to-left).
  • Books that are predominately right-to-left text should be set to rtl. Individual paragraphs can still be set to ltr if the text has both left-to-right and right-to-left text.

Interface Layout Direction

<DS dirCit="layoutDirection"/>

<DS dirCit="rtl"/>

  • Determines the layout direction of the citation above the Book View window.
  • layoutDirection can either be ltr (left-to-right) or rtl (right-to-left).

Concordance

  • Marks the document as a concordance.
  • Concordance can either be yes or no.

Dictionary

<DS dictionary="option"/>

<DS dictionary="yes"/>

  • If this attribute exists, the document is marked as a dictionary. This is the lexicon file that the dictionary uses. Cannot be used with searchlex or thesaurus.
  • option can either be yes or no.

Index Style Select

<DS idxOff="style"/>

<DS idxOff="bold"/>

  • This selects styles that are not indexed by default.
  • style can be any combination of the following values: bold, italic, script, underline, strikeout, revised, overbar, underbar, caps, effect, and hidden.

Dictionary Entry Level

  • The level type code to use as the entry level for the dictionary. Ignored if dict is not output.

Lowest Level Type

  • This is the level type to be used in “lowest level” outputs (copy).

Lowest Level Search

  • This is the level type to be used in “lowest level” searches.

Search Lexicon Entry Level

  • The level type code to use as the entry level for the search lexicon. Ignored if searchlex is not output.

Thesaurus Entry Level

  • The level type code to use as the entry level for the thesaurus. Ignored if thesaurus is not output.

Tag Window Default Paragraph Style

  • The default paragraph style to be used in the Tag Window.

Ruby Text Display

<DS rtDisp="location"/>

<DS rtDisp="top"/>

  • Where to display the ruby text in relation to the base text.
  • location can be top, bottom, or off.

Ruby Text Index

<DS rtIdx="option"/>

<DS rtIdx="yes"/>

  • Whether or not to index the ruby text.
  • option can either be yes or no.

Ruby Text Justification

<DS rtJust="option"/>

<DS rtJust="center"/>

  • Justification of the ruby text.
  • option can be center, left, or right.

Ruby Text Lexicon

  • Name of the lexicon this ruby text should be included in. If omitted, the ruby text is included in the same lexicon of the base text.

Ruby Text Position

<DS rtPos="value"/>

<DS rtPos="CODE"/>

  • Adjustment of the position of the ruby text.
  • value is a measurement value .
  • Click here to open a conversion tool between the four measurements.

Ruby Text Size

<DS rtSize="percentage"/>

<DS rtSize="80"/>

  • Size of the ruby text based on a percentage of the height of the base characters.
  • percentage can be any integer between 40 and 80 (default is 60).

Ruby Text Style

  • Text style to us for the ruby text. The size specified in the text style is ignored.

Search Lexicon

<DS srchlex="option"/>

<DS srchlex="yes"/>

  • If this attribute exists, the document is marked as a search lexicon. Cannot be used with dict or thesaurus.
  • If a book is marked as a thesaurus, an attribute type with the code ‘T’ and name ‘Type’ is predefined. Currently there are no values defined for this attribute type.
  • option can either be yes or no.

Thesaurus

<DS thesaurus="option"/>

<DS thesaurus="yes"/>

  • If this attribute exists, the document is marked as a thesaurus. Cannot be used with dict or srchlex.
  • If a book is marked as a thesaurus, an attribute type with the code ‘T’ and name ‘Type’ is predefined. Currently defined values for this attribute type are ‘Syn’ and ‘Ant’ for synonym and antonym respectively. This attribute type can be used to define the type of sibling word.
  • option can either be yes or no.

Zoom

<DS zoom="percentage"/>

<DS zoom="CODE"/>

  • This is the general zoom percentage of the document.
  • percentage can be any integer between 50 and 300 (default is 100).

This is the paragraph style element. It must be an empty element. The first paragraph style listed will be considered the default paragraph style.

Style Name

<PS st="string"/>

  • The name of the paragraph style (required).
  • string is the name of the style you wish to use.
  • Note that the first <PS/> tag will be considered the default style. If a <p> tag does not have an st attribute, it will use the first defined style.

Paragraph Wrapping Style

<PS wrap="wrappingStyle"/>

  • Changes the style of how paragraphs will wrap at the end of the document width.
  • wrappingStyle can be one of two wrapping styles:
    1. win will wrap text to the next line at the end of the document width (default value).
    2. line will NOT wrap text. A horizontal bar will appear to allow users to scroll through the text. (Used for books that need to maintain the original line structure).

Paragraph Direction

<PS dir="layoutDirection"/>

<PS dir="ltr"/>

  • Determines the paragraph direction.
  • layoutDirection can either be ltr (left-to-right) or rtl (right-to-left).

Paragraph Justification

<PS just="justifySetting"/>

<PS just="center"/>

  • Determines the paragraph justification.
  • justifySetting can be left, center, right, or full.

Text Style

<PS tSt="string"/>

  • The name of a text style associated with this paragraph style. If no text style is active, this style will be used.
  • string is the name of the text style you wish to use.

Space Before

<PS spB="value"/>

  • Extra space added before the paragraph.
  • value is a measurement value .
  • Click here to open a conversion tool between the four measurements.

Space After

<PS spA="value"/>

  • Extra space added after the paragraph.
  • value is a measurement value .
  • Click here to open a conversion tool between the four measurements.

Line Height

<PS lnHeight="value"/>

  • Fixed height of the line.
  • value is a measurement value .
  • Click here to open a conversion tool between the four measurements.

Line Spacing

<PS lnSp="value"/>

  • Line spacing multiplier.
  • value can be any real number between 0.25 and 32.0 (default is 1.0).

First Line Indent

<PS indF="value"/>

  • First line indent.
  • value is a measurement value .
  • Click here to open a conversion tool between the four measurements.

Left Indent

<PS indL="value"/>

  • Left indent.
  • value is a measurement value .
  • Click here to open a conversion tool between the four measurements.

Right Indent

<PS indR="value"/>

  • Right indent.
  • value is a measurement value .
  • Click here to open a conversion tool between the four measurements.

Background Color of Text

<PS clrBk="color"/>

  • Set the background color for the paragraph. Defaults to white. clrBk is a string of an RGB color for the font's color.

Define Tab Measurements

<PS tabs="values"/>

<PS tabs="i:L2;R3"/>

  • Set the measurements of tabs for a paragraph. To add tabs in your paragraph, refer to the <tab/> section under Paragraph Format Elements.
  • values for one tab consists of m:Tln[,n] where:
  • m can be t = twips, p = points, i = inches, or c = centimeters.
  • T can be L = left, R = right, C = center, or D = decimal.
  • l is optional to set the style of tab. If omitted, you'll get whitespace like any normal person would use. l can be d = dash, p = dot, u = underline, w = dot blank, x = underline dot, or y = dash blank.
  • n is the measurement size for m. For example, a 12-point left-justified tab would be p:L12.
  • [,n] is an optional measurement size for m for iOS devices. The square brackets must be omitted, and are shown here only to demonstrate its optional status. For example, you can set a 12-point left-justified tab for the Windows program and a 6-point tab for the iOS app with p:L12,6.
  • To modify multiple tabs within a paragraph, you can add additional ;Tln[,n] codes after the initial set. For example, a 12-point left-justified tab followed by a 24-point center-justified tab could be set with p:L12;C24.

Default Tabs

<PS tabsDef="value"/>

  • Default tab sets for the paragraph. These sets occur after the last explicitly defined tab set.
  • value is a single tab measurement value.

Left Border

<PS bdrL="values"/>

<PS bdrL="t:20;36;double;red"/>

  • The border to use for the left of the paragraph.
  • values can be one or multiple semicolon separated values, of the format: m:n[,n];p[,p];l;clr where:
    • m can be t = twips, p = points, i = inches, or c = centimeters.
    • n is the measurement size for m.
    • [,n] is an optional measurement size for m for iOS devices. The square brackets must be omitted, and are shown here only to demonstrate its optional status.
    • p is the measurement size for padding.
    • [,p] is an optional measurement size for padding for iOS devices. The square brackets must be omitted, and are shown here only to demonstrate its optional status.
    • l can be single, double, dot, dash, or wave.
    • clr is a color value for the border color.
    • NOTE: all semicolon separated sections are optional, however the semicolons are not.

Right Border

<PS bdrR="values"/>

  • The border to use for the right of the paragraph.
  • For more information on the values this attribute takes, see the Left Border section.

Top Border

<PS bdrT="values"/>

  • The border to use for the top of the paragraph.
  • For more information on the values this attribute takes, see the Left Border section.

Bottom Border

<PS bdrB="values"/>

  • The border to use for the bottom of the paragraph.
  • For more information on the values this attribute takes, see the Left Border section.

This is the Text Style element. This element customizes the text at a character level. These are typically applied to paragraphs by adding the tSt attribute to the <PS/> tag. They can also define tagwords, which are used for applying metadata to a word or phrase.

Style Name

<TS st="string"/>

  • The name of the text style (required).
  • string is the name of the style you wish to use.

Text Height

<TS tHeight="value"/>

  • Height of the font used for displaying text.
  • Click here to open a conversion tool between the four measurements.

Text Width

<TS tWidth="value"/>

  • Width of the font used for displaying text (Not recommended).
  • Click here to open a conversion tool between the four measurements.

Lexicon Name

<TS lexSt="string"/>

  • The name of the lexicon associated with this text style. Lexicons (or word lists) are often used to separate text from headings, footnotes, etc.
  • string is the name of the lexicon you wish to use.

Font Face

<TS fFace="String"/>

  • Regular font face.
  • string is the name of the font face you wish to use.

Small Font Face

<TS fFaceSm="String"/>

  • Font face used for a small format ETBU.
  • string is the name of the font face you wish to use.

Alternate Font Face

<TS fFaceAlt="String"/>

  • Alternate font face to use if fFace does not exist.
  • string is the name of the font face you wish to use.

Font Family

<TS fFamily="option"/>

  • Font family (Not recommended).
  • option can be decorative, default, modern, roman, script, or swiss.

Font Pitch

<TS fPitch="option"/>

  • Font pitch (Not recommended).
  • option can be default, fixed, proportional, or variable.

Font Quality

<TS fQuality="option"/>

  • Output quality of the font for the text.
  • option can be antialiased, cleartype, default, draft, non-antialiased, or proof.

Tag Type

<TS tagtype="code"/>

  • Name of a tag type style to apply to this text style.
  • code is a single character tag type code.

Text Color

<TS clrTxt="value"/>

  • Color of the text.
  • value is the color values for the foreground and background, formatted as: [foreground;background].

Underline Color

<TS clrUnderline="value"/>

  • Color of the underline.
  • value is a single color value.

Strikeout Color

<TS clrStrikeout="value"/>

  • Color of the strikeout.
  • value is a single color value.

Overbar Color

<TS clrOverbar="value"/>

  • Color of the overbar
  • value is a single color value.

Underbar Color

<TS clrUnderbar="value"/>

  • Color of the underbar
  • value is a single color value.

Character Properties

<TS chrProp="option"/>

  • Character style of the text.
  • option can be one or many of each of the following options, but only one from each category:
    • bold
    • italic
    • superscript | subscript
    • hidden
    • revised
    • noindex
    • underline | dash-underline | dot-underline | double-underline | wave-underline
    • emboss | engrave | outline | shadow
    • smcaps | allcaps | smcaps-up
    • subword
    • tag
    • strikeout | dash-strikeout | dot-strikeout | double-strikeout | wave-strikeout
    • overbar | dash-overbar | dot-overbar | double-overbar | wave-overbar
    • underbar | dash-underbar | dot-underbar | double-underbar | wave-underbar

This is the lexicon element. This element is used to define separate categories that words in the document can be stored in. At least one <LEX> element must be defined. These are empty elements.

Style Name

<LEX st="string"/>

  • The name of the lexicon (required).
  • string is the name of the lexicon you wish to use.

Language

<LEX id="lang"/>

  • The logical language of all the words in the lexicon.
  • lang is the standard language name or standard language abbreviation for the lexicon.

Decimal Separator

<LEX dec="character"/>

  • The character that should be used as the decimal separator for the lexicon. If specified, then grp must also be specified.

Group Separator

<LEX grp="character"/>

  • The character that should be used as the numeric group separator for the lexicon. If specified, then dec must also be specified.

Ignore Characters

<LEX chrIgn="string"/>

  • List of ignore characters. These are characters that will not be included in the text of a word.
  • string is the characters you wish to ignore.

Break Characters

<LEX chrBrk="string"/>

  • List of break characters. These are characters that will automatically break a word.
  • string is the characters you wish to use as break characters.

No-break Characters

<LEX chrNobrk="string"/>

  • List of no-break characters. These are characters that will not automatically break a word.
  • string is the characters you wish to use as no-break characters.

Text Style

<LEX tSt="string"/>

  • Default style for the lexicon.
  • string is the name of the text style you wish to use.

Word Break Mode

<LEX wrdbrk="option"/>

  • Word breaking mode for this lexicon.
  • option can be hidden-nobrk, style-brk, or script-nobrk.

This is the hyperlink style element. These define general styles used for hyperlinks. It must be an empty element.

Style Name

<HLS st="string"/>

  • The name of the text style (required).
  • string is the name of the style you wish to use.

Hyperlink Type

<HLS type="option"/>

  • The hyperlink type.
  • option can either be icon or phrase.

Text Height

<HLS tHeight="value"/>

  • Height of the font used for displaying text.
  • Click here to open a conversion tool between the four measurements.

Text Width

<HLS tWidth="value"/>

  • Width of the font used for displaying text (Not recommended).
  • Click here to open a conversion tool between the four measurements.

Font Face

<HLS fFace="string"/>

  • Regular font face.
  • string is the name of the font face you wish to use.

Small Font Face

<HLS fFaceSm="string"/>

  • Font face used for a small format ETBU.
  • string is the name of the font face you wish to use.

Print Font Face

<HLS fFacePrint="String"/>

  • Font face used for printing.
  • string is the name of the font face you wish to use.

Font Family

<HLS fFamily="option"/>

  • Font family (Not recommended).
  • option can be decorative, default, modern, roman, script, or swiss.

Font Pitch

<HLS fPitch="option"/>

  • Font pitch (Not recommended).
  • option can be default, fixed, proportional, or variable.

Font Quality

<HLS fQuality="option"/>

  • Output quality of the font for the text.
  • option can be antialiased, cleartype, default, draft, non-antialiased, or proof.

Text Color

<HLS clrTxt="value"/>

  • Color of the text.
  • value is the color values for the foreground and background, formatted as: [foreground;background].

Underline Color

<HLS clrUnderline="value"/>

  • Color of the underline.
  • value is a single color value.

Strikeout Color

<HLS clrStrikeout="value"/>

  • Color of the strikeout.
  • value is a single color value.

Overbar Color

<HLS clrOverbar="value"/>

  • Color of the overbar
  • value is a single color value.

Underbar Color

<HLS clrUnderbar="value"/>

  • Color of the underbar
  • value is a single color value.

Character Properties

<HLS chrProp="option"/>

  • Character style of the text.
  • option can be one or many of each of the following options, but only one from each category:
    • bold
    • italic
    • superscript | subscript
    • hidden
    • revised
    • noindex
    • underline | dash-underline | dot-underline | double-underline | wave-underline
    • emboss | engrave | outline | shadow
    • smcaps | allcaps | smcaps-up
    • subword
    • tag
    • strikeout | dash-strikeout | dot-strikeout | double-strikeout | wave-strikeout
    • overbar | dash-overbar | dot-overbar | double-overbar | wave-overbar
    • underbar | dash-underbar | dot-underbar | double-underbar | wave-underbar

Emphasis

<HLS em="option"/>

  • Forces emphasis of the target
  • option can either be yes or no.

Library

<HLS lib="option"/>

  • Forces the execution of the hyperlink to look in the Library first for a matching target.
  • option can either be yes or no.

This is the level type element. It must be an empty element.

Level Type Code

<LVL code="code"/>

  • The single character code to be used for the level type (required).
  • code is a single character.

Level Name

<LVL name="string"/>

  • The name of the level type (required).
  • string is the name of the level type you wish to use.

Plural Name

<LVL plural="string"/>

  • The plural name of the level type (Recommended).
  • string is the plural name of the level type you wish to use.

Separator

<LVL sep="string"/>

  • String to use in the citation line just before displaying this type of level.
  • string is the character(s) you wish to use as a separator.

Text Style

<LVL tSt="string"/>

  • Text style to use Regular font face.
  • string is the name of the text style you wish to use.

Hide

<LVL hide="option"/>

  • This attribute is used to keep reference codes of this type from appearing in the display of the text. It will not hide the reference code from citations or output operations.
  • option can either be yes or no.

Hide Output

<LVL hideOutput="option"/>

  • This attribute is used to omit reference codes of this type from citations in copy/paste/print operations. It will not hide them in the text.
  • option can either be yes or no.

Internal

<LVL internal="option"/>

  • This is the same as hidden, except that this code and all it’s children are also not shown in TOC or Citations.
  • option can either be yes or no.

Tag Type

<LVL tagtype="character"/>

  • Name of a tag type style to apply to this level type. The name and abbreviation of every level of this type will be indexed as the given tag type. If this is omitted, then the tag type defined in tSt will be used.

Lexicon Style

<LVL lexSt="string"/>

  • The name of the lexicon associated with this level type. Used only if the name will be indexed. If this is omitted, then the lexicon defined in tSt will be used if it is defined, otherwise the indexer will use the currently defined lexicon.
  • string is the name of the lexicon style you wish to use.

This is the attribute element. These are used as attributes for reference levels. It must be an empty element.

Attribute Type Code

<ATTR code="code"/>

  • The single character code used for the attribute type (required).
  • code is a single character.

Level Name

<ATTR name="string"/>

  • The name of the level type (required).
  • string is the name of the level you wish to use.

Plural Level Name

<ATTR plural="string"/>

  • The plural name of the level type (Recommended).
  • string is the plural name of the level you wish to use.

Tag Type

<ATTR tagtype="character"/>

  • Name of a tag type style to apply to this attribute type. The name of every attribute of this type will be indexed as the given tag type.
  • character is a single character.

Lexicon Style

<ATTR lexSt="string"/>

  • The name of the lexicon associated with this attribute type. Used only if the name will be indexed. If this is omitted, then the lexicon defined in the tSt attribute of the Level Type for the given reference level will be used. If that is not defined, the indexer will use the currently defined lexicon.
  • string is the name of the lexicon style you wish to use.

This is the tag type element. It must be an empty element. You can specify a maximum of 14 different tag types.

Level Type Code

<TAG code="code"/>

  • The single character code to be used for the level type (required).
  • code is a single character.

Level Name

<TAG name="string"/>

  • The name of the level type (required).
  • option can either be yes or no.

Plural Level Name

<TAG plural="string"/>

  • The plural name of the level type (Recommended).
  • string is the plural name of the level you wish to use.

Expand Tags

<TAG expand="option"/>

  • If yes, showing this tag type will automatically cause the expansion of any generic tag sections.
  • option can either be yes or no.

This is the reference tree element. Up to eight reference hierarchies can be defined in a document. This element allows the author to define a name for each of these “trees”. It must be an empty element.

Index

<TREE idx="number"/>

  • The index of the reference tree (required).
  • number is an integer between 1 and 8.

Reference Tree Name

<TREE name="string"/>

  • The name to associate with the reference tree (required).
  • string is the name you wish to use for the tree.

Abbreviations

<TREE abrv="option"/>

  • If yes, then abbreviations will be shown in citations.
  • option can either be yes or no.

Show Citations

<TREE show="option"/>

  • If yes, then references from this tree will be shown in citations. If all trees are unspecified or set to no, then only the default tree will show.
  • option can either be yes or no.

Default Reference Tree

<TREE default="option"/>

  • Specifies the default reference tree. Only one tree can be marked as the default tree. If no trees are marked as the default, the first populated tree is used.
  • option can either be yes or no.

This is the phrase group element. This defines global properties for any phrase group that is used in the document. It must be an empty element.

Index

<GRP idx="index"/>

  • The index of the phrase group (required).
  • index is an integer between 0 and 32000.

Lexicon Style

<GRP lexSt="string"/>

  • The lexicon name to use for the phrase group (required).
  • string is the name of the lexicon you wish to use.

The <DRM/> tag is called the copyright management tag because it allows publishers to set specific restrictions on a text. These restrictions prevent users from copying too much of a copyrighted book while still allowing them to copy modest snippets of text for research. Books must have at least 1,000 words in them to make the <DRM/> tag work.

Sample <DRM/>

<DRM restrictHiddenText="yes" KWICStyleReferenceList="yes" maxWordSpan="10" minWordGap="5" outputThreshold="500" maxOutputWords="3000" />

Restrict Hidden Text from Output

<DRM restrictHiddenText="boolean" />

<DRM restrictHiddenText="yes" />

  • Prohibits hidden text from being output in any copy/print/export operation.
  • By default, copying a portion of text that is marked hidden will show up visibly in programs like Word.
  • boolean can be yes or no

KWIC Style Reference List

<DRM KWICStyleReferenceList="boolean" />

<DRM restrictHiddenText="yes" />

  • Changes the Search Results view into KWIC (Keywords in context) list rather than the default context-based view.
  • The maxWordSpan attribute is required when enabling the KWIC Style Reference List.
  • This feature is planned as a future development. It is not implemented yet.
  • boolean can be yes or no

Define Search Results Word Span

<DRM maxWordSpan="integer" />

<DRM maxWordSpan="10" />

  • Defines the maximum number of words before and after any hit word/phrase that can be output from the Search Results or similar reports.
  • If KWICStyleReferenceList is enabled, this also defines the maximum number of words that can be shown before and after the hit word/phrase.
  • Defaults to 10.
  • If integer is zero, then output and display revert to normal.
  • Max value is set to 1,000 but realistically should not exceed 100.

Require Word Gap between Search Results

<DRM minWordGap="integer" />

<DRM minWordGap="20" />

  • Prevents users from copying adjacent parts of the text, like when searching for all words in a text with *.
  • For situations where maxWordSpan applies, this is the maximum number of words that must occur between adjacent entries. If the gap is not satisified by the next hit, then context around adjacent entries will be hidden until the gap is satisfied.
  • The hit word/phrase will always be output.
  • integer is a value between 1 and 4,000.
  • If the value is zero, then overlap of adjacent entries is allowed (default behavior).

Define Total Words Allowed in Single Output

<DRM outputThreshold="integer" />

<DRM outputThreshold="500" />

  • Defines the maximum number of words that can be output from the Book View in any one copy/print/export.
  • A warning will also be given each time cumulative output counts exceed this threshold.
  • If integer is zero, no limit will be established on individual output operations and warnings will be based on percentage of maxOutputWords.

Limit Total Words Allowed in All Outputs

<DRM maxOutputWords="integer" />

<DRM maxOutputWords="3000" />

  • Sets a limit on the total number of words that any one user is allowed to copy. For a normal sized book, a typical value may be around 10% of the book.
  • If the value is defined and is zero, all output is disabled. If this value is not defined, per user totals are not recorded. However, warnings may still be given if the copy threshold is violated.

This is the font embedding element. It must be an empty element. Use this element to embed a font that is used within the document in the ETBU file. The font will be extracted and installed when the document is opened in the viewer. This installation of the font is not permanent. It will be removed when the book is closed. Consequently, if a user copies text that uses a font not on their computer, programs like Word will look for an alternative font to use.

Font Family

<EMBED fFamily="string"/>

  • The family name of the font. For example: “Lato” or “Quivira” or “Roboto”. Do not specify a specific style such as bold or italics. The indexer will scan the system and embed all style variants of the requested font family (required).
  • string is the name of the font family you wish to use.

These elements provide information needed to translate citations from a previous version of the file to the current version. For instance, if in a previous ETB version (ver. 5) a particular citation was: “/Introduction” and this was changed to “/Title/Introduction” in the new version (ver. 7), this element will provide enough information to make this translation. This information is used primarily during the upgrade of note files that were attached to previous versions of the document in order to translate old citations to the new ones so that the new position of the notes can be located. It is NOT used in the indexing of the document.

Each <UPGRADE> element includes one or more empty <REF/> elements. Each <REF/> element defines one citation translation. Multiple (up to 4) <UPGRADE> elements can be included, one for each previous version. However, at this point there is only one previous version that has been released.

Attributes <UPGRADE><UPGRADE/>

Name

Values

Description

ver

5, 7

The previous file format version. This is usually 5 (non-Unicode) or 7 (Unicode).

file

File title. String[63]

This is the previous file title (i.e. the File name without the extension. (required).

codepage

Code page integer identifier.

1250 – Central European (Windows)

1251 – Cyrillic (Windows)

1252 – Western European (Windows)

1253 – Greek (Windows)

1254 – Turkish (Windows)

1255 – Hebrew (Windows)

1256 – Arabic (Windows)

1257 – Baltic (Windows)

Others…

This is the codepage that will be used to translate the old citation to Unicode. (required). (May only be needed for 5)

edition

Integer

Edition of the file. Note: You also have to add the edition attribute to <etax>.

Attributes <REF/>

Name

Values

Description

old

Citation (without offset)

This is the citation in the previous version that is in need of translation. It can be a partial citation. (required).

new

Citation (without offset)

This is the new citation. When a citation is in need of translation, if an old citation is found, then it will be replaced with the new citation. A citation is considered a match if the old citation completely matches the citation being translated (up to the number of levels defined). If the citation in question has levels beyond the match, they are concatenated onto the end of the translated citation. All entries will be checked and the longest match will be used for translation. (required).

children

yes | no

If yes, then the citation will only be translated if the citation in question has child levels beyond the match.

codepage

Code page integer identifier.

1250 – Central European (Windows)

1251 – Cyrillic (Windows)

1252 – Western European (Windows)

1253 – Greek (Windows)

1254 – Turkish (Windows)

1255 – Hebrew (Windows)

1256 – Arabic (Windows)

1257 – Baltic (Windows)

Others…

This is the codepage that will be used to translate the old citation to Unicode. This will override the codepage specified in the <UPGRADE> element.

Attributes <EMBED/>

Name

Values

Description

fFamily

Font Family Name

Example:

<EMBED fFamily=”Quivira”/>

Embed a custom font that is installed on your computer.

This is the index options element. It must be an empty element.

Attributes

Name

Values

Description

wrdbrk

Any combination of the following values:

hidden-nobrk | style-brk | script-nobrk

Word breaking options. This is the default for the whole document and can be overridden by the same attribute in the <LEX> element.

stopwrds

stop | go

If a stopword file exists, this tells whether the words are stopwords or gowords.

comp

Any combination of the following values:

off | on | text | index

Compression options. Default is on.

Paragraph Elements

The body of an ETAX document consists of a list of paragraph elements. There are currently two types of paragraphs: normal and table.

This is the normal paragraph element. This paragraph element can have any attributes that a <PS/> element can have except the name attribute. These attributes become overrides to the currently active paragraph style. The <p> element can also include the following attribute:

Paragraph Style

<p st="string"/>

  • The name of the paragraph style to use. If omitted the default paragraph style is assumed (i.e. the first paragraph style listed in the <sifx> element).
  • string is the name of the style you wish to use.

This is the table paragraph element. It is considered an alternate type of paragraph and marks the start of a table. The <ptbl> element can have any of the attributes that the <p> element can have plus the following:

Column Width

<ptbl col="measurements"/>

Sample tables with three columns:

<ptbl col="*;*;*"/>

<ptbl col="50%;20%;30%"/>

<ptbl col="p:12,10;20%;*"/>

  • The width of each column in the table. It is recommended the number of columns in a table be limited to no more than 100.
  • measurements is a list of measurement values separated by semicolons. Each measurement value can be substituted with either an asterisk (*) or a percentage to represent an automatically calculated column width or a width based on a percentage of the line width.
  • NOTE: In a Word XML file, if any cell in a column has a different cell margin than the default for the table, an asterisk (*) well be used as the width for the column

Minimum Column Width

<ptbl colMin="value"/>

  • The minimum width of any column in the table
  • value is a measurement value .

Table Type

<ptbl tblType="option"/>

  • Visual style of the table
  • option can either be flat or 3d.

Cell Vertical Alignment

<ptbl valign="option"/>

  • Vertical alignment of the text in each cell.
  • option can be top, center, or bottom.

Horizontal Padding

<ptbl hpad="value"/>

  • Horizontal internal cell padding.
  • value is a measurement value .

Vertical Padding

<ptbl vpad="value"/>

  • Vertical internal cell padding.
  • value is a measurement value .

Spacing

<ptbl spc="value"/>

  • Spacing in-between adjacent cells
  • value is a measurement value .

Border

<ptbl bdr="value"/>

  • Border width.
  • value is a measurement value .

Inner Border

<ptbl inbdr="value"/>

  • Inner border width.
  • value is a measurement value .

Border Color

<ptbl clrBdr="value"/>

  • Color of the borders.
  • value is the color value for the border.

The Table Row Element <trow></trow>

This is the table row element. It defines each row of a table. This element contains a list of <tcell> elements. Any text contained within a <trow> element is ignored unless it is within a child <p> element. This element can have any of the same attributes that a <tcell> element can have, except for spanCol and spanRow. These attributes will apply to each table cell on the row unless specifically overridden by the <tcell> element.

The Table Cell Element <tcell></tcell>

This is the table cell element. It defines each cell contained in a table row. This element contains a list of paragraph elements (either <p> or <ptbl> elements). Any text contained within a <tcell> element is ignored unless it is within a child <p> element. This element can contain the following attributes:

Cell Vertical Alignment

<ptbl valign="option"/>

  • Vertical alignment of the text in each cell
  • option can be top, center, or bottom.

Horizontal Padding

<ptbl hpad="value"/>

  • Horizontal internal cell padding.
  • value is a measurement value .

Vertical Padding

<ptbl vpad="value"/>

  • Vertical internal cell padding.
  • value is a measurement value .

Column Span

<ptbl spanCol="number"/>

  • Number of columns for this cell to span. This number cannot exceed the number of columns left in the row. If a cell spans multiple columns, the spanned cells are NOT omitted.
  • number is an integer between 2 and 32.

Row Span

<ptbl spanRow="number"/>

  • Number of rows for this cell to span. If a cell spans multiple rows, the spanned cells on the following rows are NOT omitted.
  • number is an integer between 2 and 32.

Background Color

<ptbl clrBk="value"/>

  • Background color of the cell. This color overrides the background color of a table.
  • value is the color value for the border.

Left Border

<ptbl bdrL="value"/>

  • The border to use for the left of the cell (see <PS/>).
  • value is the border you wish to use.

Right Border

<ptbl bdrR="value"/>

  • The border to use for the right of the cell (see <PS/>).
  • value is the border you wish to use.

Top Border

<ptbl bdrT="value"/>

  • The border to use for the top of the cell (see <PS/>).
  • value is the border you wish to use.

Bottom Border

<ptbl bdrB="value"/>

  • The border to use for the bottom of the cell (see <PS/>).
  • value is the border you wish to use.

Inside a paragraph definition (<p>) all printable characters will be included in the text of a paragraph. Any character with a Unicode value less than a SPACE (i.e. tabs, line feeds, carriage returns, etc.) will be ignored. This allows for some formatting of the paragraph text in the XML document. Several empty elements are used instead of these characters:

Insert Tab Character

<p>Leroy<tab/>Jenkins<tab/>24<tab/>$95,000</p>

  • <tab/> adds a tab character.
  • The measurements for tab spacing can be set in two ways:
    1. Adding a tabs attribute directly on the <p> tag. For example: <p tabs="p:L12;L24;R36;R48">...</p>
    2. Adding a st (style name) attribute to the <p> tag and defining the tabs attribute in the assigned <PS/> tag.

Attributes

Element

Description

<br/>

Inserts a hard return. This does NOT break the logical paragraph.

<w/>

Inserts a hard word break.

<sp/>

Inserts a hard non-breaking space.

<xsp/>

Inserts a hard breaking space.

<zs/>

Inserts a zero width space.

<l/>

Inserts a left indent. This is a tab that also sets the left indent property for the rest of the paragraph.

NOT RECOMMENDED. Use the paragraph style indents instead.

<r/>

Inserts a right indent. The same as the left indent, except extends from the right side of the paragraph.

NOT RECOMMENDED. Use the paragraph style indents instead.

<d/>

Inserts a double indent. This is equivalent to inserting both a left and a right indent simultaneously.

NOT RECOMMENDED. Use the paragraph style indents instead.

<lm/>

Inserts a Unicode LTR (Left to Right) mark. U+200E

<lo/>

Inserts a Unicode LTR override mark. U+202D

<le/>

Inserts a Unicode LTR embedding mark. U+202A

<rm/>

Inserts a Unicode RTL (Right to Left) mark. U+200F

<ro/>

Inserts a Unicode RTL override mark. U+202E

<re/>

Inserts a Unicode RTL embedding mark. U+202B

<pdf/>

Inserts a Unicode PDF (Pop Directional Format) mark. This is used to terminate any of the above directional override or embedding modes (<lo/><le/><ro/><re/>). U+202C

This element (<T>) is used to set the current text style. This is NOT an empty element. The text style will remain in effect for all text that is contained within the element and will override any text style specified in the paragraph element or the currently active paragraph style. If no text style element is output, or if text occurs outside this element, the text style specified in the paragraph or paragraph style will be used.

Text Style

<T st="string">...</T>

  • The name of a <TS> record in the sifx (required).
  • string is the name of the text style you wish to use.

Every attribute of a text style can be overridden individually by several Text Style Override Elements. These elements are NOT empty elements. Their attributes remain in effect for all text that is contained within the element. These elements are listed below:

Attributes

Element

Description

<b>

Turns bolding on/off.

<i>

Turns italics on/off

<s>

Turns superscript or subscript on/off

<u>

Turns underline on/off.

<o>

Turns strikeout on/off.

<ob>

Turns overbar on/off.

<ub>

Turns underbar on/off.

<c>

Turns all caps or small caps on/off.

<e>

Turns on a special effect like embossing, engraving, outline, or shadow.

<x>

Turns indexing on/off.

<rev>

Turns on/off revised text.

<h>

Turns on/off hidden text.

<f>

Changes the font.

<lex>

Changes the currently active lexicon.

<sz>

Changes the size of the text.

<tt>

Changes the current tag type.

<t>

Turns the general tagtype flag on/off.

<sw>

Turns the subword flag on/off. All text in this element will be indexed as a subword.

<cf>

Changes the foreground color.

<cb>

Changes the background color.

<cu>

Changes the underline color.

<co>

Changes the strikeout color.

<cob>

Changes the overbar color.

<cub>

Changes the underbar color.

<ch>

Forces hard characters. Characters with this style cannot be delimiters. See “Hard Characters Element” below.

Many of the above elements use similar attributes. We will explain each below in groups.

Attributes <b><i><x><rev><h><t><sw>

Name

Values

Description

val

on | off

This overrides the corresponding text style property and turns the property either on or off. By default, (or if this attributes is omitted) the property is turned on.

Attributes <s>

Name

Values

Description

val

super | sub | off

This overrides the script text style property and either turns off any scripting, or turns on superscript or subscript. By default, (or if this attributes is omitted) superscript is turned on.

Attributes <u><o><ob><ub>

Name

Values

Description

val

single | dash | dot | double | wave | off

This overrides the corresponding text style property and sets the line style accordingly. By default, (or if this attributes is omitted) the single line style is used.

Attributes <c>

Name

Values

Description

val

small | all | up | off

This overrides the caps text style property and sets the style accordingly. By default, (or if this attributes is omitted) small caps is used. The up style will show the text as small caps, but index the text as all caps.

all makes the word in caps for the text and WordWheel.

Attributes <e>

Name

Values

Description

val

outline | emboss | engrave | shadow | off

This overrides the effect text style property and sets the style accordingly. (required)

Attributes <f>

Name

Values

Description

fFace

String

Regular font face

fFaceSm

String

Font face used for a small format ETBU.

fFacePrint

String

Font face used for printing.

fFamily

decorative | default | modern | roman | script | swiss

Font family. (Not recommended).

fPitch

default | fixed | proportional | variable

Font pitch. (Not recommended).

fQuality

antialiased | cleartype | default | draft | non-antialised | proof

Output quality of the font for the text.

Attributes <lex>

Name

Values

Description

st

The name of a <LEX> record in the sifx.

This changes the currently active lexicon. Any words in this element will be indexed into that lexicon.

Attributes <sz>

Name

Values

Description

val

Either a measurement value, or a percentage, or one of the following:

s | v | t | L | V | H

These represent small, very small, tiny, large, very large, and huge sizes relative to the current base font size.

This is a typical measurement value or a percentage. If this is a percentage then the size of the text is calculated dynamically by a percentage of the current window size. This is useful for writing title pages.

Attributes <tt>

Name

Values

Description

st

The code defined in a <TAG> record in the sifx. Single Character

This changes the currently active tag type.

Attributes <cf><cb><co><cu><cob><cub>

Name

Values

Description

val

A color value

This overrides the corresponding color property.

Reference level elements <R/> are used to define a hierarchical structure to the file. This structure is used as an address in hyperlinks to properly position when a hyperlink is taken. Up to eight different hierarchy “trees” can be defined in a single document. These “trees” can be overlapping. For instance, you may want a section/sub-section hierarchy, or a book/chapter/verse hierarchy, or a page/paragraph hierarchy all in the same document. There is no way in XML to define these multiple overlapping hierarchies. Therefore, we have implemented these elements as empty elements. When a reader encounters one of these elements they must record the level and tree and keep this reference active until the next element for the same tree is found.

If you want to index the values of a level, please see the <LVL> element in the <sifx> section.

Attributes <R/>

Name

Values

Description

ref

A reference definition in the following format:

l,d[,t]:name

Where: l= Code for a <LVL> record.

Where: d= Number of the tree depth.

Where: t= The tree number (1-8).

Where: name= The name of the level.

This is the definition of this reference code. For instance, if we are defining a reference level for the title page of a document in the first reference tree we could define it as such:

ref=”S,1:Title Page”

The first paragraph in the title page could be:

ref=”P,2:1”

This assumes that <LVL> records were defined in the sifx that have the codes ‘S’ and ‘P’. Note that the level number for the title page is 1, while the level number of the paragraph is 2. This means that the paragraph is below (or part of) the title page in the hierarchy. Notice also that the tree number has been omitted. The default tree is always the first tree. Any text after the colon is the name of the level. This name may be omitted, however, if it is, then no text will be displayed in the table of contents for this reference level (required).

abrv

String

An abbreviation for the name of the level.

hide

yes | no

This attribute is used to keep reference codes of this type from appearing in the display of the text. It will not hide the reference code from citations or output operations.

hideOutput

yes | no

This attribute is used to omit reference codes of this type from citations in copy/paste/print operations. It will not hide them in the text.

internal

yes | no

This is the same as hide, except that this code and all it’s children are also not shown in TOC or Citations.

attr

A reference attribute definition in the following format:

a:name[;a:name]*

Where: a= Code for a <ATTR> record.

Where: name= The name of the attribute.

Each reference level can be given attributes defined by the author. These attributes consist of a type (defined in the sifx) and a name. For instance, you may want to give a “Topic” to a section:

attr=”T:Budget”

You may give more than one attribute:

attr=”T:Budget;S:President”

This assumes that <ATTR> records were defined in the sifx that have the codes ‘T’ and ‘S’ for possibly “Topic” and “Speaker” respectively. These categories are completely author defined and can be used to help limit or bound searches.

The <Re/> element is used to terminate the last level for a given tree. By default, once another reference element is encountered for a given level and tree, and other elements currently active for that tree which have a level greater than or equal to than the new element are automatically terminated. You may use this element to terminate a level manually.

Attributes <Re/>

Name

Values

Description

tree

A tree number (1-8)

This is the tree that should terminate the last reference level. Be default (if this attribute is omitted) the first tree is used.

There are several different types of hyperlinks:

  1. Cross reference hyperlinks. These are used to jump from one section of text to another. The destination may also be in a completely different document.
  2. Image hyperlinks. These are used to display a picture or image in a separate window.
  3. Shell hyperlinks. These are used to spawn a separate application, start an email, or open a web browser.
  4. DDE hyperlinks. These are used to control a second application.

The type of hyperlink can be determined from a code in the st attribute. Other attributes are used to define other properties of the hyperlink. Depending on the style of the hyperlink (either phrase or icon as defined in the <HLS> record in the sifx) the hyperlink is either an empty element (icon) or not (phrase). If the hyperlink is not an empty element, clicking on any text in the hyperlink element will execute the hyperlink.

Attributes <H/>

Name

Values

Description

st

The style of the hyperlink in the following format:

t:name

Where: t= X | I | S | D

Where: name= The name of an <HLS> record.

The type codes are defined as follows:

X= A cross-reference hyperlink

I= An image hyperlink

S= A shell hyperlink (calls the ShellExecute function in Windows)

D= A DDE hyperlink

(required).

file

A path to a file, usually relative to the location of the current document. The following macros may be used to specify additional paths:

%TEXT% - The current document path.

%PROGRAM% - The WordCruncher program path.

This is used to specify an external file in:

  • A cross-reference hyperlink (Optional)
  • An image file in an image hyperlink (required)
  • A file, web page, e-mail address, etc. for a shell hyperlink (required)

It is not used for DDE hyperlinks – (st=”X|I|S”).

fileAux

One or more paths to files using the same rules as the file attribute. Files are specified using the following format:

file[;file]*

Auxiliary files used to further define which book from a library should be used as the target. If the desired target is a Book Set, then this attribute can contain the other files in the set. The software will attempt to match each auxiliary file to other files in the book set.

This is only used in cross-reference hyperlinks (Optional) - (st=”X”).

cit

A forward slash (/) delimited string of reference names optionally prefaced by a tree number and optionally terminated by a word offset or a reference gap number:

[t:]/name[/name]*[ (:word[,subword [,tagword]]) | (#gap)]

This is the destination reference hierarchy path to position to. Examples might be:

cit=”/Section 1/Sub-Section 5”

cit=”2:/Page 1/Paragraph 3”

cit=”/Section 1/Sub-Section 5:3”

This is only used for cross-reference hyperlinks - (st=”X”).

citRng

One or more citation ranges delimited by semicolons:

cit[-cit][;cit[-cit]]*

Where cit is the same as in the cit attribute, without the optional tree number. Also, the cit can be a relative citation based on the citation in the cit attribute.

This is used to emphasize a range of words or references when a hyperlink is taken. Each citation can be a full citation as defined in the cit attribute (without the optional tree number). Or it can be a relative citation based on the citation given in the cit attribute. If it is a relative citation, the cit should NOT be prefaced by a forward slash, and can optionally have one or more “..” levels which will remove one child level from the base citation. For instance:

cit=”/Section/1” citRng=”1-2”

will emphasize /Section/1 through /Section/2

cit=”/Section/3/7” citRng=”6-../4/9”

will emphasize /Section/3/6 through /Section/4/9.

This is only used for cross-reference hyperlinks - (st=”X”).

idx

Number

Some image formats can contain multiple images. This is the index of the image in the file. The default is the first image.

This is only used with image hyperlinks - (st=”I”).

page

Number

Some images formats can contain multi-page images. This is used to specify a particular page.

This is only used with image hyperlinks - (st=”I”).

rect

This is a rectangle measurement value:

m:(x,y,w,h)

Where m=t|p|i|c

t->twips

p->points

i->inches

c->centimeters

Where x,y,w,h= Real numbers.

These correspond to the left, top, width, height dimensions of the source image.

This is used to crop the output of an image.

It is only used with image hyperlinks - (st=”I”).

op

String

This is an OLE verb such as “open” or “print”.

It is only used with shell hyperlinks- (st="S").

cmd

String

This is a user defined command string.

It is used with shell hyperlinks and DDE hyperlinks - (st=”S|D”).

If path represents a data file, this should be omitted. The system will find the appropriate executable to show the file if it has been properly registered.

path

A path. The same macros that were used for the file attribute can be used here.

This is a path to use.

It is only used with shell hyperlinks - (st=”S”).

Inline images are usually empty elements. If no file attribute is given for an inline image, then the text inside the element must consist of a base64 encoded image. No other text is allowed inside an inline image hyperlink.

Attributes

Name

Values

Description

file

A path to a file, usually relative to the location of the current document. The following macros may be used to specify additional paths:

%TEXT% - The current document path.

%PROGRAM% - The WordCruncher program path.

This is used to specify an external image file in an inline image (Optional). If this is omitted for an inline image then the text of the element must be a base64 encoded image.

idx

Number

Some image formats can contain multiple images. This is the index of the image in the file. The default is the first image.

page

Number

Some images formats can contain multi-page images. This is used to specify a particular page.

dim

The display dimensions of the image in a paired measurement value format:

m:(x,y)[,(x,y)]

Where m=t|p|i|c

t->twips p->points

i->inches c->centimeters

Where x,y= Real numbers.

The second optional numbers are for small format ETBU.

This is the logical output dimensions of the image. (required)

If one of the values is zero, the value will be automatically calculated from the image aspec ratio. If both values are zero, the image size will be calculated from the stored resolution of the image.

desc

String

Image description used in the ETGU.

Omit if no image data!

rect

This is a rectangle measurement value:

m:(x,y,w,h)

Where m=t|p|i|c

t->twips p->points

i->inches c->centimeters

Where x,y,w,h= Real numbers.

These correspond to the left, top, width, height dimensions of the source image.

This is used to crop the output of an image.

These elements are used to place small comments or notes above or below the main text in the document. It is commonly used to place furigana text in Japanese documents. The ruby text is positioned relative to any text that is between the begin element (<rt/>) and the end element (<rte/>). Both of these must be empty elements.

Attributes

Name

Values

Description

val

String

This is the ruby text to use. (required).

idx

yes | no

If the ruby text is included within one logical word, the base characters can be replaced with the ruby text character and the new word indexed as well.

disp

top | bottom | off

The location to place the text.

just

center | left | right

The justification of the ruby text in relation to the base characters.

sz

Percentage

Size of the ruby text based on a percentage of the base font. A comma separated second value may be included for use in small format ETBUs.

pos

Measurement value

Ruby text will be automatically placed above or below the text. If additional space is desired, this value can be used to raise or lower the ruby text.

st

String[63]

A text style upon which to base the ruby text.

lex

String[63]

A lexicon to place any indexed words.

This element can be used to change the default behavior of the word parse in the WordCruncher Indexer program. The word parser does a very good job at finding the appropriate boundary between words. However, there may be times that the Indexer selects a word boundary that is not optimal for a particular situation. Any text inside a <ch> element (including whitespace) will be part of the current word and will not delimit the word. Please note that a word may still be terminated by the end of a paragraph, change in lexicon, or any other markup (<tab/> for instance) that would otherwise place a physical break in the word. Likewise, the word may not be automatically terminated at the end of the element if it is not immediately followed by a delimiter.

This element is used to create phrasal groups of words which are displayed on the word wheel. Since formatting elements can be used within these groups, and the groups cannot be split up, these are implemented as empty elements. When the reader encounters a group beginning element <g/> this group must remain active until the corresponding group ending element <ge/> is found. The exclusion elements (<gx/> and <gxe/>) work similarly, except these are used to exclude words from the middle of a phrase.

Attributes

Name

Values

Description

idx

Number between 1 and 32767

This is the index of the group. This is used to match up <g/>-<ge/> and <gx/>-<gxe/> elements. If this is omitted, the index defaults to zero. This is useful if nested groupings are desired.

lex

String

A lexicon name in which to place the indexed phrase. This attribute is not used for the <ge/>, <gx/> and <gxe/> elements. If this is omitted, the lexicon will default to either the lexicon specified in the <GRP> element in the sifx, or to the active lexicon when the element was first encountered.

pos

Part of Speech Code

The part of speech used for the whole phrase group, if applicable.

This element is used to include another file. If the included file has its own <sifx> element, it will be ignored. Consequently, indexer warnings/errors arise when the included file has SIFX elements that the main ETAX file does not have defined in its SIFX element. This element cannot be a child element of a <p> or <ptbl> element.

Sample Include Tag

<include file="pathName"/>

<include file="include/file1.etax"/>

  • Embeds an exteral file into the ETAX. You can embed ETAX, RTF, TXT, or XML 2003 files.
  • pathName is a string of the included file's file path in relationship to the main ETAX file.

Part of Speech Tagging

WordCruncher has the ability to attach a part of speech code to each word and include it with the word information in the index (WordWheel). These part of speech codes are built with predefined tags (see below for a chart of tags) which can be organized in a hierarchical manner. For instance, the part of speech code for a singular, common noun can be written with the three tags: “n.comm.sing”. While WordCruncher provides a rich set of predefined part of speech tags, there are occasions where they may be insufficient to fully describe a part of speech code. In these cases, some user defined tags can be added.

The Indexer does not currently tag texts for part of speech. It is recommended that you use an external part of speech tagger like Stanford NLP or TreeTagger to tag your texts. Other resources like NLTK, Stanza, or Spacy with Python are also useful for tagging texts. These part of speech taggers produce abbreviated markup codes like NN1 for singular nouns and NN2 for plural nouns. WordCruncher can also accept these markup codes for parts of speech, but an XML file called the EPOSX is needed to define how these markups are translated to WordCruncher part of speech codes.

Below are a few example EPOSX files for download. They can be used or modified to fit your needs.

C5.eposx

TreeTagger-DE.eposx

stanza-en.eposx

It is encouraged that you use one of the EPOSX files from our GitHub to create your custom EPOSX. However, here is the basic template needed for the file.

<eposx name="C5" wordSeparator="_" ambiguitySeparator="-">

<userTags>

<tag code="wh" name="Wh- Words" />

</userTags>

<markups>

<markup text="AJC" code="adj.comp" />

</markups>

</eposx>

Attribute Example Values Description
title C5
C8
Stanza-English
Give a name to the type of part of speech, preferably in reference to the name of the tagger used.
wordSeparator _ (Default)
/
The character that is used to mark a word with the part of speech. If an underscore is the word separator, then your text should look like “word_NN0”.
ambiguitySeparator - (Default)
?
This character is used within part of speech codes. Taggers may provide two parts of speech if there is ambiguity. If a hyphen is used for the ambiguity separator, then your text should look like “swimming_NN0-VBG”.

WordCruncher books are prepared with an underscore as the word separator and a hyphen as the ambiguity separator, as shown below:

<eposx title="C5" wordSeparator="_" ambiguitySeparator="-">

<eposx title="C8" wordSeparator="_" ambiguitySeparator="-">

WordCruncher has predefined tags that must be the first tag listed in each part of speech code. We call these “Primary” tags. Primary tags can also be used as “Secondary” tags if needed. For example, “n.adv” can be used to mark an adverbial noun. Below are listed the Primary tags. Use the “Tag Code” when defining a part of speech code. The “Tag Name” is given merely for convenience and readability.

Name Code Abbreviation
adjective adj j
adverb adv r
alphabet alph z
article art a
circumposition circ cp
classifier clf cl
clitic clitic ct
conjunction conj c
determiner det d
existential exist ex
interjection interj !
noun n n
null null nl
numeral num #
other oth o
postposition postp pp
preposition prep p
pronoun pron pn
particle ptcl pt
punctuation punct *
suffix sufix +
unclassified uncl u
verb v v

Secondary tags are categorized in this table by the grammatical category that are usually associated with them. However, they can be used as any category based on your part of speech schema. It is recommended that only one tag per category be used when defining a part of speech code. User tags can be added to this list of secondary tags if further defintion is required.

Name Code Abbreviation
Verb
lexical lex lex
auxiliary aux aux
modal mod mod
semiauxiliary semiaux semiaux
Noun
common comm c
proper prop pr
Common Noun
unit unit unit
direction dir dir
temporal temp temp
Noun Action
subject sbj sbj
object obj obj
Adjective and Adverb
comparative comp comp
superlative superl sup
evaluative eval eval
positive pos pos
negative neg neg
attributive attr attr
predicative pred pred
degree deg deg
Numeral
cardinal card card
ordinal ord ord
fraction frac frac
Conjunction
coordinating coord coord
subordinating subord sub
correlative corr corr
Pronoun and Determiner
definite def def
indefinite indef indef
demonstrative dem dem
exclamative excl excl
interrogative interrog interrog
personal pers pers
reflexive refl refl
irreflexive irrefl irrefl
relative rel rel
substitutive substit subst
independent indep ind
Tense
conditional cond cond
future fut fut
past past past
present pres pres
Aspect
imperfect imperf impf
perfect perf pf
progressive prog prog
aorist aor aor
pluperfect pluperf plpf
State
absolute absol absol
construct constr constr
Voice
active act act
passive pass pass
middle mid mid
causative caus caus
Mood
indicative indic indic
imperative imp impv
subjunctive subj subj
optative opt opt
infinitive inf inf
finite fin fin
gerund ger ger
participle ptcp pple
jussive juss juss
cohortative hort cohrt
Person
firstperson 1 1
secondperson 2 2
thirdpersion 3 3
Case
nominative nom n
genitive gen g
possessive poss poss
dative dat d
accusative acc a
locative loc loc
vocative voc voc
instrumental instr instr
absolutive abs abs
ergative erg erg
Gender
masculine masc m
feminine fem f
neuter neut nt
universal univ univ
Animacy
animate anim anim
inanimate inanim inan
human hum hum
nonhuman nonhum nonhum
Number
singular sing s
plural pl p
dual dual du
Honorifics
formal forml forml
informal inform inform
polite pol pol
royal roy roy
Other
abbreviation abbr abbr
contraction contr contr
foreign foreign forn
headline head head
title title title
marker mark mark
pronominal pronom pronom
truncated trunc trunc

In the <userTags> section of an EPOSX, a user can add up to 64 tag elements that are not specified in secondary tags list. Codes must have no spaces, word separator character, or ambiguity separator character.

Individual tags need to be in an empty <tag/> element. There are two attributes for this tag: code and name.

TABLE

Below are some examples of tags that you can make.

<userTags>

<tag code="wh" name="Wh-words" />

<tag code="that" name="That" />

<tag code="of" name="Of" />

<tag code="be" name="Be verbs" />

<tag code="do" name="Do verbs" />

<tag code="have" name="Have verbs" />

</userTags>

Part of Speech Markups

The <markups> element is the XML element where all markups defined by external taggers are given WordCruncher part of speech code equivalents using the <markup/> element. This element has two required attributes: text and code.

TABLE

Example Text

With the EPOSX prepared correctly, the text in the ETAX needs to follow the character separators. An example paragraph should look like this:

<p>The_AT quick_JJ brown_JJ fox_NN1 jumped_VVD over_II the_AT lazy_JJ dog_NN1.</p>

Alternatively, a user can use the default WordCruncher markup for tagging text. If done this way, you won't need to add anything else to the <EPOSX>.

<p>The_art quick_adj brown_adj fox_n.sing jumped_v.past over_prep the_art lazy_adj dog_n.sing.</p>

Available Features: Search features and the Neighborhood Report are available for part of speech books. Other reports, such as the Phrase Compare Report, will be updated later to use POS functionality.

Punctuation: Normal punctuation that is not defined in as a word separator or ambiguity separator will not be used for parts of speech. That means that it’s okay to have punctuation right after a part of speech tag without adding a zero space <zs/> tag before punctuation.
For example, if a sentence ends with “dog_NN1.”, then the period at the end will not be indexed as part of the POS tag.

If the word separator or ambiguity separator are part of words, then a <ch> tag should be wrapped around these characters. For example, my_function_NN0 will cause a problem because the underscore initiates the part of speech. Since function_NN0 is probably not the desired POS, this should be modified to my<ch>_</ch>function_NN0.

Attributes, Tag Words, Reference Levels: Any text that is in an attribute, tag word, or reference level will not be indexed for POS tags.

The Translation Table Element <ettx></ettx>

The ETTX tag is used for synchronizing two versions or translations of the same book. Books that use the ETTX tag in the WordCruncher Bookstore include The Scriptures, Quran, and TED Talks. An ETTX tag is also necessary when making footnotes for a book unless they are combined in a BookSet.

You should be intimately familiar with the interaction between the <LVL/> tags, <R/> tags, and <TREE/> tags before trying to synchronize books.

When using an ETTX tag between books, you must remember to:

  1. Use the same LVL tags between both books within the SIFX. If both books have chapters, then it would make sense to have <LVL code="c" name="Chapter" plural="Chapters"/> in each SIFX.
  2. Keep the table of contents similar for each book. The more similar your table of contents, the simpler the ETTX tag will be. If the tables of contents are the exact same for each book, then your ETTX will look like this:

Basic ETTX Example

<ettx>

<table id="{f0a35742-45c3-4f96-966b-8f42f87011ef}" name="Tree Name" tree="1"/>

</ettx>

  • The id is a UUID. Generate a UUID:
  • The name is the same as the name value from your TREE tag.
  • The tree is the index number of the TREE for which you’re synchronizing. Refer to the number in your TREE tag. This usually will be 1, unless you are using a secondary TREE solely for the purpose of synchronization.

If your table of contents are not the same, you will need to map the differences between the two of them. This is done by adding MAP tags between each tag that is different.

  • Identify the 1-letter code from the matching <R/> tag.
  • Create a LEVEL tag that has the attribute type. The value of attribute should be the 1-letter code from the <R/> tag.
  • Within the LEVEL tag, create a MAP tag that has two attributes: sync and src. Sync refers to the primary book’s reference text, and src refers to the secondary book’s reference text.
  • Add a MAP tag for each reference level that has differing text.
  • Table of Contents from The Scriptures

    1. Identify the 1-letter code

    The <R/> tags for each book looks like this:

    <R ref="V,1:Holy Bible"/>

    <R ref="V,1:Santa Biblia"/>

    Both of these <R/> tags start with V. This is the 1-letter code we’ll need.

    2. Create a LEVEL tag

    The LEVEL tag has one mandatory attribute: type. Add the 1-letter code as the value of attribute.

    <level type="V"> … </level>

    3. Create a MAP tag within the LEVEL tag

    To make a map between the Holy Bible and Santa Biblia, the MAP tag will look like this:

    <map sync="Holy Bible" src="Santa Biblia"/>

    It should be a child of the LEVEL tag, so place it within the <level type="v"> tag:

    <level type="V">

    <map sync="Holy Bible" src="Santa Biblia"/>

    </level>

    4. Add a MAP tag for each reference level

    Repeat step 3 for each reference level in the table of contents, assuming each one has differing text.

    <level type="V">

    <map sync="Holy Bible" src="Santa Biblia"/>

    <map sync="Old Testament" src="Antiguo Testamento"/>

    <map sync="New Testament" src="Nuevo Testamento"/>

    </level>

    Note that a separate LEVEL tag should be made for each 1-letter code that needs to be mapped. For example, the books of The Scriptures, (e.g. Genesis, Exodus), have the code B (for Book). That LEVEL tag looks like this:

    <level type="B">

    <map sync="Genesis" src=" Génesis"/>

    <map sync="Exodus" src=" Éxodo"/>

    </level>

    If the translation table is updated after publishing the book to the Bookstore, specific attributes must be changed on the ETTX tag before a new version of the book is added in the Bookstore.

    This section needs to be finished!

    The Morphology Table Element <emtx></emtx>

    This element is identical to the contents of the Morphology Table File described in the Example Files section. If the emtx attribute of the <etax> element is omitted, this element is used.

    Morphology Table File (EMTX)

    This file defines the optional morphology tables used to search for word attributes (or word tags). All values of name and code attributes are free form based upon the content of the target book. <title> elements are merely organizational. Below is an abbreviated example file:

    <?xml version="1.0" encoding="utf-16" standalone="yes"?>

    <emtx>

    <tree name="Hebrew Scripture Word Tags" lex="English (Word Tags)" id="XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX">

    <title name="Language">

    <item name="Hebrew (Heb)" code="Heb" />

    <item name="Aramaic (Aram)" code="Aram" />

    </title>

    <title name="Grammatical Tags">

    <item name="PARTICLE" code="PARTICLE">

    <item name="article (art)" code="art" />

    <item name="conjunction (conj)" code="conj" />

    <item name="adverb (adv)" code="adv" />

    ...

    </item>

    <item name="PRONOUN" code="PRONOUN">

    <item name="interrogative (interr)" code="interr" />

    <item name="independent (indep)" code="indep">

    <title name="Person">

    <item name="1st Person (1)" code="1" />

    <item name="2nd Person (2)" code="2" />

    <item name="3rd Person (3)" code="3" />

    </title>

    ...

    </item>

    </item>

    <item name="NOUN" code="NOUN">

    <item name="proper" code="proper" />

    <item name="common (comm)" code="comm">

    <title name="Gender">

    <item name="masculine (masc)" code="masc" />

    <item name="feminine (fem)" code="fem" />

    <item name="both" code="both" />

    </title>

    ...

    </item>

    </item>

    ...

    </title>

    <title name="Other Tags">

    <item name="Word Separator" code="/" />

    <item name="Paragraph" code="¶" />

    <item name="Qere without Ketiv" code="Qere-without-Ketiv" />

    </title>

    </tree>

    </emtx>