TBX Conventions

The TBX (TermBase eXchange) format is the leading standard for open interchange of terminological data. TBX is well-defined XML and supports blind interchange—the importer does not need to contact the data's originator to interpret TBX data. WorldServer offers both import and export of TBX files.

TBX Metadata. The TBX equivalent to a WorldServer term or entry attribute is the data category. WorldServer converts all TBX data categories (transac, admin, descrip, note, and so on) into attributes, where possible. Data category types and values are mapped to the names and values of WorldServer attributes. WorldServer matches the TBX data category names (for example, definition, subjectField) directly to a terminology database attributes' internal API names. Where matching attributes do not exist in WorldServer, you must either create them on the fly or specify that WorldServer ignore them.

TBX allows data categories to apply to many levels; however, WorldServer only supports data categories at the term and entry levels. Data categories in other levels are dropped on import. The data categories that are dropped include:
  • Language level: Metadata can be attached to a langSet which can have multiple terms. The language level sits between the term entry and term levels in WorldServer.
  • Term note level: A termNoteGrp contains a term note, which is information associated with a term. However, termNoteGrp can also contain a full auxInfo set of metadata. This is something like attributes on an attribute and has no WorldServer equivalent.
  • Term component list level: TBX can break up a term into its components, typically its words. This list of term components can have metadata associated with it. WorldServer has no equivalent to this term component list.
  • Term component level: Each term component in turn can have metadata attached to it. This is a sub-term level of granularity that has nothing equivalent in WorldServer.

Data category values are either plain text or picklist types. The picklist data type maps to a WorldServer selector attribute type. Other data types map to text fields by default. TBX also specifies links and embedded binary data; however, WorldServer does not support these.

WorldServer specifies system attributes as follows:
  • A transac type of origination determines creation information, whereas the modification type determines modification information.
  • A transacNote type of responsibility identifies the user involved.
  • A date element identifies the date and time of the term's or entry's creation or modification.

The TBX specification does not explicitly identify a date format. Most, but not all, TBX samples use dates of the form YYYY-MM-DDTHH:MM:ss. WorldServer parses the date for such transactional data in this format or a similar format without the hyphens. If WorldServer cannot parse the date, it substitutes the current date and time.

WorldServer has many attribute types for which there is no TBX equivalent. These are output as plain text:
  • User attributes are imported and exported as user names.
  • Multi-value (list, multi-select) attributes are imported or exported as comma separated values. If a value has an embedded comma, this value is escaped URL-style as %23.
  • Date attributes use the YYYY-MM-DDTHH:MM:ss format.

TBX Languages. TBX uses ISO language codes (en, de, fr, and so on) to specify languages, so there is no ambiguity when using language names. However, because it is possible to specify language prefixes without country codes, a language specification can match multiple languages. For example, en can match en_EN, en_US or en_UK. WorldServer offers an appropriate choice of matching languages (en_EN, en_US, or en_UK for lang="en") if there is any country ambiguity.

XCS File. XCS (eXtensible Constraint Specification) is the TBX way of specifying constraints on metadata. It lists languages, data category names and permissible picklist values for a termbase file. In WorldServer terms, XCS lists the languages, attribute names and selector attribute values of a term database. TBX specifies a default XCS file with data category information based on a selection from ISO 12620, which in turn attempts to define a standard list of data category names and definitions. Having a substantial standard list to choose from gives terminology users a common vocabulary, greatly improving interoperability and blind interchange. WorldServer takes advantage of this common vocabulary where possible.