Importing a Simple CSV File

If you have terminology data in a simple format CSV file, you can import this data into a WorldServer term database. You may need to align the data in your spreadsheet with conventions that WorldServer understands. Any style and usage notes embedded in the translated text should be deleted or moved to a notes column, where they can be imported into an attribute.

The Delimited File: Simple Format import performs the following steps:
  • Imports each row with content as a term entry in the term database.
  • Each column corresponds to either a WorldServer locale or a term entry attribute, depending on choices made before importing.
  • Duplicate detection: WorldServer only compares terms and attributes in an entry for which there is a corresponding column in the glossary being imported. If they match, the row is treated as a duplicate even if the term entry has a superset of the terms in the import data.
  • A cell that consists only of non-word characters (for example, punctuation) will be treated as if it were empty.
  • Ignores empty cells.
  1. From the term management screen at Tools > Term Databases > Term Database: <TD>, click Import TD....
  2. Select the Delimited File: Simple Format option, the CSV file to import, the encoding, term and field delimiters, and whether to use each term's existing status or to assign a selected status to every term.
    • Import file – The file being imported must adhere to the delimiters you specify (see below).
      Note: In releases prior to WorldServer 9.0, WorldServer required that simple format files be comma-delimited. You can now import simple format files that have other field delimiters besides commas.
    • Encoding – This is the encoding that will be applied to the imported terms.
    • Delimiter –
      • Term Delimiter – A term delimiter is a single character that separates two or more terms that occupy a single field in the Delimited File. If two or more terms occupy the same field then it is assumed that they belong to the same language. The term delimiter is optional. By default, there is none and there is only one term per field. The supported term delimiters are none (the default), newline, &, @, %, #, *, |,, !, $, and ^.
      • Field Delimiter – A field delimiter character is a single character that serves as a marker to separate column fields. The supported field delimiters are ,, tab, space, |, ;, and :.
      The term and field delimiters may be the same or different characters. In the case where they are the same character, it is assumed that the object responsible for writing the delimited file data will use double quotes to handle these embedded characters.
      For example, if the | character is used as both the term delimiter and the field delimiter, then the delimited file should look like this:
      English (United States)|French (France)|Context
      
      Hello|"Bonjour|Allo"|A simple greeting
      Goodbye|"Au revoir|A bien tot"|Parting words

      If a term delimiter is contained in term text, use a backslash (\) as an escape character.

      You may also define a field delimiter. This is useful if you need to create a tab-delimited file from Excel using Unicode. Excel cannot write out Unicode content to a CSV file. Therefore, you cannot use Excel to generate a reasonable CSV file. Excel can, however, write Unicode content out to tab-delimited files.

    • Import status – Choose whether to import each term's existing status or assign a selected status to every term. For details, see the topic on "Importing Term Statuses."
  3. Click Next >>. WorldServer displays a partial preview of the spreadsheet's first rows.
  4. In the Import Delimited File: Simple Format into Term Database <TD> (Step 2 of 3) page, select the row that contains the column headers of the import grid. Click Next >>.
    WorldServer reads the column headers and displays the header matching page (Import Delimited File: Simple Format into Term Database <TD> (Step 3 of 3)).
  5. In the Import Delimited File: Simple Format into Term Database <TD> (Step 3 of 3) page, match the column headers to your terminology database's languages and term and term entry attributes.
    In this page, the cells of the header row of the delimited file have been converted into rows with three columns: Column (containing the languages and attributes), Type, and Setting. WorldServer tries to guess a column's language or attribute based on the column name.

    You can match the Column rows to the corresponding languages, term entries, and term attributes in your term database. For language rows, you simply select Term for Type, then the language in the Setting column drop-down.

    For entry attribute rows, you select Entry Attribute for Type, then the attribute in the Setting column drop-down. You can also create a new entry attribute by selecting (create new) then clicking the Next button, which causes an entry attribute editor to display.

    For term attribute rows, you select Term Attribute for Type, then the attribute and its language in the Setting column drop-downs. You can also create a new term attribute by selecting (create new) then clicking the Next button, which causes a term attribute editor to display. If you select Term Attribute in the Type column, Language and Term Attribute drop-down menus appear in the Setting column.

    Here is an example of this result:
    Figure 1. Term Attribute Causing Language and Term Attribute Drop-downs to Display
    All unrecognized columns are initially be of type Ignore, meaning WorldServer ignores them on import.

    The Entry Attribute and the Term Attribute value in the Setting column default to (create new). Continuing the example from the previous step, if you do not have an existing attribute to use for the "Notes" field (which was picked up from the file we are importing from), you can create one by clicking Next >>. When you do so, the following dialog displays:

    Figure 2. Creating a Term Attribute
    Not all attribute types are offered: the Multi Selector and Attachment attribute types, for example, do not make sense in the context of the glossary grid format.
    The following attribute types are available:
    • Boolean: – Recognized values are true and false.
    • Comment: – Produces a box in which to enter comments.
    • Date:– Requires the format dd.MM.yyyy - HH:mm:ss format.
    • HTML: – Causes HTML you enter to render.
    • Integer: – The cell content is interpreted as an integer.
    • Selector: – The cell content must match one of the valid selector values configured for that attribute.
    • Text area, text field: – The cell content is interpreted as plain text.
    • URL:– Causes URL formats you enter to be active.
    • User: – The cell content is interpreted as a user name. It must be a valid user.

    Because, in this example, we have not selected a language for the "Notes" term attribute, when we click Import, we get a message telling us to do so:

    Figure 3. Error due to not selecting a language for a term attribute
  6. Click Import. WorldServer imports the delimited file.
The results page displays the number of term entries imported as well as any errors that may have been encountered.