About the HTML File Type

HTML documents contain a lot of formatting, navigation and tagging information which you do not need to see or edit when translating them in WorldServer. WorldServer is configured to understand what content should generally be left out and what content should be presented to you for translation in the Editor. However, some of these general configurations might exclude important information or, on the contrary, include information that you do not want to translate.

To make sure that WorldServer extracts and allows you to edit only the information that should be translated from an HTML document, customize the general HTML settings that WorldServer considers when sorting translatable from non-translatable content.

You can configure some aspects of SDL file types in Management > Linguistic Tool Setup > Filter Configurations.

WorldServer applies the HTML file type settings to all *.htm, .*html, *.xhtml, *.jsp *.asp, *.aspx*.ascx, *.inc, *.php, *.hhc, *.hhk file extensions.

Note: Depending on the filter you are configuring, some of the features may not be available to you.
The customization settings are available on these pages, each dealing with a different aspect of processing HTML pages:
  1. HTML Detection—detects the documents to which WorldServer should apply the HTML file type settings.
  2. HTMP Parser—checks each HTML element in the document against your specified conditions to determine what content should be extracted for translation.
  3. HTML Writer—how WorldServer saves the target HTML file.
  4. HTML Entities—deals with how to display character entities found inside HTML elements.
  5. HTML Whitespace—handles the extra whitespace characters found inside the HTML elements.
  6. HTML Preview—handles adding style sheets for previewing HTML documents.