Filter configuration and segmentation

You can configure the settings for WorldServer file formats in Management > Linguistic Tool Setup > File Types. You can have multiple file type configurations for each format.

Each file type is designed to recognize what is translatable text and what is formatting in an asset, and to handle the formatting, subject to certain configuration options. The details in this topic can help you obtain the segmentation you need for translation, and the proper format after saving.
Note: The Binary File Type is not configurable.

File type configurations customize two of the main jobs performed by file types: segmenting and recomposing.

Decomposing the Asset: Segmentation

When an asset is segmented, the file type has two basic considerations for determining when to create a new segment: markup and "delimiters." In Browser Workbench, you only see the source segments that are marked for translation by default.

Figure 1. Segments for Translation

You can enable "Show Markup" in the workbench to display the markup segments, which are hidden by default. "Show Markup" is not available for all file types. In cases where it is not, the markup is verbose and would not provide useful information.
Figure 2. Segments with Show Markup Enabled

WorldServer processes markup first, segmenting the asset, before the delimiters are processed.

Decomposing the Asset: Formatting Encodings

File types offer control over how formatting encodings such as entities are handled.

In the entities example, a configuration option (in the XML File Type family and HTML 4 File Type) lets you "register" entities. If you register an entity it will always be presented as a character (for example, "<"). To have it presented as an entity (for example, "&lt;"), you should not register it. You can also control how these entities are handled when you save the asset. See the "XML Entity Conversion Settings" help topic for more information on handling entities.

Recomposing the Asset: Saved Targets

WorldServer also offers control over how the segments should be handled when it recomposes the target segments into a formatted asset after you save the asset in Browser Workbench. The following are just some of the options handled by file types: