You can configure the settings for WorldServer file formats in
. You can have multiple file type configurations for each format.File type configurations customize two of the main jobs performed by file types: segmenting and recomposing.
When an asset is segmented, the file type has two basic considerations for determining when to create a new segment: markup and "delimiters." In Browser Workbench, you only see the source segments that are marked for translation by default.
WorldServer processes markup first, segmenting the asset, before the delimiters are processed.
To determine what to present in translation segments, file types take into account things like markup, formatting attributes, and metadata.
In some cases, as for HTML, you can add user-defined elements and attributes, and specify the conditions in which they can be translated.
After an asset is segmented, if it is associated with a translation memory, it is leveraged against that TM, and a match score is assigned to each segment. By default, all text segments are displayed when you open the asset in a workbench. However, you can narrow the view of which segments are presented by a view category like "All except ICE and 100%", "All non-translated", "All with comments", or "All pending review". These "views" are sometimes referred to as filters because they filter out data. However, they should not be confused with the file types that are applied during segmentation.
After the file type processes markup, it further segments the asset, looking for "sentence breaking" delimiters.
When you open a text file, which has minimal markup, segments basically are sentences, delimited by periods (or question and exclamation marks). When the file type comes to one of these, it ends the segment. The Text File Type also lets you specify structure and inline patterns to use.
If the file contains markup in addition to text, the segmentation process first segments based on the markup, then it makes another pass based on the "sentence breaking" delimiters. For example, the HTML 4 File Type extracts everything in a paragraph (<p>) element first, then breaks up sentences in the paragraph if it contains more than one sentence.
File types offer control over how formatting encodings such as entities are handled.
In the entities example, a configuration option (in the XML File Type family and HTML 4 File Type) lets you "register" entities. If you register an entity it will always be presented as a character (for example, "<"). To have it presented as an entity (for example, "<"), you should not register it. You can also control how these entities are handled when you save the asset. See the "XML Entity Conversion Settings" help topic for more information on handling entities.
WorldServer also offers control over how the segments should be handled when it recomposes the target segments into a formatted asset after you save the asset in Browser Workbench. The following are just some of the options handled by file types: