Edit/Add/Copy Rule page

Use the parser rule settings available on the Add Rule, Edit Rule and Copy Rule pages to define the properties of the HTML parser rules. These settings help WorldServer sort better translatable from non-translatable text and display correctly the content extracted from the HTML documents.

Click Add..., Edit... or Copy... on the Parser page to open the Edit Rule, Add Rule or the Copy Rule page where you can configure the properties of the HTML rules.
About Rules, Attributes and Conditions
Each element in HTML documents can have specific attributes. Attributes give additional information about the HTML element. For example, the <a> element indicates a hyperlink. An "href" attribute applied to the <a> element adds a web address to the hyperlink. Similarly, a "title" attribute adds a ToolTip which is visible when hovering with the mouse over the hyperlink. The content between <a> (the opening tag) and </a> (the closing tag) define the title of the hyperlink.
The image below shows an example of an <a> element and its components:
Usually, you would want WorldServer to extract and allow you to translate the title of the hyperlink and its ToolTip. In contrast, you would not normally want WorldServer to allow you to edit the address of a hyperlink because altering this address may break the link. These conditions are specified as default settings for the <a> element on the Parser Rules page. However, there may be situations when you would want to translate the address of the hyperlink because the link in your translation should point to the localized version of the website which has a different address. Also, sometimes you may not want to translate the ToolTip of the hyperlinks, for example when the ToolTip shows a numeric value.
For such situations, when you do not want WorldServer to apply the default settings or when you want to teach WorldServer to deal with elements not mentioned on the parser rules list, customize the existing parser rules.
You modify a rule by editing its attributes and conditions. The next time that you open an HTML document, WorldServer will no longer apply the default settings for the HTML element with the customized parser rule.
Rule section Description
Name The name of the element for which you are modifying the parser rule. For example, the name of the rule which affects the <a> elements of HTML documents is named a.
Note: The Parser rules are case insensitive. This means that WorldServer will consider 'TITLE' and 'title' to be the same.
Conditions The conditions which define the extraction settings. Specify under which conditions should WorldServer extract the content inside the selected element. For example, you might modify the a rule so that WorldServer will extract the content from an a element only if the a element is placed inside a text paragraph written in English. To do this, create a condition that will check the language of the paragraphs and the location of the a element inside the structure of the HTML documents.
  1. Select the a rule from the Parser rules list and click Edit... .
  2. On the Edit Rule page, click Edit next to the Conditions box.
  3. Select the <a> tag from the Element Context box and click Add Element…
  4. Type p in the Element name box and click OK to close the Select Element page. WorldServer will now only look for I elements located under a <p> element.
  5. Click Add Attribute... to add an attribute condition for the <p> element.
  6. Type language="en" in the Attribute field and type true in the has value field.
  7. Click OK to add the language attribute to the paragraph element.
  8. Click OK again to close the Element Conditions page. WorldServer will now extract any content inside <a> elements only if they are located inside a paragraph written in English.
Attributes section
Option Description
Attributes The localization setting which determines whether the attributes of an element becomes editable after extraction. Specify which of the attributes that could define the selected HTML element should be extracted as editable text in WorldServer and which attributes should be extracted as non-editable text. For example, for situations where you do not want to translate the ToolTip of a hyperlink, change the Translate property of the title attribute inside the a rule:
  1. Select the a rule from the Parser rules list and click Edit...
  2. Select the title attribute from the Attributes list and click Edit. This displays the Edit Attribute window for the title attribute.
  3. Make sure that the Translate attribute checkbox is cleared and click OK.
  4. Click OK on the Edit Rule page to save your changes.

The next time that you open an HTML document, WorldServer will extract the ToolTips from hyperlinks but will not allow you to edit them.

Note: Element attributes for which you do not specify a localization attribute appear as non-translatable in the Editor.
Properties section
Option Description
Translate

The localization setting which determines whether the content of the selected element becomes editable after extraction. Specify if WorldServer should allow you to translate in the Editor the content extracted from the selected element.

You can set the Translate property to one of the following options:
  • Always translatable—You can edit the content extracted from the HTML element.
  • Translatable (but not in protected content)—You can edit the content extracted from the HTML element unless the HTML element has a protected content value inherited from its parent.
  • Not Translatable—WorldServer extracts and displays the content of the HTML element but does not allow you to edit it.
Note:
  • WorldServer does not display not translatable HTML elements if you assign them a Structure tag type.
  • WorldServer extracts as translatable any content it finds inside HTML elements which are not subject of a parser rule.
Whitespace

The setting which defines how WorldServer deals with any extra whitespace characters it finds in the translatable content extracted from the selected HTML element.

Specify if you want WorldServer to keep or remove extra whitespace. To edit the settings for the whitespace in non-translatable content and in element attributes, use the Whitespace in tags option on the global Whitespace page.

Set the Whitespace property to one of the following:
  • Inherit from parent—The content extracted from the HTML element uses the same whitespace setting as its ancestor.
  • Always preserve—WorldServer never keeps whitespace as it is.
  • Normalize unless xml:space='preserve'—WorldServer replaces whitespace with a single space unless the element includes xml:space='preserve'
  • Always normalize—WorldServer always replaces whitespace with a single space and ignores any xml:space='preserve'' attribute.
Tag Type The settings which control how the HTML elements are displayed in the Editor. HTML elements are extracted and shown in the Editor as tags. The translatable content inside the elements is displayed as editable text.
Tags can be displayed as:
  • InlineInline tag show formatting information and the translatable content extracted from the HTML element is available for editing.
  • StructureStructure tags usually contain information about the structure of the HTML document. Only translatable attributes inside structure elements are displayed in the Editor.
Note: For more information on Inline tags and Structure tags, see the WorldServer–Studio integration documentation.
Segmentation Hint (applicable to inline tags)

Segmentation hints help WorldServer better segment the HTML document when converting it to a translatable format. Segmentation hints determine if WorldServer will position the element within a segment, outside of the segment or whether it will force a segmentation break.

Set the Segmentation Hint to one of the following:
  • Include with text—The tag is displayed with the HTML content when it has leading or preceding text. Example: for tags that specify a footnote marker, you will need to attach the marker to another word in the same sentence. Therefore, the tag should be included as part of the text.
  • Include—The tag will be displayed in the segment, even if it has no associated text.
  • May exclude, Undefined—The Editor determines whether the tag is part of the text.
  • Exclude—WorldServer will, where possible, use the tag or tag pair to segment the text. For example, if <p>...</p> or <br> tags are marked Exclude, then if an HTML document includes embedded HTML code, the HTML tags <p>...</p> and <br> will be used to segment the document. This segmentation is additional to the segmentation that is already applied to the embedding HTML code.
Formatting

The settings which define how the content extracted by the parser looks like in the Editor. Click Edit and select one of the following options for each of the available styles:

  • Inherit—Applies the style that is specified for the parent, if there is such a setting.
  • Activate—Applies that style to the text.
  • Deactivate—Does not apply that style to the text.

The Sample box shows a preview of how the text extracted by the rule looks in the WorldServer Editor.

Structure Information Properties section
(applicable to structure elements)
Structure information enables you to add additional context information to structure elements. You can then view this information in the Document Structure column and in the Document Structure tree available while in Editor view.
Click Add or Edit to define the following settings for a structure element:
Type of element Setting Description
Standard Offers a list of standard HTML structure elements with predefined context information. Choose Custom if you want to create your own element and customize its context information.
Custom For custom elements you can specify the following properties:
Purpose
Document Explorer Select what information is displayed in the Document Structure tree in the Editor view. You can choose to display only the name of the element, the entire content of the element or no information at all.
Name Specify a name for the element. By default, WorldServer also uses this name for the Code and Identifier fields but you can edit them if you want to use different names instead.
Description Specify a description for your element. The description is displayed in the Additional Information column of the Document Structure Information dialog box.
Color Specify the background color for displaying the element in the Document Structure column and in Document Structure Information dialog box.
Formatting Specify the font, size, color and style for displaying the content of the element in the Editor view. You can choose to inherit the formatting from the element's parent or to activate/deactivate a certain style.