Spellex SDK for JavaTM - Technical Support

 

Product: Spellex Spelling Checker Engine Java SDK, version 5.8 and later

Problem: When I use HTMLStringWordParser, any misspelled words after a "&" or "<" character in the text are not detected.

Discussion: This behavior is by design. HTMLStringWordParser expects the string to contain correctly formatted HTML. In correctly formatted HTML, the "&" and "<" are special characters (known as meta-characters). The "&" character is used to signal the beginning of an HTML character entity, such as "&copy;" for a copyright symbol. The "<" is used to signal the beginning of an HTML markup, such as "<b>" for boldface. When HTMLStringWordParser sees these characters, it expects either a character entity or markup to follow. More accurately, it begins skipping text until it encounters the terminator of the character entity or markup. The terminator of a character entity is ";", and the terminator of a markup is ">". Thus, the appearance of "&" in the text acts like a switch that causes text to be skipped until a ";" appears (similarly for "<" and ">"). This behavior is documented in the JavaDoc documentation for HTMLStringWordParser.

Beginning in version 5.10, HTMLStringWordParser will skip text when a "&" or "<" character is encountered until the corresponding terminator or any white space character appears, whichever comes first.

Solution: Either user StringWordParser instead of HTMLStringWordParser, or ensure the text contains correctly formatted HTML. A literal "&" character should be entered as "&amp;", and a literal "<" character should be entered as "&lt;".

 

 

Home | Order Now | Products | Upgrades | Free Trial | Partners | About Spellex | Contact Us | Site Map | Privacy Policy

Spellex Corporation © 2008. All rights reserved