Spellex SDK for JavaTM - Technical Support

Product: Spellex Spell Checking Engine Java SDK

This article describes some techniques for using Spellex Spell Checking Engine to detect the presence of specific words in text, such as profanity.

Usually, the Spellex spelling engine is used to detect words which are not present in a dictionary or lexicon of words. Words not in the dictionary are deemed to be misspelled and are reported so they can be corrected. Detecting the presence of specific words is, in certain respects, the reverse of this.

Each word in a dictionary used by the Spellex spelling engine has an action code associated with it. When a word in the text being checked matches a word in a dictionary, the Spellex engine examines the action code associated with the word and performs the indicated action. The most common action tells the Spellex engine to ignore or skip over the word, usually because the word is correctly spelled so no further action is associated with it.

Other action codes supported by the Spellex engine cause the word to be automatically or conditionally replaced with another word. These actions are typically used to "auto correct" certain frequently misspelled words, such as replacing "recieve" with "receive." Automatic or conditional replacements are not made by the Spellex engine directly. Instead, the Spellex engine reports to the calling application (i.e., your application) that the word should be replaced with another word. Normally, the calling application then makes the replacement by calling certain methods in the Spellex API, perhaps after confirming the replacement with the user. The key thing to note here is that certain words can be assigned an action code which causes the Spellex engine to report to your application when those words are encountered. This is exactly what is needed to detect the presence of those words.

Entries in text lexicons (e.g., FileTextLexicon or StreamTextLexicon) contain a word, an action, and a replacement word. Suppose you want to detect the presence of the words "dog," "cat," or "pig" in the text (we'll pretend these words are profanity). You could create a new FileTextLexicon and add three entries to it, one for each word you want to detect. Actions which can be associated with words in lexicons are listed in the Spellex programmer's guide under "Action codes." We'll use CONDITIONAL_CHANGE_ACTION as the action. The replacement word will be an encoded string. The encoded string will tell our application that the word is profanity. To keep things simple, we'll just use "XXX" as the replacement word. The three entries can be created by calling FileTextLexicon's addWord method:

lex.addWord("dog", lex.CONDITIONAL_CHANGE_ACTION, "XXX");
lex.addWord("cat", lex.CONDITIONAL_CHANGE_ACTION, "XXX");
lex.addWord("pig", lex.CONDITIONAL_CHANGE_ACTION, "XXX");

With this lexicon open, the SpellingSession.check method will return CONDITIONAL_CHANGE_WORD_RSLT whenever "dog", "cat", or "pig" are encountered in the text. Your application can examine the check method's otherWord parameter to determine if the word is profanity: If otherWord is "XXX", a match has been found.

if (speller.check(wordParser, otherWord) == speller.CONDITIONAL_CHANGE_WORD_RSLT)
{
if (otherWord == "XXX")
{
System.out.println("Tsk, tsk tsk!");
}
}

The replacement word can be any string, so additional information can be encoded in it. For example, you might want to follow "XXX" with a digit indicating the "offensiveness," with "1" meaning "mildly inappropriate" and "9" being reserved for words that would make Tony Soprano blush.

The same approach can be used in other circumstances where you want to detect the presence of certain words: Categorizing e-mail into folders, filtering spam, detecting part numbers, etc.

 

Home | Order Now | Products | Upgrades | Free Trial | Partners | About Spellex | Contact Us | Site Map | Privacy Policy

Spellex Corporation © 2008. All rights reserved