Provides a language independent way to break UNICODE
 text into meaningful semantic units (e.g. words).
[scriptable, uuid(9f620be4-e535-11d6-b254-00039310a47a)]
interface nsISemanticUnitScanner : nsISupports

Methods

 next()
 Get the begin / end offset of the next unit in the current text

 @param text the text to be scanned
 @param length the number of characters in the text to be processed
 @param pos the current position
 @param isLastBuffer, the buffer is the last one
 @param begin the begin offset of the next unit 
 @param begin the end offset of the next unit 
 @return has more unit in the current text
boolean next(in wstring text, in long length, in long pos, in boolean isLastBuffer, out long begin, out long end)
 start()

 Starts up the semantic unit scanner with an optional
 character set, which acts as a hint to optimize the heuristics
 used to determine the language(s) of the processed text.

 @param characterSet the character set the text was originally
                     encoded in (can be NULL)
void start(in string characterSet)