Gather and preprocess word and their statisitics. More...
Public Member Functions | |
BuildList (void) | |
Initialise the list. | |
T * | insertWord (char *word) |
Insert and word and preprocess. | |
T * | insertWordNoReplace (char *word) |
Insert no more words, only perform preprocessing. | |
T * | insertWordNoPrepare (char *word) |
Insert word without preprocessing. | |
void | buildWordList (ObjectList< T > *list) |
Return the set of words in an ObjectList. | |
virtual bool | prepareWordBuffer (char *word)=0 |
Preprocess the words. | |
int | uniqueWords (void) |
The number of words currently stored. | |
T ** | generateWordList (void) |
Build a list suitable for the buildWordList method. | |
Protected Attributes | |
T * | _wordItem |
A word. | |
int | _wordCount |
The current number of words stored. | |
char * | _wordBuffer |
Storage for word preprocessing. | |
HashTable< char *, T * > * | _wordHash |
Storage for the hash table. | |
HASHREC< char *, T * > * | _wordPointer |
A pointer to the hash table. |
Gather and preprocess word and their statisitics.
This class is used to accumuate encountered words during the text scanning portion of the indexing stage. Inserted words may be preprocessed (stopping, stemming, case folding), and are placed into a hash table for later look up. Once the scanning is complete, the words and their statisitics can be extracted.