Making modifications to textIR

In this section, we will provide details on how to modify certain parts of the program to get it to work as you want.

Changing the text parser

The grammar and parser are defined in the files trec_minScan.l and trec_parse.y respectively. These files are written in flex and bison and are used to parse TREC documents marked up with SGML tags. To parse different formats, change these files to suit.

To read tabular content, the files csv_scan.l and csv_parse.y are used to generate dindexCsv. If a different tabular format is used, edit the provided grammar and parser files.

Building your own thesaurus

textIR comes with a library to store and compress a thesaurus once the contents has been computed. To store generated values in a thesaurus, use the template provided src/Thesaurus-template.cc and link this to libdthesaurus.a

The template two functions that need to be filled in, the first is the pre-storage section found in the constructor method FloatThesaurus::FloatThesaurus where we insert the code to generate the values to be computed. For the case of the PLSA thesaurus, we compute the PLSA topics here. Any classes created here should be defined in the class definition found in the file Thesaurus-template.h. This will ensure that the data is available during the next section. The second section is within the method bool FloatThesaurus::calculateRow(int currentRealWord) and defines what is to be stored for a given row. This method returns 1 if the row is to be stored and 0 if we do not store the row in the thesaurus. Note that all sections of the files Thesaurus-template.cc and Thesaurus-template.h that need to be added contain a line beginning with //FILL.

For examples on how to fill in the empty sections, see the files LSAFloatThesaurus.cc, PLSAFloatThesaurus.cc and LSAFloatIndex.cc within the src directory of the project.

To build the application, simply link the compiled Thesaurus-template.o to the textIR library libdthesaurus.a found in the src directory.

 All Classes Functions Variables Friends

Generated on Tue Nov 10 14:12:07 2009 for textIR by  doxygen 1.6.1