[1] | Kotagiri Ramamohanarao and Laurence A. F. Park. Spectral-based document retrieval. In Advances in Computer Science - ASIAN 2004, volume 3321 of Lecture Notes in Computer Science, pages 407--417. Springer Berlin / Heidelberg, December 2004. [ bib | DOI ] |
[2] |
Laurence A. F. Park and Kotagiri Ramamohanarao.
Hybrid pre-query term expansion using latent semantic analysis.
In Rajeev Rastogi, Katharina Morik, Max Bramer, and Xindong Wu,
editors, The Fourth IEEE International Conference on Data Mining, pages
178--185, Los Alamitos, California, November 2004. IEEE Computer Society.
[ bib |
DOI |
.pdf ]
Latent semantic retrieval methods (unlike vector space methods) take the document and query vectors and map them into a topic space to cluster related terms and documents. This produces a more precise retrieval but also a long query time. We present a new method of document retrieval which allows us to process the latent semantic information into a hybrid latent semantic-vector space query mapping. This mapping automatically expands the users query based on the latent semantic information in the document set. This expanded query is processed using a fast vector space method. Since we have the latent semantic data in a mapping, we are able to store and retrieve vector information in the same fast manner that the vector space method offers. Multiple mappings are combined to produce hybrid latent semantic retrieval which provide precision results 5% greater than the vector space method and fast query times. |
[3] |
Laurence A. F. Park, Kotagiri Ramamohanarao, and Marimuthu Palaniswami.
Fourier domain scoring : A novel document ranking method.
IEEE Transactions on Knowledge and Data Engineering,
16(5):529--539, May 2004.
[ bib |
DOI |
.pdf ]
Current document retrieval methods use a vector space similarity measure to give scores of relevance to documents when related to a specific query. The central problem with these methods is that they neglect any spatial information within the documents in question. We present a new method called Fourier Domain Scoring (FDS) which takes advantage of this spatial information, via the Fourier transform, to give a more accurate ordering of relevance to a document set. We show that FDS gives an improvement in precision over the vector space similarity measures for the common case of Web like queries, and it gives similar results to the vector space measures for longer queries. |
[4] | Laurence A. F. Park and Kotagiri Ramamohanarao. Preliminary work on pre-query term expansion using latent semantic analysis. In The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining Workshop, pages 1--10. Springer, Sydney, Australia, May 2004. [ bib ] |
[5] |
Laurence A. F. Park.
Spectral Based Information Retrieval.
PhD thesis, The University of Melbourne, Australia, 2004.
[ bib |
.pdf ]
The information found on the Internet is growing at such a rapid rate that soon methods of searching through text using terms frequencies will not be enough. At the moment, many Web search engines are showing signs of imprecision because they are based on these term counting methods which do not examine the relationships between the document terms. These methods begin to fail as the number of indexed documents increases past an allowable limit. Natural language processing has been performed in the past and we have found that it is only useful within its own domain. For example, if we use a natural language system to extract documents from a sporting database, we will find that the same tool will not be very effective for medical articles. Spatial methods have been developed to tackle the problem of the ever growing World Wide Web. Many have failed but a few have risen to the level of the frequency based methods mentioned above. Due to the extra document analysis performed, the spatial methods are slower than the frequency based methods and require more storage. This thesis presents a novel method of information retrieval entitled “Spectral Information Retrieval”. This method achieves the speed of the vector space methods with the benefits of the proximity methods to provide an overall high quality information retrieval system. Rather than using the spatial locality information (used in proximity searches), a spectral information retrieval method utilises the query terms' spectral lo- cality information (found with the aid of either the Fourier transform, Cosine transform, Gaussian transform or Wavelet transform). By combining the query term spectra, we are able to make fast proximity calculations and also make use of the many varieties of vec- tor space method weighting schemes. This method provides superior results to existing text based information retrieval systems. It is shown that spectral information retrieval methods provide high precision results at query times comparable to the widely used vector space methods, when using an index of comparable size to a vector space method index. This was possible using com- pression techniques such as spectral component cropping and quantisation, and speed up techniques such as early termination. When querying with a small set of terms, we saw that the spectral document retrieval methods using a certain vector space method weighting scheme, always improved the precision of the vector space method by a sig- nificant margin (at least 10%). It is also shown that the spectral information retrieval system can be further enhanced when working in conjunction with a relevance feedback system. |
This file was generated by bibtex2html 1.99.