[1] |
Laurence A. F. Park and Kotagiri Ramamohanarao.
Query expansion for the language modelling framework using the naive
bayes assumption.
In Takashi Washio, Einoshin Suzuki, Kai Ming Ting, and Akihiro
Inokuchi, editors, The Twelfth Pacific-Asia Conference on Knowledge
Discovery and Data Mining, number 5012 in LNCS, pages 681--688. Springer,
May 2008.
[ bib |
DOI |
.pdf ]
Language modelling is new form of information retrieval that is rapidly becoming the preferred choice over probabilistic and vector space models, due to the intuitiveness of the model formulation and its effectiveness. The language model assumes that all terms are independent, therefore the majority of the documents returned to the user will be those that contain the query terms. By making this assumption, related documents that do not contain the query terms will never be found, unless the related terms are introduced into the query using a query expansion technique. Unfortunately, recent attempts at performing a query expansion using a language model have not been in-line with the language model, being complex and not intuitive to the user. In this article, we introduce a simple method of query expansion using the naive Bayes assumption, that is in-line with the language model since it is derived from the language model. We show how to derive the query expansion term relationships using probabilistic latent semantic analysis (PLSA). Through experimentation, we show that using PLSA query expansion within the language model framework, we can provide a significant increase in precision. |
[2] |
Laurence A. F. Park and Kotagiri Ramaohanarao.
The effect of weighted term frequencies on probabilistic latent
semantic term relationships.
In Amihood Amir, Andrew Turpin, and Alistair Moffat, editors,
The 5th String Processing and Information Retrieval Symposium, volume
5280/2009, pages 63--74, 2008.
[ bib |
DOI |
.pdf ]
A latent semantic thesaurus allows us to use the term relationships generated by probabilistic latent semantic analysis (PLSA) in the form of a query expansion. It has many benefits over a latent semantic index; one of them being that the weights used to calculate the thesaurus term relationships can be different to the weights used during document retrieval. This article contains an investigation of the effect of term weighting on the probabilistic latent semantic term relationships. The effect of the term weighting is examined through the precision obtained from queries using the PLSA term relationships. Through experimentation, we found that all but one of the document sets used produced more effective term relationships when using weighted document-term frequencies, bringing us to the conclusion that it is more likely that term relationships will be more effective when using weighted terms with PLSA. A comparison to the BM25 pseudo-relevance feedback retrieval system showed that the PLSA weighted thesaurus method was able to produce an average 9% increase in average reciprocal rank. |
[3] |
Yong Zhen Guo, Kotagiri Ramamohanarao, and Laurence A. F. Park.
Web page prediction based on conditional random fields.
In Proceedings of the 18th European Conference on Artificial
Intelligence, pages 251--255, 2008.
[ bib |
.pdf ]
Web page prefetching is used to reduce the access latency of th`e Internet. However, if most prefetched Web pages are not visited by the users in their subsequent accesses, the limited network bandwidth and server resources will not be used efficiently and even worsen the access delay problem. Therefore, enhancing the Web page prediction accuracy is a main problem of Web page prefetching. Conditional Random Fields (CRFs), which are popular sequential learning models, have already been successfully used for many Natural Language Processing (NLP) tasks such as POS tagging, name entity recognition (NER) and segmentation. In this paper, we propose the use of CRFs in the field of Web page prediction. We treat the accessing sessions of previous Web users as observation sequences and label each element of these observation sequences to get the corresponding label sequences, then based on these observation and label sequences we use CRFs to train a prediction model and predict the probable subsequent Web pages for the current users. Our experimental results show that CRFs can produce higher Web page prediction accuracy effectively when compared with other popular techniques like plain Markov Chains and Hidden Markov Models (HMMs). |
[4] |
Yong Zhen Guo, Kotagiri Ramamohanarao, and Laurence A. F. Park.
Error correcting output coding-based conditional random fields for
web page prediction.
In Proceedings of the 2008 IEEE/WIC/ACM International Conference
on Web Intelligence, 2008.
[ bib |
DOI |
.pdf ]
Web page prefetching has been used efficiently to reduce the access latency problem of the Internet, its success mainly relies on the accuracy of Web page prediction. As powerful sequential learning models, Conditional Random Fields (CRFs) have been used successfully to improve the Web page prediction accuracy when the total number of unique Web pages is small. However, because the training complexity of CRFs is quadratic to the number of labels, when applied to a website with a large number of unique pages, the training of CRFs may become very slow and even intractable. In this paper, we decrease the training time and computational resource requirements of CRFs training by integrating error correcting output coding (ECOC) method. Moreover, since the performance of ECOC-based methods crucially depends on the ECOC code matrix in use, we employ a coding method, Search Coding, to design the code matrix of good quality. |
[5] |
Yuye Zhang, Laurence A. F. Park, and Alistair Moffat.
Parameter sensitivity in rank-biased precision.
In The Proceedings of the Thirteenth Australasian Document
Computing Symposium, 2008.
[ bib |
.pdf ]
Rank-Biased Precision (RBP) is a retrieval evaluation metric that assigns an effectiveness score to a ranking by computing a geometricly weighted sum of document relevance values, with the monotonicly decreasing weights in the geometric distribution determined via a persistence parameter p. Despite exhibiting various advantageous traits over well known existing measures such as Average Precision, RBP has the drawback of requiring the designer of any experiment to choose a value for p. Here we present a method that allows retrieval systems evaluated using RBP with different p values to be compared. The proposed approach involves calculating two critical bounding relevance vectors for the original RBP score, and using those vectors to calculate the range of possible RBP scores for any other value of p. Those bounds may then be sufficient to allow the outright superiority of one system over the other to be established. In addition, the process can be modified to handle any RBP residuals associated with either of the two systems. We believe the adoption of the comparison process described in this paper will greatly aid the uptake of RBP in evaluation experiments. |
This file was generated by bibtex2html 1.99.