[1] Laurence A. F. Park, Kotagiri Ramamohanarao, Christopher A. Leckie, and James C. Bezdek. Adapting spectral co-clustering to documents and words using latent semantic analysis. In AI 2009: Advances in Artificial Intelligence, volume 5866 of Lecture Notes in Computer Science, pages 301--311. Springer Berlin / Heidelberg, December 2009. [ bib | DOI | .pdf ]
Spectral co-clustering is a generic method of computing co- clusters of relational data, such as sets of documents and their terms. Latent semantic analysis is a method of document and term smoothing that can assist in the information retrieval process. In this article we ex- amine the process behind spectral clustering for documents and terms, and compare it to Latent Semantic Analysis. We show that both spectral co-clustering and LSA follow the same process, using different normal- isation schemes and metrics. By combining the properties of the two co-clustering methods, we obtain an improved co-clustering method for document-term relational data that provides an increase in the cluster quality of 33.0%.
[2] Uyen T. V. Nguyen, Laurence A. F. Park, Liang Wang, and Kotagiri Ramamohanarao. A novel path-based clustering algorithm using multi-dimensional scaling. In AI 2009: Advances in Artificial Intelligence, volume 5866 of Lecture Notes in Computer Science, pages 280--290. Springer Berlin / Heidelberg, December 2009. [ bib | DOI | .pdf ]
[3] Laurence A. F. Park and Kotagiri Ramamohanarao. Kernel latent semantic analysis using an information retrieval based kernel. In David Cheung, Il-Yeol Song, Wesley Chu, Xiaohua Hu, Jimmy Lin, Jiexun Li, and Zhiyong Peng, editors, Proceeding of the 18th ACM conference on Information and knowledge management., pages 1721--1724. The Association for Computing Machinery, November 2009. [ bib | DOI | .pdf ]
[4] Yong Zhen Guo, Kotagiri Ramamohanarao, and Laurence A. F. Park. Web access latency reduction using CRF-based predictive caching. In Web Information Systems and Mining, volume 5854 of Lecture Notes in Computer Science, pages 31--44. Springer Berlin / Heidelberg, November 2009. [ bib | DOI | .pdf ]
[5] Laurence A. F. Park and Kotagiri Ramamohanarao. The sensitivity of latent dirichlet allocation for information retrieval. In Wray Buntine, Marko Grobelnik, Dunja Mladenic, and John Shawe-Taylor, editors, Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD Proceedings, Part II, Lecture Notes in Artificial Intelligence, pages 176--188. Springer, Bled, Slovenia, September 2009. [ bib | DOI | .pdf ]
[6] William Webber and Laurence A. F. Park. Score adjustment for correction of pooling bias. In Mark Sanderson, ChengXiang Zhai, Justin Zobel, James Allan, and Javed Aslam, editors, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 444--451, New York, USA, July 2009. ACM Press. [ bib | DOI | .pdf ]
[7] Sri Ravana, Laurence A. F. Park, and Alistair Moffat. System scoring using partial prior information. In Mark Sanderson, ChengXiang Zhai, Justin Zobel, James Allan, and Javed Aslam, editors, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 788--789, New York, USA, July 2009. ACM Press. [ bib | DOI | .pdf ]
[8] Justin Zobel, Alistair Moffat, and Laurence A. F. Park. Against recall: Is it persistence, cardinality, density, coverage, or totality? In ACM SIGIR Forum, volume 43, pages 3--15. ACM Press, June 2009. [ bib | .pdf ]
[9] Yong Zhen Guo, Kotagiri Ramamohanarao, and Laurence A. F. Park. Grouped ECOC conditional random fields for prediction of web user behavior. In Advances in Knowledge Discovery and Data Mining, volume 5476 of Lecture Notes in Computer Science, pages 757--763. Springer Berlin / Heidelberg, April 2009. [ bib | DOI | .pdf ]
[10] Laurence A. F. Park, James C. Bezdek, and Christopher A. Leckie. Visualisation of clusters in very large rectangular dissimilarity data. In G. Sen Gupta and S. C. Mukhopadhyay, editors, Proceedings of the Fourth International Conference on Autonomous Robots and Agents, pages 251--256, February 2009. [ bib | DOI | .pdf ]
A matrix D, of pairwise dissimilarities between m row objects and n column objects, can be clustered: amongst row objects or column objects; amongst the union of row and column objects; and amongst the union of row and column objects containing at least one object of each type (co-clusters). The coVAT algorithm, which builds images for visual assessment of clustering tendency for these problems, is limited to mn O(10^4*10^4). We develop a scalable version of coVAT that approximates coVAT images when D is very large. Two examples are given to illustrate and evaluate the new method.
[11] Laurence A. F. Park and Kotagiri Ramamohanarao. An analysis of latent semantic term self-correlation. ACM Transactions on Information Systems, 27(2):1--35, 2009. [ bib | DOI | .pdf ]
Latent semantic analysis (LSA) is a generalised vector space method (GVSM) that uses dimension reduction to generate term correlations for use during the information retrieval process. We hypothesised that even though the dimension reduction establishes correlations between terms, the reduction is causing a degradation in the correlation of a term to itself (self-correlation). In this article, we have proven that there is a direct relationship to the size of the LSA dimension reduction and the LSA self-correlation. We have also shown that by altering the LSA term self-correlations we gain a significant increase in precision during the information retrieval process.
[12] Laurence A. F. Park and Kotagiri Ramamohanarao. Efficient storage and retrieval of probabilistic latent semantic information for information retrieval. The International Journal on Very Large Data Bases, 18(1):141--156, January 2009. [ bib | DOI | .pdf ]
Probabilistic latent semantic analysis (PLSA) is a method for computing term and document relationships from a document set. The probabilistic latent semantic index (PLSI) has been used to store PLSA information, but unfortunately the PLSI uses excessive storage space relative to a simple term frequency index, which causes lengthy query times. To overcome the storage and speed problems of PLSI, we introduce the probabilistic latent semantic thesaurus (PLST); an efficient and effective method of storing the PLSA information. We show that through methods such as document thresholding and term pruning, we are able to maintain the high precision results found using PLSA while using a very small percent (0.15%) of the storage space of PLSI.

This file was generated by bibtex2html 1.99.