zbMATH — the first resource for mathematics

Low-rank orthogonal decompositions for information retrieval applications. (English) Zbl 0905.68060
Authors’ abstract: Current methods to index and retrieve documents from databases usually depend on a lexical match between query terms and keywords extracted from documents in database. These methods can produce incomplete or irrelevant results due to the use of synonyms and polysemus words. The association of terms with documents (or implicit semantic structure) can be derived using large sparse term-by document matrices. In fact, both terms and documents can be matched with user queries using representations in \(k\)-space (where \(100\leq k\leq 200\)) derived from \(k\) of the largest approximate singular vectors of these terms-by-document matrices. This completely automated approach called latent semantic indexing or LSI, uses subspaces spanned by the approximate singular vectors to encode important associative relationships between terms and documents in \(k\)-spaces. Using LSI, two or more documents may be close to each other in \(k\)-space (and hence meaning) yet share no common terms. The focus of this work is to demonstrate the computational advantages of exploiting low-rank orthogonal decomposition such as the ULV (or URV) as opposed to the truncated singular value decomposition (SVD) for the construction of initial and updated rank-\(k\) subspaces arising from LSI applications.
Reviewer: J.Zítko (Praha)

68P20 Information storage and retrieval of data
65F20 Numerical solutions to overdetermined systems, pseudoinverses
Full Text: DOI
[1] and . The computational complexity of alternative updating approaches for an SVD-encoded indexing scheme. In Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, Philadelphia, 1995. SIAM. · Zbl 0960.68594
[2] Berry, International Journal of Supercomputer Applications 6 pp 13– (1992)
[3] Berry, SIAM Review 37 pp 573– (1995)
[4] Bischof, SIAM Journal on Scientific and Statistical Computing 12 pp 1332– (1991)
[5] Bunch, Numerische Mathematik 31 pp 111– (1978)
[6] Chan, Linear Algebra and Its Applications 88 pp 67– (1987)
[7] Chan, SIAM Journal on Scientific and Statistical Computing 11 pp 519– (1990)
[8] Chan, SIAM Journal on Scientific and Statistical Computing 13 pp 727– (1992)
[9] Deerwester, Journal of the American Society for Information Science 41 pp 391– (1990)
[10] and . Using latent semantic analysis to improve access to textual information. In Proceedings of Computer Human Interaction ’88, pages 281-285, 1988.
[11] Dumais, Behavior Research Methods, Instruments, & Computers 23 pp 229– (1991) · doi:10.3758/BF03203370
[12] and . Low-rank revealing two-sided orthogonal decompositions. Technical Report 94-09, Department of Mathematics, California State University, San Marcos, CA, October 1994.
[13] Fierro, Numerische Mathematik 70 pp 453– (1995)
[14] Foltz, Communications of the ACM 35 pp 51– (1992)
[15] Foster, Linear Algebra and Its Applications 74 pp 47– (1986)
[16] , , , , and . Information retrieval using a singular value decomposition model of latent semantic structure. In Proceedings of SIGIR, pages 465-480, 1988.
[17] and . Matrix Computations. Johns-Hopkins, Baltimore, Second edition, 1989.
[18] and . Solving Least Squares Problems. Prentice Hall, N.J., First edition, 1974. · Zbl 0860.65028
[19] Information management tools for updating an SVD-encoded indexing scheme. Master’s thesis, The University of Knoxville, Tennessee, Knoxville, TN, 1994.
[20] On an algorithm for refining a rank-revealing URV decomposition and a perturbation theorem for singular values. Technical Report CS-TR 2626, University of Maryland, Department of Computer Science, College Park, MD, 1991.
[21] Stewart, ZEEE Transactions on Signal Processing 40 pp 1535– (1992)
[22] Cross-language information retrieval using latent semantic indexing. Master’s thesis, The University of Knoxville, Tennessee, Knoxville, TN, 1994.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.