This repo contains the R code to replicate the Latent Semantic Indexing example from Berry, Dumias and O'Brien article "Using Linear Algebra for Intelligent Information Retrieval."
The data in machine readable form is in books.txt. The code itself is in analysis.R.
The authors construct a document term matrix. I reproduce using the tm package for R and some light hand-editing to account for parser differences:
I then perform a SVD of the DTM.
Figure 4 from the paper shows the locations of the documents and terms.