First approach to text mining from PDF files. It's defined as the process of deriving quality information from text, text mining, is an approach to obtain and tie information from several resources. In this case pdf files.
- R
- Libraries:
- pdftools
- tidytext
- wordcloud
The algorithm reads the pdf file stored as "sampledoc.pdf" and plots two types of graphs. First one is a bar graph which represent the words that appears in the text more than 35 times. Second one is a wordcloud representing the 15 most common words, size relative to its frequency.