text_mining's Introduction

First approach to text mining in pdf files with R

Introduction

First approach to text mining from PDF files. It's defined as the process of deriving quality information from text, text mining, is an approach to obtain and tie information from several resources. In this case pdf files.

Technologies

R
Libraries:
- pdftools
- tidytext
- wordcloud

Launch

The algorithm reads the pdf file stored as "sampledoc.pdf" and plots two types of graphs. First one is a bar graph which represent the words that appears in the text more than 35 times. Second one is a wordcloud representing the 15 most common words, size relative to its frequency.

Plot

Recommend Projects

alexgomb / text_mining Goto Github PK

text_mining's Introduction

First approach to text mining in pdf files with R

Introduction

Technologies

Launch

Plot

text_mining's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent