Giter Site home page Giter Site logo

ramongsilva / indexing-of-extracted-information Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 18.75 MB

This repository contains files and information about step 3 of Kaphta Architecture: Indexing of Extracted Information, using the R language.

R 100.00%
machine-learning inverted-index indexing-algorithms information-retrieval text-mining text-classification ensemble

indexing-of-extracted-information's Introduction

Indexing of Extracted Information

This repository contains files and information about step 3 of Kaphta Architecture: Indexing of Extracted Information. In this stage, PubMed abstracts with extracted information (Information Extraction step) are indexed. There are 2 indexations, using the R language: Individual and Cross indexations. The individual indexations are for entities about polyphenols, cancer and genes, and the cross indexations are for polyphenol-cancer and polyphenol-gene entity associations. The following are listed the files and results of this stage.

For more information about this and other steps of the Kaphta Architecture, see sections of the Kaptha Web Tool available in https://portal.ifsuldeminas.edu.br/kaphtawebtool/.

Individual and Cross indexations

  • indexing-information-extracted-gh.R: R script for individual and cross indexation of extracted information from PubMed abstracts about polyphenols anticancer activity, using the inverted index.
  • functions.R: script with auxiliary functions. Save this file in the same folder of indexing-information-extracted-gh.R script, because it is needed to execute this script.
  • db_total_project.db: SQLite Database needed to execute all R scripts of kaphta architecture steps. This database contains tables with the Entity dictionary, Total PubMed abstracts textual corpus, and Pubmed abstracts classified as positive in text classification. Save this file in the same folder of indexing-information-extracted-gh.R script, because it is needed to execute this script.
  • entities-recognized: folder with files resulted of NER task, containing extracted information about named entities (polyphenols, cancers and genes) recognized on PubMed abstracts in the previous stage (Information Extraction step). Save this folder with the files in the same folder of indexing-information-extracted-gh.R script, because it is needed to execute this script, on the indexation task.
  • Rule_associations_recognized.rar: compacted file resulted of AR task in the previous stage (Information Extraction step), containing the PubMed abstract sentences with at least one rule from rules dictionary recognized. Save this file in the same folder of indexing-information-extracted-gh.R script, because it is needed to execute this script, on indexation tasks.

Results

Below are presented files from the results folder, with the results for individual and cross indexation of PubMed abstracts.

Individual indexation

Cross indexation

indexing-of-extracted-information's People

Contributors

ramongsilva avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.