Giter Site home page Giter Site logo

fabiod20 / big-data-analytics-and-business-intelligence Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 2.0 19.67 MB

Final project of "Big Data Analytics and Business Intelligence" course.

Jupyter Notebook 100.00%
big-data business-intelligence natural-language-processing pyspark spark-nlp named-entity-recognition neo4j neo4j-bloom powerbi

big-data-analytics-and-business-intelligence's Introduction

Cardiological Examinations Graph

This repository contains the final project of the Big Data Analytics and Business Intelligence course (AY 20/21) at the University of Naples Federico II.

Assignment

A dataset containing medical information of different patients is provided. Patient's information includes its examinations, with the relative anamnesis and diagnosis, written in Italian. The aim of the project is to build a Named Entity Recognition (NER) system capable of extracting diseases and symptoms from textual inputs. Labeled data are provided to train the model. Once the NER system is developed, a graph-based database must be implemented, integrating patients information (examinations, diseases, symptoms, etc.).

Project

The project, developed in team of 3, is structured as following:

  • preprocessing directory contains data cleaning and data preparation steps, both performed using PySpark in a distributed environment, provided by Databricks.
  • ner-system directory contains training and inference of the NER system, implemented using John Snow Labs (Spark NLP). The system is based on a version of BERT pre-trained on an Italian dataset, and it has been fine-tuned for 10 epochs, exploiting GPUs on Google Colab. However, due to the scarcity of labeled data, the model reached limited performance.
  • graph-based-database directory contains the code used to populate a graph-based database, built using Neo4j. The database stores all the relevant information of the patients, interconnecting each patient with its examinations, diseases, symptoms, drugs and doctors.
  • data-visualization directory contains:
    • Three dashboards developed in Microsoft Power BI, which show respectively general information about doctors, the clinical situation of a given patient, and correlation among diseases, symptoms and drugs.
    • A bloom perspective implemented in Neo4j Bloom, which allows the user to explore the graph without the need to know Cypher Query Language.
  • documentation directory contains an exhaustive documentation of the project, written in Italian.

big-data-analytics-and-business-intelligence's People

Contributors

fabiod20 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.