Giter Site home page Giter Site logo

vitaly-z / visualize-unstructured-data-with-watson Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ibm/visualize-unstructured-data-with-watson

0.0 0.0 0.0 876 KB

Visualize unstructured data using Watson NLU

Home Page: https://developer.ibm.com/patterns/visualize-unstructured-text/

License: Apache License 2.0

JavaScript 2.68% Java 17.24% CoffeeScript 39.77% CSS 29.39% HTML 10.92%

visualize-unstructured-data-with-watson's Introduction

Build Status

Visualize Unstructured Data Using Watson Natural Language Understanding

In this code pattern, we will create a web app for visualizing unstructured data using Watson Natural Understanding, Apache Tika, and D3.js. After a user uploads a local file of their choosing, the application leverages Apache Tika to extract text from the unstructured data file. The text is then passed through Watson Natural Language Understanding, where entities and concepts are extracted. Finally, the application uses the D3.js library as a visualization tool to display the results to the user.

The main benefit of using the Watson Natural Understanding Service is its powerful analytics engine that provides cognitive enrichments and insights into your data. The key enrichments that are extracted include:

  • Entities: people, companies, organizations, cities, and more.
  • Keywords: important topics typically used to index or search the data.
  • Concepts: identified general concepts that aren't necessarily referenced in the data.
  • Sentiment: the overall positive or negative sentiment of the data.

The enrichments will be displayed using D3.js, a JavaScript library that provides powerful visualization techniques that helps bring data to life. In this app, we will use it to display each of the enrichments in an interactive bubble cloud, with each elements size and location determined by its relative significance.

When the reader has completed this code pattern, they will understand how to:

  • Create and use an instance of Watson Natural Language Understanding
  • Leverage Apache Tika to extract text from unstructured files
  • Use D3.js for displaying the visuals

architecture

Flow

  1. User configures credentials for the Watson NLU service and starts the app.
  2. User selects data file to proecess and load.
  3. Text is extracted from the data file using Apache Tika.
  4. Extracted text is passed to Watson NLU for enrichment.
  5. Enriched data is visualized in the UI using the D3.js library.

Watch the Video

This video is from a webinar produced for the "Building With Watson" series.

video

Steps

  1. Clone the repo
  2. Create Watson services with IBM Cloud
  3. Configure credentials
  4. Run the application

1. Clone the repo

Clone the visualize-unstructured-data-with-watson repo locally. In a terminal, run:

git clone https://github.com/IBM/visualize-unstructured-data-with-watson

2. Create Watson services with IBM Cloud

Create the following services:

3. Configure credentials

The credentials for IBM Cloud services, can be found in the Services menu in IBM Cloud, by selecting the Service Credentials option for each service.

Use those values to update the config.properties file located in the src/main/resources directory. Replace the default values with the appropriate credentials (either API key, or username/password). Note that quotes are not required.

# Watson Natural Language Understanding
NATURAL_LANGUAGE_UNDERSTANDING_URL=<add_nlu_url>
## Un-comment and use either username+password or IAM apikey.
NATURAL_LANGUAGE_UNDERSTANDING_IAM_APIKEY=<add_nlu_iam_apikey>
#NATURAL_LANGUAGE_UNDERSTANDING_USERNAME=<add_nlu_username>
#NATURAL_LANGUAGE_UNDERSTANDING_PASSWORD=<add_nlu_password>

4. Run the application

Pre-requisite

Maven >= 3.5 is used to build, test, and run the app. Check your maven version using the following command:

mvn -v

To download and install maven, click here.

Note: If you would prefer not to download Maven, you can substitute the mvn portion of any Maven command with either ./mvnw (on Linux or Mac), or mvnw.cmd (on Windows). This will run a pre-installed local version of Maven that is included in this repo.

Build and Run the app

  1. Install and package the Java app by running the following Maven command (remember, you can substitute mvn with mnvw if you do not have Maven installed):
mvn clean install
  1. Start the app by running:
java -jar target/nlu-visual-1.0.jar
  1. Browse to http://localhost:8080 to see the app.

  2. To start the visualization process, select and upload a data file from your local file system. Note that while Apache Tika supports over a thousand different files types, this app has only been tested using a small set of standard document type formats. For your convenience, we have included a few sample poems located in the data subdirectory of this repo.

Sample output

From the home page, you will be prompted to choose a file from your local system:

home_page

Select a file and press the Upload button. In this example, the file "The Raven.pdf" was selected from the data folder:

concepts_tab

If you click on the Sentiments tab, you will see:

sentiments_tab

License

This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.

Apache License FAQ

visualize-unstructured-data-with-watson's People

Contributors

rhagarty avatar scottdangelo avatar markstur avatar dolph avatar imgbot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.