Giter Site home page Giter Site logo

nikhilranjan7 / forte Goto Github PK

View Code? Open in Web Editor NEW

This project forked from asyml/forte

0.0 2.0 0.0 9.59 MB

Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/

License: Apache License 2.0

Shell 0.13% Python 99.23% Perl 0.65%

forte's Introduction




build test coverage documentation apache license gitter code style: black

DownloadQuick StartContribution GuideLicenseDocumentationPublication

Bring good software engineering to your ML solutions, starting from Data!

Forte is a data-centric framework designed to engineer complex ML workflows. Forte allows practitioners to build ML components in a composable and modular way. Behind the scene, it introduces DataPack, a standardized data structure for unstructured data, distilling good software engineering practices such as reusability, extensibility, and flexibility into ML solutions.

image

DataPacks are standard data packages in an ML workflow, that can represent the source data (e.g. text, audio, images) and additional markups (e.g. entity mentions, bounding boxes). It is powered by a customizable data schema named "Ontology", allowing domain experts to inject their knowledge into ML engineering processes easily.

Installation

To install the released version from PyPI:

pip install forte

To install from source:

git clone https://github.com/asyml/forte.git
cd forte
pip install .

To install some forte adapter for some existing libraries:

Install from PyPI:

# To install other tools. Check here https://github.com/asyml/forte-wrappers#libraries-and-tools-supported for available tools.
pip install forte.spacy

Install from source:

git clone https://github.com/asyml/forte-wrappers.git
cd forte-wrappers
# Change spacy to other tools. Check here https://github.com/asyml/forte-wrappers#libraries-and-tools-supported for available tools.
pip install src/spacy

Some components or modules in forte may require some extra requirements:

Quick Start Guide

Writing NLP pipelines with Forte is easy. The following example creates a simple pipeline that analyzes the sentences, tokens, and named entities from a piece of text.

Before we start, make sure the SpaCy wrapper is installed.

pip install forte.spacy

Let's start by writing a simple processor that analyze POS tags to tokens using the good old NLTK library.

import nltk
from forte.processors.base import PackProcessor
from forte.data.data_pack import DataPack
from ft.onto.base_ontology import Token

class NLTKPOSTagger(PackProcessor):
    r"""A wrapper of NLTK pos tagger."""

    def initialize(self, resources, configs):
        super().initialize(resources, configs)
        # download the NLTK average perceptron tagger
        nltk.download("averaged_perceptron_tagger")

    def _process(self, input_pack: DataPack):
        # get a list of token data entries from `input_pack`
        # using `DataPack.get()`` method
        token_texts = [token.text for token in input_pack.get(Token)]

        # use nltk pos tagging module to tag token texts
        taggings = nltk.pos_tag(token_texts)

        # assign nltk taggings to token attributes
        for token, tag in zip(input_pack.get(Token), taggings):
            token.pos = tag[1]

If we break it down, we will notice there are two main functions. In the initialize function, we download and prepare the model. And then in the _process function, we actually process the DataPack object, take the some tokens from it, and use the NLTK tagger to create POS tags. The results are stored as the pos attribute of the tokens.

Before we go into the details of the implementation, let's try it in a full pipeline.

from forte import Pipeline

from forte.data.readers import StringReader
from fortex.spacy import SpacyProcessor

pipeline: Pipeline = Pipeline[DataPack]()
pipeline.set_reader(StringReader())
pipeline.add(SpacyProcessor(), {"processors": ["sentence", "tokenize"]})
pipeline.add(NLTKPOSTagger())

Here we have successfully created a pipeline with a few components:

  • a StringReader that reads data from a string.
  • a SpacyProcessor that calls SpaCy to split the sentences and create tokenization
  • and finally the brand new NLTKPOSTagger we just implemented,

Let's see it run in action!

input_string = "Forte is a data-centric ML framework"
for pack in pipeline.initialize().process_dataset(input_string):
    for sentence in pack.get("ft.onto.base_ontology.Sentence"):
        print("The sentence is: ", sentence.text)
        print("The POS tags of the tokens are:")
        for token in pack.get(Token, sentence):
            print(f" {token.text}[{token.pos}]", end = " ")
        print()

It gives us output as follows:

Forte[NNP]  is[VBZ]  a[DT]  data[NN]  -[:]  centric[JJ]  ML[NNP]  framework[NN]  .[.]

We have successfully created a simple pipeline. In the nutshell, the DataPacks are the standard packages "flowing" on the pipeline. They are created by the reader, and then pass along the pipeline.

Each processor, such as our NLTKPOSTagger, interfaces directly with DataPacks and do not need to worry about the other part of the pipeline, making the engineering process more modular. In this example pipeline, SpacyProcessor creates the Sentence and Token, and then we implemented the NLTKPOSTagger to add Part-of-Speech tags to the tokens.

To learn more about the details, check out of documentation! The classes used in this guide can also be found in this repository or the Forte Wrappers repository

And There's More

The data-centric abstraction of Forte opens the gate to many other opportunities. Not only does Forte allow engineers to develop reusable components easily, it further provides a simple way to develop composable ML modules. For example, Forte allows us to:

image

To learn more about these, you can visit:

  • Examples
  • Documentation
  • Currently we are working on some interesting tutorials, stay tuned for a full set of documentation on how to do NLP with Forte!

Contributing

Forte was originally developed in CMU and is actively contributed by Petuum in collaboration with other institutes. This project is part of the CASL Open Source family.

If you are interested in making enhancement to Forte, please first go over our Code of Conduct and Contribution Guideline

About

Supported By

                                                        

image

License

Apache License 2.0

forte's People

Contributors

atif93 avatar avinashbukkittu avatar bhaskar2443053 avatar cz9779 avatar dependabot[bot] avatar gpengzhi avatar haoyulucas avatar hepengfe avatar hunterhector avatar j007x avatar jasonyanwenl avatar jennyzhang-petuum avatar jiaqiang-ruan avatar jieralice13 avatar jrxk avatar jzpang avatar mgupta1410 avatar mingkaid avatar mylibrar avatar nikhilranjan7 avatar piyush13y avatar pushkar-bhuse avatar qinzzz avatar seanrosario avatar swapnull7 avatar vincentyaombzuai avatar wanglec avatar weiwei718 avatar xuezhi-liang avatar zhanyuanucb avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.