Giter Site home page Giter Site logo

askem-ta1-dockervm's Introduction

ASKEM-TA1-DockerVM

Place to create docker recepies that will make our pipelines easy to use

Usage

The current directory must contain subdirectories inputs and outputs. Inputs will have plain text files to be processed by TA 1 reading pipelines Edit docker-compose.yml to add your OpenAI API key.

Run docker compose up and after both pipelines have finished, the outputs directory will contain files with the following prefiexes:

  • extractions_ Arizona output artifacts
  • mit_ MIT output artifacts
  • canonical_ merged outputs using canonical data format

SKEMA Service Components

Text Reading

The client code for both SKEMA and MIT text reading pipelines is available in end-to-end-rest/notebooks/text_reading_pipeline.ipynb

This notebook contains examples of how to annotate:

  • PDFs
  • Plain text files
  • Call the embedding based MIRA grounding.

Additionally, the variable extraction endpoints support an optional AMR file that will be linked with the variables extracted at the end of extraction. See the PDF annotation example for reference.

AMR alignment

The notebook end-to-end-rest/notebooks/text_reading/metal.ipynb contains an example of how to call the AMR linking endpoint if you have a file with variable extractions and a pre-existing AMR.

Eqn2AMR

{Img,LaTeX}2pMML

There are two endpoints available for this part of the workflow, which are demonstrated in the equations.ipynb notebook located in the end-to-end-rest/notebooks directory.

  1. get("/latex/mml")

This endpoint handles a GET request and expects a LaTeX string representing an equation as input. It then returns the corresponding presentation MathML for that equation.

  1. post("/image/mml")

This endpoint handles a POST request and expects a PNG image of an equation as input. It then processes the image and returns the corresponding presentation MathML for that equation.

Please refer to the equations.ipynb notebook for a detailed demonstration of how to use these endpoints.

pMML2AMR

There are several endpoints that can be used for this aspect of the workflow. They are demonstrated in the eqn2amr.ipynb notebook in the end-to-end-rest/notebooks directory.

  1. post("/workflows/consolidated/equations-to-amr")

    Is a put request takes in a vector of mathml or LaTeX strings and returns an AMR of the selected variety, either Petrinet, RegNet, gAMR, MET, or Decapode.

    An example input for a regnet below:

    {
    "mathml": [
        "<math><mrow><mfrac><mrow><mi>d</mi><mi>x</mi></mrow><mrow><mi>d</mi><mi>t</mi></mrow></mfrac><mo>=</mo><mi>alpha</mi><mi>x</mi><mo>-</mo><mi>beta</mi><mi>x</mi><mi>y</mi></mrow></math>",
        "<math><mrow><mfrac><mrow><mi>d</mi><mi>y</mi></mrow><mrow><mi>d</mi><mi>t</mi></mrow></mfrac><mo>=</mo><mi>delta</mi><mi>x</mi><mi>y</mi><mo>-</mo><mi>gamma</mi><mi>y</mi></mrow></math>"
    ],
    "model": "regnet"
    }```
    
    
    

Code2AMR

code2FN

The Code2FN service take code as input (in multiple different forms), runs the program analysis pipeline to parse the files into CAST and translate the CAST into a Function Network (FN) and returns Gromet Function Network Module Collection (GrometFNModuleCollection) JSON.

The service currently accepts Python and Fortran (family) source code. The language type is determined by the filename extensions:

  • Python: .py
  • Fortran: .f, .F, .for, .f90, .F90, .f95, .F95

The service can accept the following types of code forms:

  • A JSON serialized code system (fn-given-filepaths)
  • .zip archive containing a directory tree of source code files (fn-given-filepaths-zip)

The two endpoints, as well as the expected structure of the JSON serialized code system, are demonstrated in the two notebooks end-to-end-rest/notebooks/code2fn/fn-given-filepaths.ipynb and end-to-end-rest/notebooks/code2fn/fn-given-filepaths-zip.ipynb

FN2AMR

This is demonstrated in the code2amr.ipynb notebook in the end-to-end-rest/notebooks directory.

This part of the workflow currently has two endpoints, one for code-snippets and one for code archives (.zip files), however only PetriNet's are primarily supported right now AMR extractions. The code-snippet workflow is accessed through the following endpoint:

post("/workflows/consolidated/code-snippets-to-amrs")

The endpoint to take in a code archive is the following:

post("/workflows/code/llm-assisted-codebase-to-pn-amr")

AMR Refinement

This is demonstrated in the MORAE_demo.ipynb notebook in the morae-demo/ directory.

Enrique, if you could explain how to run it here.

askem-ta1-dockervm's People

Contributors

enoriega avatar myedibleenso avatar free-quarks avatar vincentraymond-ua avatar ualiangzhang avatar jastier avatar cl4yton avatar

Stargazers

 avatar  avatar Sushma Akoju avatar

Watchers

Daniel Bryce avatar  avatar  avatar Keith Alcock avatar  avatar Sushma Akoju avatar Kate Isaacs avatar

Forkers

rchamplin

askem-ta1-dockervm's Issues

[code2fn] Add `end-to-end-rest` example for processing source code

In coordination with @vincentraymond-ua , add add code2FN notebooks to code2fn-rest example.

Tasks

  1. Integration
    jastier
  2. 2 of 3
    Integration
    jastier vincentraymond-ua
  3. Code2FN Integration high priority
    titomeister
  4. Code2FN Integration high priority
    vincentraymond-ua

[code2fn] Enrich `end-to-end-rest` example gromet output with code comment metadata

Tasks

  1. 3 of 3
    Code2FN documentation high priority
    cl4yton
  2. Code2FN Integration high priority
    titomeister

Depends on ml4ai/skema#239

Text input endpoint

This endpoint will receive one or more text files in its request's body and pass them along to the reading pipelines.
After both results are ready, it will run the canonical format unifier

[code2fn] example of gromet -> AMR in `end-to-end-rest`

In coordination with @Free-Quarks , enhance or supplement end-to-end-rest example with AMR output.

We have a snippet we expect to work (to be added to code2amr.ipynb), but MORAE doesn't yet support our typical gromet:

from IPython.display import display, HTML, Image
from pathlib import Path
import requests
import json
import os

pp = lambda x: print(json.dumps(x, indent=2))

SKEMA_PA_SERVICE = os.environ.get("SKEMA_PA_ADDRESS", "http://skema-py:8000")
SKEMA_RS_SERVICE = os.environ.get("SKEMA_RS_ADDRESS", "http://skema-rs:8080")


filename = "CHIME_SIR.py"
with open(Path("/data") / "skema" / "code" / filename, "r") as infile:
    code = infile.read()

# display file contents
display(HTML(f"<code>{code}</code>"))

# API call and response

response = requests.post(f"{SKEMA_RS_SERVICE}/extract-comments", json={"language" : "Python", "code" : code})

r = requests.put(f"{BASE_URL}/models/PN", json=response.json())
r.json()

# NOTE: the put request cleans up after itself (from @Free-Quarks)
#requests.delete(f"{BASE_URL}/models/{MODEL_ID}").text

Tasks

  1. Code2FN documentation wontfix
    cl4yton

Suggestion of descriptive text for Code2FN in e2e jupyter notebook for OpenHouse demo

<text>

The Code2FN service take code as input (in multiple different forms), runs the program analysis pipeline to parse the files into CAST and translate the CAST into a Function Network (FN) and returns Gromet Function Network Module Collection (GrometFNModuleCollection) JSON.

The service currently accepts Python and Fortran (family) source code. The language type is determined by the filename extensions:

  • Python: .py
  • Fortran: .f, for, f95

The service can accept the following four types of code forms:

  • string containing code
  • single file
  • multi-file - array of text-blobs and corresponding filenames
  • zip archive containing a directory tree of source code files

</text>

@vincentraymond-ua :

  • Can you verify that my list of supported Fortran extensions is accurate and complete?
  • Does my description of the input types look correct?

[code2fn] Example notebook in `end-to-end-rest` for code -> fn

Provide a jupyter notebook for the code2fn-rest example illustrating how to call program analysis endpoints to generate function networks from code.

Tasks

Update Notebook Examples

There are a few notebooks we have with examples of using our endpoints. However I think they are probably outdated at this point. We should make sure they get updated before the program ends so they are still valid documentation. This issue is to remind us and track that work.

  • eq2amr endpoints updated
  • #53
  • image2mml endpoints updated
  • isa endpoints updated
  • text-reading endpoints updated

We should also make sure the README is up to date as well with respect to these endpoints.

  • eq2amr endpoints updated
  • code2amr endpoints updated
  • image2mml endpoints updated
  • isa endpoints updated
  • text-reading endpoints updated

Tasks

No tasks being tracked yet.

Create proxy api and docker image

Create a REST API using fastapi to serve as the entry point that will orchestrate our annotation workflows and route requests to the appropriate web service (MIT or AZ)

[documentation] consolidate examples to use a single docker-compose file

To reduce confusion, we move away from the subdir-based examples to a unified approach that uses a single docker-compose file.

Tasks

  1. enoriega
  2. Free-Quarks

[code2fn] Enrich `code2fn-rest` with example of zip archive -> fn

Enrich the jupyter notebook associated with code2fn-rest with an example illustrating how to process a zip archive of code.

Tasks

  1. 2 of 3
    Integration
    jastier vincentraymond-ua
  2. Code2FN enhancement
    vincentraymond-ua

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.