ml4ai / automates Goto Github PK
View Code? Open in Web Editor NEWAutoMATES: Automated Model Assembly from Text, Equations, and Software
Home Page: https://ml4ai.github.io/automates
License: Other
AutoMATES: Automated Model Assembly from Text, Equations, and Software
Home Page: https://ml4ai.github.io/automates
License: Other
Could someone who knows how we use nltk see what is involved to bump it up to at least 3.6.4?
CVE-2021-3828 High severity
This was found while writing the if_statement
unit test in #275. In the C
file for that test, notice that variable a
is accessed in the second if
block. In the GrFN, we would expect an arrow from a
at the top to in the interface node in the second conditional block. Furthermore, the GrFN which is generated cannot be executed and results in the following error:
Executing GrFN...
Traceback (most recent call last):
File "/home/ryan/projects/automates/scripts/program_analysis/run_gcc_to_grfn.py", line 213, in <module>
run_gcc_pipeline()
File "/home/ryan/projects/automates/scripts/program_analysis/run_gcc_to_grfn.py", line 205, in run_gcc_pipeline
result = grfn(inputs)
File "/home/ryan/projects/automates/automates/model_assembly/networks.py", line 1197, in __call__
self.root_subgraph(self, subgraph_to_hyper_edges, node_to_subgraph, set())
File "/home/ryan/projects/automates/automates/model_assembly/networks.py", line 621, in __call__
sugraph_execution_result = subgraph(
File "/home/ryan/projects/automates/automates/model_assembly/networks.py", line 646, in __call__
to_execute()
File "/home/ryan/projects/automates/automates/model_assembly/networks.py", line 390, in __call__
variable = self.outputs[i]
IndexError: list index out of range
Excerpt of relevant code:
if (a > b) {
x = b;
b = a;
}
if (x == 3) {
a = x;
b = a;
x = 10;
}
Here are the CAST and GrFN pdfs created from the C
file:
if_statement--CAST.pdf
if_statement--GrFN.pdf
I am having difficulty installing the package and noticed the number of pinned dependencies in setup.py is fairly substantial (see below). I'm trying to install on an M1 Mac and get stuck on the scipy dep not having a wheel for that version and my arch. Would it be possible to unpin some of these deps? Thanks!
"antlr4-python3-runtime==4.8",
"dill==0.3.4",
"Flask==1.1.1",
"flask_codemirror==1.1",
"flask_wtf==0.14.3",
"future==0.18.2",
"matplotlib==3.3.4",
"networkx==2.5",
"nltk==3.6.6",
"notebook==6.4.12",
"numpy==1.21",
"pandas==1.2.2",
"plotly==4.5.4",
"pygraphviz==1.7",
"pytest==6.2.2",
"pytest-cov==2.11.1",
"python-igraph==0.9.1",
"Pygments==2.7.4",
"SALib==1.3.12",
"seaborn==0.10.0",
"scikit_learn==0.24.1",
"SPARQLWrapper==1.8.5",
"sympy==1.5.1",
"tqdm==4.29.0",
"WTForms==2.2.1",
"flask-codemirror",
"scipy==1.6.0",
"ruamel.yaml",
"pdfminer.six",
"pdf2image",
"webcolors",
"lxml",
"Pillow",
"ftfy",
"fastparquet"
],
Cases currently not handled by untangleConj
:
Possible solutions:
Given a Python program like the following
y = 10
if y < 5:
x = 1
else:
x = 3
print(x)
This program won't correctly translate to GrFN due to a bug having to do with the variable x
. The variable x
doesn't appear before the conditional, and this introduces an issue at the GrFN generation when creating the appropriate GrFN variables.
There are currently two proposed fixes:
The current proposal is to implement the second as a solution, though perhaps in the future the first solution should also be implemented.
when we parse arxiv pdfs with science parse, the first page seems to get the sideways watermark in the document pretty consistently:
The federated learning learns a shared global model by the aggregation of local models
on client devices. But in the original paper of federated learning [18] only uses a simple
average on client models, taking the number of samples in each client device as the
weight of averaging. In the mobile keyboard applications, the language preference may
vary from different individuals. There is a relation among the client
ar X
iv :1
81 2.
07 10
8v 1
[ cs
.C L
] 1
7 D
ec 2
01 8
language models, and their contributions to the central server are quite different.
Since we're primarily using arxiv papers, I think we should removed this, prob with a DocumentFilter
@bsharpataz, just another example to support the idea of not using grobid-quant for intervals:
{
"runtime": 59,
"measurements": [
{
"type": "listc",
"quantities": [
{
"rawValue": "0.9",
"parsedValue": {
"numeric": 0.9,
"structure": {
"type": "NUMBER",
"formatted": "0.9"
},
"parsed": "0.9"
},
"offsetStart": 105,
"offsetEnd": 108,
"quantified": {
"rawName": "< Kcbmax",
"normalizedName": "kcbmax",
"offsetStart": 109,
"offsetEnd": 117
}
}
],
"quantified": {
"rawName": "< Kcbmax",
"normalizedName": "kcbmax",
"offsetStart": 109,
"offsetEnd": 117
}
},
{
"type": "interval",
"quantityLeast": {
"rawValue": "1.15",
"parsedValue": {
"numeric": 1.15,
"structure": {
"type": "NUMBER",
"formatted": "1.15"
},
"parsed": "1.15"
},
"offsetStart": 177,
"offsetEnd": 181
},
"quantityMost": {
"rawValue": "1.15",
"parsedValue": {
"numeric": 1.15,
"structure": {
"type": "NUMBER",
"formatted": "1.15"
},
"parsed": "1.15"
},
"offsetStart": 120,
"offsetEnd": 124
}
}
]
}
where b is a positive number
keepLongestVariable
should take care of that.[Not high priority] In chrome the web form only scales vertically and displays results in small window.
With an RMSE of 22.8%, drastic discrepancies were found in the comparison of Ref-ET ETo and ETpm from DSSAT-CSM version 4.5 for Arizona conditions (fig. 1a).
Processing sentence : With an RMSE of 22.8%, drastic discrepancies were found in the comparison of Ref-ET ETo and ETpm from DSSAT-CSM version 4.5 for Arizona conditions (fig. 1a).
DOC : org.clulab.processors.corenlp.CoreNLPDocument@6992ebed
[error] application -
! @7am4d9lji - Internal server error, for (GET) [/parseSentence?sent=With+an+RMSE+of+22.8%25%2C+drastic+discrepancies+were+found+in+the+comparison+of+Ref-ET+ETo+and+ETpm+from+DSSAT-CSM+version+4.5+for+Arizona+conditions+(fig.+1a).&showEverything=true] ->
play.api.http.HttpErrorHandlerExceptions$$anon$1: Execution exception[[NoSuchElementException: key not found: type]]
at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:255)
at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:182)
at play.core.server.AkkaHttpServer$$anonfun$$nestedInanonfun$executeHandler$1$1.applyOrElse(AkkaHttpServer.scala:251)
at play.core.server.AkkaHttpServer$$anonfun$$nestedInanonfun$executeHandler$1$1.applyOrElse(AkkaHttpServer.scala:250)
at scala.concurrent.Future.$anonfun$recoverWith$1(Future.scala:414)
at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at play.api.libs.streams.Execution$trampoline$.execute(Execution.scala:70)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
at scala.concurrent.impl.Promise$KeptPromise$Kept.onComplete(Promise.scala:368)
Caused by: java.util.NoSuchElementException: key not found: type
at scala.collection.MapLike.default(MapLike.scala:232)
at scala.collection.MapLike.default$(MapLike.scala:231)
at scala.collection.AbstractMap.default(Map.scala:59)
at scala.collection.MapLike.apply(MapLike.scala:141)
at scala.collection.MapLike.apply$(MapLike.scala:140)
at scala.collection.AbstractMap.apply(Map.scala:59)
at ujson.Value$Selector$StringSelector.apply(Value.scala:97)
at ujson.Value.apply(Value.scala:62)
at ujson.Value.apply$(Value.scala:62)
at ujson.Obj.apply(Value.scala:190)
Adding this will require keeping track of operation order in at least base CAST. We (Tito, Ryan, Janalee, Clay) have previously discussed doing this by adding an order number to each CAST node; when a break or continue is indicated, then the order number of the parent can be used to resolve where to return control.
Now that we are switching over to GrFN3 we need to revamp the grfn2cag translation pipeline. @aswinchester this would be a great task for you to get integrated into the core of the AutoMATES repository. Please reach out to @dpdicken and @titomeister for assistance as you accomplish this task. They will be more than happy to pair-program with you and help you along the way!
grfn2gromet
branch or a child branch from that branch. DO NOT attempt to do this on master because the setup tests and objects only exist on the grfn2gromet
branch.automates/model_assembly/networks.py
: this file contains the class definitions for the GroundedFunctionNetwork
and the CausalAnalysisGraph
classes. This file should be the one location where implementations need to be made.scripts/model_assembly/py2grfn.py
: this is a script that will run our whole pipeline to translate Python source code into a GrFN/CAG. Read the flag definitions carefully. You will need to supply the correct flags to actually run CAG generation and to test whether the CAG is equivalent to one saved/loaded from JSON.tests/data/program_analysis/language_tests/python/<idiom-name>/<test-name>/<test-name>.py
: this is a collection of Python code examples that can be used to see if we can generate CAGs for certain programming idioms that we may see in scientific model source code.tests/model_assembly/test_grfn2cag.py
: this is a file where tests exist that will test whether the methods that will be defined as part of this issue are working or not. Use this file to determine if your implementations are successful or not after you can successfully run the py2grfn.py
script.tests/conftest.py
: a helper file for the tests mentioned above that defines pytest fixtures used during the testing process.CausalAnalysisGraph
and CAGContainer
classes marked with TODO: @Alex ...
.
CausalAnalysisGraph.from_GrFN()
: this method creates a CausalAnalysisGraph
from a GroundedFunctionNetwork
CausalAnalysisGraph.from_json_file()
: this method creates a CausalAnalysisGraph
from a CAG.json fileCAGContainer.from_func_node()
: this method creates a CAGContainer
from a GrFN BaseConFuncNode
. Containers in a CAG play the role of determining variable node subgraph membership based upon the functions found in the GrFN parent. Ask @cl4yton if you need guidance on what a CAGContainer
should look like.CAGContainer.from_dict()
: this method creates a CAGContainer
from a dictionary of data. This dict is derived from a re-loaded CAG.json.to_igraph_gml()
method in the CausalAnalysisGraph
class that allows the CAG to be converted to an igraph object that can be used with the identifiability algorithm. An example of this for the old CAG class is in the same file (commented out and below the new class definition).from_json_file
and from_dict
methods take a look at the already implemented to_json
and to_dict
methods because those reveal what information is expected to be present in the CAG.json.This was found while writing the if_else_statement
unit test in #275. In the C
file for that test, notice that variable a
is assigned in the else
branch, but it does not appear as the output of an assignment node in the GrFN. Furthermore, as a
was assigned, it should come out of the interface block, but it does not. The CAST correctly has the assignment of a
in the else branch of the first conditional.
Excerpt of relevant code:
if (a < b) {
x = b;
b = a;
}
// else should be taken
else {
x = a;
a = b;
}
Here are the CAST and GrFN pdfs created from the C
file:
if_else_statement--CAST.pdf
if_else_statement--GrFN.pdf
This issue is much simpler. We should prepare sets of JSONs for GE for the models they have been working with to assist them in making the transition from using GrFN2 + ExprTree JSONs to using GrFN3 JSON.
Note: all references in this issue are for the grfn2gromet
branch. I'm not sure whether the references are correct on master
.
model_name--GrFN.json
model_name--GrFN3.json
model_name--expr-trees.json
The code for transforming lambda expressions into expr-trees exists both as a standalone script and as a function in automates/model_assembly/networks.py
. The same core library is used and the calls are done for each expression function in GrFN whether we are talking 2.0 or 3.0.
scripts/model_assembly/expression_walker.py
automates/model_assembly/networks.py::BaseFuncNode::expression_to_expr_nodes()
If necessary, it should be possible to use the second function to generate the information needed to form the expr-tree JSON during GrFN3 translation. I'm not sure if that will make things easier, but I wanted you both to know about that as an option.
Please add version number so we know which schema version was used to generate any file. Thanks!
In the GCC/Gimple to AST pipeline, we see that compound conditional (e.g., (cond1 && cond2) ) get split into sequentially handled conditions. These need to be recognized as compound, otherwise the CAST that gets generated will look like two sequential if-statements.
The collection of tasks defined below should be all that is needed to prepare the tests that will determine if the grfn2gromet
branch to merge into master
. All tasks defined should be carried out on the grfn2gromet
branch or a branch from that branch.
tests/data/program_analysis/language_tests/python/<idiom-name>/<test-name>/<test-name>.py
tests/conftest.py
./pytest.ini
scripts/model_assembly/py2grfn.py
tests/program_analysis/test_python2cag.py
, tests/program_analysis/test_cag2air.py
, tests/model_assembly/test_air2grfn.py
, and tests/model_assembly/test_grfn2cag.py
for all of the Python code examples according to the three example stubs showntests/model_assembly/test_grfn_execution.py
to utilize all of the Python code examples
pytest.mark.skip
so that we have a record of them but they will not be caught as failing for this PR.This will list outstanding text reading grammar issues:
@pratikbhd and @hlim1 have finished getting the new container types loop
, if-block
, and select-block
ready to be used by the GrFN extraction module. We now need to refactor networks.py::process_container
so that processing is done based on the type of the container.
Check if there are any (non-standard) python dependencies that have to be installed prior to running the code in the repo.
https://github.com/ml4ai/automates/wiki/Setup
@pauldhein, I think we could be the main python users, so this is probably on us. I will check the TR-specific python scripts. Let me know if there is anything you'd like me to add from your side of things.
Also, let me know if you had to go through any other extra steps to setup tr and alignment and i can add that to the wiki page.
In the GCC2CAST pipeline, any declared variables that aren't assigned a value like
int x;
Currently have their default values set to -1. We would like to change that so that it's an value of sorts. This way it's more clear that it's an actual declaration but not an assignment.
This is more likely going to be done at the CAST level by leveraging the LiteralValue node.
We have done some limited handling of side-effecting on the GCC-side. E.g., there should be some handling (within the annCAST passes) of identification of globals being assigned, and capturing this in some the annCAST (needs review).
But we do not currently have a general approach to handling side-effects / wiring.
For example, for the use of side-effecting functions or the Python assignment-expression Walrus Operator:
z = -5
b = -10
a = -5
print(a)
def temp_a(b):
global a
a = max(2, b)
return a
if ( z - 20 < a or not(z < temp_a(b)) ) and ...:
print("if_body", z, a)
else:
print("else_body", z, a)
With the Walrus Operator:
z = -5
b = -10
a = -5
print(a)
if z - 20 < a or not(z < (a := max(2, b))):
print("if_body", z, a)
else:
print("else_body", z, a)
These cases need to be handled.
It seems that the general solution requires passing side-effected updated variables as part of expression return values, so likely require use of packing/unpacking.
(Gromet FN aim to be monadic, so need to massage side-effecting into monadic framework.)
To alleviate some confusion in the CAST generation pipeline we propose changing the attribute 'val' to 'name' in order to reduce confusion with the new 'default_val' attribute. We will make this change sometime after Milestone 5 and the demo in order to prevent any bug ripple effects in CAST generation.
d-1 and h-1 are extracted as variables in the example below, but there shouldn't be.
ETsz = standardized reference crop evapotranspiration for short (ETos) or tall (ETrs) surfaces (mm d-1 for daily time steps or mm h-1 for hourly time steps),
[Not high priority] In chrome, web app returns error when submit an empty form. Perhaps just have this be a "no-op" -- i.e., do nothing and just provide empty form new entry.
With the webapp, we aren't displaying Intervals well:
Found Entities:
List(ValueAndUnit, Value, Measurement, Entity) => 10 - 20 cm
------------------------------
Rule => GrobidEntityFinder
Type => RelationMention
------------------------------
value (Value, Measurement, Entity) => 10
unit (Unit, Measurement, Entity) => cm
------------------------------
List(Interval, Measurement, Entity) => 10 - 20 cm
------------------------------
Rule => GrobidEntityFinder
Type => RelationMention
------------------------------
most (ValueAndUnit, Value, Measurement, Entity) => 20 cm
least (ValueAndUnit, Value, Measurement, Entity) => 10 - 20 cm
------------------------------
List(Value, Measurement, Entity) => 10
------------------------------
Rule => GrobidEntityFinder
Type => TextBoundMention
------------------------------
Value, Measurement, Entity => 10
------------------------------
List(Value, Measurement, Entity) => 20
------------------------------
Rule => GrobidEntityFinder
Type => TextBoundMention
------------------------------
Value, Measurement, Entity => 20
------------------------------
List(Unit, Measurement, Entity) => cm
------------------------------
Rule => GrobidEntityFinder
Type => TextBoundMention
------------------------------
Unit, Measurement, Entity => cm
------------------------------
Currently, generation of the lambdas for Mini-PET
causes the following error to occur:
This seems like a simple bug that should be solved by adding the proper import line so that controltype
and switchtype
are visible in this lambdas file.
NOTE: This issue pertains to the dssat_pet
branch in Delphi but we are documenting it here for task tracking purposes.
@marcovzla I created a very rough and undocumented script to run SymPy's LaTeX parsing pipeline. It is located at scripts/equation_reading/tex2py.py
.
What we need to do now is to modify this script to turn it into a callable library routine that takes a tokenized LaTeX equation string as input and outputs a string representation of the equivalent python code.
Once we have that I can use Python's ast
module to turn the mathematic expression code into a parse tree that I will align to a lambda expression extracted from source code.
At a very high-level, here is what the TeX2Py
algorithm tries to accomplish:
(0) Given a string of tokenized LaTeX (call it T)
(1) Split T on = into a left-hand side (LHS) and a right-hand side (RHS)
(1a) Set aside the LHS, we will return that as-is for now
(1b) if there are multiple =
(1bi) count every expression other than the LHS as an RHS
(2) Remove common LaTeX formatting tokens (e.g. ~, \left, \mathrm, etc)
(3) Create a LaTeX variable to simple variable map (call it V)
(3a) simple variables will be a single letter
(3b) LaTeX variables can use _{}, _, ^, ^{}, _{}^{}, ^{}_{} in their definition
(3c) Convert to pythonic form by replacing _{} with _ and ^{} with __
(3d) Create map of pythonic vars to single-letter vars
(4)Replace all variables in RHS with one-letter vars in V
(5) Perform the translation to python with sympy.parsing.latex.parse_latex
(6) Replace one letter vars with the pythonic vars from V
(7) Return the results
@marcovzla if you have any questions about the above algorithm or any ambiguity associated with this task please @ me in this issue.
Copy mention may not be updating the new token interval here (may be able to combine with previous several lines of unused code (constructing new mentions instead of copying---those several lines are deleted in the Alice_Functions2 branch):
Also, make sure discontinuous char offset attachments are handled properly---if we use indices to get to them, we need to be adding something to the sequence even if there is no attachment.
The output marked with red in the webapp is not correct.
The output in the terminal while running the webapp is correct, eg this:
Processing sentence : Under full irrigation, Kcbmax with the ETo-Kcb method had little influence on maize and cotton yield for 0.9 < Kcbmax < 1.15, but simulated yield decreased rapidly for Kcbmax > 1.15 (fig. 6a).
DOC : org.clulab.processors.corenlp.CoreNLPDocument@12c55fc2
Done extracting the mentions ...
They are : 1.15, but simulated yield decreased rapidly for Kcbmax > 1.15, > 1.15 (fig, 1.15, 0.9, 1.15, Kcbmax, full irrigation, simulated yield, 0.9 < Kcbmax < 1.15, little influence, > 1.15, ETo-Kcb method, cotton yield, Kcbmax, maize, fig, fig, 6a
Sentence returned from processPlaySentence : Under full irrigation , Kcbmax with the ETo-Kcb method had little influence on maize and cotton yield for 0.9 < Kcbmax < 1.15 , but simulated yield decreased rapidly for Kcbmax > 1.15 ( fig .
Found mentions (in mkJson):
List(Concept, Entity) => 6a
------------------------------
Rule => simple-np
Type => TextBoundMention
------------------------------
Concept, Entity => 6a
------------------------------
List(Concept, Entity) => full irrigation
------------------------------
Rule => simple-np
Type => TextBoundMention
------------------------------
Concept, Entity => full irrigation
------------------------------
List(Concept, Entity) => Kcbmax
------------------------------
Rule => simple-np
Type => TextBoundMention
------------------------------
Concept, Entity => Kcbmax
------------------------------
I'm trying to fix this myself, but I haven't succeeded so far.
This sentence:
The height of the chair was 100-150 cm and it was between 2700.5 and 3000 mm in length.
stack trace:
Exception in thread "main" java.lang.RuntimeException: unsupported measurement type 'listc'
at org.clulab.aske.automates.quantities.GrobidQuantitiesClient.mkMeasurement(GrobidQuantitiesClient.scala:30)
at org.clulab.aske.automates.quantities.GrobidQuantitiesClient.$anonfun$getMeasurements$1(GrobidQuantitiesClient.scala:24)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:234)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:52)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike.map(TraversableLike.scala:234)
at scala.collection.TraversableLike.map$(TraversableLike.scala:227)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.clulab.aske.automates.quantities.GrobidQuantitiesClient.getMeasurements(GrobidQuantitiesClient.scala:24)
at org.clulab.aske.automates.entities.GrobidEntityFinder.extract(GrobidEntityFinder.scala:19)
at org.clulab.aske.automates.entities.TestStuff$.main(GrobidEntityFinder.scala:156)
at org.clulab.aske.automates.entities.TestStuff.main(GrobidEntityFinder.scala)
Process finished with exit code 1
anything else we're not handling????? (assuredly)
Should we need to modify it to handle other mention types? Or not expand?
with text: For this research Tx = 10 mm d−1, Lsc = −1100 J kg−1 and Lpwp = −2000 J kg−1.
(variable test t9b)
[info] java.lang.ClassCastException: org.clulab.odin.RelationMention cannot be cast to org.clulab.odin.TextBoundMention
[info] at org.clulab.aske.automates.actions.ExpansionHandler.expand(ExpansionHandler.scala:136)
[info] at org.clulab.aske.automates.actions.ExpansionHandler.expandIfNotAvoid(ExpansionHandler.scala:95)
[info] at org.clulab.aske.automates.actions.ExpansionHandler.$anonfun$expandArgs$4(ExpansionHandler.scala:67)
[info] at org.clulab.aske.automates.actions.ExpansionHandler.$anonfun$expandArgs$4$adapted(ExpansionHandler.scala:66)
[info] at scala.collection.Iterator.foreach(Iterator.scala:929)
[info] at scala.collection.Iterator.foreach$(Iterator.scala:929)
[info] at scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
[info] at scala.collection.IterableLike.foreach(IterableLike.scala:71)
[info] at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
[info] at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
PassThroughPreprocessor still does some cleaning (e.g., doesn't allow sequences that are less than 60% letters excluding spaces), so the webapp fails on text like this: then: dI (0) > 0
.
In PR #263, we combine cosmos blocks to make sure paragraphs are not split up (that happens at the end of a column in two-column papers and at the end of pages). When we combine blocks, the location of extracted mentions becomes less specific---instead of saying Mention 1 comes from p. 1 block 1, we are saying Mention 1 comes from p. 1 block 1-2, and the mention can be located either in block 1, block 2, or be split between the two blocks. Keeping track of length of each block in characters and knowing the character offset of the extraction based on the combined block content can help narrow it down.
Note: Currently, COSMOS combines some paragraphs into longer blocks. This needs to be discussed with UW.
either check the entities field of the Doc or ask the state if mentions with Greek label overlap
The endpoint described here https://github.com/ml4ai/automates/wiki/Text-Reading#mention-extraction- results in some mentions with null paths, which can't be handled by the align endpoint (JNull error). If input is supposed to be the processors document, the wiki will need to be updated with an extra step, but it is probably more convenient for users to not have extra steps, so just make sure the endpoint takes a science parse doc and produces mentions of the right format. (also, check if the pdf_to_mentions produces mentions that can be handled by align)
At this point the GroMEt generation has extensively grown, and with that it could use some refactoring in some visitors.
Some visitors were initially written in a less-than-general manner, but as new language features get added the visitors become harder to maintain. I think a rewrite of some pieces would help alleviate the issue, and make the implementations more general.
The current pieces that could use a rewrite, in order of priority:
AnnCAST Call Node: Adding primitives and attributes into the mix has made this visitor difficult to understand and maintain.
AnnCAST Assignment Node: This visitor in particular is dealt on a case-by-case basis. While this doesn't need a serious rewrite it could definitely use some looking over as it's been added to a lot.
AnnCAST Attribute Node: This visitor is relatively new, and doesn't need a rewrite, but needs more expansion in order to better fit itself when used by other visitors.
Encountering Math Error while running mathjax_mml_conversion.py https://github.com/ml4ai/automates/blob/gauravs_automates/scripts/equation_reading/mathjax/arxiv_eqn_extraction/mathjax_mml_converter.py
"Math Processing Error: Maximum call stack size exceeded"
It puts the server on a hold. The server stops responding without getting killed. Is there a way to make the MathJax node kill by itself? I have tried providing timeout in requests.post () but nothing worked.
Some related post that I have found that might be helpful:
https://phabricator.wikimedia.org/T120959
https://moodle.org/mod/forum/discuss.php?d=318488
The equation extraction documentation linked from this file appears to be broken:
PPE ~ use ~ of ~ an ~ item ~ on ~ a ~ particular ~ day = \frac{(\# ~ of ~ patients ~ that ~ day) (\# ~ of ~ daily ~ cont acts ~ per ~ patient)}{\# ~ of ~ patient ~ contacts ~ before ~ discarding ~ item}
I'd like to use Json serialization to get the me tions into a JSON file so that we can ingest it into solr.
Is this code the write place to build on? https://github.com/clulab/processors/tree/master/main/src/main/scala/org/clulab/serialization/json
This issue addresses two requests for the interface between the TR-endpoints and the model analysis pipeline. The two requests are:
/align
into new methods where each method focuses on creating a single link typeAlong with merging the changes necessary to resolve this issue to master, we should also document the choices made in the AutoMATES Github wiki. That will require a new section that we will call Model Assembly
... @pauldhein will work on that.
All changes implemented for this issue should be done in PR #121
This section is here just to disambiguate some terms. Previously, the PA and MA teams had been referring to the output from the PA pipeline as GrFN JSON, and previously (as well as currently) we had been sending a path to that JSON file to the /align
endpoint. @BeckySharp and/or @maxaalexeeva correct me if I am wrong, but I believe the only fields needed by the /align
endpoint from the old GrFN JSON are the variables
and source_comments
. I propose that I deliver these fields to you in a JSON file without any of the other old or new GrFN components.
Does that work for you both? If this works then you won't have to worry about any further changes to the PA/MA GrFN JSON (or it's new sousing the AIR JSON).
Currently we have the following endpoint defined for aligning all sources:
/**
* Align mentions from text, code, comment. Expected fields in the json obj passed in:
* 'mentions' : file path to Odin serialized mentions
* 'equations': path to the decoded equations
* 'grfn' : path to the grfn file, already expected to have comments and vars
* @return decorated grfn with link elems and link hypotheses
*/
def align: Action[AnyContent] = Action { request =>
...
}
I would like to change this so that we have separate endpoints for each of the dashed links shown in our overall link diagram:
Let's review these links:
tokenized equation <--> assignment statement
: The TR team doesn't need to worry about this one.ontology concept <--> text definition
: I will address this link in a separate GitHub issuesource code variable <--> source comment variable
: what are we doing for this link right now? Is it just a string edit distance? Would it be easier for us to make this link on the python side of AutoMATES?equation variable <--> text variable
: This seems like a good candidate for a new endpoint. Perhaps we can name it /alignEquationsAndText
?variable docstring <--> text definition
: Can we have this new endpoint be /alignDocstringsAndText
?For each of the new endpoints, I'd like to pass in to Scala a single path to a single JSON file that holds an object that has all the fields necessary for that endpoint. Does that sound reasonable? Perhaps we can plan what fields are needed for which endpoints in this issue?
Hello there, first of all, congratulations on the project, I found it very interesting !
However I could not test/reproduce it because it requires access to GDrive files such as "ASKE-AutoMATES/Data/equation_decoding/arxiv2018-downsample-aug-model_step_80000.pt". Could you please add the Gdrive path to the documentation?
Many thanks
Murilo
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.