Giter Site home page Giter Site logo

gatenlp / gate-core Goto Github PK

View Code? Open in Web Editor NEW
75.0 75.0 29.0 17.76 MB

The GATE Embedded core API and GATE Developer application

License: GNU Lesser General Public License v3.0

Shell 0.12% Batchfile 0.01% HTML 1.11% Java 98.72% Rich Text Format 0.05%

gate-core's People

Contributors

betoboullosa avatar dependabot[bot] avatar greenwoodma avatar ianroberts avatar jlleitschuh avatar johann-petrak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gate-core's Issues

Deadlock when loading pipeline

Java 9.0.4 and GATE 8.5-alpha1, build 5448:

While loading a pipeline the GUI locked up. The console output shows

Deadlocked Thread:
------------------
"LoadResourceFromFileAction" prio=1 Id=34 BLOCKED on gate.util.GateClassLoader@231f98ef owned by "AWT-EventQueue-0" Id=14
	at [email protected]/java.lang.ClassLoader.loadClass(ClassLoader.java:543)
	-  blocked on gate.util.GateClassLoader@231f98ef
	at gate.util.GateClassLoader.loadClass(GateClassLoader.java:208)
	at gate.util.GateClassLoader.loadClass(GateClassLoader.java:225)
	at gate.util.GateClassLoader.loadClass(GateClassLoader.java:148)
	at [email protected]/java.lang.Class.forName0(Native Method)
	at [email protected]/java.lang.Class.forName(Class.java:375)
	at [email protected]/com.sun.beans.finder.ClassFinder.findClass(ClassFinder.java:103)
	at [email protected]/com.sun.beans.finder.InstanceFinder.instantiate(InstanceFinder.java:94)
	...


	[email protected]/java.lang.ClassLoader.loadClass(ClassLoader.java:543)
	gate.util.GateClassLoader.loadClass(GateClassLoader.java:208)
	gate.util.GateClassLoader.loadClass(GateClassLoader.java:225)
	gate.util.GateClassLoader.loadClass(GateClassLoader.java:148)
	[email protected]/java.lang.Class.forName0(Native Method)
	[email protected]/java.lang.Class.forName(Class.java:375)
	[email protected]/com.sun.beans.finder.ClassFinder.findClass(ClassFinder.java:103)
	[email protected]/com.sun.beans.finder.InstanceFinder.instantiate(InstanceFinder.java:94)
	[email protected]/com.sun.beans.finder.InstanceFinder.find(InstanceFinder.java:66)
	[email protected]/java.beans.Introspector.findExplicitBeanInfo(Introspector.java:484)
	[email protected]/java.beans.Introspector.<init>(Introspector.java:434)
	[email protected]/java.beans.Introspector.getBeanInfo(Introspector.java:205)
	gate.creole.CreoleAnnotationHandler.processParameters(CreoleAnnotationHandler.java:455)
	gate.creole.CreoleAnnotationHandler.processCreoleResourceAnnotations(CreoleAnnotationHandler.java:305)
	gate.creole.CreoleAnnotationHandler.processAnnotationsForResource(CreoleAnnotationHandler.java:275)
	gate.creole.CreoleAnnotationHandler.processAnnotations(CreoleAnnotationHandler.java:245)
	gate.creole.CreoleAnnotationHandler.processAnnotations(CreoleAnnotationHandler.java:248)
	gate.creole.CreoleAnnotationHandler.processAnnotations(CreoleAnnotationHandler.java:231)
	gate.creole.CreoleRegisterImpl.processFullCreoleXmlTree(CreoleRegisterImpl.java:295)
	gate.creole.CreoleRegisterImpl.parseDirectory(CreoleRegisterImpl.java:279)
	gate.creole.CreoleRegisterImpl.registerPlugin(CreoleRegisterImpl.java:194)
	gate.util.persistence.PersistenceManager.loadObjectFromUrl(PersistenceManager.java:1294)
	gate.util.persistence.PersistenceManager.loadObjectFromFile(PersistenceManager.java:1213)
	gate.gui.MainFrame$LoadResourceFromFileAction$1.run(MainFrame.java:3685)
	[email protected]/java.lang.Thread.run(Thread.java:844)
Deadlocked Thread:
------------------
"AWT-EventQueue-0" prio=6 Id=14 BLOCKED on gate.util.GateClassLoader@667e834 owned by "LoadResourceFromFileAction" Id=34
	at [email protected]/java.lang.ClassLoader.loadClass(ClassLoader.java:543)
	-  blocked on gate.util.GateClassLoader@667e834
	at gate.util.GateClassLoader.loadClass(GateClassLoader.java:208)
	at gate.util.GateClassLoader.loadClass(GateClassLoader.java:248)
	at gate.util.GateClassLoader.loadClass(GateClassLoader.java:148)
	at [email protected]/java.lang.Class.forName0(Native Method)
	at [email protected]/java.lang.Class.forName(Class.java:375)
	at gate.gui.MainFrame.getIcon(MainFrame.java:323)
	at gate.gui.MainFrame.getIcon(MainFrame.java:305)
	...


	[email protected]/java.lang.ClassLoader.loadClass(ClassLoader.java:543)
	gate.util.GateClassLoader.loadClass(GateClassLoader.java:208)
	gate.util.GateClassLoader.loadClass(GateClassLoader.java:248)
	gate.util.GateClassLoader.loadClass(GateClassLoader.java:148)
	[email protected]/java.lang.Class.forName0(Native Method)
	[email protected]/java.lang.Class.forName(Class.java:375)
	gate.gui.MainFrame.getIcon(MainFrame.java:323)
	gate.gui.MainFrame.getIcon(MainFrame.java:305)
	gate.groovy.GroovySupport.getActions(GroovySupport.java:113)
	gate.gui.MainFrame$ToolsMenu$1.run(MainFrame.java:4510)
	[email protected]/java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:313)
	[email protected]/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:764)
	[email protected]/java.awt.EventQueue.access$500(EventQueue.java:97)
	[email protected]/java.awt.EventQueue$3.run(EventQueue.java:717)
	[email protected]/java.awt.EventQueue$3.run(EventQueue.java:711)
	[email protected]/java.security.AccessController.doPrivileged(Native Method)
	[email protected]/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:89)
	[email protected]/java.awt.EventQueue.dispatchEvent(EventQueue.java:734)
	[email protected]/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:199)
	[email protected]/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
	[email protected]/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
	[email protected]/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
	[email protected]/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
	[email protected]/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)

Weird window sizing bug after changing dual monitor setup.

GATE starts with a default window size that has the window height set to a ridiculously low value and thus does not show any content, only the menu and tool button bars (width is ok).
Also there are intermittent/non-reproducible situations where the right-click menu now suddenly goes over two columns, showing e.g. "Applications->run" in the second column.

This may be related to a new dual monitor configuation on a Linux Ubuntu 17.10 computer with monitor 1 being 1680x1050, normal and monitor 2 being 1920x1080, left-rotated.

As already known for a long time, the splash screen is also in a weird place, but now the location is even weirder (happens to be the bottom right of the left monitor). This is because the coordinates for the splash screen are apparently calculated by summing over the dimensions of both monitors.

Move spring support out of gate-core

In our continuing push to separate gate-core into independent pieces that can be more easily updated I'm planning on moving the spring support out into a separate project. Anyone with any objections should shout now :)

NPE in SerialCorpusImpl.java:742

when running the tests, when running testExplicitMimeType(gate.corpora.TestDocument) , it can happen that the following exception occurs:

	at gate.corpora.SerialCorpusImpl.findDocument(SerialCorpusImpl.java:742)
	at gate.corpora.SerialCorpusImpl.indexOf(SerialCorpusImpl.java:898)
	at gate.corpora.SerialCorpusImpl.resourceUnloaded(SerialCorpusImpl.java:479)
	at gate.creole.CreoleRegisterImpl.fireResourceUnloaded(CreoleRegisterImpl.java:1087)
	at gate.creole.CreoleRegisterImpl.resourceUnloaded(CreoleRegisterImpl.java:1140)
	at gate.CreoleProxy.fireResourceUnloaded(Factory.java:915)
	at gate.Factory.deleteResource(Factory.java:463)
	at gate.corpora.TestDocument.testExplicitMimeType(TestDocument.java:306)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at junit.framework.TestCase.runTest(TestCase.java:176)
	at junit.framework.TestCase.runBare(TestCase.java:141)
	at junit.framework.TestResult$1.protect(TestResult.java:122)
	at junit.framework.TestResult.runProtected(TestResult.java:142)
	at junit.framework.TestResult.run(TestResult.java:125)
	at junit.framework.TestCase.run(TestCase.java:129)
	at junit.framework.TestSuite.runTest(TestSuite.java:252)
	at junit.framework.TestSuite.run(TestSuite.java:247)
	at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:86)
	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
	at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
	at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

To allow the test to complete successfully a temporary NPE guard was added, but this should get investigated!

Loading from a 8.4 xgapp file should not throw a NPE

I get this exception:

java.lang.NullPointerException
gate.persist.PersistenceException: java.lang.NullPointerException
	at gate.util.persistence.PersistenceManager.loadObjectFromUrl(PersistenceManager.java:1333)
	at gate.util.persistence.PersistenceManager.loadObjectFromFile(PersistenceManager.java:1213)
	at gate.gui.MainFrame$LoadResourceFromFileAction$1.run(MainFrame.java:3697)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
	at gate.util.persistence.PersistenceManager$URLHolder.unpackPersistentRepresentation(PersistenceManager.java:489)
	at gate.util.persistence.PersistenceManager$URLHolder.createObject(PersistenceManager.java:461)
	at gate.util.persistence.PersistenceManager.getTransientRepresentation(PersistenceManager.java:793)
	at gate.util.persistence.CollectionPersistence.createObject(CollectionPersistence.java:82)
	at gate.util.persistence.PersistenceManager.getTransientRepresentation(PersistenceManager.java:793)
	at gate.util.persistence.PersistenceManager.getTransientRepresentation(PersistenceManager.java:774)
	at gate.util.persistence.PersistenceManager.loadObjectFromUrl(PersistenceManager.java:1286)
	... 3 more
  Caused by:
java.lang.NullPointerException
	at gate.util.persistence.PersistenceManager$URLHolder.unpackPersistentRepresentation(PersistenceManager.java:489)
	at gate.util.persistence.PersistenceManager$URLHolder.createObject(PersistenceManager.java:461)
	at gate.util.persistence.PersistenceManager.getTransientRepresentation(PersistenceManager.java:793)
	at gate.util.persistence.CollectionPersistence.createObject(CollectionPersistence.java:82)
	at gate.util.persistence.PersistenceManager.getTransientRepresentation(PersistenceManager.java:793)
	at gate.util.persistence.PersistenceManager.getTransientRepresentation(PersistenceManager.java:774)
	at gate.util.persistence.PersistenceManager.loadObjectFromUrl(PersistenceManager.java:1286)
	at gate.util.persistence.PersistenceManager.loadObjectFromFile(PersistenceManager.java:1213)
	at gate.gui.MainFrame$LoadResourceFromFileAction$1.run(MainFrame.java:3697)
	at java.lang.Thread.run(Thread.java:745)

should the default annotation set have a name?

Currently a call to getName() on the default set returns null. Unfortunately it also returns null in any case where an annotation set isn't tied directly to a document (i.e. if you ask an annotation set for a subset). This means from code it is impossible to tell these sets apart. It also causes problems if the name of the default set is stored in a FeatureMap as this causes an issue when saving as JSON.

The obvious solution would be to give the default set a name. As currently a call to getAnnotations(String) passing either null or the empty string returns the default set, then setting the name of the default set to the empty string seems to make the most sense.

Improve save as Inline XML and possibly factor into a plugin. Also update documentation!

The default behaviour of this right now is a bit odd: since the rootElement parameter is optional and empty by default, unless the original document was one with a root XML-like element (e.g. HTML), the document that gets created is non-valid XML.
At the least there should be a tooltip warning about this and proper documentation.
Another possibility would be to add a parameter (add rootElement unless already present) to add a root element always, and use some default element name there if none is entered for the rootElement parameter ("Document" springs to mind).

should FeatureMap allow null keys?

Related to gate-core#18 should we update FeatureMap to prohibit null keys? Personally I can't see any good reason to use null as a key in a FeatureMap, especially if we start returning the empty string as the name of the default set

replacing an annotation can leave annotation set in an inconsistent state

It's possible to end up with two annotations with the same ID in an annotation set and while the GUI will show you them both only the second to be added gets saved.

Under normal circumstances this is unlikely to happen as new annotations are not created with an ID (one is assigned by the annotation set on creation and that should be safe). The problem arises if you add an existing annotation or specify an ID; something that you probably shouldn't do but that the annotation set transfer (AST) does do.

The problem boils down to the fact that when adding the annotation, an existing annotation with the same ID is not removed from the by offset or by type indexes. Usually with the AST this isn't a problem as the same type is used, but if the type is changed then the old annotation ends up being left in the indexes. An example being; copy annotations of type Topic into a temporary set from the default set, do something to them and then copy them back into the default set renaming them to CandidateTarget. In this example, the GUI then shows both Topic and CandidateTarget annotations some of which have the same ID. When you save the document to XML though only the CandidateTarget annotations are persisted as it's the by ID index that's used and the Topic annotations are only present in the by type (used by the GUI) and by offset indexes.

Fixing this probably just involves removing the old annotation from the by type and by ID indexes when the ID is being replaced, but I haven't done the fix yet as I want time to think about a few other edge cases and how best to update the indexes without being horribly inefficient.

Runtime-parameter value field does not re-adjust properly to GUI size (SF #199)

Bug created on SF on 2015-06-06

Still unsolved with GATE 8.4.1 and GATE 8.5 on Ubuntu 17.10 as of 2018-03-10.

Original bug:

Seen this on Linux (Mate desktop), Gate 8.2.

  • show the runtime parameters of the Reset PR: The buttons for editing the list-valued values are shown on the right of the parameter table rows
  • make the gui smaller: the buttons disappear though there is no space used up to their left
  • make the gui larger than it was originally: the buttons get moved to the right too
  • reduce the size of the GUI the the original size: the buttons do not move back to where they originally were and are now stuck even more to the right.

The bottom line is that the buttons never seem to move to the left, or that the value fields never adjust to become smaller, only larger.
I think this is also a problem even when there are no buttons to the right: the value fields increase their size but do not decrease their size, even if empty.

ResourceReference chooser only supports Maven based plugins

Currently (and I can understand why) the new ResourceReference chooser only supports viewing resources inside plugins which are instances of Plugin.Maven. While I think that currently covers all cases, long term we should probably change this as there is nothing stopping over types of plugins offering resources. I guess we should show any plugin that returns true for hasResources although we might need to add a new method that can enumerate the resources without copying them to Plugin to support this.

java.lang.ClassNotFoundException: gate.creole.ontology.BooleanDT

Hi ,
I am using gate-core 8.4.1 jar and i have Data-property to my classes in ontology and i have to read them ,
OInstanceobj.getDatatypePropertyValues(DataProperty);
So it showing me error :
java.lang.ClassNotFoundException: gate.creole.ontology.BooleanDT
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1907)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1750)
at gate.creole.ontology.impl.sesame.SesameManager.toGateDataType(SesameManager.java:829)
at gate.creole.ontology.impl.sesame.SesameManager.convertSesameLiteral2Literal(SesameManager.java:763)
at gate.creole.ontology.impl.sesame.UtilTupleQueryIterator.nextFirst(UtilTupleQueryIterator.java:185)
at gate.creole.ontology.impl.sesame.OntologyServiceImplSesame.getDatatypePropertyValues(OntologyServiceImplSesame.java:2016)
at gate.creole.ontology.impl.OInstanceImpl.getDatatypePropertyValues(OInstanceImpl.java:460)
at output.Output(file:/home/synerzip/GATE_Developer_8.4.1/plugins/ANNIE/japes/output.jape:167)
at gate.jape.RightHandSide.transduce(RightHandSide.java:344)
Note : The same jape is working fine from GateUI but giving error from Gate Embeded

Please Help

Remove the "without Defaults" option for loading ready made ANNIE application

From @johann-petrak on June 25, 2017 14:2

This only is offered for the toolbox-button not the right click resources menu, nor the file menu. It is horrid, non-generic (won't work for other ready made applications) and must be killed! Optionally replaced with some better and generic approach for how to modify init parms when loading a pipeline, maybe something more related to the configuration/parametrisation used in ModularPipelines? These things should eventually get unified somehow anyway.

Copied from original issue: GateNLP/gateplugin-ANNIE#1

Should we nag discreetly about new GATE versions?

It is probably not difficult to make GATE check whether it is at the latest version -- the release process could e.g. include a step that updates a file somewhere on the internet (e.g. another file in the gate-meta repository) which GATE could check at certain points (maybe where it needs to retrieve info from the internet anyway).
GATE could then more or less discreetly suggest to update and also provide a link for getting the newest release.
Not having this version info on gate-meta but on a server controlled by us would maybe allow us to get some very basic usage info (just the number of pings and the version of the pinging GATE, maybe the OS and java version, but no IP or similar).

All folders in the ResourceReference chooser are shown disabled

Turns out the check for a trailing slash as part of the code for disabling icons when showing plugin resources doesn't work. At least on Ubuntu running under Java 9 none of the folder names end in a trailing slash so all folders are shown disabled which is very confusing.

Somehow deal with loading multiple versions of plugins

It is now much easier than before to load multiple versions of a plugin into GATE. This is easy to go unnoticed and even if it happens it is hard to find out which version is actually active and gets used for e.g. creating a resource.

Ultimately, loaded plugins (and thus the workings of the plugin manager) should be local to each application but even there, especially with nested plugins, it would be possible to have different versions of the same plugin loaded. We need to come up with a good plan for how this should ideally be handled and then implement it.
Would it be possible at all to have two versions of a plugin loaded and then choose which version to use for instantiating a resource?

should all names be valid Java identifiers

Currently we allow the name of any annotation set, annotation name, document feature, or annotation feature to be any random object including null (see gate-core#19). This can cause issues when trying to access these values in JAPE (quoting in JAPE can get messy) and also allows no consistent way of referring to a value through a path like object. One possible option that has been discussed (at least between myself @johann-petrak ) is to restrict these names and to only allow String objects containing valid Java identifiers.
It's likely that this would break some existing code, but would allow us to implement some new ideas in a more straightforward fashion, as such this would be a change made in a future major version of GATE, and so this issue is as much for open discussion of the idea as it is to track progress on making the change.

GATE Developer installer does not create .desktop files on Linux

When running the gate-developer-8.5.1-installer.jar the following was output to the terminal:

$ sudo java -jar gate-developer-8.5.1-installer.jar 
[sudo] password for pellegrinoda: 
Command line arguments: 
Cannot find named resource: 'packsLang.xml' AND 'packsLang.xml_eng'
====================
Installation started
Framework: 5.1.3-SNAPSHOT-ab376 (IzPack)
Platform: linux,version=4.17.1-1.el7.elrepo.x86_64,arch=x64,symbolicName=null,javaVersion=1.8.0_171
Installation finished
WARNING: using deprecated Desktop Entry key Encoding with value UTF-8
WARNING: using deprecated Desktop Entry key Encoding with value UTF-8
WARNING: Shortcut 'GATE 8.5.1 User Guide' has URL but type ('Application') is not 'Link'
WARNING: using deprecated Desktop Entry key Encoding with value UTF-8
WARNING: Could not copy  to /root/.local/share/pixmaps (Source '' does not exist)
[ Writing the uninstaller data ... ]

During the install, I had selected the option to install a desktop shortcut for all users. However, post-installation I am unable to find any gate ".desktop" files and there is no shortcut in GNOME. I am assuming the installer was making reference to GNOME .desktop files used for its launchers.

For reference, the Desktop Entry Specification can be found at https://developer.gnome.org/desktop-entry-spec/.

As a work-around I placed the following into a ~/.local/share/applications/gate.desktop file:

[Desktop Entry]
Name=GATE Developer
Exec=/opt/GATE_Developer_8.5.1/bin/gate.sh
Icon=/opt/GATE_Developer_8.5.1/icons/gate-icon.png
Type=Application
Terminal=false

In addition, the gate-icon.png has "8.1" drawn in the lower-right corner, which does not match the version of "8.5.1."

GATE is unusable under Java 9

GATE relies on XStream for reading and writing application files (i.e. xgapp files). Unfortunately XStream doesn't currently support Java 9 which means we can't load/save applications, which makes it pretty much impossible to use GATE under Java 9. See x-stream/xstream#74 for more details.

icon\r file is treated as a document/file on Mac

From @domrout on June 28, 2017 8:53

This file is created automatically on Mac for folders that have a custom icon in finder, without the user knowing about it. For datastore creation, this triggers a folder not empty error. For corpus creation this triggers a file with no data in it. Suggest finding a way to ignore it.

Copied from original issue: GateNLP/gate-top#2

Indicate document name in exception output

When an exception occurs in a PR, it would be useful if the error output in the Messages pane indicated the document being processed.

This could be done by catching ExecutionException in SerialAnalyserController and ConditionalSerialAnalyserController and printing the document there. Then either re-throw the exception so that it gets caught somewhere higher up which currently leads to termination of the corpus processing, or just print the stack trace and continue processing of the corpus.
In some cases, the latter might be preferable: it is frustrating to heave a corpus left just nearly fully processed because one document did not work with a PR (e.g. not finding a sentence annotation might throw an exception in minipar plugin). On the other hand, some serious condition might then cause hundreds or thousands of exceptions to get logged. A compromise might be to accept only a maximum of a dozen such errors before terminating processing the corpus.

Plugin manager window does not need tabs any more

The Plugin Manager window still shows the tab "Installed Plugins": this is both misleading (since the list now shows the available standard plugins) and unnecessary, since there are no other tabs any more.

Documents are not processing when document has the inner tables or lots of sapce

Hi, I am using Gate Developer. Here, I have documents or consider it as a resumes in which there are inner tables in that resume. And some are with a huge space between texts so, when I try to process this type of resumes then I am not able to process this documents. Sometimes, because of this documents it throes out of memory exception as well. I do have some JAPE rules. Is it because of the Rules ? Thanks

Move handling of -tmp parameters from script to Main

Currently the parameter -tmp is essentially a work-around for the quirky way of how config and session files are handled and also for how the default directory shown in the file picker depends (or not) on the current directory when GATE is started (always starting from the user home is annoying).

This should get handled properly from directly inside GATE, but for that the way of how config and session file settings depend on each other needs to get untangled.

Importing/Exporting of Corpus as a Zip File

On many occasions corpora are distributed as a zip file. It would be nice if we could read these directly, and if we could export a corpus directly as a zip file. This should be fairly easy with a CorpusExporter and another populate style tool.

Make "Save As...." Interruptible

When saving a corpus (i.e. saving all the documents in the corpus as XML) in the GUI, a dialog window is shown that has a disabled "Stop" button. For very large corpora, saving can take a very long time and should be interruptible.
This should be easy to add for DocumentExporter instances as each document is handled independently via the GUI. Might be harder if the exporter is actually a CorpusExporter as the whole corpus is handed off to the exporter in one go.

Initialization/Duplication of some LanguageAnalysers is not thread-safe

We use GATE extensively to do NLP document processing. Our software instantiates the processing pipeline from the file system and then duplicates it to use it in multithreading environment.

Some time ago we switched the duplication calls from the sequential to parallel (using ExecutorService). And we started to experience some infrequent random errors.
After some investigation we found that some GATE internal and external resources are using unsynchronized primitive ints as internal counters (e.g. indexes of FSM states).

This results in eventual errors in concurrent initialization or duplication.
The incomplete list of affected classes:
gate/creole/tokeniser/SimpleTokeniser.java: public static int maxTypeId;
gate/creole/tokeniser/FSMState.java: static int index;
gate/creole/tokeniser/DFSMState.java: static int index;
gate/creole/gazetteer/FSMState.java: private static int index;
gate/jape/SinglePhaseTransducer.java.caching: public static int avesize = 0;
gate/jape/SinglePhaseTransducer.java.caching: public static int maxsize = 0;
gate/jape/SinglePhaseTransducer.java.caching: public static int avecount = 0;
gate/fsm/State.java: protected static int index = 0;
gate/fsm/Transition.java: private static int index = 0;

DoubleDT class issues with gate core jar 8.4.1

Hi, I am using gate 8.4.1. I am using maven dependency to download gate and it downloads gate-core-8.4.1.jar and a couple of other jars. But these do not have Data type classes including DoubleDT. I see from your installation folder that these classes are there in gate.jar and not gate-core.jar. But there is no info on how to get gate.jar using maven dependency. Could you please help me on this?

Starting GATE after setting stored config to LAF GTK+ fails

When setting the LaF to GTK+ in GATE, then exiting GATE, then startimg again, the following exception happens and the GATE GUI never appears:
`
Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException
at gate.swing.ResourceReferenceChooser$Renderer.updateUI(ResourceReferenceChooser.java:711)
at javax.swing.JLabel.(JLabel.java:164)
at javax.swing.JLabel.(JLabel.java:235)
at javax.swing.tree.DefaultTreeCellRenderer.(DefaultTreeCellRenderer.java:169)
at gate.swing.ResourceReferenceChooser$Renderer.(ResourceReferenceChooser.java:680)
at gate.swing.ResourceReferenceChooser.(ResourceReferenceChooser.java:173)
at gate.gui.MainFrame.(MainFrame.java:538)
at gate.gui.MainFrame.getInstance(MainFrame.java:366)
at gate.Main$2.run(Main.java:186)
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:311)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
at java.awt.EventQueue.access$500(EventQueue.java:97)
at java.awt.EventQueue$3.run(EventQueue.java:709)
at java.awt.EventQueue$3.run(EventQueue.java:703)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:205)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)

`

Document exporter UI default file name logic flawed

The logic used in DocumentExportMenu.getSelectedFile to come up with a suggested file name for the export is flawed in a number of ways, in particular it is badly broken on Windows and mildly broken on other platforms in cases where a directory name contains spaces or other non-alphanumeric characters.

  • Firstly, it does .getPath() on a URI or URL and then later assumes it can safely construct a File from such a path - this usually works on Mac/Linux but is guaranteed to break on Windows
  • Second, it replaces all characters other than [/a-zA-Z0-9._-] anywhere in the path with underscore - this includes the colon of a drive letter on Windows, and any spaces or other non-alnums in any of the parent directory names

The upshot of this is that if you have a document with a sourceUrl like file:/C:/Users/ian/My%20Documents/interesting%20file.txt it defaults to trying to save it as C:\C_\Users\ian\My_Documents\interesting_file.xml (which is definitely not going to be valid).

Even on Mac where there's only one filesystem root and forward slash is the separator, it's still common to use spaces in file and directory names so if you were to load a file from /Users/ian/GATE Documents/doc 1.txt the "save as inline XML" would offer /Users/ian/GATE_Documents/doc_1.xml as the default target - less bad but still broken.

I would suggest (a) using proper URI manipulation and converting back to File using the new File(URI) constructor, to properly handle Windows conventions, and (b) only applying the "cleanup" rules on the actual file name (the last component) and not on the ancestor directories.

Error handling when loading a maven plugin with invalid creole.xml

When a mavenised plugin is created with a creole.xml file that contains a
<JAR scan="yes">I.do.not.exist.jar</JAR> line, then the plugin appears to load fine but then
does not work (no resources are shown). The messages on the message tab are all in black and the last message is

"CREOLE plugin loaded: ..." 

but among the other messages also appears

 "java.io.FileNotFoundException: JAR entry I.do.not.exist.jar not found in .... Plugin not available!" 

This is not shown in red and does not throw an exception, apparently.

Any error when loading the creole.xml file should show a user friendly message and throw and exception. Since users may make all kinds of small mistakes when setting this up for the new mavenized build of plugins, we have to expect everything that can get wrong to occur at some point.

Can an undo button be implemented

Hamburg have pointed out that their annotators regularly make mistakes and have no way to undo changes to the text. Would it be possible to implement an undo button?

Allow the upgrade tool, when running from a file, to specify non-existing coordinates

Sometimes it would be very useful to upgrade a bunch of pipelines to correspond to a version
that is yet to be released on Maven, for this purpose it would be useful if one could specify a
version that does not exist in the upgrade tsv file.
The tool may print a warning in such a case to avoid doing this by accident, or may require some additional parameter or column in the tsv to make this request explicit.

Error messages should go to stderr

It appears that be default log4j seems to send all messages to stdout even if you log them as error messages via log.error("message"). While this means they still end up on the message pane in GATE they don't end up in red and so are easy to miss.

GUI becomes unresponsive after populating corpus in datastore

If you populate a corpus that's in a datastore the GUI goes mad as it adds and removed documents quickly. While this looks odd it's not a huge problem and shows you something is happening. Unfortunately once the populate has finished the resource tree is often left in a mess and quickly becomes unresponsive. AN indication of this is a document icon in the tree with no name. Clicking in the tree then results in NPE's. You can also sometimes get an array out of bounds exception during the population (I assume where we have overlapping adds and removes happening via the listeners).

As an example load the Format Twitter plugin and then populate from the attached tweets, you'll probably end up with a completely borked GUI.

trump-tweets-2018-06-12-000.gz

Busy indicator in the GUI when fetching JARs for loading a pipeline

When a pipeline is loaded that requires to fetch a number of plugins which are not in the cache yet, there is no indication in the GUI what is happening. Even the busy icon seems to appear only at a later stage.

It may be less confusing for users to get some message ("Downloading required plugin xx", "Downloading required plugins..") in the bottom message line, or some other indicator for this and maybe also make the usual busy indicator (the turning cogs) appear right from the start?

URL for a directory may not get persisted correctly in an xgapp file.

Had a case where a URL that refers to a directory, set from the filepicker for a runtime parameter of a PR, did not have the trailing slash it should have. So when the URL gets read and set for the runtime parameter it does not have the slash and therefore does not indicate a directory properly.
Need to investigate how that can happen and if the PersistenceManager is could be swallowing the slash when persisting the URL.

Make it easy and obvious to bring up the documentation for a plugin in the plugin manager

The plugin manager has access to the the pom <url></url> field for the project and it would be good if there would be a button or some other obvious way to display that page which should probably be the main documentation page for that plugin or a page that contains a link to the documentation page.

A simple way to achieve this may be to chow the actual URL as a clickable link between the description and the list of PRs.

If at all possible, it would also be cool to allow navigating to the help pages defined for the resources shown in the list, e.g. from a right-click menu or by double clicking.
If the information from the resource annotations is fully available to the resource manager from the meta-info pom, then it would probably also be nice to immediately show the description of the resource in the GUI when a resource is clicked in the list together with a hint about how to show the full help web page.

move to using SVG for all icons

With the increase in HDPI monitors we probably need to think about replacing all the legacy icon with versions generated from SVG files. This probably requires us to package up the svg2java ANT task as a Maven plugin to make it easy for people to do this in plugins as well as in core.

DocumentStaxUtils ignores CDATA within TextWithNodes

As reported on the gate-users mailing list, the GATE XML format parser silently discards any CDATA sections that fall within the TextWithNodes part of a GATE XML document.

When saving as GATE XML the serialiser uses CDATA to represent any segment of text within the TextWithNodes or any feature name or value that contains more than a few less-than signs, as this is more compact and human-readable than escaping each one individually as &lt;. The parser handles CDATA correctly when reading feature names and values but not the TextWithNodes.

This problem is not generally apparent as the serialiser only uses CDATA when there are lots of less-thans within a single span of text - if you were to save an ANNIE-processed document with a big run of <<<<<<<<<<, each symbol would be a separate Token and thus there would be empty Node elements between each pair and no single run would have more than one <. However if the same document were saved as GATE XML with minimal annotations (e.g. just human-annotated entities and no Tokens) then it would hit this bug.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.