perseids-project / perseids_docs Goto Github PK

Project Documentation For Perseids

License: GNU General Public License v3.0

HTML 100.00%

perseids_docs's Introduction

Overview

Perseids is an online platform for collaborative editing, annotation and publication of digital texts and annotations. Perseids is not one single application but an integrated environment built from a loose coupling of heterogeneous open source tools and services from a variety of sources using RESTful APIs, and supporting various standard formats for encoding of text and annotations. Supported standards include, but aren’t limited to, the Text Encoding Initiative (TEI) and Open Annotation (OA). The platform also supports cross-project collaboration, where content might come from a separate project, be edited through Perseids and then returned to its source repository.

The main user entrypoint to the platform is at https://sosol.perseids.org/sosol.

Administrative entrypoints are described at admin.md

UI Design Guidelines are at UIDesignGuidelines.pdf

Integration details can be found in the integrations folder

API info can be found in api.md

Puppet Repository

The private puppet repository contains all production and development deployment environment manifests. A public version of this repository, without deployment secrets and encrypted files, can be found at https://github.com/perseids-project/puppet-public.

Related Resources and Publications

Future Directions ideas

Presentations, Publications, Classroom Resources and Blog Posts can be found on The Perseids Blog.

Almas, B., (2017). Perseids: Experimenting with Infrastructure for Creating and Sharing Research Data in the Digital Humanities. Data Science Journal. 16, p.19. DOI: http://doi.org/10.5334/dsj-2017-019

Mailing Lists

[email protected]

Funders

This project has received support from the Andrew W. Mellon Foundation, Tufts University, the National Endowment for the Humanities [grant HD-51548-12] and the Institute of Museum and Library Services.

perseids_docs's People

Contributors

Stargazers

Watchers

Forkers

marie-clairebeaulieu lfdm amir-zeldes samboshi

perseids_docs's Issues

Move running code elsewhere (+ fix for scala warning)

@srdee, we might consider moving your script to an own repo and not leave it inside this documentation repo.

About the warning you receive when you run it: The scala interpreter is just picky about the definition of main. If you change https://github.com/PerseusDL/perseids_docs/blob/master/alphAlignDir2perseidsAlignFile/alphDir2persFile.scala#L11 to def main(args:Array[String]) { (i.e. get rid of the equal sign) it will run just fine.

Support use of gold-standard during review of submitted annotation

Need the ability to automatically compare submitted annotations against one or more selected gold standards

work on the logic of the syntactic menu in Alpheios so that it is more context-sensitive

support runtime validation of an annotation against one or more gold-standards

For all annotation editors, would be nice to have an option be able to validate user input against one or more gold-standards at runtime

define requirements/guidelines for agent identifiers and versioning

Need to decide on the requirements/guidelines for identifiers for software agents participating in the infrastructure.

We want to be able to record provenance information for any tool or service participating in a workflow on the platform.

Modeling this with the Persieds-Pelagios integration in OA annotations, Pelagios reports:

oa:serializedBy http://pelagios.org/recogito#version1.0.

But ideally should be
http://pelagios.org/recogito#version1.0 a prov:SoftwareAgent.
http://annotationuri oa:serializedBy http://pelagios.org/recogito#version1.0.

(this is what I'm transforming it to)

I like the approach Rainer took to the pelagios agent URI, using the # to identify the version.

Need also to decide preferred mechanisms for a service/tool to identify itself... i.e. in output, via version query, ...

Design annotation template creation workflow

thinking about the workflow for preparing a text for annotation, where does responsibility like for :

handling hyphenated text (e.g. See in in Senece De Brevitate Vitae - lots of words hyphenated on new line)
parsing of things like note/speaker tags

add support for renumbering sentences and words in alignment files

need to implement preprocess stylesheet to automatically correct numbering of sentences and words in alignment files for parity with treebank functionality.

make additional sources available as annotation bodies (via the annotsrc inventory) in Perseids

Need to load high priority sources for:

Athenaeus
Bodin

(Waiting for lists from Monica and Yannis)

Load Bodin English, Latin and French texts into Perseids

Bodin XML files need to be loaded into the Perseids canonical git repository and added to the P5 inventory. (files are currently being managed in https://github.com/TuftsUniversity/perseids_bodin)

Ability to support multiple layers and types of annotation at one time

From the fall 2013 treebanking workshop:

We would like the ability to do support complementary layers of annotation at one time. For example, while doing morpho-syntactic annotation, have ability to annotate named entities, semantic meaning, etc.

Add ability to add Shibboleth login to an OpenId authenticated account

We would like to enable groups of students to work in a shared account but login with their own credentials. We can't get a group UTLN for this but could maybe create a gmail account and then allow the students to attach their Shib login to it.

Better/more options for handling ellipses

We need better ways to deal with Ellipses while treebanking:

there may be an Alpheios Editor bug? https://elist.tufts.edu/wws/arc/perseids-dev/2014-01/msg00007.html
we want the ability ability to supply the elided form while annotating
we want the ability ability to use a uri to point at the implied reference

SoSOL treebank support - integrate services for template creation (Latin)

basic workflow for creation of annotation template:

Inputs:

CTS URN of passage for annotation
name of CTS inventory
name of annotation format (optional - defaults to aldt)
create CTS GetPassage proxy url (to retrieve passage)
execute segment/tokenize request to LLT service, supplying GetPassage proxy url
XSLT application to transform tokenized output to treebank file template for requested format

Limitations for initial implementation (but targeted for future work):

Texts available for templates must be available in one of the CTS inventories pre-configured for use by SoSOL (and loaded in the Alpheios CTS Service on perseids.org - won't retrieve directly from Perseus)
annotation formats must be pre-configured and loaded in Perseids Alpheios editor environment
Only available for short passages as syncrhonous requests for the moment. Asynchronous implementation using BSP Cache and Notification services will be implemented later.
No morphology pre-populated yet - will be done via integration with Morphology Service along with asynchronous service support

Ability to take advantage of linked data in annotation interface

We would like the ability to take advantage of linked data in the annotation interface(s) to show things such as:

short definition
translation
morphemes
grammatical references

etc.

Note that this includes linking in other annotations as well as reference data.

Perseids Review Workflow: expose links to previously reviewed publications

Request from Giuseppe:
Once in Perseids I reject an annotation, is that correct that I cannot visualize the text anymore (not only the sentences annotated and rejected but the entire text also)? If so, it would be great to add a functionality in the future that allows me to see the sentences, because when I correct them with the student, it is helpful to have access to the text. Thanks.

Note: everyone's publications are in fact accessible read-only at any time but we have hidden the links to them in the interface (normally in the news feed) due to privacy concerns with comments on "rejected" student work.

Dynamically select annotation targets

We would like the services and interface to support automatic selection of annotation targets based upon various criteria

Load Seneca De Brevitate Vitae treebanking templates in to Perseids

support for character-based alignment (e.g. align only endings)

Request from @srdee and and Maryam - add the ability to support character-based alignment (for example, to align only word endings)

treebank validation - report error if words reference themselves as a head

verify that no words reference themselves as the head - messes up the visualization.

In Alpheios emptey nodes should not be annotated in the value of an attribute but as a separate element (i.e., <empty>) which can also be annotated

Figure out what to do about namespaces and treebank and alignment documents

The TreebankCiteIdentifier code validates against the Perseus treebank schema but doesn't require that the documents be namespaced and in fact the api would currently break if the namespace was used.

For the AlignmentCiteIdentifiers, the opposite is true, i.e. the api would break if the namespace is NOT used.

Need to get consistent about this -- I think the namespace should be enforced. But note also that I'm validating the treebank documents against a provisional 1.6 version of the treebank schema which is currently located on the nlp.perseus.tufts.edu server. Really this should be somewhere else, and we need to make the 1.6 version of the schema official and update the version referenced in the treebank documents, etc.

Essentially a mess that needs be cleaned up.

SoSOL add tests on ownership guards

Need to add a bunch of unit tests for the controllers to be sure everything is protected property by ownership guards

be sure to include the dmm_api

Support rtl text in Alignment Editor

The support for rtl text is currently broken in the Alignment Editor

Add ability to annotate and correct anaphora

Feedback from Fall 2013 treebanking workshop: we would like the ability to annotate and correct anaphora while treebanking

Cleanup template creation process

Need to look at a few issues with the epidoc template creation process:

CTS URNs aren't being embedded in translations
System events recorded under http://papyri.info/editor
translation div wrapped in edition

Improve support for nested annotation tag sets

The Alpheios Treebank Editor current supports a certain degree of nesting of tag sets, but many improvements are needed to make this truly workable.

This item still needs to be fleshed out with details from various email discussions and documents.

SoSOL - merge OACIdentifier CTSOACIdentifier?

I'm not sure the original intent of the OACIdentifier class being a base class for different types of OAC annotations has really held up -- we now support a mixture of CTS and non-CTS bodies and the logic is a little mixed up between the base and derived class. this needs to be straightened out.

Alpheios editors: XML Schemas

The alignment editor and the treebank editor use different naming schemas for words (<word id="1"> against <w id="1-1">) Can we unify that for easier referencing against each other?

SoSOL OAC annotation support - retain target/body text markup for display

Stripping the markup from the text selection interface makes it hard to use, especially for large passages like we will have for Bodin (chapter size).

Want to replace the use of the extract_text.xsl application to the GetPassage output to use tokenization services. Should implement this only if we have a tokenization service for the language of the text in question, and fall back to current behavior if not.

alpheios treebank editor save notice stays up too long

should go away when you navigate to a new sentence

bodin text branch name causes git errors

not exactly sure where the source of the problem is but the filename branch for the Bodin text causes an invalid ref error for git.

SoSOL Treebanking interface - ui elements for max sentences and jumping to a specific sentence

Eventually we want to support full searching and browsing of the treebank files but for the short term at least:

make the max # of sentences displayed at one time settable in the UI
add a UI element for jumping to a specific #d sentence
max the size returned for the api info request to be the # of sentences in the file and not a hard-coded #

enable Athenaeus texts for transcription/annotation in Perseids

Need to convert the Athenaeus texts to EpiDoc and put them in the P5 inventory.

SoSOL treebank support - display validation messages in Alpheios Editor

Currently if the treebank file fails to validate upon a save request to the dmm_api the validation error isn't forwarded to the UI tool (or if it is, it isn't being captured and displayed). Validation errors should be shown to the user in the editing interface.

add the possibility to treebank via Alpheios a manually modified XML file without loosing that annotation

integrate parser functionationality at runtime while annotating morphosyntax

Would like the ability to automatically validate/invalidate (based upon logic) a user's selection when annotating morphosyntax

Alpheios should show the reference of sencentes

Add support for cross-sentence linking

The ability to tag so as to link sentences back and forth with one another will allow for analyses of rhetorical figures (ellipsis and anaphora being the first), and eventually to stylistic and literary analysis.

Related to #41 and #40

Create view of FLC Paleography collection images for Medieval Latin students

See http://sites.tufts.edu/perseids/projects/epifacs/medieval-latin-january-2013/

Shibboleth - Implement support for encrypted Attributes

implement support for encrypted attributes in AuthResponse

Leipzig doesn't support the unencryped AttributeQuery profile

Load Athenaeus Templates from treebanking into Perseids

Load Athenaeus Templates from treebanking into Perseids -- start with Book 10

OA Annotation Interface Enhancements

Want to use the OA annotation interface for the Bodin project to support creation of annotations that link a section of text in the source text to a section of text in a translation. To support a decent workflow for this, we probably need to add the following enhancements to the interface:

*multiple annotation bodies (see sosol/sosol#18)

the ability to save as a new annotation (to support a workflow that doesn't require the user to reload and retokenize the passage text)
use of texts from the target publication as the body of the annotation
- for this we need to use a URI to the current publication as the base uri for the annotation and then update it upon finalization.
we should probably also use the base URI for the current publication as the base uri for the target too and update it upon finalization.
editing a text passage should invalidate or at least warn user to check annotations targeting that passage
add edit bar to top of display so that users can switch back to editing XML
fix the @ vs # in the CTS URns
make sure highlighting works when editing an existing annotation

Ability to annotate morphosyntax using multiple tag sets at once

Feedback from Fall 2013 treebanking workshop: would be nice to have the ability to annotate treebanks using multiple tag sets at once.

Ability to apply frequency data to morphology service output

We would like the ability to apply frequency data to disambiguate morphology service output. One use case for this is in the scenario where we populate treebanking templates, but it could equally apply to runtime use of morphological output in annotation interfaces, etc.

add unit tests for ownership on individual annotations

Add some tests to ensure that creator uri protections on individual annotations in the OACIdentifier class are still functional

Ability to apply filter criteria when reviewing annotations of various types

From the Fall 2013 Treebanking workshop: we would like an intelligent versioning/review process whereby the changes can be filtered by the type of change
e.g. the process would identify places where we changed tokens (and / or word ids) while ignoring changes to things like pos and head, and vice-versa. This will help not only in reviewing but also in ultimate re-syncing of the annotations with the underlying source text

add vocabulary to SoSOL OAC annotation interface

Use SAWS ontology terms for text reuse/alignment.

Pending receipt of list from Monica Berti.

will be used by Digital Athenaeus and Bodin projects.

SoSOL - enable passage selection interface for CTS EpiDoc files - text and translations

SoSOL - enable passage selection interface for CTS EpiDoc files - text and translations.

Need to make this contingent on ability to retrieve valid citations from the text being edited.

Eliminate text length limits on alignment entry form

The Alpheios Alignment editor sentence entry form currently limits the amount of text that can be aligned to this # of characters allowed in a URI (because it issues a GET to the back-end to create the annotation).

This limitation should not be carried over to the Perseids environment and should be fixed in the Alpheios standalone tool

when creating a new treebank annotation from a template, check on existing publication is broken

create_from_linked_urn for treebankcitationidentifier isn't matching on existing annotation for the athenaeus 10 files because they don't use the cts urn in the document_id. This should be something that the system checks and corrects when first creating a publication for a treebank file.