Giter Site home page Giter Site logo

perseids_docs's Introduction

Overview

Perseids is an online platform for collaborative editing, annotation and publication of digital texts and annotations. Perseids is not one single application but an integrated environment built from a loose coupling of heterogeneous open source tools and services from a variety of sources using RESTful APIs, and supporting various standard formats for encoding of text and annotations. Supported standards include, but aren’t limited to, the Text Encoding Initiative (TEI) and Open Annotation (OA). The platform also supports cross-project collaboration, where content might come from a separate project, be edited through Perseids and then returned to its source repository.

The main user entrypoint to the platform is at https://sosol.perseids.org/sosol.

Administrative entrypoints are described at admin.md

UI Design Guidelines are at UIDesignGuidelines.pdf

Integration details can be found in the integrations folder

API info can be found in api.md

Puppet Repository

The private puppet repository contains all production and development deployment environment manifests. A public version of this repository, without deployment secrets and encrypted files, can be found at https://github.com/perseids-project/puppet-public.

Related Resources and Publications

Future Directions ideas

Presentations, Publications, Classroom Resources and Blog Posts can be found on The Perseids Blog.

Almas, B., (2017). Perseids: Experimenting with Infrastructure for Creating and Sharing Research Data in the Digital Humanities. Data Science Journal. 16, p.19. DOI: http://doi.org/10.5334/dsj-2017-019

Mailing Lists

[email protected]

Funders

This project has received support from the Andrew W. Mellon Foundation, Tufts University, the National Endowment for the Humanities [grant HD-51548-12] and the Institute of Museum and Library Services.

perseids_docs's People

Contributors

balmas avatar caesarfeta avatar lcerrato avatar lfdm avatar marie-clairebeaulieu avatar ponteineptique avatar srdee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

perseids_docs's Issues

define requirements/guidelines for agent identifiers and versioning

Need to decide on the requirements/guidelines for identifiers for software agents participating in the infrastructure.

We want to be able to record provenance information for any tool or service participating in a workflow on the platform.

Modeling this with the Persieds-Pelagios integration in OA annotations, Pelagios reports:

oa:serializedBy http://pelagios.org/recogito#version1.0.

But ideally should be
http://pelagios.org/recogito#version1.0 a prov:SoftwareAgent.
http://annotationuri oa:serializedBy http://pelagios.org/recogito#version1.0.

(this is what I'm transforming it to)

I like the approach Rainer took to the pelagios agent URI, using the # to identify the version.

Need also to decide preferred mechanisms for a service/tool to identify itself... i.e. in output, via version query, ...

Design annotation template creation workflow

thinking about the workflow for preparing a text for annotation, where does responsibility like for :

  • handling hyphenated text (e.g. See in in Senece De Brevitate Vitae - lots of words hyphenated on new line)
  • parsing of things like note/speaker tags

SoSOL treebank support - integrate services for template creation (Latin)

basic workflow for creation of annotation template:

Inputs:

  1. CTS URN of passage for annotation
  2. name of CTS inventory
  3. name of annotation format (optional - defaults to aldt)
  4. create CTS GetPassage proxy url (to retrieve passage)
  5. execute segment/tokenize request to LLT service, supplying GetPassage proxy url
  6. XSLT application to transform tokenized output to treebank file template for requested format

Limitations for initial implementation (but targeted for future work):

  1. Texts available for templates must be available in one of the CTS inventories pre-configured for use by SoSOL (and loaded in the Alpheios CTS Service on perseids.org - won't retrieve directly from Perseus)
  2. annotation formats must be pre-configured and loaded in Perseids Alpheios editor environment
  3. Only available for short passages as syncrhonous requests for the moment. Asynchronous implementation using BSP Cache and Notification services will be implemented later.
  4. No morphology pre-populated yet - will be done via integration with Morphology Service along with asynchronous service support

Ability to take advantage of linked data in annotation interface

We would like the ability to take advantage of linked data in the annotation interface(s) to show things such as:

short definition
translation
morphemes
grammatical references

etc.

Note that this includes linking in other annotations as well as reference data.

Perseids Review Workflow: expose links to previously reviewed publications

Request from Giuseppe:
Once in Perseids I reject an annotation, is that correct that I cannot visualize the text anymore (not only the sentences annotated and rejected but the entire text also)? If so, it would be great to add a functionality in the future that allows me to see the sentences, because when I correct them with the student, it is helpful to have access to the text. Thanks.

Note: everyone's publications are in fact accessible read-only at any time but we have hidden the links to them in the interface (normally in the news feed) due to privacy concerns with comments on "rejected" student work.

Figure out what to do about namespaces and treebank and alignment documents

The TreebankCiteIdentifier code validates against the Perseus treebank schema but doesn't require that the documents be namespaced and in fact the api would currently break if the namespace was used.

For the AlignmentCiteIdentifiers, the opposite is true, i.e. the api would break if the namespace is NOT used.

Need to get consistent about this -- I think the namespace should be enforced. But note also that I'm validating the treebank documents against a provisional 1.6 version of the treebank schema which is currently located on the nlp.perseus.tufts.edu server. Really this should be somewhere else, and we need to make the 1.6 version of the schema official and update the version referenced in the treebank documents, etc.

Essentially a mess that needs be cleaned up.

SoSOL add tests on ownership guards

Need to add a bunch of unit tests for the controllers to be sure everything is protected property by ownership guards

be sure to include the dmm_api

Improve support for nested annotation tag sets

The Alpheios Treebank Editor current supports a certain degree of nesting of tag sets, but many improvements are needed to make this truly workable.

This item still needs to be fleshed out with details from various email discussions and documents.

SoSOL - merge OACIdentifier CTSOACIdentifier?

I'm not sure the original intent of the OACIdentifier class being a base class for different types of OAC annotations has really held up -- we now support a mixture of CTS and non-CTS bodies and the logic is a little mixed up between the base and derived class. this needs to be straightened out.

Alpheios editors: XML Schemas

The alignment editor and the treebank editor use different naming schemas for words (<word id="1"> against <w id="1-1">) Can we unify that for easier referencing against each other?

SoSOL OAC annotation support - retain target/body text markup for display

Stripping the markup from the text selection interface makes it hard to use, especially for large passages like we will have for Bodin (chapter size).

Want to replace the use of the extract_text.xsl application to the GetPassage output to use tokenization services. Should implement this only if we have a tokenization service for the language of the text in question, and fall back to current behavior if not.

Add support for cross-sentence linking

The ability to tag so as to link sentences back and forth with one another will allow for analyses of rhetorical figures (ellipsis and anaphora being the first), and eventually to stylistic and literary analysis.

Related to #41 and #40

OA Annotation Interface Enhancements

Want to use the OA annotation interface for the Bodin project to support creation of annotations that link a section of text in the source text to a section of text in a translation. To support a decent workflow for this, we probably need to add the following enhancements to the interface:

*multiple annotation bodies (see sosol/sosol#18)

  • the ability to save as a new annotation (to support a workflow that doesn't require the user to reload and retokenize the passage text)
  • use of texts from the target publication as the body of the annotation
    • for this we need to use a URI to the current publication as the base uri for the annotation and then update it upon finalization.
  • we should probably also use the base URI for the current publication as the base uri for the target too and update it upon finalization.
  • editing a text passage should invalidate or at least warn user to check annotations targeting that passage
  • add edit bar to top of display so that users can switch back to editing XML
  • fix the @ vs # in the CTS URns
  • make sure highlighting works when editing an existing annotation

Ability to apply frequency data to morphology service output

We would like the ability to apply frequency data to disambiguate morphology service output. One use case for this is in the scenario where we populate treebanking templates, but it could equally apply to runtime use of morphological output in annotation interfaces, etc.

Ability to apply filter criteria when reviewing annotations of various types

From the Fall 2013 Treebanking workshop: we would like an intelligent versioning/review process whereby the changes can be filtered by the type of change
e.g. the process would identify places where we changed tokens (and / or word ids) while ignoring changes to things like pos and head, and vice-versa. This will help not only in reviewing but also in ultimate re-syncing of the annotations with the underlying source text

Eliminate text length limits on alignment entry form

The Alpheios Alignment editor sentence entry form currently limits the amount of text that can be aligned to this # of characters allowed in a URI (because it issues a GET to the back-end to create the annotation).

This limitation should not be carried over to the Perseids environment and should be fixed in the Alpheios standalone tool

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.