Giter Site home page Giter Site logo

virtuos / edusharing-opencast-importer Goto Github PK

View Code? Open in Web Editor NEW
2.0 5.0 2.0 640 KB

This importer harvests episodes (lecture recordings) from Opencast instances and push it as external references to an Edusharing instance.

License: GNU General Public License v3.0

JavaScript 100.00%
opencast edu-sharing open-educational-resources oer hacktoberfest twillo

edusharing-opencast-importer's People

Contributors

ffeyen avatar jduehring avatar mirjan-hoffmann avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

edusharing-opencast-importer's Issues

Incomplete import for some records

In twillo there are some records that were not imported completely. There is missing information like the title and also the link to the material itself is not working for one of them.

The problem occurs on both twillo Test and twillo Prod.

If this is a problem with the data that cannot be fixed in the importer, we should be able to skip these records at least.

Add subjects

Subjects are already available in the episode data, but there are records where there is obviously a description in the subject. Nevertheless, quite a few still use the field for subjects, so it could be used in edu-sharing.

A mapping to the edu-sharing vocabulary is necessary. In twillo and ZOERR https://w3id.org/kim/hochschulfaechersystematik/scheme is used in field ccm:taxonid

Publish as npm package

Easier to install and use

Notes:

  • How to run the app periodically? (add service file for systemd systems?)
  • Where to put the config files?
  • Add github actions workflow to publish on master branch merge

Update only changed episodes in Opencast (since the last harvest)

→ Reduce traffic / load

  • Or include a deleted=true option in /search/episode.json
  • New Opencast API endpoint to get retracted (deleted) episodes /search/getDeletedRecords
  • Extend /search/episode.json to be able to request events published since a specific date

add episode blacklisting function

If an episode has not wanted content, this episodes should not by imported.
Approach/Idea:

  1. add Array in ocInstance objects in /config/config.oc-instances.js
  2. add blacklist-filter in /src/services/filter.js
  {
    orgName: 'Opencast',
    orgUrl: 'https://opencast.org',
    protocol: 'https',
    domain: 'develop.opencast.org',
    blacklistedIds: ['id1', 'id2', 'id3']
  }

Metadata update fails for records with special char in title

... for titles like

  • heinrich-ii.
  • otto-ii.-und-otto-iii.
  • otto-iii.

=> there is an error message in edu-sharing:

2022-12-14 10:33:44,039 ERROR [restservices.shared.ErrorResponse.handleLog] org.alfresco.repo.node.integrity.IntegrityException: 111433651 Found 2 integrity violations:
Invalid property value: 
   Node: workspace://SpacesStore/67fef633-4669-40cd-ad29-73716e45ae31
   Name: df38bb84-1f2a-4335-a29d-28f02c88ad78-otto-iii.
   Type: {http://www.campuscontent.de/model/1.0}io
   Property: {http://www.alfresco.org/model/content/1.0}name
   Constraint: 111433649 Value 'df38bb84-1f2a-4335-a29d-28f02c88ad78-otto-iii.' is not valid as a file name. This property must be a valid file name.

I expect it is because of the . at the end

Add language

Set iso-639-1 code in "cclom:general_language". (language is already available in episode data)

Add institution

Currently the opencast-instance has a config for the source organization. This should also be used when updating the edu-sharing data. But for twillo edu-sharing we need a more specific identifier then just a string - the ror (for example 04qmmjx98 for Osnabrück University (https://ror.org/04qmmjx98))

So, add ror to oc-instance-config and set ror for edu-sharing (field ccm:university)

"is reachable" check requires access to path /search

If an instance is not reachable, the import will be skipped. The check uses the url <scheme>://<host>/search/ for this. This works for instances like UOS-oc that have this url open to the public. But it does not work for instances that restrict access to this url, like TIHO-oc (you get a 405-response here and the check fails).

Maybe better just use <scheme>://<host>/ for the check?!

Add collection description to record

Often the description of a record is very very sparse and you cannot see what this record is about. It would be helpful to have the description of the collection available directly at record-level. This way records could be found more easily and users can better understand what a record is about.

I would prefer a configurable solution. Maybe (if activated) just add the description of the series at the end of the episode-description

vCards in edu-sharing not displayed correctly

Some vCards are not displayed correctly in edu-sharing. The name in edu-sharing is not displayed.

Seems to be a problem, because the vCard that is set by the opencast-importer has explicit values like undefined for non-existent field. See for example

      "ccm:metadatacontributer_creator": [
        "BEGIN:VCARD\r\nVERSION:3.0\r\nFN;CHARSET=UTF-8:Opencastundefined Importer\r\nN;CHARSET=UTF-8:Importer;Opencast;;;\r\nURL;CHARSET=UTF-8:https://github.com/virtUOS/edusharing-opencast-importer\r\nREV:2022-12-13T09:39:21.298Z\r\nEND:VCARD\r\n"
      ],

different license representations in different oc-instances

The representation of the license differs from instance to instance. For example UOS uses representations like CC-BY-SA, while TIHO uses representations like Creative Commons 3.0: Attribution-NonCommercial-NoDerivs

Currently we need to add all the different representations in the allowedLicences list. Currently there is just a mapping for some representations to the edu-sharing licenses.

I think, it would be better to map the licenses from the oc-value to the edu-sharing representation first and afterwards check against the allowedLicences list. The mapping should be configurable as we can not know all values from all oc-instances.

Parsing of honorific prefix preceding a Person's name incomplete

An honorific prefix preceding a Person's name such as Dr/Mrs/Mr/Prof is parsed by the importer. But it seems that this is incomplete - some parts of the prefix are used as name currently, but should be included in the prefix

examples

  • Dr. jur. Matthew LeMieux becomes (Title: Dr., Name: jur. Matthew LeMieux), but should be (Title: Dr. jur., Name: Matthew LeMieux)
  • Prof. Dr. phil. Thomas Vogtherr becomes (Title: Prof. Dr., Name: phil. Thomas Vogtherr), but should be (Title: Prof. Dr. phil., Name: Thomas Vogtherr)

get existing nodeIds if metadata.nodeId is undefined

starting point: src/edu-sharing/create-folders-structure.js in error handling of function sendPostRequest in line if (error.response.status === 409) return true
If this error code appears, the folder to create is already existing. If additionally `seriesData[0].nodeId === undefined' means there was an error saving the nodeId while creating an ES node (in this case: a folder).

Approach:

  1. http://localhost/edu-sharing/swagger/#!/NODE_v1/getMetadata
    http://localhost/edu-sharing/rest/node/v1/nodes/-home-/-userhome-/metadata
    get the nodeId of the folder -userhome- (main folder of a user)

  2. http://localhost/edu-sharing/swagger/#!/NODE_v1/getChildren
    http://localhost/edu-sharing/rest/node/v1/nodes/-home-/NODEID-FROM-REQUEST-ABOVE/children?maxItems=500&skipCount=0

Unable to create edusharing nodes, if stored data was already "used".

Lets say we have a clean empty edusharing instance and local setup without any stored data and we:

  1. Run the importer
  2. Delete all created edusharing folder/nodes
  3. Rerun the importer with the stored data

...then an error will be thrown and no new edusharing folder will be created. This is because we saved the node-id and parent-id of each node from the previous import and exclude them from beeing recreated (for example in create-folder-structure.js line 22).

Wouldn't it make sense to just allow to create new nodes, even though we have saved, "expired" node-ids?

Better metadata mapping

  • What metadata from Opencast could we use in Edu-Sharing?
  • Check license mapping

see: getBodyUpdateMetadata() in src/edu-sharing/update-matadata.js

Duplicated and redundant author information

The author information seems to be set at two different places: in ccm:author_freetext and also in ccm:lifecyclecontributer_author

We should avoid duplicated values. I think, the preferred field should be ccm:lifecyclecontributer_author, because there are so many edu-sharing instances with freetext-author-fields that are not useable at all - therefore these are ignored in services like oersi.org.

Maybe this can be made configurable, otherwise I would prefer to remove the freetext-author field

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.