Giter Site home page Giter Site logo

metadata-exports's Introduction

Unofficial exports of the Serlo metadata API

Development

Using Pipenv

  • Install pipenv
  • Run pipenv shell to activate the project's virtual environment.
  • Run pipenv install --dev to install the dev dependencies.
  • Run pipenv run lint to run the linting.
  • Run pipenv run format to format the code.
  • Run pipenv run test to run the tests in /tests.
  • Run exit to exit the shell

Export Metadata

  • Run pipenv run python download_metadata.py [output_file] to download all metadata from serlo.org
  • Run pipenv run python convert2rss.py [input_file] [output_file] to convert the downloaded .json into .rss

metadata-exports's People

Contributors

kulla avatar dependabot[bot] avatar andreashuber avatar moehome avatar eliflores avatar hugotiburtino avatar

Watchers

Botho avatar  avatar hackerman avatar  avatar

metadata-exports's Issues

Fix bug with licenses

  • Not every rss item has license information -> this needs to be fixed for those elements with a cc license
  • Filter out all elements without a license

Add script to enhance data from Metadata API

Problem

Currently we have duplicated logic in https://github.com/serlo/metadata-exports/blob/main/convert2rss.py and https://github.com/serlo/metadata-exports/blob/main/update_datenraum_nodes.py to enhance a metadata record with a description from our website. We should remove the code duplication.

Solution

https://github.com/serlo/metadata-exports/blob/main/download_metadata.py downloads metadata from our Metadata API. In an extra script we can enhance the metadata with descriptions from our website. The script should:

  1. Load the cached descriptions from the last run:
    def get_description_cache():
  2. go through metadata resources and add descriptions from cache / website when not present:
    def get_description(
    resource: Dict[str, Any],
    description_cache: Dict[str, Any],
    time_passed: timedelta,
    ):
    if "description" in resource and isinstance(resource["description"], str):
    return resource["description"]
    cached_value = description_cache.get(resource["id"], {})
    if cached_value.get("version", None) == resource["version"] and isinstance(
    cached_value.get("description", None), str
    ):
    return cached_value["description"]
    if time_passed > timedelta(minutes=30):
    return None
    new_description = load_description_from_website(resource)
    description_cache[resource["id"]] = {
    "description": new_description,
    "version": resource["version"],
    }
    return new_description
    def load_description_from_website(resource: Dict[str, Any]):
    identifier = resource.get("identifier", {}).get("value", None)
    if not isinstance(identifier, int):
    return None
    data = load_json_ld(f"https://serlo.org/{identifier}")
    if (
    data is not None
    and "description" in data
    and isinstance(data["description"], str)
    ):
    return data["description"]
    return None
  3. make sure that the script does not run more than X minutes (let's use 20min as a setting)
  4. save the new json in public/ as a new file -> This file should be the import of https://github.com/serlo/metadata-exports/blob/main/convert2rss.py and https://github.com/serlo/metadata-exports/blob/main/download_datenraum_nodes.py
  5. Delete code from https://github.com/serlo/metadata-exports/blob/main/convert2rss.py and https://github.com/serlo/metadata-exports/blob/main/download_datenraum_nodes.py which is not needed any more

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.