Giter Site home page Giter Site logo

nationalarchives / ds-caselaw-ingester Goto Github PK

View Code? Open in Web Editor NEW
5.0 4.0 1.0 1.47 MB

Parse judgements from the Transformation Engine and load them into MarkLogic as part of the National Archives Find Case Law service

License: MIT License

Makefile 2.11% Python 92.15% Shell 5.74%
caselaw etl find-caselaw national-archives service

ds-caselaw-ingester's Introduction

The National Archives: Find Case Law

This repository is part of the Find Case Law project at The National Archives. For more information on the project, check the documentation.

Case Law Ingester

This is the repository for the lambda function used to parse Transformation Engine judgments and insert them to Marklogic

Development

We're using localstack, along with the awslocal-cli to enable local development of the lambda function.

Requirements

An installation of make is required to use the bundled Makefile for local development. Most operating systems come with this preinstalled, including Ubuntu Linux and MacOS. On Windows, Make can be installed via the Chocolatey package manager, or using the Windows Subsystem for Linux (WSL).

You will also need both awscli and awslocal-cli installed. awslocal-cli is a Localstack-specific wrapper around awscli.

Install both from the requirements file using:

python3 -m pip install -r requirements/local.txt

Setup Localstack

First, copy .env.example to .env and fill in the missing variables. If you are using Localstack via Docker, leave MARKLOGIC_HOST as host.docker.internal.

Then, start Localstack using:

docker-compose up -d

This will start Localstack in detached mode; logs are accessible via Docker Desktop.

Once the docker container is running, use the following make command to build a distribution of the lambda function, and setup the localstack AWS services

make setup

This will create a folder, dist, on your local machine that contains a zip file called lambda.zip - this is our compiled lambda. You can also upload this directly to the AWS console.

Sending a message

To send the example message bundled, use the send-message-v2 make target:

make send-message-v2

This will publish a message to the SNS topic, triggering the handle function in our lambda.

(send-message-v1 exists, and sends a v1 message.)

Viewing Output

The lambda output will be logged in the Localstack logs. Look for the lines following:

localstack.services.awslambda.lambda_executors: Lambda arn:aws:lambda:us-east-1:000000000000:function:te-lambda result / log output:

The logs will show the response from the lambda directly below this line. Any values sent to stdout (e.g. print statements), will be output beneath.

Unit tests

To run the tests

  • [First time] create a virtualenv (virtualenv venv -p \which python`` )
  • Activate it with . venv/bin/activate
  • scripts/test
  • When you're done, you might want to deactivate

Note that you might get a spurious errors about django config and environment variables if you're running in the wrong environment.

Updating the lambda

If you make a change to the code and need to update the lambda function, use the update make command:

make update

And then send a message:

make send-message-v2

Local testing

To test a tarfile locally:

  1. Add your test tarfile to aws_examples/s3/te-editorial-out-int.
  2. Edit aws_examples/sns/parsed-judgment.json to contain your tarfile name in s3-folder-url and consignment reference in consignment-reference.
  3. Run make setup aws_examples/s3/te-editorial-out-int/<your tarfile>, for example make setup aws_examples/s3/te-editorial-out-int/XYZ-123.tar.gz. If you run make setup without an argument, the original test tarfile TDR-2022-DNWR.tar.gz will be used
  4. Run make send-message-v2 to ingest your tarfile.

Deployment

Every change to the main branch is automatically deployed to the staging environment via GitHub actions.

Only releases are deployed to production. To trigger a deploy, create a new release named and tagged vX.Y.Z following semantic versioning. Autogenerate release notes, and publish; the release will then be tagged latest automatically and deployed to production.

ds-caselaw-ingester's People

Contributors

anthonyhashemi avatar donna-h avatar dragon-dxw avatar floppy avatar jacksonj04 avatar lewisdaleuk avatar lozette avatar renovate[bot] avatar rjw1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ds-caselaw-ingester's Issues

Ligatures aren't normalised in PDF or HTML

The PDF of [2024] UKFTT 31 (TC) contains a number of instances of the "๏ฌ‚" ligature (U+FB02).

This is seen repeatedly in the phrase "potato ๏ฌ‚our":

Screencast.from.17-01-24.08.49.19.webm

I do not have access to the original DOCX, although I note the ligature is also present in the PDF judgement on the official Tribunals website.

The ligature is also present in the HTML version but not in the XML version.

I suggest that the text undergoes Unicode Normalisation before a PDF is created.

(Apologies if this isn't the correct repo. Feel free to move it somewhere more suitable.)

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: Preset is invalid JSON (github>nationalarchives/ds-find-caselaw-docs)

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Pending Approval

These branches will be created by Renovate only once you click their checkbox below.

  • Update dependency ds-caselaw-marklogic-api-client to v23
  • Update localstack/localstack Docker tag to v3
  • ๐Ÿ” Create all pending approval PRs at once ๐Ÿ”

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

docker-compose
docker-compose.yml
  • localstack/localstack 2.3.2
github-actions
.github/workflows/ci.yml
  • actions/checkout v4
  • actions/setup-python v5
  • pre-commit/action v3.0.1
  • actions/setup-python v5
  • ubuntu 22.04
.github/workflows/codeql.yml
  • actions/checkout v4
  • github/codeql-action v3
  • github/codeql-action v3
  • github/codeql-action v3
.github/workflows/deploy-production.yml
  • actions/checkout v4
  • actions/setup-python v5
  • aws-actions/setup-sam v2
  • aws-actions/configure-aws-credentials v4
.github/workflows/deploy.yml
  • actions/checkout v4
  • actions/setup-python v5
  • aws-actions/setup-sam v2
  • aws-actions/configure-aws-credentials v4
.github/workflows/secrets.yml
  • actions/checkout v4
pip_requirements
requirements/base.txt
  • django-environ ~=0.10
  • ds-caselaw-marklogic-api-client ==18.0.0
  • requests-toolbelt ~=1.0
  • urllib3 ~=1.26
  • notifications-python-client ~=9.0
requirements/local.txt
  • pytest ==8.0.2
  • callee ==0.3.1
pre-commit
.pre-commit-config.yaml
  • pre-commit/pre-commit-hooks v4.5.0
  • psf/black 24.3.0
  • PyCQA/isort 5.13.2
  • PyCQA/flake8 7.0.0
  • pre-commit/mirrors-mypy v1.8.0
  • pre-commit/mirrors-prettier v3.1.0
pyenv
.python-version
  • python 3.12

  • Check this box to trigger a request for Renovate to run again on this repository

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.