Giter Site home page Giter Site logo

datapackage-pipelines-fiscal's Introduction

OpenSpending

Build Status Issues Docs Discord

OpenSpending is a project to make government finances easier to explore and understand. It started out as "Where does my money go", a platform to visualize the United Kingdom's state finance, but has been renamed and restructured to allow arbitrary financial data to be loaded and displayed.

The main use for the software is the site openspending.org which aims to track government finance around the world.

OpenSpending's code is licensed under the GNU Affero Licence except where otherwise indicated. A copy of this licence is available in the file LICENSE.txt.

OpenSpending is a microservices platform made up of a number of separate apps, each maintained in their own git repository. This repository contains docker-compose files that can be used run an instance of Openspending for development, or as the basis for a production deployment. This repository also acts as a central hub for managing issues for the entire platform.

What are these files?

Most applications that make up the OpenSpending platform are maintained in their own repositories, with their own Dockerfiles, built and pushed to the OpenSpending organisation on Docker Hub:

This repository maintains docker-compose files used to help get you started with the platform.

docker-compose.base.yml: This is the main docker-compose file for OpenSpending specific services. All installations will use this as the basis for running the platform.

docker-compose.dev-services.yml: This defines backing services used by the platform, such as Redis, ElasticSearch, and PostgreSQL. This file also includes fake-s3 in place of AWS S3, so you don't have to set up an S3 bucket for development. It is not recommended to use this for production.

docker-compose.data-importers.yml: This defines the services used for the separate os-data-importers application. They depend on services defined in docker-compose.dev-services.yml. Unless you are working on the data-importers or its associated source-spec files, it's not necessary to run this file.

docker-compose.local.yml: Create this file to add additional services, or overrides for the base configuration. It is ignored by git.

Dockerfiles/*: Most services are maintained in their own repositories, but a few small custom services used by the platform are maintained here. os-nginx-frontend is a basic frontend nginx server and configuration files to define resource locations for the platform. This will be build and run directly by docker-compose.base.yml.

I'm a developer, how can I start working on OpenSpending?

  1. Define the environmental variables that applications in the platform need. The easiest way to do this is to create a .env file (use .env.example as a template).

  2. Use docker-compose up to start the platform from the base, dev-services, and optionally local compose files:

$ docker-compose -f docker-compose.base.yml -f docker-compose.dev-services.yml [-f docker-compose.local.yml] up

  1. Open localhost:8080 in your browser.

I'm a developer, how can I work on a specific OpenSpending application? Show me an example!

You can use volumes to map local files from the host to application files in the docker containers. For example, say you're working on OS-Conductor, you'll add an override service to docker-compose.local.yml (create this file if necessary).

  1. Checkout the os-conductor code from https://github.com/openspending/os-conductor into ~/src/dockerfiles/os-conductor on your local machine.
  2. Add the following to docker-compose.local.yml:
version: "3.4"

services:
  os-conductor:
    environment:
      # Force python not to use cached bytecode
      PYTHONDONTWRITEBYTECODE:
    # Override CMD and send `--reload` flag for os-conductor's gunicorn server
    command: /startup.sh --reload
    # Map local os-conductor app files to /app in container
    volumes:
      - ~/src/dockerfiles/os-conductor:/app
  1. Start up the platform with base, dev-services, and your local compose file:

$ docker-compose -f docker-compose.base.yml -f docker-compose.dev-services.yml -f docker-compose.local.yml up

Now you can start working on os-conductor application files in ~/src/dockerfiles/os-conductor and changes will reload the server in the Docker container.

I want to work on the data-importers application. Show me how!

In Openspending, the os-data-importers application provides a way to import data and create fiscal datapackages from source-spec files. You can either work on the app independently, by following the README in the os-data-importers repository, or within the context of an Openspending instance, by using the included docker-compose.data-importers.yml file, and starting Openspending with:

$ docker-compose -f docker-compose.base.yml -f docker-compose.dev-services.yml -f docker-compose.data-importers.yml up

This will start Openspending locally as usual on port :8080, and the pipelines dashboard will be available on port :5000: http://localhost:5000.

I have my own backing service I want to use for development

That's fine, just add the relevant resource locator to the .env file. E.g., you're using a third-party ElasticSearch server:

OS_ELASTICSEARCH_ADDRESS=https://my-elasticsearch-provider.com/my-es-instance:9200

I want to run my own instance of OpenSpending in production

Great! There are many ways to orchestrate Docker containers in a network. E.g. for openspending.org we use Kubernetes. Use the docker-compose.base.yml file as a guide for networking the applications together, with their appropriate environment variables, and add resource locators pointing to your backing services for Postgres, ElasticSearch, Redis, memcached, AWS S3 etc. See the .env.example file for the required env vars you'll need to set up.

You'll also need to set up OAuth credentials for OS-Conductor (see https://github.com/openspending/os-conductor#oauth-credentials), and AWS S3 bucket details.

What happened to the old version of OpenSpending?

You can find the old OpenSpending v2, and the complete history for the codebase to that point, in the openspending-monolith branch.

datapackage-pipelines-fiscal's People

Contributors

akariv avatar brew avatar cyberbikepunk avatar vitorbaptista avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datapackage-pipelines-fiscal's Issues

`deduplicate` treats label columns as part of the unique key

In order to submit an issue, please ensure you can check the following. Thanks!

  • Declare which version of Python you are using (python --version)
  • Declare which operating system you are using

Problem

Two rows where all the code columns are equal but one or more label columns differ (e.g. different capitalisation) don't get deduplicated.

Expected

An error indicating which codes and labels are conflicting.

Another option would be to randomly pick a "correct" label. This would be bad because it would be hiding a problem in the data

Actual

The dataset is output and uploaded to OpenSpending which fails its unique key check

normalise-measures apparently incompatible with add_computed_field

In order to submit an issue, please ensure you can check the following. Thanks!

  • Declare which version of Python you are using (python --version)
  • Declare which operating system you are using
measures:
  currency: ZAR
  title: Amount
  mapping:
    "2013/14 Audited outcome":
      budget_phase: "Audited Outcome"
      financial_year: "2013"
    "2014/15 Audited outcome":
      budget_phase: "Audited Outcome"
      financial_year: "2014"

#deduplicate: true

postprocessing:
  - processor: add_computed_field
    parameters:
      resources: dedupe-measures-test
      fields:
        - operation: constant
          target: multiplication_factor
          with: 1000
        - operation: multiply
          source:
            - value
            - multiplication_factor
          target: value2

results in

ERROR log from processor fiscal.model:
+--------
| ERROR   :Missing OS Type for field multiplication_factor
| Traceback (most recent call last):
|   File "/home/jdb/proj/code4sa/treasury-portal/fiscal-data-package/env/src/datapackage-pipelines-fiscal/datapackage_pipelines_fiscal/processors/model.py", line 21, in <module>
|     field['type'] = os_types[field_name]
| KeyError: 'multiplication_factor'

Duplicate rows intoduced by normalising measures aren't deduplicated

In order to submit an issue, please ensure you can check the following. Thanks!

  • Declare which version of Python you are using (python --version)
  • Declare which operating system you are using

for the dataset

econ_1,2013/14 Audited outcome,2014/15 Audited outcome
A,123,0
A,0,0

the spec (ignoring sources and fields)

measures:
  currency: ZAR
  title: Amount
  mapping:
    "2013/14 Audited outcome":
      budget_phase: "Audited Outcome"
      financial_year: "2013"
    "2014/15 Audited outcome":
      budget_phase: "Audited Outcome"
      financial_year: "2014"

deduplicate: true

produces a fiscal data package which has duplicate primary keys.

I think this is because deduplication happens before measure normalisation, and measure normalisation introduces two rows with key A,2014,Audited Outcome,0

Succesful datapackage casts now error when publishing

In order to submit an issue, please ensure you can check the following. Thanks!

  • Declare which version of Python you are using (python --version)
  • Declare which operating system you are using

I have a working example of my fiscal data on openspending (data is linked at the end of the page):

When I try to reload the cvs I get now an error:

dump.to_path
ERROR :Failed to cast row {'id': 1}
ERROR :1) Field "id" can't cast value "1" for type "string" with format "default"
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/datapackage_pipelines/lib/dump/dumper_base.py", line 93, in schema_validator
schema.cast_row(to_cast)
File "/usr/local/lib/python3.6/site-packages/tableschema/schema.py", line 152, in cast_row
raise exceptions.CastError(message, errors=errors)
tableschema.exceptions.CastError: There are 1 cast errors (see exception.errors)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/datapackage_pipelines/specs/../lib/dump/to_path.py", line 31, in
PathDumper()()
File "/usr/local/lib/python3.6/site-packages/datapackage_pipelines/lib/dump/dumper_base.py", line 47, in call
finalizer=self.finalize
File "/usr/local/lib/python3.6/site-packages/datapackage_pipelines/wrapper/wrapper.py", line 69, in spew
for rec in res:
File "/usr/local/lib/python3.6/site-packages/datapackage_pipelines/lib/dump/dumper_base.py", line 122, in row_counter
for row in resource:
File "/usr/local/lib/python3.6/site-packages/datapackage_pipelines/lib/dump/dumper_base.py", line 251, in rows_processor
for row in resource:
File "/usr/local/lib/python3.6/site-packages/datapackage_pipelines/lib/dump/dumper_base.py", line 98, in schema_validator
raise ValueError(row) from e
ValueError: {'id': 1}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.