Giter Site home page Giter Site logo

frictionlessdata / frictionless-ci Goto Github PK

View Code? Open in Web Editor NEW
35.0 10.0 12.0 3.71 MB

Data management service that brings continuous data validation to tabular data in your repository via Github Action

Home Page: https://repository.frictionlessdata.io

License: MIT License

Dockerfile 2.11% JavaScript 92.85% Makefile 5.04%

frictionless-ci's Introduction

frictionless-ci

Build Coverage Release Codebase Support

Data management service that brings continuous data validation to tabular data in your repository via Github Action

Purpose

  • Continuous Data Validation: With Frictionless Repository you can ensure the quality of your data. This Github Action will report any problems with your data like bad header or missing cells.

Features

  • Open Source (MIT)
  • Validation badges
  • Visual quality reports
  • Composable Github Action

Example

Take a look at the DEMO repository

Frictionless Experimental

Example

Documentation

Please visit our documentation portal:

frictionless-ci's People

Contributors

7yl4r avatar aivuk avatar augusto-herrmann avatar gabrielbdornas avatar matthew-stacks avatar roll avatar sapetti9 avatar shashigharti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

frictionless-ci's Issues

Duplicated primary key entries are reported as errors in framework but validate in repository

Overview

Some checks that fail validation in Frictionless Framework pass validation in Frictionless Repository.

Example

Recently I have added primary keys to a schema. Since then, Frictionless Framework v5.8.1 correctly points out that there are duplicated entries:

validation report

frictionless validate data/valid/datapackage.json 

# -------
# invalid: brazilian-transparency-and-open-data-portals.csv 
# -------

## Summary 

+------------------+--------------------------------------------------+
| Name             | Value                                            |
+==================+==================================================+
| File Place       | brazilian-transparency-and-open-data-portals.csv |
+------------------+--------------------------------------------------+
| File Size        | 20.0 kB                                          |
+------------------+--------------------------------------------------+
| Total Time       | 0.04 Seconds                                     |
+------------------+--------------------------------------------------+
| Rows Checked     | 183                                              |
+------------------+--------------------------------------------------+
| Total Errors     | 1                                                |
+------------------+--------------------------------------------------+
| PrimaryKey Error | 1                                                |
+------------------+--------------------------------------------------+

## Errors 

+-------+---------+-------------+------------------------------------------------------------------------------------+
|   Row | Field   | Type        | Message                                                                            |
+=======+=========+=============+====================================================================================+
|     3 |         | primary-key | Row at position "3" violates the primary key: the same as in the row at position 2 |
+-------+---------+-------------+------------------------------------------------------------------------------------+


# -------
# invalid: brazilian-municipality-and-state-websites.csv 
# -------

## Summary 

+------------------+-----------------------------------------------+
| Name             | Value                                         |
+==================+===============================================+
| File Place       | brazilian-municipality-and-state-websites.csv |
+------------------+-----------------------------------------------+
| File Size        | 292.2 kB                                      |
+------------------+-----------------------------------------------+
| Total Time       | 0.207 Seconds                                 |
+------------------+-----------------------------------------------+
| Rows Checked     | 2880                                          |
+------------------+-----------------------------------------------+
| Total Errors     | 40                                            |
+------------------+-----------------------------------------------+
| PrimaryKey Error | 40                                            |
+------------------+-----------------------------------------------+

## Errors 

+-------+---------+-------------+------------------------------------------------------------------------------------------+
|   Row | Field   | Type        | Message                                                                                  |
+=======+=========+=============+==========================================================================================+
|  1030 |         | primary-key | Row at position "1030" violates the primary key: the same as in the row at position 1029 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1182 |         | primary-key | Row at position "1182" violates the primary key: the same as in the row at position 1181 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1208 |         | primary-key | Row at position "1208" violates the primary key: the same as in the row at position 1207 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1256 |         | primary-key | Row at position "1256" violates the primary key: the same as in the row at position 1255 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1267 |         | primary-key | Row at position "1267" violates the primary key: the same as in the row at position 1266 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1310 |         | primary-key | Row at position "1310" violates the primary key: the same as in the row at position 1309 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1375 |         | primary-key | Row at position "1375" violates the primary key: the same as in the row at position 1374 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1479 |         | primary-key | Row at position "1479" violates the primary key: the same as in the row at position 1478 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1501 |         | primary-key | Row at position "1501" violates the primary key: the same as in the row at position 1500 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1519 |         | primary-key | Row at position "1519" violates the primary key: the same as in the row at position 1518 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1534 |         | primary-key | Row at position "1534" violates the primary key: the same as in the row at position 1533 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1607 |         | primary-key | Row at position "1607" violates the primary key: the same as in the row at position 1606 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1612 |         | primary-key | Row at position "1612" violates the primary key: the same as in the row at position 1611 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1622 |         | primary-key | Row at position "1622" violates the primary key: the same as in the row at position 1621 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1639 |         | primary-key | Row at position "1639" violates the primary key: the same as in the row at position 1638 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1702 |         | primary-key | Row at position "1702" violates the primary key: the same as in the row at position 1701 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1741 |         | primary-key | Row at position "1741" violates the primary key: the same as in the row at position 1740 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1750 |         | primary-key | Row at position "1750" violates the primary key: the same as in the row at position 1749 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1760 |         | primary-key | Row at position "1760" violates the primary key: the same as in the row at position 1759 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1787 |         | primary-key | Row at position "1787" violates the primary key: the same as in the row at position 1786 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1940 |         | primary-key | Row at position "1940" violates the primary key: the same as in the row at position 1939 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1941 |         | primary-key | Row at position "1941" violates the primary key: the same as in the row at position 1940 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  1963 |         | primary-key | Row at position "1963" violates the primary key: the same as in the row at position 1962 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2022 |         | primary-key | Row at position "2022" violates the primary key: the same as in the row at position 2021 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2023 |         | primary-key | Row at position "2023" violates the primary key: the same as in the row at position 2022 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2048 |         | primary-key | Row at position "2048" violates the primary key: the same as in the row at position 2047 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2081 |         | primary-key | Row at position "2081" violates the primary key: the same as in the row at position 2080 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2194 |         | primary-key | Row at position "2194" violates the primary key: the same as in the row at position 2193 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2224 |         | primary-key | Row at position "2224" violates the primary key: the same as in the row at position 2223 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2299 |         | primary-key | Row at position "2299" violates the primary key: the same as in the row at position 2298 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2339 |         | primary-key | Row at position "2339" violates the primary key: the same as in the row at position 2338 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2346 |         | primary-key | Row at position "2346" violates the primary key: the same as in the row at position 2345 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2347 |         | primary-key | Row at position "2347" violates the primary key: the same as in the row at position 2346 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2396 |         | primary-key | Row at position "2396" violates the primary key: the same as in the row at position 2395 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2491 |         | primary-key | Row at position "2491" violates the primary key: the same as in the row at position 2490 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2561 |         | primary-key | Row at position "2561" violates the primary key: the same as in the row at position 2560 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2672 |         | primary-key | Row at position "2672" violates the primary key: the same as in the row at position 2671 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2673 |         | primary-key | Row at position "2673" violates the primary key: the same as in the row at position 2672 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2723 |         | primary-key | Row at position "2723" violates the primary key: the same as in the row at position 2722 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+
|  2846 |         | primary-key | Row at position "2846" violates the primary key: the same as in the row at position 2845 |
+-------+---------+-------------+------------------------------------------------------------------------------------------+

However, Frictionless Repository v2 checks on this workflow do pass validation tests for some reason. Aren't they supposed to be the same?

Expected behaviour

Frictionless Repository was expected to fail validation due to the duplicated primary key violations in two of the csv files.

Cannot load a report

I am trying to create a Frictionless repo, but I have some problems.

  1. I cannot see the report of a failed validation.
    image

  2. I have a question about the action YAML file.
    image
    What is the meaning of these attributes?

I would appreciate any help. I appreciate any help you can provide.

Goodtables migration guide missing from docs

Overview

I think there used to be a migration guide from Goodtables in Frictionless Repository's documentation. Right now, if you open the docs and use the search feature for "Goodtables" you even find a result there.

search-goodtables

However, when you click on that result you get a "Page not found" error.

For why this is necessary, see #10.

Download reports

It would be nice to have a report in JSON, CSV format post validation.

GitHub action failing

Overview

I use the Frictionless GitHub action in a number of repositories to check datasets created updated on a weekly schedule. Today the action failed to run, generating the same error in all repositories:

 exec /bin/sh: no such file or directory
  The command '/bin/sh -c curl -fsSL https://deb.nodesource.com/setup_14.x | bash - &&   apt-get install -y nodejs &&   npm install --production &&   pip install -r requirements.txt' returned a non-zero code: 1

My workflow file contains:

name: frictionless

on:
  workflow_dispatch:
  workflow_run:
    workflows: ["Fetch latest data"]
    types:
      - completed

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v2
        with:
          lfs: true # add this to download LFS files
          submodules: true # add this to download submodules
      - name: Validate data
        uses: frictionlessdata/repository@v1

I tried updating to repository@v2 but that didn't help.

Noudesource.com deprecation warning makes build slow

Overview

The current build process uses a script hosted in noudesource.com to install Node.js, which is deprecated, generates a warning, and adds a whole minute of wait to the build process:

  #9 [4/4] RUN   curl -fsSL https://deb.nodesource.com/setup_16.x | bash - &&   apt-get install -y nodejs &&   npm install --omit dev &&   pip install -r requirements.txt
  #9 0.227 
  #9 0.227 ================================================================================
  #9 0.227 ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
  #9 0.227 ================================================================================
  #9 0.227 
  #9 0.227                            SCRIPT DEPRECATION WARNING                    
  #9 0.227 
  #9 0.227   
  #9 0.227   This script, located at https://deb.nodesource.com/setup_X, used to
  #9 0.227   install Node.js is deprecated now and will eventually be made inactive.
  #9 0.227 
  #9 0.227   Please visit the NodeSource distributions Github and follow the
  #9 0.227   instructions to migrate your repo.
  #9 0.227   https://github.com/nodesource/distributions
  #9 0.227 
  #9 0.227   The NodeSource Node.js Linux distributions GitHub repository contains
  #9 0.227   information about which versions of Node.js and which Linux distributions
  #9 0.227   are supported and how to install it.
  #9 0.227   https://github.com/nodesource/distributions
  #9 0.227 
  #9 0.227 
  #9 0.227                           SCRIPT DEPRECATION WARNING
  #9 0.227 
  #9 0.227 ================================================================================
  #9 0.227 ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
  #9 0.227 ================================================================================
  #9 0.227 
  #9 0.227 TO AVOID THIS WAIT MIGRATE THE SCRIPT
  #9 0.227 Continuing in 60 seconds (press Ctrl-C to abort) ...
  #9 0.227 
  #9 60.23 
  #9 60.23 ## Installing the NodeSource Node.js 16.x repo...

This should be migrated to avoid breaking the build in the future and also to skip this 60 second wait.

It seems like a similar problem has happened before (#47), but new action is required.

Release stable v1

Overview

We need to gather feedback for 1-2 month and then release.

Detect and validate data packages?

Overview

Currently, it looks only for CSV and EXCEL files, although, it will be easy to support data packages lookup by default. At the same time, the logic needs to handle duplication among data package's files and independently found data files.

Error: Cannot validate inquiry: stdout maxBuffer length exceeded

Validation report is not being generated (`Cannot load a report`)

Overview

Validation report is not being generated (message: Cannot load a report) because of two errors shown in the 'report' artifact:

The task has an error: 'in <string>' requires string as left operand, not NoneType
The task has an error: 'fieldPosition'

The repo is using git LFS, and the yaml file is pointing to frictionlessdata/repository@v1 :

Nevertheless, internal validation is showing results as expected. And in another repository of reproductible examples, also using git LFS, the report could be loaded:


Please preserve this line to notify @roll (lead of this repository)

Rename this to Workflows

While Frictionless CI makes more sense to a developer than the old name ("Repository"), I would suggest renaming it Frictionless Workflow. Or creating something of an umbrella project which incorporates the techniques in this one.

Rationale: this project could also be very useful to people creating data pipelines, in some cases linked to ETL components, while others would be more interested in streaming data sources or other advanced scenarios. Continuous Integration is just a mechanism, mostly known as a term in software engineering, while Continuous Aggregation or Continuous Validation might be more appropriate when dealing with datasets.

Furthermore, the development of user-friendly tools related to the management of workflows using Frictionless Data standards could be well communicated and promoted through an easily recognizable term. One that is, nota bene, currently promoted by GitHub itself.

I've created a #workflows channel to discuss this idea, and added some thoughts in the CKAN camp at ckan/ideas#211

Failing validation by zipped resources

Overview

I am using the pattern suggested here to describe zipped CSVs.

However, the validation fails with an "Encoding Error" (see here for an example report) while the very same datapackage validate the very same files if they are unzipped and referenced one by one.

You can find my datapackages (both the zipped and unzipped resources version) in the attached zip
datapackages.zip


Please preserve this line to notify @roll (lead of this repository)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.