Giter Site home page Giter Site logo

cwr-dataapi's Introduction

CWR Data Model API

This projects offers a domain model for the CISAC CWR standard v2.1 to be used on Python applications, along a series of parsing which allow transformations between the model and various data structures.

CWR stands for Common Works Registration, and it is a common or standard format for the registration and revision of musical works, used by publishers and performing rights societies as a way to exchange musical works data.

While the CWR standard has been created by CISAC this library has been developed by WESO independently, with help from BMAT.

CWR-API Pypi package page

CWR-API latest documentation Status

Join the chat at https://gitter.im/weso/CWR-DataApi

Features

  • Model for CWR files
  • Configurable parser from transforming a CWR file into the model classes (including Pyparsing grammar)
  • Parsers for for model-JSON transformations

Documentation

Check the latest docs for the most current version of the documentation.

They are generated with the help of Sphinx. The source files for this are stored in the docs folder.

Usage

The application has been coded in Python. Dependencies are taken care with the use of pip, and an included makefile helps building the project.

Prerequisites

The project has been tested in the following versions of the interpreter:

  • Python 3.4
  • Python 3.5
  • Python 3.6

All other dependencies are indicated on the requirements.txt file. The included makefile can install them with the command:

$ make requirements

Among them, the most important is the Pyparsing library, which is used to create the CWR file parser.

Installing

The project includes a setup.py file and a makefile allowing direct installation of the library.

This can be done with the following command:

$ make install

Additionally, the project is offered as a Pypi package, and can be installed through pip:

$ pip install cwr-api

Making use of the parser

Once the project is installed it can be used in a similar way to this (using Python 2.7):

import codecs
import os

from cwr.parser.decoder.file import default_file_decoder
from cwr.parser.encoder.cwrjson import JSONEncoder

if __name__ == '__main__':
    print('File to JSON test')
    path = raw_input(
        'Please enter the full path to a CWR file (e.g. c:/documents/file.cwr): ')
    output = raw_input(
        'Please enter the full path to the file where the results will be stored: ')
    print('\n')
    print('Reading file %s' % path)
    print('Storing output on %s' % output)
    print('\n')

    decoder = default_file_decoder()

    data = {}
    data['filename'] = os.path.basename(path)
    data['contents'] = codecs.open(path, 'r', 'latin-1').read()

    data = decoder.decode(data)

    encoder = JSONEncoder()
    result = encoder.encode(data)

    output = codecs.open(output, 'w', 'latin-1')

    output.write(result)

Collaborate

The project is still under ongoing development, and any help will be well received.

There are two ways to help: reporting errors and asking for extensions through the issues management, or forking the repository and extending the project.

Issues management

Issues are managed at the GitHub project issues page.

Everybody is allowed to report bugs or ask for features.

Getting the code

The code can be found at the GitHub project page.

Feel free to fork it, and share the changes.

License

The project has been released under the MIT License.

cwr-dataapi's People

Contributors

bernardo-mg avatar black-rosary avatar ginfante avatar kazeborja avatar ralic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cwr-dataapi's Issues

CWRTables coupling

While the CWRTables is useful for keeping all the CWR tables info and files in a single place, the parser may be too dependant on it.

Right now it is used only for the Lookup fields on the grammar.field.table module. Which reduces the coupling, but this can be improved.

It should be possible to implement a custom instance of this class, so an interface or abstract base class should and then an implementation should be accessible through a factory or a facade.

Support for Acknowledgement files in the model

Right now only the first kind of CWR files is supported, the ones being sent to the receiver for processing.

A second type, the acknowledgement file, is created from the first, indicating the results from processing the file.

While this is closely related to the validation process, the parser, the console printer, and any other piece using the model classes, should give support for reading Acknowledgement files.

CWRFileDecoder only decodes the new filename format

The CWRFileDecoder class only decodes the filename into a FileTag if it follows the new naming format.

It should try to decode the new format, but if it fails then it should try again with the old format. A second failure means the filename can't be decoded.

JSON decoder

Create a decoder for JSON.

Can use the basic dictionary decoder for help.

Make wiki more readable

Move most of the CWR details to another document (maybe the Github page?), and use the wiki just as a quick guide for the library.

Mongo decoder

Create a decoder for Mongo.

Can use the basic dictionary decoder for help.

Problem with the Transmission Trailer and line end

The Transmission Trailer rule should indicate that this record ends on a line end.

But this causes an error when reading from a file if it doesn't end at the end of the trailer.

I've been unable to replicate this with tests, and for now the Transmission Trailer rule lacks the end of line requirement.

Interested Party class

Check the InterestedParty class.

Is it really needed? Should it be removed? Should it be used more often?

Some related classes which don't use it:

  • AgreementInterestedParty

Prepare a Github page

A Github page, giving details about the project and its background, would help a lot to make the library usable for third parties.

Maybe Sphinx would help here?

Controlled publishers tree

Controlled publishers stored on the CWR file are representing a tree which indicates the relationship between them and the territories.

The model contains classes to build this tree, but currently it is not being done.

Make sure the parser takes care of this.

ValueEntity inheritors

Classes created from ValueEntity seem to be all the same. It may be possible to remove them all, using only the base ValueEntity.

Make field grammar creation homogeneous

The grammar for the fields should be homogeneous. Try to make all of them be created through a method with the same type of parameters (columns, compulsory, name, etc).

Add a good validation system to the CWR file parser

Right now the validation is added manually to the nodes.

There should be a system where a node is assigned a constraint identifier, and then the validation gets configured.

It should be noted that the validation configuration is composed of, at least, the following pieces (check CWR and error expecifications for the actual requirements):

  • Constraint imposing limitations
  • Validation level (record, transaction, group, transmission...)
  • Action on failure (set to default, set to given value, reject...)
  • Failure message

They should be configurable for two reasons:

  • There is a huge number of errors and constraints (a quick look shows a few hundred of them)
  • There are variations (different societies have different constraints, and may even have different versions)

It would be better if this configuration was set on a file which could be easily modified.

Also, it should be possible to completely deactivate the validation system for testing purposes, or to swap it for another one.

Grouping transactions into classes or into collections?

While there are classes in the model to represent transactions, these are right now being grouped into just a collection.

It is necessary to find out which of the two would be better, checking both the most simple and the most complex cases of each type of transaction.

Update for the revision 7

The current implementation has been prepared for revision 3 and the current is revision 13. Check what changes came and apply them.

File encoding tag on source files

The file encoding is indicated using:
'# -- coding: utf-8 --'

Instead of:
'# -- encoding: utf-8 --'

This should be corrected on all files.

JSON encoder

Create an encoder for JSON.

Can use the basic dictionary encoder for help.

Revise tests

Right now Travis runs a few hundred of tests. While it is necessary to add a lot more to check the parser, it is also needed to somehow simplify those cases where the same groups of tests are repeated, as for example is the case of the fields grammar variations.

Field factories should be configurable

Field factories should be fully configurable. For example, the adapters are right now hardcored, but should be set with parameters or a configuration file.

Mongo encoder

Create an encoder for Mongo.

Can use the basic dictionary encoder for help.

AgreementTerritory & Territory

AgreementTerritory & Territory are very similar. It may be possible to swap AgreementTerritory for Territory, removing the first on the process.

Try to add Jython support

This is not required, but it can be nice to add Jython support.

In practice, it would mean adding Jython to the Tox test environment. I've already tried doing so, but Travis was unable to make it run (a problem with Java versions).

Still, some basic configurations, including a script, are still on the project, waiting for another try.

CWRConfiguration coupling

In a similar way to the problem with CWRTables, the CWRConfiguration is not configurable.

While this problem is reduced by the fact that this is meant to be set up editing the configuration files, it has the problem of being referred directly on several modules.

The CWRConfiguration class should implement an interface or an abstract class, and then be accessed through a factory or a facade.

Grammar exceptions messages

Some of the exceptions used on the grammar, mostly for fields values, seem to be incorrectly initialized, and they will cause an 'unprintable exception' error when raised.

Storing the Interested Parties information globally

When the same Interested Party appears several times on the file a new instance containing all his data is created.

Instead of this, it may be a good idea to create a single instance for the Party and then reuse it each time he appears.

Use a factory on the parser

Right now the model instances are created on the same module the rules are contained.

Instead of this, a factory should be used to decouple the parser from the model.

Grammar fields names issues

There is a problem with grammar fields where sometimes it gives the base field name instead of the one of the current field.

For example, after creating a numeric field, and naming it 'Field 1', it's name may still be 'Numeric Field'. This seems to be a problem related with fields being a combination of optional rules.

The easiest way to solve this is to add a 'name' parameter to the field creation methods.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.