Giter Site home page Giter Site logo

dctap-python's Introduction

dctap-python

Normalize and serialize a Tabular Application Profile.

Test Status Documentation Status Code style: black

Documentation

Installation

To work correctly, "dctap" requires Python 3.7 or higher. Executing the command python3 should show you if Python 3 is installed on your machine and in which version.

As explained below, the command-line utility can be installed either from the online PyPI repository, using the pip command, or from a local copy of the project on your own machine.

Installation from PyPI

Installing with "pip" pulls the most recently published version of the project from the PyPI repository with the following command:

python3 -m pip install dctap

or uninstalled with:

python3 -m pip uninstall dctap

Installation from Github

"dctap" can be installed directly from Github with:

python3 -m pip install -U https://github.com/dcmi/dctap-python/archive/main.zip

Installing "dctap" this way, and periodically re-running this command to refresh, is the best way to keep up with the latest version of "dctap".

Installation with pip in virtual environment

For developers who work alot with Python projects, it is good practice to create and activate a virtual environment so that "dctap", and its dependencies, will not be installed into the global Python environment on your machine. The virtual environment is held in a directory of your choice; in the example below, a hidden directory .venv is created in some_directory (your current working directory), and the virtual environment is activated by executing source .venv/bin/activate.

$ python3 -m venv .venv
$ source .venv/bin/activate
$ python3 -m pip install -U https://github.com/dcmi/dctap-python/archive/main.zip

Note that "dctap" will pip-install even without creating and activating a virtual environment, even though this is not considered good practice. If you do install it into a virtual environment, note that the virtual environment must be activated with source .venv/bin/activate or "dctap" will not work. The activation of a virtual environment can be automated by adding this command to a shell profile where it will be executed when starting the shell, for example by adding the lines to the file "~/.bash_profile":

cd /Users/foo/somedirectory
source .venv/bin/activate

Installation from a local clone of Git repository

Cloning the "dctap-python" repository to your machine and installing it from the dctap-python directory is a good option if you want to keep up-to-date with the latest developments in the project. The following commands install "dctap" for the first time. In order to refresh the project directly from the project repository, you can at any time execute git pull (from within the repository), which will install the latest features and bug fixes in your local copy.

$ git clone https://github.com/dcmi/dctap-python.git
$ cd dctap-python
$ python -m venv .venv
$ source .venv/bin/activate
$ python3 -m pip install flit Pygments
$ flit install -s

Quick start

Run without arguments, "dctap" shows what options and commands are available.

$ dctap
Usage: dctap [OPTIONS] COMMAND [ARGS]...

  DC Tabular Application Profiles (DCTAP) - base module

  Examples:

  $ dctap read my_profile.csv
  $ dctap read --json my_profile.csv
  $ dctap read --expand-prefixes my_profile.csv
  $ dctap read --warnings my_profile.csv
  $ dctap read --warnings --expand-prefixes --json my_profile.csv
  $ dctap init
  Built-in settings written to dctap.yml - edit as needed.
  $ dctap init /Users/tbaker/dctap.yml
  Built-in settings written to /Users/tbaker/dctap.yml - edit as needed.
  $ dctap generate --configfile /Users/tbaker/dctap.yml

Options:
  --version  Show version and exit
  --help     Show help and exit

Commands:
  generate  Generate normalized text, JSON, or YAML of CSV, with warnings.
  init      Generate customizable configuration file [default: dctap.yml].

For more information, see the documentation on readthedocs.io.

dctap-python's People

Contributors

dublincore avatar kcoyle avatar nishad avatar tombaker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dctap-python's Issues

CSV file is valid as CSV

See dcmi/dctap#28

The program should verify that the input file as valid as CSV:

  • each row has the same number of columns

If errors found:

  • stop program with message to user

`yaml.safe_load` gets `PendingDeprecationWarning`

If ignore::PendingDeprecationWarning is commented out in pytest.ini one gets:

tests/test_config/test_config_get_config_dict.py::test_exit_if_configfile_has_bad_yaml
  /Users/tbaker/github/dcmi/dctap-python/dctap/config.py:73: PendingDeprecationWarning:
  safe_load will be removed, use

    yaml=YAML(typ='safe', pure=True)
    yaml.load(...)

  instead
    return yaml.safe_load(default_config_yaml)

-- Docs: https://docs.pytest.org/en/stable/warnings.html

Warning about shapeID

The program issues a warning for each use of a shapeID that is not in URL form:

  "warnings": {
    "Scholarly Resource": {
      "shapeID": [
        "Value 'Scholarly Resource' does not look like a URI.",
        "Value 'Scholarly Resource' does not look like a URI.",

Because we do not require the shapeID to be a URI, this warning could be confusing. I think it should be removed.

Carriage returns?

At huggingFace, the default text output does not have line ends:

['Tabular Application Profile (TAP)', '    Shape', '        shapeID                  Scholarly Resource', '        Statement Template', '            propertyID           dct:abstract', '            propertyLabel        Abstract', '            valueDataType        xsd:string', '            note                 Free text', '        Statement Template', '            propertyID           dct:accessRights', '            propertyLabel        Access rights', '            valueDataType        xsd:anyURI', "            valueConstraint      ['http://vocabularies.coar-repositories.org/documentation/access_rights/']", '            valueConstraintType  iristem', '            note                 A term from COAR vocabulary (http://vocabularies.coar-repositories.org

In a text program it looks like:
Screenshot 2023-11-29 at 8 23 13 AM

I assume this is a question of carriage return types, as I think that huggingface doesn't make any modifications to output.

Issues with .dctaprc as a config file name

.*rc files are a standard convention, but using โ€œ.dctaprcโ€ raises some issues.

  • It is a hidden file in NIX OSs as a dotfile, which may be tricky for many users to find or modify, especially within the working directory. In the long run, this implementation can raise a lot of unexpected side effects.
  • Keeping the file extension yaml or yml helps the text editors to syntax highlight and validate while editing these files.

DEFAULT_CONFIGFILE_NAME = ".dctaprc"

python version required for dctap

pyproject.toml currently requires Python 3.9, but I'm not at all sure that the version needs to be so recent.

dctap can now be installed with pip (see PyPI project), it would be good to adjust the version required to the lowest possible.

@nishad Do you have a way to test whether it will work with earlier Python versions?

Spurious warning about invalid valueNodeType when added in extra_value_node_types

I'm using the csvreader() method in another program. The TAP I am reading has one entry of valueNodeType of IRI BNODE (line 8). The dctap yaml config I am using has

extra_value_node_types:
 - iri bnode

but still I get {'valueNodeType': ["'iri bnode' is not a valid node type."]} in the "warnings_dict"

(I understand the values should be case-insensitive, but I've also tried IRI BNODE in the YAML config and it made no difference.)

The file loads fine and I can use it, but I think this is a spurious warning.

Reverted to hand-entering version number in `docs/conf.py`

@nishad I noticed today that the last three RTD builds had failed - see https://readthedocs.org/projects/dctap-python/builds/14275110/ .

The builds started failing when we changed the hand-entered version number in conf.py to be dctap.__version__. At first it failed because this variable was unfindable until I added import dctap to conf.py. But I did not realize until today that import dctap was now causing the build to "fail", perhaps because the documentation online looked perfectly fine.

Now that I have reverted to hand-entering the version number (from dctap/__init__.py), the build is passing again.

I do not think that hand-entering the version number imposes an undue burden, but do you see a way to make it work with dctap.__version__?

Installing from GitHub using pip

There is an alternative option to install from GitHub using pip.

pip install git+https://github.com/dcmi/dctap-python.git

Works with recent versions of pip.

Element aliases causing error in dctap.yaml?

Running the latest version of dctap with a dctap.yml that sets an element alias causes the error
Valid DCTAP CSV must have a 'propertyID' column.

to repeat use:
dctap.yaml:

### dctap configuration file (in YAML format)
extra_statement_template_elements:
 - severity

element_aliases:
     "Mand": "mandatory"
     "Rep": "repeatable"

tap.csv:

shapeID,propertyID,propertyLabel,Mand,Rep,valueNodeType,valueDataType,valueConstraint,valueConstraintType,valueShape,note,severity
BookShape,dct:title,Title,TRUE,FALSE,Literal,rdf:langString,,,,,Violation
BookShape,dct:creator,Author,FALSE,TRUE,IRI BNODE,,,,AuthorShape,,Warning
BookShape,sdo:isbn,ISBN-13,FALSE,FALSE,Literal,xsd:string,^(\\d{13})?$,pattern,,"Just the 13 numbers, no spaces or separators.",Violation
BookShape,rdf:type,Type,TRUE,FALSE,IRI,,sdo:Book,,,,Warning
AuthorShape,rdf:type,Type,TRUE,TRUE,IRI,,foaf:Person,,,,Warning
AuthorShape,foaf:givenName,Given name,FALSE,TRUE,Literal,xsd:string,,,,,Warning
AuthorShape,foaf:familyName,Family name,FALSE,TRUE,Literal,xsd:string,,,,,Warning

Removing the element_alias block from dctap.yaml fixes the error message.

Pylint says return statements are inconsistent

At https://github.com/dcmi/dctap-python/blob/main/dctap/utils.py#L22

def is_uri_or_prefixed_uri(uri):
    """True if string is URI or superficially looks like a prefixed URI."""
    if is_uri(uri):
        return True
    if re.match("[A-Za-z0-9_]*:[A-Za-z0-9_]*", uri):  # looks like prefixed URI
        return True

Pylint says: "R1710: Either all return statements in a function should return an expression, or none of them should. (inconsistent-return-statements)"

@nishad Can you advise?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.