Giter Site home page Giter Site logo

datadotworld / ckanext-datadotworld Goto Github PK

View Code? Open in Web Editor NEW
12.0 41.0 9.0 165 KB

CKAN extension for data.world

Home Page: https://data.world/integrations/ckan

License: Apache License 2.0

Shell 3.37% Python 74.93% CSS 5.16% HTML 15.53% JavaScript 1.01%
reference-implementation ckan ckanext ckan-extension datasets open-data dwstruct-t01-dist

ckanext-datadotworld's Introduction

ckanext-datadotworld

With this extension enabled, the manage view for organizations is provided with the additional tab data.world. Within the data.world tab organization admins can specify syncronization options that will apply for that organization.

Supported versions

CKAN version 2.4 or greater (including 2.7).

All versions support celery backend, but version 2.7 will use RQ. There are no changes required to use new backend - just start it using:

paster --plugin=ckan jobs worker -c /config.ini

instead of:

paster --plugin=ckan celeryd run -c /config.ini

Details at http://docs.ckan.org/en/latest/maintaining/background-tasks.html

Installation

To install ckanext-datadotworld:

  1. Activate your CKAN virtual environment, for example:

    . /usr/lib/ckan/default/bin/activate
    
  2. If you already have an older version of this extension, remove it first:

    pip uninstall -y ckanext-datadotworld
    

    Install the ckanext-datadotworld Python package into your virtual environment:

    pip install git+https://github.com/datadotworld/ckanext-datadotworld
    
  3. Add datadotworld to the ckan.plugins setting in your CKAN config file (by default the config file is located at /etc/ckan/default/production.ini).

  4. Create DB tables:

    paster --plugin=ckanext-datadotworld datadotworld init -c /config.ini
    paster --plugin=ckanext-datadotworld datadotworld upgrade -c /config.ini
    
  5. Start celery daemon either with suprevisor or using paster:

    paster --plugin=ckan celeryd run -c /config.ini
    

Config Settings

Attempts to push failed datasets can be scheduled by adding the following line to cron:

* 8 * * * paster --plugin=ckanext-datadotworld datadotworld push_failed -c /config.ini

A similar solution enables syncronization with remote (i.e. not uploaded) resources with data.world:

* 8 * * * paster --plugin=ckanext-datadotworld datadotworld sync_resources -c /config.ini

Delay option

There is a 1 second delay configured by default. This delay period can be controlled by modifying the "ckan.datadotworld.request_delay" configuration variable within the CKAN ini file.

For example:

ckan.datadotworld.request_delay = 1

To ensure that the delay will work correctly, you also need to configure Celery to work in single thread mode. To do this, add the following flag to the Celery start command:

--concurrency=1

Details at http://celery.readthedocs.io/en/latest/userguide/workers.html#concurrency.

Template snippets

In order to add data.world banner on dataset page(currently it seats at the top of package_resources block) you may add next snippet to template with datadotworld_extras variable that contains object(model) with currently viewed package's datadotworld extras and org_id - owner organization of viewed packaged:

{% snippet 'snippets/datadotworld/banner.html', org_id=pkg.owner_org, datadotworld_extras=c.pkg.datadotworld_extras %}

Sidebar label may be added by placing next snippet to your template(org_id is ID of viewed organization):

{% snippet 'snippets/datadotworld/label.html', org_id=organization.id %}

Development Installation

To install ckanext-datadotworld for development, activate your CKAN virtualenv and do the following:

git clone https://github.com/datadotworld/ckanext-datadotworld.git
cd ckanext-datadotworld
python setup.py develop
paster datadotworld init -c /config.ini

Running the Tests

Make sure you follow the CKAN testing guide (http://docs.ckan.org/en/latest/contributing/test.html). To run the tests, do the following:

nosetests --ckan --nologcapture --with-pylons=test.ini

To run the tests and produce a coverage report, first make sure you have coverage installed in your virtualenv (pip install coverage) then run:

nosetests --ckan --nologcapture --with-pylons=test.ini --with-coverage --cover-package=ckanext.datadotworld --cover-inclusive --cover-erase --cover-tests

ckanext-datadotworld's People

Contributors

iaroslav13 avatar luketully avatar rflprr avatar sarakbarr avatar smotornyuk avatar starl3n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ckanext-datadotworld's Issues

Installation error (Ubuntu and Centos7)

hi,

when i try to install, i receive many errors.
Celery is needed but 3.x and not 4.x.

At the end of installation:

paster --plugin=ckanext-datadotworld datadotworld init -c /etc/ckan/default/production.ini

Traceback (most recent call last):
File "/usr/lib/ckan/default/bin/paster", line 11, in
sys.exit(run())
File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 102, in run
invoke(command, command_name, options, args[1:])
File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 141, in invoke
exit_code = runner.run(args)
File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 236, in run
result = self.command()
File "/usr/lib/ckan/default/lib/python2.7/site-packages/ckanext/datadotworld/command.py", line 44, in command
self._init()
File "/usr/lib/ckan/default/lib/python2.7/site-packages/ckanext/datadotworld/command.py", line 93, in _init
main(argv, debug=False, repository=repository)
File "/usr/lib/ckan/default/lib/python2.7/site-packages/migrate/versioning/shell.py", line 209, in main
ret = command_func(**kwargs)
File "", line 2, in version_control
File "/usr/lib/ckan/default/lib/python2.7/site-packages/migrate/versioning/util/init.py", line 160, in with_engine
return f(*a, **kw)
File "/usr/lib/ckan/default/lib/python2.7/site-packages/migrate/versioning/api.py", line 250, in version_control
ControlledSchema.create(engine, repository, version)
File "/usr/lib/ckan/default/lib/python2.7/site-packages/migrate/versioning/schema.py", line 139, in create
repository = Repository(repository)
File "/usr/lib/ckan/default/lib/python2.7/site-packages/migrate/versioning/repository.py", line 77, in init
self.verify(path)
File "/usr/lib/ckan/default/lib/python2.7/site-packages/migrate/versioning/repository.py", line 98, in verify
raise exceptions.InvalidRepositoryError(path)
migrate.exceptions.InvalidRepositoryError: /usr/lib/ckan/default/lib/python2.7/site-packages/datadotworld_repository

Ckan 2.6.2 (same error on Ubuntu or Centos7 VM)

thanks

Normalize tags to satisfy data.world constraints

Currently, if tags on the CKAN side contain characters not allowed by data.world, datasets fail to sync.

Instead, the CKAN extension should normalize tags so that they only contain lower-case letters, numbers or spaces.

Rate limit sync requests

For large catalogs, the extension can produce a large number of API requests. data.world rate limits API use. With that in mind, when datasets are refreshed in batch mode, the extension should rate limit API calls on its side to a maximum of 1 per second. In addition, the extension should backoff and retry, if the API response is HTTP 429.

sqlalchemy-migrate upgrade

The sqlalchemy-migrate version requirement is well below the version currently required by ckan@master. The currently required version is also well behind the latest minor release in the 0.9.x tag.

Since there do not appear to be any breaking changes in sqlalchemy-migrate between 0.9.1 and 0.10.0, it's worth updating this.

Preview some data.world UI elements

Right now, this extension is primarily for synchronization/plumbing purposes and doesn't expose any UI user-facing UI elements beyond a link to the corresponding data.world dataset.

Data.world should consider previewing/teasing some summary metadata on CKAN, e.g.

  • Comments ala ckanext-disqus or discourse
  • Allow embedding, so it can be showcased
  • A Resource View to show data.world summary elements

These interactions can be designed so it doesn't dilute data.world valprop/brand, as most interactions will be done natively on data.world site.

Or per Tim O'Reilly - "Create more value than you capture" ;)

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: Cannot find preset's package (github>datadotworld/renovate-config)

Delete datasets as they are deleted on CKAN

Currently, the CKAN extension does not delete datasets on data.world once they are deleted on data.world.

Instead, now that a DELETE endpoint has been released, the CKAN extension should properly delete them on data.world.

More info here: https://dwapi.api-docs.io/v0/datasets/delete-a-dataset

IMPORTANT: This endpoint requires a token with admin permissions. Existing users should be advised to obtain a token with such permission if DELETE requests fail with HTTP 403.

ValueError: invalid literal for int() with base 10: 'cs'

$ paster --plugin=ckan celeryd run -c production.ini 
WARNING: This function is deprecated. Use `paster jobs worker` instead.
No handlers could be found for logger "ckan.lib.celery_app"
[2017-10-16 17:21:47,113: WARNING/MainProcess] /usr/lib/ckan/default/local/lib/python2.7/site-packages/celery/apps/worker.py:161: CDeprecationWarning: 
Starting from version 3.2 Celery will refuse to accept pickle by default.

The pickle serializer is a security concern as it may give attackers
the ability to execute any command.  It's important to secure
your broker from unauthorized access when using pickle, so we think
that enabling pickle should require a deliberate action and not be
the default choice.

If you depend on pickle then you should set a setting to disable this
warning and to be sure that everything will continue working
when you upgrade to Celery 3.2::

    CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']

You must only enable the serializers that you will actually use.


  warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
[2017-10-16 17:21:47,123: ERROR/MainProcess] Unrecoverable error: ValueError("invalid literal for int() with base 10: 'cs'",)
Traceback (most recent call last):
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/celery/worker/__init__.py", line 206, in start
    self.blueprint.start(self)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
    self.on_start()
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/celery/apps/worker.py", line 169, in on_start
    string(self.colored.cyan(' \n', self.startup_info())),
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/celery/apps/worker.py", line 230, in startup_info
    results=self.app.backend.as_uri(),
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/celery/backends/base.py", line 118, in as_uri
    url = maybe_sanitize_url(self.url or '')
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/kombu/utils/url.py", line 63, in maybe_sanitize_url
    return sanitize_url(url, mask)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/kombu/utils/url.py", line 58, in sanitize_url
    return as_url(*_parse_url(url), sanitize=True, mask=mask)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/kombu/utils/url.py", line 24, in _parse_url
    return (scheme, unquote(parts.hostname or '') or None, parts.port,
  File "/usr/lib/python2.7/urlparse.py", line 113, in port
    port = int(port, 10)
ValueError: invalid literal for int() with base 10: 'cs'

Fix mapping of title, description and summary

When updating a dataset in CKAN:

  1. Changing the title does not change the dataset in data.world
  2. Changing the description changes both description and summary in data.world

Instead, the mapping from CKAN to data.world should be:

  1. ckan:ID -> dw:title
  2. ckan:title -> dw:description
  3. ckan:notes -> dw:summary

Fix file names and extensions

Currently, when I add a resource using:

The file created on data.world has the wrong name: Fruit and Vegetable Prices.xlsx?v=42082

Instead, the name should be: Fruit and Vegetable Prices.xlsx

We need some heuristics to choose the right extension here:

  1. If user provides file type, use the correct extension for the type
  2. Otherwise, if user provides extension with the file name, use the name as provided
  3. Otherwise, extract extension from URL (discarding query string completely)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.