Giter Site home page Giter Site logo

mdheller / coopy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from paulfitz/coopy

0.0 0.0 0.0 12.38 MB

distributed spreadsheets with intelligent merges

Home Page: http://share.find.coop

License: Other

CMake 1.03% Shell 0.43% Java 23.89% PHP 0.01% Python 0.05% Ruby 0.51% NSIS 0.04% JavaScript 4.90% C++ 16.83% C 51.58% Makefile 0.18% HTML 0.46% Tcl 0.08% CSS 0.01%

coopy's Introduction

The COOPY toolbox

Build Status

Diffing, patching, merging, and revision-control for spreadsheets and databases. Focused on keeping data in sync across different technologies (e.g. a MySQL table and an Excel spreedsheet).

The main programs

  • ssdiff - generate diffs for spreadsheets and databases.
  • sspatch - apply patches to spreadsheets and databases.
  • ssmerge - merge tables with a common ancestor.
  • ssformat - convert tables from one format to another.
  • ssfossil - the fossil DVCS, modified to use tabular diffs rather than line-based diffs. You can also work with git. If you use github, you may want to check out CSVHub, which uses a simplified version of ssdiff called daff to show pretty data diffs on github.
  • coopy - a graphical interface to ssfossil.

Supported data formats

  • CSV (comma separated values)
  • SSV (semicolon separated values)
  • TSV (tab separated values)
  • Excel formats (via gnumeric's libspreadsheet)
  • Other spreadsheet formats (via gnumeric's libspreadsheet)
  • Sqlite
  • PostgreSQL
  • MySQL
  • Microsoft Access format (via mdbtools - READ ONLY, or via jackcess for read/write)
  • A JSON representation of tables.
  • A custom "CSVS" format that is a minimal extension of CSV to handle multiple sheets in a single file, allow for unambiguous header rows, and have a clear representation of NULL.

Supported diff formats

Example uses

  • Enumerating differences between any pairwise combination of CSV files, database tables, or spreadsheets.
  • Applying changes to a database or spreadsheet, without losing meta-data (formatting of spreadsheet, indexing/type information for database). Particularly useful for applying changes in an exports CSV file back to the original source.
  • Editing a MySQL/Sqlite database in gnumeric/openoffice/Excel/...
  • Distributed editing of a spreadsheet/database using a DVCS. Benefits: revision history, offline editing in tool of choice, self-hosting possible.

Features

  • By default, when comparing tables, no initial assumption is made about schema similarity. Column names are not required to exist, or to be preserved between tables. The number and order of columns may also differ.
  • If schema changes are not expected, COOPY can be directed to use certain columns as a trusted identity for rows (a key).
  • Respects row order for table representations for which row order is meaningful (spreadsheets, csv).
  • By default, COOPY assumes your data is very messy. If it is clean, you can get much faster results by tweaking some options.

Algorithm

The core of the COOPY toolbox is a 3-way comparision between an ancestor and two descendents. First, rows are compared using bags of substrings drawn from across all columns. Once corresponding rows are known, columns are compared, again using bags of substrings. Row and column assignments are optimized and ordered using a Viterbi lattice. Once the pairwise relationships between each descendent and its ancestor are known, differences are computed, and a good merged ordering is determined (again using the Viterbi algorithm).

Installing on OSX

  • Use homebrew.
  • Do brew tap paulfitz/data to get a formula for coopy.
  • Install XQuartz from http://xquartz.macosforge.org
  • Then brew install coopy should work fine.

Installing on Windows

Installing on Linux

  • Sorry, this is where I develop myself, but I don't have an installer. Building is easy though!

Building

  • For a stripped-down js/py/rb/php version see http://paulfitz.github.io/daff/
  • See BUILD.md for information on building the programs.
    • Summary: CMake
  • See SERVE.txt for server-side information.
    • Summary: fossil
  • See COPYING.txt for copyright and license information.
    • Summary: GPL. Relicensing of library core planned for version 1.0.

Status

COOPY targets a stable, fully-documented release at version 1.0. At the time of writing, the version number is just beyond 0.5. It is about half way there.

Apparently COOPY is the closest thing right now to git for data:

But if you deal with big data sets and don't care so much about diffs and patches and whatnot, you may want to look at dat:

coopy's People

Contributors

paulfitz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.