Giter Site home page Giter Site logo

kghmanuel / marklogic-contentpump Goto Github PK

View Code? Open in Web Editor NEW

This project forked from marklogic/marklogic-contentpump

0.0 2.0 0.0 227.3 MB

Command-line tool to import, export, and copy data to or from MarkLogic databases

Home Page: http://developer.marklogic.com/products/mlcp

License: Apache License 2.0

Java 98.82% HTML 0.41% CSS 0.15% Batchfile 0.05% Shell 0.03% JavaScript 0.26% XQuery 0.28%

marklogic-contentpump's Introduction

MarkLogic Content Pump and MarkLogic Connector for Hadoop

MarkLogic Content Pump (mlcp) is a command-line tool that provides the fastest way to import, export, and copy data to or from MarkLogic databases. Core features of mlcp include:

  • Bulk load billions of local files
  • Split and load large, aggregate XML files or delimited text
  • Bulk load billions of triples or quads from RDF files
  • Archive and restore database contents across environments
  • Export data from a database to a file system
  • Copy subsets of data between databases
  • Load documents from HDFS, including Hadoop SequenceFiles

You can run mlcp across many threads on a single machine or across many nodes in a Hadoop cluster.

The Hadoop Connector is an extension to Hadoop’s MapReduce framework that allows you to easily and efficiently communicate with a MarkLogic database from within a Hadoop job. mlcp uses the Hadoop Connector internally, but it can also be used to build Hadoop MapReduce jobs that interact with MarkLogic, outside of mlcp, such as in a larger Hadoop application. Core features of the Hadoop Connector include:

  • Process data in MarkLogic with Hadoop MapReduce for bulk analytics or transformation
  • Persist data from Hadoop to MarkLogic for query and update
  • Access MarkLogic text, geospatial, scalar, and document structure indexes to send only the most relevant data to Hadoop for processing
  • Write results from MapReduce jobs to MarkLogic in parallel

Getting Started

Documentation

For official product documentation, please refer to:

Wiki pages of this project contain useful information when you work on development:

Required Software

Build

mlcp and Hadoop Connector can be built together. Steps to build:

$ git clone https://github.com/marklogic/marklogic-contentpump.git
$ cd marklogic-contentpump
$ mvn clean package -DskipTests=true

The build writes to the respective deliverable directories under the top-level ./mlcp/ and ./mapreduce/ directories.

Alternatively, you can build mlcp and the Hadoop Connector independently from each component’s root directory (i.e. ./mlcp/ and ./mapreduce/) with the above command. Note that mlcp depends on the Hadoop Connector, so a successful build of the Hadoop Connector is required to build mlcp.

For information on contributing to this project see CONTRIBUTING.md. For information on working on development of this project see project wiki page.

Tests

The unit tests included in this repository are designed to provide illustrative examples of the APIs and to sanity check external contributions. MarkLogic Engineering runs a more comprehensive set of unit, integration, and performance tests internally. To run the unit tests, execute the following command from the marklogic-contentpump/ root directory:

$ mvn test

For detailed information about running unit tests, see Guideline to Run Tests.

Have a question? Need help?

If you have questions about mlcp or the Hadoop Connector, ask on StackOverflow. Tag your question with mlcp and marklogic. If you find a bug or would like to propose a new capability, file a GitHub issue.

Support

mlcp and the Hadoop Connector are maintained by MarkLogic Engineering and distributed under the Apache 2.0 license. They are designed for use in production applications with MarkLogic Server. Everyone is encouraged to file bug reports, feature requests, and pull requests through GitHub. This input is critical and will be carefully considered. However, we can’t promise a specific resolution or timeframe for any request. In addition, MarkLogic provides technical support for release tags of mlcp and the Hadoop Connector to licensed customers under the terms outlined in the Support Handbook. For more information or to sign up for support, visit help.marklogic.com.

marklogic-contentpump's People

Contributors

jxchen-us avatar mattsunsjf avatar itsshivaverma avatar karshuntsoi avatar jmakeig avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.