Giter Site home page Giter Site logo

webhdfs's Introduction

webhdfs

Build Status Coverage Status

Provide an interface to WebHDFS operations by leveraging configuration details from the clusterconf package. This includes functions to facilitate listing directory contents, creating, renaming, or deleting directories. These common commands have corresponding shortcut functions in this package (e.g. hdfs_ls, hdfs_makedir, hdfs_rename, hdfs_delete).

For WebHDFS operations that do not have corresponding functions in this package, it is possible to issue many of those commands through the lower-level functions hdfs_get and hdfs_put.

Requirements

This package dynamically builds the URL for WebHDFS services based on cluster settings for namenode, WebHDFS port number, and suffix. Specifically, the settings webhdfs.cluster.nn.url, webhdfs.cluster.webhdfs.port, and webhdfs.cluster.webhdfs.suffix must be defined. It is convenient to do this through the clusterconf package, although it also possible to set the necessary variables manually through the set_name_node_url, set_webhdfs_port, and set_webhdfs_suffix functions provided in this package.

Note that the namenode setting (webhdfs.cluster.nn.url) is a character array of length one or more. When the WebHDFS URL is built, the get_webhdfs_url function dynamically checks the provided namenodes for one that is active. If the Hadoop cluster is configured with high availability through namenode failover, one namenode is active and one is in standby mode at all times. Providing both URLs in the namenode setting protects the user from having to know which one is active at the time of the WebHDFS request.

Other settings are expected to be a character array of length one.

Installation

The easiest way is to install the latest development version from GitHub, for example with the devtools package.

devtools::install_github("mitre/webhdfs")

Usage

Here is a simple example to get up and running with this package, where my cluster name corresponds to a HDFS cluster whose configuration is defined using the protocol of the clusterconf package.

library(webhdfs)
set_cluster("my cluster name")
hdfs_ls("/data/")

The above example requires that a clusterconf package for my cluster name (e.g. clusterconf.myclustername) exists and has already been installed. If such a package does not exist, the same capability can be achieved manually by setting the name node url(s) as shown below. Suppose that my cluster has its primary namenode at mycluster-nn1.mydomain.com and its backup namenode at mycluster-nn2.mydomain.com. (Also assume that WebHDFS is enabled at the default port 50070 with the standard URL suffix webhdfs/v1.) Then the following commands will reproduce the capability of the block above.

library(webhdfs)
set_name_node_url(c("http://mycluster-nn1.mydomain.com",
                    "http://mycluster-nn2.mydomain.com"))
hdfs_ls("/data/")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.