Giter Site home page Giter Site logo

ning / hfind Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pierre/hfind

2.0 10.0 1.0 157 KB

Find implementation for hadoop

Home Page: http://www.opengroup.org/onlinepubs/009695399/utilities/find.html

License: Apache License 2.0

Shell 0.28% Java 99.72%

hfind's Introduction

hfind, find(1) implementation for Hadoop

Getting started

Download the latest version of the hfind tarball at:

http://github.com/pierre/hfind/downloads

The tarball contains two files, a shell script and a jar.

You first need to configure hfind to find your Hadoop installation, via the HFIND_OPTS variable:

export HFIND_OPTS="-Dhfind.hadoop.ugi='pierre\,pierre' -Dhfind.hadoop.namenode.url=hdfs://namenode.company.com:9000"

You can then simply run ./hfind from the exploded tarball directory:

[pierre@mouraf ~/downloads]$ ls
metrics.hfind-1.0.0-SNAPSHOT.tar.gz
[pierre@mouraf ~/downloads]$ tar zxvf metrics.hfind-1.0.0-SNAPSHOT.tar.gz
hfind
target/metrics.hfind-1.0.0-SNAPSHOT-jar-with-dependencies.jar
[pierre@mouraf ~/downloads]$ export HFIND_OPTS="-Dhfind.hadoop.ugi='pierre\,pierre' -Dhfind.hadoop.namenode.url=hdfs://namenode.ning.com:9
000"
[pierre@mouraf ~/downloads]$ ./hfind /user/pierre -type f -name "file*.dat" -o -type d -name a
/user/pierre/testing/a
/user/pierre/testing/a/b/fileb.dat
/user/pierre/testing/a/c/filec.dat
/user/pierre/testing/a/filea.dat
/user/pierre/testing/a/fileaa.dat

Examples

[pierre@mouraf ~]$ hfind /user/pierre -type f -name "file*.dat" -o -type d -name a
/pierre/testing/a
/pierre/testing/a/b/fileb.dat

[pierre@mouraf ~]$ hfind /user/pierre -mtime +3 -type d
/user/pierre/2010/09
/user/pierre/2010/09/20
/user/pierre/2010/09/21

Usage

hfind follows as close as possible the POSIX specification documented here:

http://www.opengroup.org/onlinepubs/009695399/utilities/find.html

As of 2010, some options cannot be implemented (e.g. symbolic links).

-maxdepth/-d follows the BSD implementation:

 -d      Cause find to perform a depth-first traversal, i.e., directories are visited in post-order and all entries in a
         directory will be acted on before the directory itself.  By default, find visits directories in pre-order, i.e.,
         before their contents.  Note, the default is not a breadth-first traversal.
         This option is equivalent to the -depth primary of IEEE Std 1003.1-2001 (``POSIX.1'').

-maxdepth n
         Descend at most n directory levels below the command line arguments.  If any -maxdepth
         primary is specified, it applies to the entire expression even if it would not normally be evaluated.
         -maxdepth 0 limits the whole search to the command line arguments.

See TODO below for work in progress.

Things to remember

A directory size is always reported as zero. Use -empty to find empty directories.

A directory mtime is the date of the latest modification of any file in the directory, going one level only. You can have more recent files in subdirectories:

[pierre@mouraf ~]$ find /tmp/a -ls
45404725        0 drwxr-xr-x    4 pierre   wheel         136 Sep 23 17:10 /tmp/a
45404727        0 drwxr-xr-x    3 pierre   wheel         102 Sep 23 17:11 /tmp/a/b
45404730        0 -rw-r--r--    1 pierre   wheel           0 Sep 23 17:11 /tmp/a/b/hello
45404726        0 -rw-r--r--    1 pierre   wheel           0 Sep 23 17:10 /tmp/a/hello

In the previous example, mtime of /tmp/a is 17:10 although, it contains /tmp/a/b/hello, modified later.

TODO

  • Respect parentheses

  • Implement -prune

  • Implement -exec and -ok

  • Implement -perm

  • Implement -depth and -mindepth

Build

mvn assembly:assembly

License

See COPYING file.

Copyright 2010 Ning

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

hfind's People

Contributors

pierre avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

strategist922

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.