Giter Site home page Giter Site logo

hdfs-file-slurper's People

Contributors

alexholmes avatar kawaa avatar killerwhile avatar miguno avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hdfs-file-slurper's Issues

Not able to download the hdfs-file-slurper package

Hi

I want to transfer files from one Linux drop-zone server to HDFS.
If I am not wrong hdfs-file-slurper package will solve my requirement.

Can you help me the steps to use this component in my application.
I went through the readme.txt but not clear from where I can download, and run mvn package

Slurper doesn't work with Hadoop 2.2

The Slurper currently packages and includes Hadoop in the distro. The Hadoop jars shouldn't be included in the distro, and we should pick up whatever version is locally installed.

Mvn install issue

Hi- hitting error when trying to mvn package (On OSX if that matters):

INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 5.585s
[INFO] Finished at: Wed Oct 16 15:35:58 EDT 2013
[INFO] Final Memory: 6M/81M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project hdfs-slurper: Could not resolve dependencies for project com.alexholmes:hdfs-slurper:jar:0.1.5: Failed to collect dependencies for [commons-cli:commons-cli:jar:1.2 (compile), org.apache.commons:commons-exec:jar:1.1 (compile), commons-io:commons-io:jar:2.1 (compile), commons-lang:commons-lang:jar:2.6 (compile), commons-logging:commons-logging:jar:1.1.1 (compile), log4j:log4j:jar:1.2.16 (compile), org.mortbay.jetty:jetty:jar:6.1.26 (compile), org.apache.hadoop:hadoop-core:jar:0.20.2-cdh3u2 (compile), com.hadoop.compression.lzo:hadoop-lzo:jar:0.4.14 (compile), org.apache.hadoop:hadoop-test:jar:0.20.2-cdh3u2 (test), junit:junit:jar:4.9 (test)]: Failed to read artifact descriptor for xmlenc:xmlenc:jar:0.52: Could not transfer artifact xmlenc:xmlenc:pom:0.52 from/to cdh.snapshots.repo (https://repository.cloudera.com/content/repositories/snapshots): Failed to transfer file: https://repository.cloudera.com/content/repositories/snapshots/xmlenc/xmlenc/0.52/xmlenc-0.52.pom. Return code is: 409, ReasonPhrase:The repository 'libs-snapshot-local' rejected the artifact 'libs-snapshot-local:xmlenc/xmlenc/0.52/xmlenc-0.52.pom' due to its snapshot/release handling policy..
Thanks

Missing dependency: jetty

The build of the current 0.1.0 version of the Slurper fails with the following error:

   Tests in error: 
     testCopyFromLocalToDfs(com.alexholmes.hdfsslurper.WorkerThreadTest): org/mortbay/thread/ThreadPool

Enabling debug output (mvn ... -e -X) yields:

-------------------------------------------------------------------------------
Test set: com.alexholmes.hdfsslurper.WorkerThreadTest
-------------------------------------------------------------------------------
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.225 sec <<< FAILURE!
testCopyFromLocalToDfs(com.alexholmes.hdfsslurper.WorkerThreadTest)  Time elapsed: 2.156 sec  <<< ERROR!
java.lang.NoClassDefFoundError: org/mortbay/thread/ThreadPool
    at org.apache.hadoop.hdfs.server.namenode.NameNode$1.run(NameNode.java:367)

The problem is that pom.xml is missing the dependency for org.mortbay.jetty.jetty.

ClassNotFoundException during the maven Build

While I am trying to build the HDFS-file-slurper, build fails with the following error

[root@01HW288075 hdfs-file-slurper-master]# mvn clean install
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building HDFS File Slurper 0.1.7
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for commons-cli:commons-cli:jar:1.2 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for org.apache.commons:commons-exec:jar:1.1 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for commons-io:commons-io:jar:2.1 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for commons-lang:commons-lang:jar:2.6 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for commons-logging:commons-logging:jar:1.1.1 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for log4j:log4j:jar:1.2.16 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for org.apache.hadoop:hadoop-common:jar:2.2.0 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for org.apache.hadoop:hadoop-hdfs:jar:2.2.0 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for com.hadoop.gplcompression:hadoop-lzo:jar:0.4.19 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for org.apache.hadoop:hadoop-test:jar:1.2.1 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for junit:junit:jar:4.9 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[INFO]
[INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @ hdfs-slurper ---
[INFO] Deleting /home/root1/softs/dataslurper/hdfs-file-slurper-master/target
[INFO]
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ hdfs-slurper ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/root1/softs/dataslurper/hdfs-file-slurper-master/src/main/resources
[INFO]
[INFO] --- maven-compiler-plugin:2.0.2:compile (default-compile) @ hdfs-slurper ---
[INFO] Compiling 6 source files to /home/root1/softs/dataslurper/hdfs-file-slurper-master/target/classes
[INFO]
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ hdfs-slurper ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/root1/softs/dataslurper/hdfs-file-slurper-master/src/test/resources
[INFO]
[INFO] --- maven-compiler-plugin:2.0.2:testCompile (default-testCompile) @ hdfs-slurper ---
[INFO] Compiling 1 source file to /home/root1/softs/dataslurper/hdfs-file-slurper-master/target/test-classes
[INFO]
[INFO] --- maven-surefire-plugin:2.10:test (default-test) @ hdfs-slurper ---
[INFO] Surefire report directory: /home/root1/softs/dataslurper/hdfs-file-slurper-master/target/surefire-reports


T E S T S

Running com.alexholmes.hdfsslurper.WorkerThreadTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.156 sec <<< FAILURE!

Results :

Tests in error:
testCopyFromLocalToDfs(com.alexholmes.hdfsslurper.WorkerThreadTest): com/google/common/collect/Interners

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.459s
[INFO] Finished at: Fri Jul 25 14:02:48 IST 2014
[INFO] Final Memory: 15M/108M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.10:test (default-test) on project hdfs-slurper: There are test failures.
[ERROR]
[ERROR] Please refer to /home/root1/softs/dataslurper/hdfs-file-slurper-master/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

While viewing the source of the problem at com.alexholmes.hdfsslurper.WorkerThreadTest I get this message.

Test set: com.alexholmes.hdfsslurper.WorkerThreadTest

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.155 sec <<< FAILURE!
testCopyFromLocalToDfs(com.alexholmes.hdfsslurper.WorkerThreadTest) Time elapsed: 0.075 sec <<< ERROR!
java.lang.NoClassDefFoundError: com/google/common/collect/Interners
at org.apache.hadoop.util.StringInterner.(StringInterner.java:48)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2108)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2001)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1918)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:893)
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:355)
at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:338)
at com.alexholmes.hdfsslurper.WorkerThreadTest.testCopyFromLocalToDfs(WorkerThreadTest.java:89)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:69)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:48)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:292)
at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.Interners
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:264)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:332)
... 38 more
Kindly help me in resolving the issue.

Customizable destination via a script - Unable to copy files form LFS to HDFS

I wan to dynamically transfer the files from LFS (Local File System) to HDFS (Hadoop) based on their extension .For that i have written a small script(shell script) that check the extension of file and based on the extension it moved to there specified directory in HDFS but my problem is - files are failed to copy form LFS to HDFS.

#!/bin/bash
function loop() {
for i in "$1"/*
do
if [ -d "$i" ]; then
loop "$i"
elif [ -e "$i" ]; then
if [[ $i == *.txt ]]; then
hdfs_dest="hdfs://data/txtfiles/"
echo $hdfs_dest
elif [[ $i == *.csv ]]; then
hdfs_dest="hdfs://data/csvfiles/"
echo $hdfs_dest
else
echo "Hello!!"
fi
else
echo "$i"" - Folder Empty"
fi
done
}
loop "/path/to/source/directory"

Slurper is not detecting copying file

slurper is start copying the file as soon as it detect the file in local file system. its not waiting till full copy.

I am setting below property in the configuration file
VERIFY = true
Please let me know if i'm missing anything.

Note : I am using winscp tool for uploading files into slurper in directory

Post processing script?

Any possibility to get a PostProcessing script config entry developed?

Also would you clarify what is meant by "Capability to write "done" file after completion of copy"? Does this refer to the "COMPLETE_DIR," config entry?

Build fails at downloading xmlenc 0.52

When trying to build via mvn clean install, it fails at downloading xmlenc 0.52.

$ mvn clean install
[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building HDFS File Slurper 0.1.5
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for commons-cli:commons-cli:jar:1.2 is missing, no dependency information available
[WARNING] The POM for org.apache.commons:commons-exec:jar:1.1 is missing, no dependency information available
[WARNING] The POM for commons-io:commons-io:jar:2.1 is missing, no dependency information available
[WARNING] The POM for commons-lang:commons-lang:jar:2.6 is missing, no dependency information available
[WARNING] The POM for commons-logging:commons-logging:jar:1.1.1 is missing, no dependency information available
[WARNING] The POM for org.mortbay.jetty:jetty:jar:6.1.26 is missing, no dependency information available
Downloading: https://repository.cloudera.com/content/repositories/snapshots/xmlenc/xmlenc/0.52/xmlenc-0.52.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/commons-net/commons-net/1.4.1/commons-net-1.4.1.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/tomcat/jasper-runtime/5.5.23/jasper-runtime-5.5.23.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/tomcat/jasper-compiler/5.5.23/jasper-compiler-5.5.23.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/org/codehaus/jackson/jackson-core-asl/1.5.2/jackson-core-asl-1.5.2.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/org/codehaus/jackson/jackson-mapper-asl/1.5.2/jackson-mapper-asl-1.5.2.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/javax/servlet/jsp/jsp-api/2.1/jsp-api-2.1.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/commons-el/commons-el/1.0/commons-el-1.0.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/net/java/dev/jets3t/jets3t/0.6.1/jets3t-0.6.1.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/javax/servlet/servlet-api/2.5/servlet-api-2.5.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/hsqldb/hsqldb/1.8.0.7/hsqldb-1.8.0.7.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/oro/oro/2.0.8/oro-2.0.8.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/org/eclipse/jdt/core/3.1.1/core-3.1.1.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/org/apache/ftpserver/ftplet-api/1.0.0/ftplet-api-1.0.0.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/org/apache/mina/mina-core/2.0.0-M5/mina-core-2.0.0-M5.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/org/apache/ftpserver/ftpserver-core/1.0.0/ftpserver-core-1.0.0.pom
Downloading: https://repository.cloudera.com/content/repositories/snapshots/org/apache/ftpserver/ftpserver-deprecated/1.0.0-M2/ftpserver-deprecated-1.0.0-M2.pom
[WARNING] The POM for junit:junit:jar:4.9 is missing, no dependency information available
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11.563s
[INFO] Finished at: Mon Sep 23 12:08:16 CEST 2013
[INFO] Final Memory: 6M/238M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project hdfs-slurper: Could not resolve dependencies for project com.alexholmes:hdfs-slurper:jar:0.1.5: Failed to collect dependencies at org.apache.hadoop:hadoop-core:jar:0.20.2-cdh3u2 -> xmlenc:xmlenc:jar:0.52: Failed to read artifact descriptor for xmlenc:xmlenc:jar:0.52: Could not transfer artifact xmlenc:xmlenc:pom:0.52 from/to cdh.snapshots.repo (https://repository.cloudera.com/content/repositories/snapshots): Failed to transfer file: https://repository.cloudera.com/content/repositories/snapshots/xmlenc/xmlenc/0.52/xmlenc-0.52.pom. Return code is: 409 , ReasonPhrase:Conflict. -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

xmlenc is an indirect dependency through hadoop-core. Is that something which I should poke Hadoop's devs about? (Or is it something on my end?)

File are getting moved instead of copy

Hi Alex,

I am going through api hdfs-file-slurper & it is great stuff.

My requirement is a little bit different.

I am transferring from Hadoop to LFS. File is getting transferred but getting moved to the complete directory. I want to keep files on srcdir only.

Can you please suggest how to keep files on srcdir.

Appreciate your help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.