Giter Site home page Giter Site logo

seismichadoop's Introduction

Seismic Hadoop

Introduction

Seismic Hadoop combines Seismic Unix with Cloudera's Distribution including Apache Hadoop to make it easy to execute common seismic data processing tasks on a Hadoop cluster.

Build and Installation

You will need to install Seismic Unix on both your client machine and the servers in your Hadoop cluster.

In order to create the jar file that coordinates job execution, simply run mvn package.

This will create a seismic-0.1.0-job.jar file in the target/ directory, which includes all of the necessary dependencies for running a Seismic Unix job on a Hadoop cluster.

Running Seismic Hadoop

The suhdp script in the bin/ directory may be used as a shortcut for running the following commands. It requires that the HADOOP_HOME environment variable is set on the client machine.

Writing SEG-Y or SU data files to the Hadoop Cluster

The load command to suhdp will take SEG-Y or SU formatted files on the local machine, format them for use with Hadoop, and copy them to the Hadoop cluster.

suhdp load -input <local SEG-Y/SU files> -output <HDFS target> [-cwproot <path>]

The cwproot argument only needs to be specified if the CWPROOT environment variable is not set on the client machine. Seismic Hadoop will use the segyread command to parse a local file unless it ends with ".su".

Reading SU data files from the Hadoop Cluster

The unload command will read Hadoop-formatted data files from the Hadoop cluster and write them to the local machine.

suhdp unload -input <SU file/directory of files on HDFS> -output <local file to write>

Running SU Commands on data in the Hadoop cluster

The run command will execute a series of Seismic Unix commands on data stored in HDFS by converting the commands to a series of MapReduce jobs.

suhdp run -command "seismic | unix | commands" -input <HDFS input path> -output <HDFS output path> \
    -cwproot <path to SU on the cluster machines>

For example, we might run:

suhdp run -command "sufilter f=10,20,30,40 | suchw key1=gx,cdp key2=offset,gx key3=sx,sx b=1,1 c=1,1 d=1,2 | susort cdp gx" \
    -input aniso.su -output sorted.su -cwproot /usr/local/su

In this case, Seismic Hadoop will run a MapReduce job that applies the sufilter and suchw commands to each trace during the Map phase, and then sorts the data by the CDP field in the trace header during the Shuffle phase, and then performs a secondary sort on the receiver locations for each CDP gather in the Reduce phase. There are a few things to note about running SU commands on the cluster:

  1. Most SU commands that are specified are run as-is by the system. The most notable exception is susort, which is performed by the framework, but is designed to be compatible with the standard susort command.
  2. If the last SU command specified in the command argument is an X Windows command (e.g., suximage, suxwigb), then the system will stream the results of running the pipeline to the client machine, where the X Windows command will be executed locally. Make sure that the CWPROOT environment variable is specified on the client machine in order to support this option.
  3. Certain commands that are not trace parallel (e.g., suop2) will not work correctly on Seismic Hadoop. Also, commands that take additional input files will not work properly because the system will not copy those input files to the jobs running on the cluster. We plan to fix this limitation soon.

seismichadoop's People

Contributors

jwills avatar mortenbpost avatar

Stargazers

Xiang Xiao avatar Parsa BR avatar  avatar  avatar  avatar iAnt avatar BlancosWay avatar  avatar  avatar Cody Burns avatar Allan avatar Anthony Lichnewsky avatar sssfeather avatar Iman Rousta avatar Victor avatar Pradeep avatar  avatar carlos avatar Michelangelo Partipilo avatar Muhammad Fuad Dwi Rizki avatar Kyrre Wahl Kongsgård avatar Tan Pin Siang avatar PrescienceAI avatar Mark C. Williams avatar  avatar vanguard_space avatar  avatar Henryk Modzelewski avatar Rajesh Koilpillai avatar  avatar

Watchers

Todd Lipcon avatar Roman V Shaposhnik avatar Eric Sammer avatar Muhammad Fuad Dwi Rizki avatar Henry Robinson avatar Sarah avatar Brock Noland avatar  avatar Jon Natkins avatar  avatar James Cloos avatar Prasad Mujumdar avatar vanguard_space avatar Xuefu avatar Woody Christy avatar Asim Ghaffar avatar Mark Grover avatar  avatar Jairam avatar  avatar  avatar  avatar  avatar  avatar Szehon Ho avatar Abhijeet dhumal avatar Tony Huinker avatar Ma Qingzhen avatar Shiraz Ali avatar Bessenyei Balázs Donát avatar  avatar  avatar  avatar  avatar

seismichadoop's Issues

Processed file not getting re-loaded

Hi,

I'm using Apache Hadoop cluster + seismic-0.1.0-job.jar.

A SegY file gets loaded properly and also I'm able to perform seismic operations on it e.g Hilbert Transform + Whitening.

When I unloaded this processed file on the local file system by name AGC_Hilbert.segy(signifying that Hilbert Transform has been performed on this file) and tried to reload it, I got an error:

/Loading processed file/
./suhdp load -input /home/hd/omkar/AGC_Hilbert.segy -output /sufiles/AGC_Hilbert.su /home/hd/seismicunix

Reading input file: /home/hd/omkar/AGC_Hilbert.segy
BIG_ENDIAN :BIG_ENDIAN

/home/hd/seismicunix/bin/segyread: format not SEGY standard (1, 2, 3, 5, or 8)
1+0 records in
6+1 records out
3200 bytes (3.2 kB) copied, 0.000103235 s, 31.0 MB/s
Bytes read: 0
Callback list size 1
Bytes written: 0
path: /sufiles/AGC_Hilbert.su
path: hdfs://172.25.38.87:9000/user/hd
path: hdfs://172.25.38.87:9000/user/hd
parent: /sufiles

I flipped through the code and I think that the issue exists because(not sure!) in SegyUnloader.java, the part files formed post-processing are directly written to a DataOutputStream to a local file WITHOUT enforcing the BIG ENDIAN order unlike SUReader.java.

Thanks and regards !!!

Processing error

Hey,

I was trying to run some commands with the seismichadoop and encountered the following error.
It seems some miss match with classes.

Can you please help me out to remove it?

Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.getNamedOutputsList(CrunchMultipleOutputs.java:210)
at org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.checkNamedOutputName(CrunchMultipleOutputs.java:197)
at org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.addNamedOutput(CrunchMultipleOutputs.java:256)
at org.apache.crunch.io.impl.FileTargetImpl.configureForMapReduce(FileTargetImpl.java:65)
at org.apache.crunch.io.impl.FileTargetImpl.configureForMapReduce(FileTargetImpl.java:50)
at org.apache.crunch.impl.mr.plan.MSCROutputHandler.configure(MSCROutputHandler.java:63)
at org.apache.crunch.io.impl.FileTargetImpl.accept(FileTargetImpl.java:71)
at org.apache.crunch.impl.mr.plan.MSCROutputHandler.configureNode(MSCROutputHandler.java:51)
at org.apache.crunch.impl.mr.plan.JobPrototype.build(JobPrototype.java:138)
at org.apache.crunch.impl.mr.plan.JobPrototype.getCrunchJob(JobPrototype.java:114)
at org.apache.crunch.impl.mr.plan.MSCRPlanner.plan(MSCRPlanner.java:111)
at org.apache.crunch.impl.mr.MRPipeline.plan(MRPipeline.java:144)
at org.apache.crunch.impl.mr.MRPipeline.run(MRPipeline.java:154)
at org.apache.crunch.impl.mr.MRPipeline.done(MRPipeline.java:183)
at com.cloudera.seismic.crunch.SUPipeline.run(SUPipeline.java:152)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.cloudera.seismic.segy.Main.main(Main.java:35)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Experiences with large segy files?

Hi,

I'm playing around with this again and this time I've brought som larger files. More specifically > 100gb each segy. When loading these files using suhdp -load it's insanely slow. I'm getting write speeds to HDFS at around 500-600kb/s. I've tried on vanilla Hadoop, CDH and also on a CDH cluster of xlarge instances on EC2 with the segy located locally on the node loading it.

Anyone with similar experience?

Thanks! :)

Log4J vulnerability

Hi all,
A vulnerability (CVE-2021-44228) was announced by the Apache Foundation, in their Log4J2 (AKA Log4J v2) logging library. This library is a common method to write to log files when using the Java language.
Does Seissee software utilize Log4J and is affected by CVE-2021-44228?
If yes, which versions are impacted?
Does this software require internet access to function?
Regards,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.