Giter Site home page Giter Site logo

reynoldsm88 / spark-testing-base Goto Github PK

View Code? Open in Web Editor NEW

This project forked from holdenk/spark-testing-base

0.0 3.0 0.0 673 KB

Base classes to use when writing tests with Spark

License: Apache License 2.0

Scala 78.06% Perl 0.92% Shell 1.11% Python 10.83% Java 9.08%

spark-testing-base's Introduction

buildstatus codecov.io

spark-testing-base

Base classes to use when writing tests with Spark.

Why?

You've written an awesome program in Spark and now its time to write some tests. Only you find yourself writing the code to setup and tear down local mode Spark in between each suite and you say to your self: This is not my beautiful code.

How?

So you include com.holdenkarau.spark-testing-base [spark_version]_0.8.0 and extend one of the classes and write some simple tests instead. For example to include this in a project using Spark 2.2.0:

"com.holdenkarau" %% "spark-testing-base" % "2.2.0_0.8.0" % "test"

or

<dependency>
    <groupId>com.holdenkarau</groupId>
    <artifactId>spark-testing-base_2.11</artifactId>
    <version>${spark.version}_0.8.0</version>
    <scope>test</scope>
</dependency>

How to use it inside your code? have a look at the wiki page.

The Maven repositories page for spark-testing-base lists the releases available.

Minimum Memory Requirements and OOMs

The default SBT testing java options are too small to support running many of the tests due to the need to launch Spark in local mode. To increase the amount of memory in a build.sbt file you can add:

fork in Test := true
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")

If using surefire you can add:

<argLine>-Xmx2048m -XX:MaxPermSize=2048m</argLine>

Note: the specific memory values are examples only (and the values used to run spark-testing-base's own tests).

Special considerations

Make sure to disable parallel execution.

In sbt you can add:

parallelExecution in Test := false

In surefire make sure that forkCount is set to 1 and reuseForks is true.

Where is this from?

Much of this code is a stripped down version of the test suite bases that are in Apache Spark but are not accessible. Other parts are also inspired by sscheck (scalacheck generators for Spark).

Other parts of this are implemented on top of the test suite bases to make your life even easier.

What are some other options?

While we hope you choose our library, https://github.com/juanrh/sscheck , https://github.com/hammerlab/spark-tests , https://github.com/wdm0006/DummyRDD , and more https://www.google.com/search?q=python+spark+testing+libraries exist as options.

spark-testing-base's People

Contributors

holdenk avatar mahmoudhanafy avatar mrpowers avatar rylanhalteman avatar zouzias avatar kaatzee avatar bryanyang0528 avatar hgiddens avatar markdessain avatar nightscape avatar ponkin avatar sgt avatar chiefmanc avatar mandoz avatar pchundi avatar rgarciate avatar siklosid avatar juanrh avatar joshrosen avatar jackcviers avatar falloutdurham avatar felipehummel avatar felipefzdz avatar brkyvz avatar bryanv avatar borisclemencon avatar

Watchers

Michael Reynolds avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.