Exploration of HPC technologies
Scispark is a project extending Apache Spark.
scispark_zeppelin
contains zeppelin notebooks to explore climate data manipulation.
To install Scispark follow the installation instructions from the wiki page up to the zeppelin installation.
Note: some examples below may need more memory ressources than the default one. To make sure you do not have memory limitations issues, it is suggested you add the following lines to your zeppelin-env.sh
:
export SPARK_SUBMIT_OPTIONS=--driver-memory 8G --executor-memory 8G
You can view zeppelin notebooks with the zeppelin viewer
Calculate and plot Canadian year precipitation average for 1950 and 1951
Data
The notebook needs the following files
nrcan_canada_daily_pr_1950.nc
nrcan_canada_daily_pr_1951.nc
http://outarde.crim.ca:8083/thredds/dodsC/birdhouse/data/nrcan/nrcan_canada_daily/
You will need a list of path to netCDF files in the format
/path/to/nrcan_canada_daily_pr_1950.nc
/path/to/nrcan_canada_daily_pr_1951.nc
Once the file is created, change the definition of the variable dataListPath
in the notebook with its path.