ecmwfcode4earth / challenges_2020 Goto Github PK

ECMWF Summer of Weather Code 2020 challenges

software-development machine-learning artifical-intelligence copernicus ecmwf meteorology climate atmosphere fire jupyter julia python

challenges_2020's Introduction

ECMWF Summer of Weather Code 2020

ECMWF Summer of Weather Code is a collaborative programme where each summer several developer teams work on innovative weather- and climate-related open-source software. ESoWC is organised by the European Centre for Medium-Range Weather Forecasts (ECMWF) and supported by Copernicus.

ESoWC 2020 Projects

Congratulations to the 11 teams that were part of ECMWF Summer of Weather Code 2020:

Project title	Team	Mentors	Supported by
Elefridge.jl	Milan Kloewer	Miha Razinger Juan-Jose Dominguez	WEkEO powered by CloudFerro
Forecasting Wildfire Danger with Deep Learning	Roshni Biswas Anurag Saha Roy Tejasvi S Tomar	Claudia Vitolo Tianran Zhiang	WEkEO powered by CloudFerro
ECMWF Conversational Virtual Assistant	Frank Lieber Michael Kuhn	Anna Ghelli Helen Setchell Michela Gusti	European Weather Cloud
HPC Performance Profiling Tool	Tiberiu Lepadatu	Olivier Marsden Michael Lange Clara Brune	European Weather Cloud
Air Quality Observation Classification 1	Gordan Rates	Miha Razinger Johannes Flemming	European Weather Cloud
Detect Anomaly in Air Quality Station (DAAQS)	Mohit Anand Kumar Shridar	Miha Razinger Johannes Flemming	European Weather Cloud
Exploring of machine/deep learning techniques to detect and track tropical cyclines	Ashwin Samudre	Linus Magnusson Pedro Maciel	European Weather Cloud
Applying AI capabilities to address Operations challenges in ECMWF Products Team	Adithya Niranjan Aditya Ahuja	Matthew Manoussakis Peter Dueben	European Weather Cloud
Creating Jupyter-based OpenIFS training material	Ayush Prasad	Olivier Marsden Michael Lange Adrien Oyono	European Weather Cloud
UNSEEN-Open	Timo Kelder	Julia Wagemann Christel Prudhomme

Generous cloud computing ressources are provided by:

the Copernicus DIAS service WEkEO (powered by CloudFerro)	European Weather Cloud

ESoWC 2020 Timeline

1. Application period: 27 Jan - 22 Apr 2020

Browse through the ESoWC 2020 challenges or develop your own project idea. Applications for ESoWC 2020 closed 22 April 2020.

2. Announcement of selected proposals: 1 May 2020

The final ESoWC 2020 project teams were announced on 1 May 2020.

3. Coding period: 4 May - 30 Sep 2020

The 5-month long coding period started 4 May 2020 and lasts until 30 September 2020. During this time, the selected teams team up with experienced mentors and experts in weather, climate, machine-learning and cloud-computing.
Follow the progress of the projects here on Github.

4. Final ESoWC day @ECMWF: 16 October 2020

The ESoWC day @ECMWF is the final day of the programme, where each team presented their project results. Watch the recordings:

Important links

ESoWC Frequently Asked Questions
ESoWC Terms & Conditions
Follow ESoWC on Twitter, LinkedIn and YouTube

challenges_2020's People

Contributors

Stargazers

Watchers

Forkers

mhdella xiaoya27 teoxu gavin971 tnp1618 farhadrclass chanjeunlam coronath3beer

challenges_2020's Issues

Challenge #12 -Performance-portable implementation of ECRad in Julia

Challenge #12 - Performance-portable implementation of ECRad in Julia

Stream 1 - Weather-related software and applications

IMPORTANT: this challenge is eligible to apply for cloud credits from the European Weather Cloud.

Goal

Performance-portable implementation of ECRad radiation package in Julia

Mentors and skills

Mentors: @mlange05 , @marsdeno
Skills required
- Julia programming language
- GPU optimization (optional)

Challenge description

While traditional compiled programming languages, like Fortran or C/C++, provide raw performance needed for HPC, modern dynamic languages like Python are often much more concise and user-friendly. The Julia programming language promises to provide the best of both worlds, providing multiple dispatch, dynamic typing and native interoperability with C and Python, while using Just-in-Time compilation (JIT) to provide performance. In addition, since compilation is delayed until runtime, Julia code can also leverage hardware-specific compilation optimizations allowing it to target CPU and GPU architectures from a single code base or with only a small amount of specialization [1].

In this project we aim to evaluate performance portability for Numerical Weather Prediction (NWP) models by porting the ECRad radiation package, a key component of the operational ECMWF model [2], to Julia. The key deliverable of this project is to develop a new Julia-based version of ECRad that can be validated against existing test cases. The performance of the port will be compared to the operational Fortran baseline, and if possible further optimized for GPUs or ARM architectures [3]. The ultimate aim of this project would be a demonstration of production-grade NWP code that is capable of targeting multiple HPC Architectures from a single code base.

References

[1] Julia GPU Programming
[2] Hogan and Bozzo, "A Flexible and Efficient Radiation Scheme for the ECMWF Model", 2018
[3] Julia optimization for GPUs or ARM architecture

Challenge #11 - Creating Jupyter-based OpenIFS training material

Stream 1 - Weather related software and applications

IMPORTANT: this challenge is eligible to apply for cloud credits from the European Weather Cloud.

Goal

Create a new set of interactive online training materials for scientific training courses, based on OpenIFS and Jupyter notebooks in the cloud.

Mentors and skills

Mentors: @mlange05 , @marsdeno @aoyono
Skills required:
- Jupyter notebooks
- Python

Challenge description

In this project we are aiming to demonstrate the use of modern cloud technologies to provide interactive online scientific OpenIFS training based on containerized Jupyter notebooks. Building on a previous ESoWC project that demonstrated the feasibility of running OpenIFS in Jupyter notebooks via Python wrappers, we are now looking to productize this solution and integrate it with elements of the ECMWF software stack, such as Metview and the Copernicus Climate Data Store (CDS), enabling a new style of training and teaching materials.

The current set of teaching materials uses a user-compiled binary version of the OpenIFS, with some emphasis during the course put on installation. Configuration of OpenIFS runs during courses is performed via complex namelist files. Analysis of results is done in a separate manual workflow. In this project we are aiming to demonstrate an alternative approach that uses online, interactive teaching materials that hide some of the technical details behind a user-friendly Python API, allowing a greater scientific focus of the training material. Leveraging the outcome of a previous ESoWC project that successfully demonstrated how to run OpenIFS via Python wrappers, this project will focus on enhancing this early prototype with additional interfaces for model configuration and output plotting, as well as integrating it with the Copernicus CDS API. The final aim is to create a set of self-contained demonstrator notebooks that can be run via Jupyter notebooks, hosted locally or remotely. Hosting on Amazon Cloud Services (AWS) will be tested.

Challenge #26 -Forecasting wildfire danger using deep learning

Challenge #26 - Forecasting wildfire danger using deep learning

Stream 2 - Machine-Learning and Artificial Intelligence

IMPORTANT: this challenge is eligible to apply for cloud credits from WEkEO. Please specify the cloud resources you think will be needed in your proposal.

Goal

The project aims to explore whether a deep learning model could be used to predict wildfire danger at various lead times.

Mentors and skills

Mentors: @cvitolo @tianranZH
Skills required
- Experience with Deep Learning
- Understanding of wildfire danger

Challenge description

The Global ECMWF Fire Forecasting (GEFF) system uses Numerical Weather Predictions to drive a number of empirical models to predict forest fire danger indices up to 10/15 days ahead. The current system, written in Fortran, works in both reanalysis and forecast (deterministic and probabilistic) mode. The most widely used fire danger indices were originally designed by Canadian fire experts and later calibrated to better predict fire regimes and patterns occurring in Europe. This means the performance of the forecasts varies widely across the world, working well only in regions like North America and Europe. There might be underlying factors and phenomena ignored or not well captured by the GEFF system, or usage of outdated input which prevents the fire danger forecasts from performing equally well in other regions of the globe. 

A machine learning (deep learning) approach (U-Net etc) could be used to explore the relationship between the weather information and the expected fire danger and gain valuable insights. 

This project will focus on the following:

Build a model to forecast fire danger using the same inputs used by GEFF
Explore possible improvements to the model by including additional inputs, e.g. SMOS (soil moisture and vegetation), other relevant remote sensing data, etc
Explore the possibility to extend predictions to longer time scales (e.g. seasonal)

We would like interested developers to provide a clear implementation plan, including a description of the model to be used and a validation strategy.

Suggested deliverables and milestones:

Build a DL model with reported accuracy, training speed using the inputs and output of GEFF.
Use external validation data (ground based meteorological observations, satellite based product [GFAS]) to evaluate the performance of DL model and GEFF, propose potential solutions in DL model to improve the accuracy compared to external validation data.
Update the DL model with the agreed solution from deliverable2 and explore the possibility to extend DL model from deliverable3 to longer time scales.
Final report of established DL model.

We value proposals that are:

clearly described, including a timeline for deliverables and milestones,
technically feasible within 4 months,
proposing a scientifically-sound approach,
applicable to any place on Earth
open source, well documented and easy to maintain

Stream 3 - Questions

Propose your own project idea to receive cloud credits from the European Weather Cloud

Stream 3 - Cloud-based weather and climate innovations

IMPORTANT: this challenge is ONLY eligible to apply for cloud credits from European Weather Cloud.

Goal

Do you have a project idea that makes use of ECMWF or Copernicus open data but you would need cloud processing resources to process the data? This is your chance to get access to cloud credits and gain experience in cloud computing.

Mentors and eligibility requirements

Mentors:
- you will get assistance (if needed) on how to use cloud services and we will follow up the progress of your project during the coding period
Requirements
- Your project has to be related to a weather, climate or atmospheric application

Who is it for?

Are you a student with an idea for your final student project that requires cloud computing resources?
Are you interested in geospatial cloud services, but you never had a chance to work with cloud services?

What does a good proposal make?

it is well structured and precise
the solution proposed is technically feasible within 4 months
it contains a clear timeline with milestones and deliverables defined

An example structure could be:

Brief description of the problem to be solved
Proposed solution (including AWS tools, data used and link to AWS Simply Monthly calculator)
Key milestones and deliverables
Timeline
Plans to share the project outcomes

Questions

Post here any question related to Stream 3.

Challenge #21 -Exploring or machine/deep learning techniques to detect and track tropical cyclones

Challenge #21 - Exploring machine/deep learning techniques to detect and track tropical cyclones

Stream 2 - Machine-Learning and Artificial Intelligence

IMPORTANT: this challenge is eligible to apply for cloud credits from the European Weather Cloud.

Goal

From this project we expect to see a program (preferably based on Python3) that will use a machine/deep learning module to learn to recognize tropical cyclones of different intensities.

The project will consist of two stages:

learn to detect tropical cyclones and their intensities using historical information based on a set of satellite images and the BestTrack (https://www.ncdc.noaa.gov/ibtracs/) database/
test whether the algorithm is able to reliably detect tropical cyclones using real time information (forecasts)

Mentors and skills

Mentors: @linusmagnusson @pmaciel
Skills required
- Machine / Deep learning

Challenge description

Tropical cyclones are among the most devastating weather systems on the planet, and accurate predictions of them are essential to make reliable warnings. For these warnings, the information about the track of the cyclone and its intensity is required and needs to be extracted from forecasts.
From observations (mainly satellite images), tropical cyclones are manually detected, and the estimated positions and intensities are broadcasted and put into the IBTrACS (“BestTrack”) database. The classification of the cyclone intensity is in most of the cases based on the so called Dvorak technique.

Numerical weather forecast models iteratively predict a range of variables in a 4-dimensional hypercube/tensor (3 spatial dimensions plus time). A challenge is to find tropical cyclone features in this vast amount of data and to extract the track of the feature. Currently, the applied tracking algorithms are based on the surface pressure, circulation further up in the atmosphere and the temperature in the core of the cyclone. The algorithms are configured from human experience.
The current weather models have the capability to use the 3-dimensional output to simulate satellite images. This open up the possibility to use the Dvorak technique also on model output to classify the cyclones. While the Dvorak technique is based on human experience, we here seek to explore machine learning technique to recognize and classify tropical cyclones.

We propose to explore face recognition techniques to

Train the algorithm to detect a tropical cyclone from a satellite image (real or simulated from analysis data). To train the data we will use the observed tropical cyclones from the BestTrack database to build up a library of past cyclones and match with satellite images.
In a second stage the outcome from the training period will be used on simulated satellite images from forecasts to track tropical cyclone features and classify the intensity.

If successful, we can in the future consider to apply this technique to other meteorological features that leads to severe weather.

More information

Use face-recognition algorithm to track tropical cyclones

Challenge #13 -Interactive visualization of HPC performance data

Challenge #13 - Interactive visualization of HPC performance data

Stream 1 - Weather related software and applications

IMPORTANT: this challenge is eligible to apply for cloud credits from the European Weather Cloud.

Goal

Develop interactive analysis tools for visualizing IFS performance data.

Mentors and skills

Mentors: @mlange05 , @marsdeno @crbrune
Skills required
- Python plotting libraries
- Performance analysis (optional)

Challenge description

The continuous integration cycle of the IFS model is able to provide a regular stream of performance data, such as component runtimes, I/O and parallelisation overheads. This performance data is currently gathered and stored in a tabular format that can be converted to common data science formats, such as pandas.DataFrame.

In this project we are aiming to develop a set of interactive visualization tools based on Python visualization packages (eg. matplotlib[1], Bokeh[2] or Altair[3]) to better track HPC performance of the model. An initial set of performance metrics, based on the IFS' internal performance monitoring tools (DrHook, GStats, ECProf) will be provided, which will be used to explore and demonstrate different interactive visualization interfaces.

References

Challenge #24 -A simple Global Air Quality Data Classification

Challenge #24 - A Simple Global Air Quality Data Classification

Stream 2 - Machine-Learning and Artificial Intelligence

Goal

Simple clustering and quality control algorithm to scrutinize air quality observations from different networks worldwide.

Mentors and skills

Mentors: @miha-at-ecmwf @JohannesFlemming
Skills required
- Clustering techniques
- Data quality control knowledge
- Statistical analysis skills

Challenge description

Data

PM2.5, NO2 and ozone observations from the openAQ network (or similar).
CAMS operational forecast data for the same species and station locations

What is the current problem/limitation?*

CAMS lacks credible surface air quality observations in many parts of the world, often in the most polluted area such as in India or Africa. Some observations are available for these areas from data harvesting efforts such as openAQ but there is no quality control applied to the data, and it is often not well known if the observations are made in a rural, urban or heavily polluted local environment. This information on the environment is important because the very locally influenced measurements are mostly not representative for the horizontal scale (40 km) of the CAMS forecasts and should therefore not be used for the evaluation of the CAMS model.

What could be the solution?*

Use AI clustering techniques to identify classes in the observed AQ data.
Identify outliers in the data set and consult with CAMS experts if they are erroneous data.
Check classification with meta data such as population statistics.
Derive similar cluster from the modelled data and compare against classification derived from the observed data

Further directions

Investigate the potential to improve CAMS forecast for major cities worldwide using the information from those observations

Challenge #25 -Virtual assistant for users of ECMWF online products and services

Challenge #25 - Virtual assistant for users of ECMWF online products and services

Stream 2 - Machine-Learning and Artificial Intelligence

Goal

To reduce the number of questions which need answering in person. The AI should ‘converse’ with the user to guide users to what they need.

Mentors and skills:

Mentors: @annaghelli @kiden @MichyG
Skills required
- Knowledge of Dialog flow or similar
- Understanding of content (indexing, metadata, etc)
- Confluence and Jira Service Desk and their APIs

Challenge description

Data and systems to be used

Dialog flow or similar (virtual assistant)
Confluence user doc space (current location of documentation and content to train the virtual assistant)
Jira Service Desk (current location of documentation and content to train the virtual assistant)

What is the current problem/limitation?*

We want create an effectiveness and efficient support service for our users. Currently answers to any support question require human resources independently from the complexity of the request.

What could be the solution?

The idea is to move some of the basic queries to a self-serve service whereby users can answer their questions helped by a virtual assistant. The basic metadata to train the virtual assistant are available on an internal wiki and ticketing service (Confluence and JIRA service desk, part of the Atlassian suit).
As an example of the virtual assistant we propose Dialog flow or similar.

Challenge #23 -What is an optimal number of vertical model levels to represent atmospheric trace gases

Challenge #23 - What is an optimal number of vertical model levels to represent atmospheric trace gases?

Stream 2 - Machine-Learning and Artificial Intelligence

Goal

Finding an optimal vertical resolution that requires a minimum number of vertical levels to represent the vertical structure of atmospheric traces gases (ozone, NO2, CO2, and aerosols), which are simulated by the ECMWF model on 137 model levels. Different scenarios should be represented for reduction of the model levels in the range between 70 to 10 model levels.

Mentors and skills

Mentors: @miha-at-ecmwf @JohannesFlemming
Skills required
- Spatial (3D) data analysis of large data set
- Statistical analysis skills
- Familiarity with the concept of the vertical discretization in meteorological models will be helpful but is not essential

Challenge description

Data

CAMS operational forecast data for ozone, NO2, aerosols and CO2 at the original 137 model levels.

What is the current problem/limitation?

The IFS uses 137 levels in the atmosphere to simulate the weather as well as transport and the sources and sinks of atmospheric trace gases. The traces gases tend to have pronounced vertical gradients predominately close to the surface and for ozone and NO2 also in the stratosphere. Far fewer than the 137 model level may be sufficient to represent the vertical variability of the trace gases.

What could be the solution?

Use data mining techniques to identify the areas of strong typical and episodical vertical variability in the CAMS data. These areas of gradient can vary for the different chemical species.
Define a set of useful evaluation metrics for the accuracy of the reduced model set. Scientific guidance will be given.
Find an optimal common subset of vertical models levels that provides the best cost (loo in accuracy) to benefit (data volume reduction) ratio valid for all species

Further directions

Apply the method to also reduce the horizontal resolution in areas with little gradients such as over the ocean or int the stratosphere.

Challenge #22 -Applying AI capabilities to address Operations challenges in ECMWF Products Team

Challenge #22 - Applying AI capabilities to address Operations challenges in ECMWF Products Team

Stream 2 - Machine-Learning and Artificial Intelligence

Goal

To apply AI capabilities to analyse log data in real-time to be able to predict issues before they occur.

Mentors and skills

Mentors: @dueben @Matthew-Manoussakis
Skills required
- Data science experience
- Strong experience in building AI/ML algorithms
- Experience in Linux
- Experience in Python3
- Experience in applying AIOps ideas on Operational environments would be desirable

Challenge description

Due to the explosion of data in recent years - known as the data avalanche - many companies can no longer cope with the rapid growth in data volumes and the variety of logs produced by their IT environments. On the other hand, ensuring the services' availability and performance is more critical than ever for most businesses.
Leading companies are turning to artificial intelligence (AI) for IT operations (AIOps*) to analyze data real time and predict issues before they occur.
This enables them to continuously track and assess the status of their services to improve monitoring and troubleshooting.

Our services in brief

The ECMWF Meteorological Archival and Retrieval System (MARS) enables users to retrieve meteorological data in GRIB/NetCDF via:

the MARS client on ECMWF computers such as ecgate
the Web API service (supported Python client software)

In Products team, we are managing the services above and we provide tailored data to Member State users, commercial users and public users.
Our services above produce massive amounts of multi-structured log file every day, spread in several disparate systems, which include underused or hidden valuable information.

Project description

Naturally, the scale and complexity of our services and infrastructures makes monitoring and troubleshooting an increasing challenge.
The suggested project is exploratory research, that investigates how the application of AI/ML techniques can be used to improve Operations in products team.
This would enable our team to proactively understand the behaviour of our services, to take preventative actions manually or ideally through automation, to reduce MTTR and to improve user experience.
If successful, the developed tools could be extended to improve the operational fidelity of other ECMWF services.

Possible datasets available:

Machine logs produced by Web-API and MARS (stored in Splunk)

Expected Outcomes

Documentation
Analysis
Working Python software

Additional information

AIOPS: “AIOps” stands for “artificial intelligence for IT operations.” Originally coined by Gartner in 2017

Challenge #14 -Size, precision, speed -pick two

Challenge #14 - Size, precision, speed - pick two

Stream 1 - Weather related software and applications

IMPORTANT: this challenge is eligible to apply for cloud credits from WEkEO.

Goal

Optimising GRIB and NetCDF data encoding methods that we use operationally for CAMS atmospheric composition data at ECMWF.

The work on this project could help us to reduce both volume of data we store in our archive and the amount we disseminate to the users while preserving useful information.

Mentors and skills

Mentors: @juanjodd @miha-at-ecmwf
Skills required
- Some knowledge of meteorological data formats (GRIB, NetCDF) and libraries which are used to decode and manipulate them (ecCodes, netcdf, cdo, nco, ..)
- Some knowledge about data encoding (data packing, accuracy, compression methods)
- Knowledge of statistical metrics to understand and quantify errors due to different data encoding methods
- Knowledge of a software library to compute and present the above metrics
- Familiarity with a Chemical Transport Model (CTM) to be able to better appreciate non-linear aspects of the problem would be beneficial

Challenge description

Data and software

We plan to use the CAMS global real-time forecast dataset, ecCodes and NetCDF libraries to test different configurations and estimate data encoding errors and software library to compute and present results (possibly Python/numpy/matplotlib or R).

What is the current problem?

There is a lot of artificial precision in the current data encoding setup, CAMS data takes a long time to archive and download.

What could be the solution?*

We would like to remove artificial precision from the encoded fields without loss of useful information. At the same time we need to be conscious of operational constrains, so data encoding and decoding steps do not become prohibitively expensive.

The desired solution would be a combination of data encoding settings and step to achieve this goal.

Ideas for the implementation

Some things to address: a more appropriate bitsPerValue, log packing, various data compression algorithms, bit grooming.