Giter Site home page Giter Site logo

gfdrr / ccdr-tools Goto Github PK

View Code? Open in Web Editor NEW
10.0 9.0 7.0 72.48 MB

Geoanalytics for climate and disaster risk screening

Home Page: https://gfdrr.github.io/CCDR-tools/

Jupyter Notebook 72.89% Python 17.77% Makefile 0.07% CSS 0.13% R 9.13%
climate disaster hazard risk

ccdr-tools's People

Contributors

artessen avatar connectedsystems avatar matamadio avatar pzwsk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ccdr-tools's Issues

Disaster data collection and elaboration

We are mostly levereging EM-DAT, with some additional stats fro Desinventar if the country is covered.

EM-DAT gives subnational reference for each event in a string of values.

immagine

It takes some manual work to extract the subnational stats and map them:

immagine

immagine

Would be nice to produce a script to make this a bit faster.

Preview to show custom color scale?

When viewing the result preview the color scale may be misleading due to outliers.

Suggest using a logarithmic colormap instead:

With absolute values:
image

With a custom color map that maps colors to minimum value, 0.1 quantile, 0.25 quantile, 0.5 quantile and maximum:

image

Although the color bar needs to be fixed up as the bin values at the lower end get squashed together:

image

TEMP - Global hazard data

The most relevant datasets (updated, high resolution, scientific quality) representing extreme events and long-term hazards that were considered for inclusion in the CCDR and other risk-related activities across the Bank have been listed below for each hazard, explaining their pros and cons and providing suggestions for improvement.

Geophysical
Hydro-meteorological
Environmental factors
Earthquake River flood Air pollution
Tsunami Landslide  
Volcanic activity Coastal flood
Tropical cyclones
  Drought
  Extreme heat
  Wildfires

Some hazards are modelled using a probabilistic approach, providing a set of scenarios linked to hazard frequency for the period of reference. For the current data availability, this is the case for floods, storm surges, cyclones, heatwaves, and wildfires.
Others, such as landslides, use a deterministic approach, providing an individual map of hazard intensity or susceptibility.

GEOPHYSICAL HAZARDS

Earthquake

Tsunami

Volcanic activity

HYDRO-METEOROLOGICAL HAZARDS

River floods

Flood hazard is commonly described in terms of flood frequency (multiple scenarios) and severity, which is measured in terms of water extent and related depth modelled over Digital Elevation Model (DEM). Inland flood events can be split into 2 categories:

  • Fluvial (or river) floods occur when intense precipitation or snow melt collects in a catchment, causing river(s) to exceed capacity, triggering the overflow, or breaching of barriers and causing the submersion of land, especially along the floodplains.
  • Pluvial (or surface water) floods are a consequence of heavy rainfall, but unrelated to the presence of water bodies. Fast accumulation of rainfall is due to reduced soil absorbing capacity or due to the saturation of the drainage infrastructures; meaning that the same event intensity can trigger very different risk outcomes depending on those parameters. For this reason, static hazard maps based on rainfall and DEM alone should be used with extreme caution.
Name Fathom flood hazard maps Aqueduct flood hazard maps
Developer Fathom WRI
Hazard process Fluvial flood, Pluvial flood Fluvial flood
Resolution 90 m 900 m
Analysis type Probabilistic Probabilistic
Frequency type Return Period (11 RPs) Return Period (10 RPs)
Time reference Baseline (1989-2018) Baseline (1960-1999); Projections – CMIP5 (2030-2050-2080)
Intensity metric Water depth [m] Water depth [m]
License Commercial Open data
Other Includes defended/undefended option  
Notes Standard for WB analysis The only open flood dataset addressing future hazard scenarios
  • Despite missing projections, Fathom modelling has consistently proven to be the preferred option due to its higher quality (better resolution, updated data and a more advanced modelling approach). There are, however, important details and limitations to consider for the correct use and interpretation of the model. The undefended model (FU) is typically the preferred product to use in assessments, since the defended model (FD) does not account for physical presence of defense measures, rather proxies the defense standard by using GDP as proxy (FLOPROS database).

  • WRI hazard maps are the preferred choice only in cases when 1) data needs to be open/public; 2) explicit climate scenarios are required, however the scientific quality and granularity of this dataset is far from the one offered by Fathom – and far from optimal, in general (low resolution, old baseline, simplified modelling).

It is important to note that pluvial (flash) flood events are extremely hard to model properly on the base of global static hazard maps alone. This is especially true for densely-populated urban areas, where the hazardous water cumulation is often the results of undersized or undermaintained discharge infrastructures. Because of this, while Fathom does offer pluvial hazard maps, their application for pluvial risk assessment is questionable as it cannot account for these key drivers.

A complementary perspective on flood risk is offered by the Global Surface Water layer produced by JRC using remote sensing data (Landsat 5, 7, 8) over the period1984-2020. It provides information on all the locations ever detected as max water level, water occurrence, occurrence change, recurrence, seasonality, and seasonality change. However, this layer does not seem to properly account for extreme flood events, I.e. recorded flood events for the period 1984-2020 most often exceed the extent of this layer. Hence it can be used to identify permanent and semi-permanent water bodies, but not to identify the baseline flood extent from past events.

---
align: center
---
Global Surface Water Layer

Coastal floods (storm surge)

Coastal floods occur when the level in a water body (sea, estuary) rises to engulf otherwise dry land. This happens mainly due to storm surges, triggered by tropical cyclones and/or strong winds pushing surface water inland. Like for inland floods, hazard intensity is measured using the water extent and associated depth.

Name Aqueduct flood hazard maps Global Flood map
Developer WRI-Deltares Deltares
Hazard process Coastal flood Coastal flood, SLR
Resolution 1 km 90 m, 1 km, 5 km
Analysis type Probabilistic
Frequency type Return Period (10 RPs) Return Period (6 RPs)
Time reference Baseline (1960–1999); Projections – CMIP5 (2030-2050-2080) Baseline (2018); Projections – SLR (2050)
Intensity metric Water depth [m] Water depth [m]
License Open data Access requested
Notes Includes effect of local subsidence (2 datasets) and flood attenuation. Modelled future scenarios. Essentially an evolution of the WRI

The current availability of global dataset is poor, with WRI products (recently updated by Deltares) representing the best option in terms of resolution and time coverage (baseline + scenarios), and water routing, including inundation attenuation to generate more realistic flood extent. The latest version has a much better resolution of 90 m based on MeritDEM or NASADEM, overcoming WRI limitations for local-scale assessment. Note that the Fathom is working to include coastal floods and climate scenarios in the next version (3) of the dataset (coming sometime in 2023/24), which will likely become the best option for risk assessment in the next future.

Additional datasets that have been previously used in WB coastal flood analytics are:

Name Coastal flood hazard maps Coastal risk screening
Developer Muis et al. (2016, 2020) Climate Central
Hazard process Coastal flood Mean sea level
Resolution 1 km
Analysis type Probabilistic
Frequency type Return Period (10 RPs) One layer per period
Time reference Baseline (1979–2014) Baseline; Projections
Intensity metric Water depth [m] Water extent
License Open data Licensed
Notes The update of Muis 2020 has been considered; however, the available data does include easily applicable land inundation, only extreme sea levels. Does use simple bathtub distribution without flood attenuation – does not simulate extreme sea events.

Both these models seem to be affrom a simplified bathtub modelling approach, projecting unrealistic flood extent already under baseline climate conditions.

As shown in figure below, considering the minimum baseline values (least impact criteria), the flood extent drawn by the Climate Central layer is similar to the baseline RP100 from Muis, in the middle - both generously overestimating water spreading inland even under less extreme scenarios [the locaiton of comparison is chosen as both the Netherlands and N Italy are low-lying areas, which are typically the most difficult to model].
In comparison, the WRI is far from perfection (it is also a bathtub model), but it seems to apply a more realistic max flood extent, which ultimately makes it more realistic for application.

---
align: center
---
Quick comparison of coastal flood layers over Northern Europe under baseline conditions, RP 100 years.

Sea level rise

Landslide

Landslides (mass movements) are affected by geological features (rock type and structure) and geomorphological setting (slope gradient). Landslides can be split into two categories depending on their trigger:

  • Dry mass movements (rockfalls, debris flows) are driven by gravity and can be triggered by seismic events, but they can also be a consequence of soil erosion and environmental degradation.
  • Wet mass movements can be triggered by heavy precipitation and flooding and are strongly affected by geological features (e.g. soil type and structure) and geomorphological settings (e.g., slope gradient). They do not typically include avalanches.
Name Global landslide hazard layer Global landslide susceptibility layer
Developer ARUP NASA
Hazard process Dry (seismic) mass movement Wet (rainfall) mass movement Wet (rainfall) mass movement
Resolution 1 km 1 km
Analysis type Deterministic Deterministic
Frequency type none none
Time reference Baseline (rainfall trigger) (1980-2018)
Intensity metric Hazard index [-] Susceptibility index [-]
License Open
Notes Based on NASA landslide susceptibility layer. Median and Mean layers provided. Although not a hazard layer, it can be accounted for in addition to the ARUP layer.

Landslide hazard description can rely on either the NASA Landslide Hazard Susceptibility map (LHASA) or the derived ARUP layer funded by GFDRR in 2019. This dataset considers empirical events from the COOLR database and model both the earthquake and rainfall triggers over the existing LHASA map. The metric of choice is frequency of occurrence of a significant landslide per km2, which is however provided as synthetic index (not directly translatable as time occurrence probability).

---
align: center
---
Example from the ARUP landslide hazard layer (rainfall trigger, median): Pakistan. The continuos index is displayed into 3 discrete classes (Low, Medium, High).

Tropical cyclones

Tropical cyclones

Tropical cyclones (including hurricanes, typhoons) are events that can trigger different hazard processes at once such as strong winds, intense rainfall, extreme waves, and storm surges. In this category, we consider only the wind component of cyclone hazard, while other components (floods, storm surge) are typically considered separately.

Name GAR15-IBTrACS IBTrACSv4 STORMv3
Developer NOAA NOAA IVM
Hazard process Strong winds Strong winds Strong winds
Resolution 30 km 10 km 10 km
Analysis type Probabilistic Historical Historical, Probabilistic
Frequency type Return Period (5 RPs) Return periods (10 10,000 years)
Time reference Baseline (1989-2007) Baseline (1980-2022) Baseline (1984-2022)
Intensity metric Wind gust speed [5-sec m/s] Many variables Many variables
License Open data Open data Open data

A newer version (IBTrACSv4) has been released in 2018 and could be leveraged to generate an updated wind-hazard layer, with better resolution and possibly the inclusion of orography effect. There are several attributes tied to each event; the map shows the USA_WIND variable (Maximum sustained wind speed in knots: 0 - 300 kts) as general intensity measure.
The STORM database has recently released their new version (STORMv3), which includes synthetic global maps of 1) maximum wind speeds for a fixed set of return periods; and 2) return periods for a fixed set of maximum wind speeds, at 10 km resolution over all ocean basins. In addition, it contains the same set for events occurring within 100 km from a selection of 18 coastal cities and another for events occurring within 100 km from the capital city of an island.

More recently (2022), simulated tracks for climate change scenarios have been developed as described in Bloemendaal, et al., 2022. Both synthetic tracks and wind speed maps are available.

Drought & Water scarcity

Heat stress

Wildfires

ENVIRONMENTAL FACTORS

Air pollution

Accessing and processing Fathom 3 data

Fathom 3 purchase is completed and the data is currently on dropbox. Soon it will be moved to Azure data lake.
The data is split into 1 degree tiles for all the world, for each scenario. A csv lists the ISO-a2 country code for each tile.

We need an automatic selector of country, scenarios, and RPS to download the tiles and the get them directly into the processing.

It is unlikely we will be able to consider the full range of scenarios (280 layers) for each analysis; propose a selection.

Type Period Scenario Defence
Fluvial 2020 SSP 1/2.6 Undefended
Pluvial 2030 SSP 2/4.5 Defended
Coastal 2050 SSP 5/8.5

Return periods: 5, 10, 20, 50, 100, 200, 500, 1000

Fluvial: 112 global layers
2020: 2x8
2030: 2x3x8
2050: 2x3x8

Pluvial: 56 global layers
Note: there is no pluvial undefended; only defended option.
2020: 1x8
2030: 1x3x8
2050: 1x3x8

Coastal: 112 global layers
2020: 2x8
2030: 2x3x8
2050: 2x3x8

Update notebooks

Notebooks reflects the earlier version of the code, while the parallel version has implemented several improvements.
Update the individual hazard notebook according to that.
In particular check the EAI calculation as the notebooks seems to use frequency * impact instead of exc. frequency * impact.

Exposure datasets: migrate to other sources

As exposure indicators, we are currently using:

  • Worldpop for population
  • World Settlement Footprint for built-up
  • ESA for land cover (agri land)

Worldpop has shown it's limitations in several occasions across CCDR analytics, in both terms of pop distribution and total value.
WSF is great at 10m res, but underfunded and with uncertain developments - the connected pop dataset doesn't seem to come soon.
Moreover, the two datasets are independent from eachother, bringing sometimes unaligned exposure results for the two indicators.

Meanwhile, GHS (by JRC) has updated its data offer, with better resolution, new height/volumetric buildings data, new popolation layers, etc. It also offers projections up to 2030!
I think it will ultimately offer a more consistent analysis, and sure it seems more future-proof than what we are relying on now.

Add Wildfire hazard [optional feature, only wherever relevant]

Wildfire hazars is currently not included in the screening.

There are some recent options in terms of third-party global datasets to evaluate:

  • Fire danger indices historical data Copernicus
  • Fire burned area from 2001 to present derived from satellite observations Copernicus

A review of existing datasets and methodologies for fire hazard has been previously included in the hazard review document, but needs to be updated.

Wildfire

A wildfire is any uncontrolled burning of biomass and affected man-made assets, which spreads based on environmental conditions. The probability of wildfire occurrence is typically measured by the Fire Weather Index (FWI), possibly in conjunction with a fuel model.

Name Global Fire Weather Index Global fire danger re-analysis (1980–2018) for the Canadian Fire Weather Indices
Developer CSIRO Vitolo et al.
Hazard process Wildfire Wildfire
Resolution 10 km
Analysis type Probabilistic
Frequency type Return Period (3 RPs): 2, 5, 10 years
Time reference Baseline (36 years) Baseline (1980-2018)
Intensity metric Fire Weather Index
License Open data
Notes

The CSIRO datasets (Fig. 8), which drove the wildfire assessment in ThinkHazard and other applications, uses an approach which is entirely based on fire weather index climatology (Fire Weather Index, FWI) to assess both the onset of conditions that will allow fires to spread, as well as the likelihood of fire at any point in the landscape. The method uses statistical modelling (extreme value analysis) of a 36 years fire weather climatology from GFWED to assess the predicted fire weather intensity for specific return period intervals. These intensities are classified based on thresholds using conventions to provide hazard classes that correspond to conditions that can support problematic fire spread in the landscape if an ignition and sufficient fuel were to be present.

immagine
Figure 8. CSIRO FWI, RP 30 years.

Despite the fact that CSIRO tried rebalancing the distribution of hazard classes, the resulting FWI is strongly skewed towards extremes, as shown by the number of countries falling in each hazard rank.

FWI Hazard rank N. of countries
> 30 High 163
20 – 30 Medium 17
15 – 20 Low 2
<15 Very low 92

This raster shows values up to 300 (ten times the “high” threshold). The raster uses FWI ranks as averaged from various country studies. According to CSIRO report, the FWI method does not account for fuel, but just the meteorological forcing related to wildfire generation, the only masking has been applied for desert areas.

The index is compared to fire frequencies derived from the Global Fire Emissions Database (GFED4, Giglio et al. 2013) for the period 1997-Today, shown as an overlay in Fig. 9. The largest majority of recorded burning happened within the “high” hazard zones (in red), yet we notice some important discrepancies: general hazard overestimation for the Indian sub-continent and Europe; underestimation of fire hazard in some northern regions such as North America and North-East Asia.

immagine
Figure 9. FWI from CSIRO and overlay of pixel burnt over the period 1995 to 2016. Light grey values indicate at least 10% of pixel burnt over the period from 1995 to 2016, and black indicates frequent recurrent burning of the entire pixel.

Simply put, meteorological conditions are not sufficient to trigger a wildfire if there is no fire trigger and fuel to burn. A fuel layer should be applied to mask out the areas that cannot produce the hazard (e.g. no vegetation). Values from GlobCover 2009 (resolution 300 m) corresponding to vegetation are used to identify potential fuel and applied as mask for the CSIRO layer. The simple masking of non-vegetated areas produces some improvement, especially for in the Indian sub-continent (Fig. 10).

immagine
Figure 10. CSIRO (RP10) masked (white) for non-vegetated areas (300 m).

To test if vegetation aggregation and threshold criteria can further improve the filtering, a mask of vegetated areas is produced from Globcover 2009 on a 10km grid by flagging as “vegetated” only those cells with more than 10% of vegetated area. This is also required to match with the resolution of the FWI layer. The vegetation grid is used as a binary mask, that means that vegetation density is not used as a weight (but that information is stored and available). The effect of this filtering impacts the most in the area of North India/Nepal, with little to no effect in other places (Fig. 11).

immagine
Figure 11. Vegetation aggregated on a 0.7 degree cell with criteria vegetation >10% of area.

Since the filtering through the vegetation mask does not look sufficient to fix the apparent overestimation of fire hazard provided by CSIRO layers, new global datasets were explored and compared. Updated fire indices from Vitolo et al (2019) are aggregated from 38 years of global reanalysis of wildfire danger (Fig. 12). The dataset used to produce the analysis and the final products are available for download.

immagine
Figure 12. FWI 100-year mean (1980-2018) from Vitolo et al. masked for vegetation <10%.

The whole dataset consists of seven indices, each of which describes a different aspect of the effect that fuel moisture and wind have on fire ignition probability and its behavior, if started. Three indices measure the soil moisture: Fine Fuel Moisture Code (FFMC), Duff Moisture Code (DMC), Drought Code (DC). From these, the FWI model generates two fire behaviour indeces: Initial Spread Index (ISI), Build Up Index (BUI). Then the model generates the Fire Weather Index (FWI) and Daily Severity Rating (DSR). For convenience, each index is archived separately. All datasets are calculated using a daily time step by interpolating the atmospheric fields at local noon when fire conditions are considered to be at their worst. Fig. 13 shows how much larger the CSIRO value is compared to Vitolo. Even in area that both data rank as high class, the difference in value is enormous.

immagine
Figure 13. Difference in the FWI index is calculated as CSIRO(value) – Vitolo(value).

To better understand how hazard ranking matches with observations, the FWI is compared with a fire density map (fig. 14) produced from the NASA MODIS fire archive M6 (2000 to present) and distributed by FIRMS. Points representing fire events are counted on the same grid as FWI: only “vegetation fire” type is considered (Type = 0). Confidence value (0-100) can be also used to filter out uncertain events. Threshold of 30% confidence is applied. This reduces the sample from 42 to 38 thousand records. FRP (Fire Radiative Power, expressed in MW) depicts the pixel-integrated fire radiative power and can be potentially used as weight for event severity.

immagine
Figure 14. Point density map of vegetation fire events (confidence >30%) from MODIS remote sensing using Fire Radiative Power as unit of intensity.

The MODIS map appears consistent (at least in relative terms) to the one from GFED4 (fig. 15).
immagine
Figure 15. Burned ground from GFED4.

In both cases, we can notice some important differences when comparing empirical fire maps against the FWI rankings. See central Africa in fig. 16 as example.

immagine
Figure 16. Comparing MODIS event grid and Vitolo FWI index.

One partial explanation is provided by the fact that MODIS fire archive considers agricultural fires as vegetation fires, as found when comparing vegetation mask and MODIS events. The vegetation map masks the MODIS grid almost perfectly; one notable exception, the Punjab region, which is in fact excluded as it is identified as post-flooding agricultural land. The high number of events here are waste fires from agricultural activities, as confirmed by NASA focus also in central Africa. These are fires that do not require any FWI severity or natural fuel to happen, posing an issue on using these observed fires as validation for FWI. Further details about wildfire data and comparison are found in a dedicated doc.

Align data analytics pipeline and risk dashboard

The objective is to plug country risk outputs from CCDR analytics into a global dashboard.

immagine

To do so effectively, we want to do the least amount of changes between the output of the python analytics and the input required by the dashboard.

I think we already agree on the list of output fields to plot, and mostly on the way to plot them.

ADM0_CODE Unique identifier
ADM1_CODE Unique identifier
ADM2_CODE Unique identifier
ADM3_CODE Unique identifier
ADM3_NAME ADM unit names
ADM2_NAME ADM unit names
ADM1_NAME ADM unit names
ADM0_NAME ADM unit names
ADM4_pop Total population count
ADM4_builtup Total builtup extent (ha)
ADM4_agr Total agricultural land (ha)
FL_pop_EAI Expected mortality from river floods (population count)
FL_pop_EAI% Expected mortality from river floods (% of ADM3 population)
FL_builtup_EAI Expected damage on builtup from river floods (hectars)
FL_builtup_EAI% Expected damage on builtup from river floods (% of ADM3 builtup)
FL_EAE_agri Expected damage on agricultural land from river floods (hectars)
FL_EAE_agri% Expected damage on agricultural land from river floods (% of ADM3 builtup)
CF_pop_EAI Expected mortality from river floods (population count)
CF_pop_EAI% Expected mortality from river floods (% of ADM3 population)
CF_builtup_EAI Expected damage on builtup from river floods (hectars)
CF_builtup_EAI% Expected damage on builtup from river floods (% of ADM3 builtup)
CF_EAE_agri Expected damage on agricultural land from river floods (hectars)
CF_EAE_agri% Expected damage on agricultural land from river floods (% of ADM3 builtup)
DR_S1_30p Frequency of agricultural stress affecting at least 30% of arable land during Season 1 (percentage of historical period 1984-2022)
DR_S2_30p Frequency of agricultural stress affecting at least 30% of arable land during Season 2 (percentage of historical period 1984-2022)
SW_BU_EAI Expected annual impact from tropical cyclon strong winds over builtiup (hectares)
SW_BU_EAI% Expected annual impact from tropical cyclon strong winds over builtiup (% of ADM3 builtup)
LS_pop_C3 Population within landslide hazard zones class 3 (high)
LS_builtup_C3 Built-up within landslide hazard zones class 3 (high

Ranking of risk: combining indices, comparability in time and space

In a couple of works along the CCDR (Caribbean) we got asked to:

  1. express the individual risks score into some combined metric;
  2. rank the risk in a way that is comparable with other countries of future periods (i.e. a risk growing from "medium" today to "high" in the future).

For option 1, we thought of this:

  • Consider the EAI or EAE, both as absolute and % over total, for each exposure category within an hazard (e.g. FL_Pop_EAI, FL_Pop_EAI%, FL_Builtup_EAI, FL_Builtup_EAI%, FL_Crops_EAE, FL_Crops_EAE%)
  • Normalise 0-1 each of the column using max and min (excluding zeros)
  • Calculate the GEOMEAN between the normalised scores: we obtain a new score 0-1 indicating the intra-country risk ranking for each hazard (actually using the double-inverse GEOMEAN calculation to make it conservative towards the max).
  • In the same way, we calculate the geomean of each hazard normalised score to produce a general risk ranking.

This ranking is only useful to set priorities within a country. In this sense, "low" risk score wouldn't necessarily mean that that is low in absolute terms. The same non-normalised value could correspond to "high" risk in another country.
This is also not useful if you want to compare to the future of the same country, as the future scores would be again normalised 0-1 (i.e. can't turn the amp to 11).

To tackle this, after discussion with OECS people, I came out in extremis with option 2: which is simply expert based thresholds for each hazard, only accounting for the relative value (EAI% and EAE%). Each hazardXexposure has its own threshold, purely expert based, accounting both for the data distribution and general rules of thumb (generalisation potential). The individual scores are not combined. I reckon this is very sensitive to ADM size, i.e. an EAI=50% could correspond to 1 people in some unit, and to 10,000 in another unit.

immagine

But I hadn't the chance to think too much - also I'm very much over the agreed contract time.

Any suggestion is welcome, as I feel it won't be the last time we get this kind of requests.

Impact function used in classification?

So I tried to create new notebook for heat stress.
It requires classification only, with multiple RPs. So I stripped down the code of ifs related to function, including the lines that specify impact_array.

        if exp_cat_dd.value == 'pop':
            impact_array = mortality_factor(fld_array)
        elif exp_cat_dd.value == 'builtup':
            impact_array = damage_factor_builtup(fld_array)
        elif exp_cat_dd.value == 'agri':
            impact_array = damage_factor_agri(fld_array)

But these are required to build the impac_rst, which seems to be used in both procedures??

        # Create raster from array
        impact_rst = xr.DataArray(np.array([impact_array]).astype(np.float32), 
                                  coords=hazard_data.coords, 
                                  dims=hazard_data.dims)
        
        if save_inter_rst_chk.value:
            impact_rst.rio.to_raster(os.path.join(OUTPUT_DIR, f"{country}_LS_{rp}_{exp_cat}_hazard_imp_factor.tif"))

Please clarify, am I messing things? impact_array should not be part of the classification approach!
I checked the results for classification and I'm pretty sure they are correct - no function used. But then why the script doesn't work if function is not specified? :(

The expected notebook for this has:

  • NO choice of analytical procedures - just classes intervals
  • Does only combine selected EXP category with specified hazard classes (bins) - no impact function!
  • Individual, global hazard layers (3 RPs) instead of country clips

Framework for Return Period calculation

Some hazard layers are produced from annul data into individual "total" or "mean" value layers, such as:

  • Drought (frequency of events over an exposure threshold)
  • Air pollution (mean of mean annual values, see also #17 )
  • New heat indices (see #14)

Looking to improve the representativeness of these data, we could develop an approach to obtain a probabilistic representation of hazard in terms of multiple return periods from a long series of observed or simulated past records.

CCDR development plan

See also Trello board

OBJECTIVES

Updated 07/2023

More efficient spatial processing to work on large countries at high resolution, with better user control on the input data and integrated output presentation. Align the climate indices to new CCKP service features.

NEW TOOL FEATURES

  • Code optimisation, data splitting and parallel processing
  • Uniformed interface with more user customisation
  • Risk Dashboard
  • Direct API access to data where allowed

DATA UPDATE

Hazard geodata

  • Full-range probabilistic analisys (all RPs considered)
  • Flood:
    • Damage to crops: check paper
    • Framework for geostatistical model comparison (i.e. quick maps and charts)
  • Coastal flood
    • New deltares model 90 m (via STAC catalogue)
  • Drought:
    • differentiate drought types, each one an appropriate index:
      • Agricultural: ASI
      • Climatological: SPEI
      • Socio-economic: WCI or others
      • Energy production?
    • easier interpretation of results and projections - see #19
    • turn into probabilistic - see Veldkamp 2022
  • Heat
  • New global layers proposed to be developed by Vito
  • switch to or add the UTCI. New Copernicus dataset
  • Wildfires - see #26
  • Air pollution
    • Frequency of days over threshold instead of period mean (#17)
  • Aridity - add new layer
  • Explore any useful stuff in the Planetary data catalogue

Exposure geodata

Vulnerability

  • Vulnerability functions
    • Air pollution - better mortality rates or function (#17)
    • #18

Poverty

  • Poverty data: compare DHS, RWI and census-based indices
  • New Poverty Mapping by UNICEF AI4D

Climate indices

Past disasters

Analytical approach

  • Investigate GEE potential for cloud processing

Data presentation

Extend to more return periods

I have been trying to add more hazard layers (return periods) in the flood analysis.

valid_RPs = [5, 10, 20, 50, 75, 100, 200, 250, 500, 1000]

But output is still just for 10, 100 and 1000, because the output structure is fixed:

  if analysis_type == "Function":
        # Sum all EAI to get total EAI across all RPs
        result_df.loc[:, f"{exp_cat}_EAI"] = result_df.loc[:, result_df.columns.str.contains('_EAI')].sum(axis=1)

        # Calculate Exp_EAI% (Percent affected exposure per year)
        result_df.loc[:, f"{exp_cat}_EAI%"] = (result_df.loc[:, f"{exp_cat}_EAI"] / result_df.loc[:, f"{adm_name}_{exp_cat}"]) * 100.0
    
        # Reorder - need ADM code, name, and exp at the front regardless of ADM level
        result_df = result_df.loc[:, all_adm_code_tmp + all_adm_name_tmp +
                                [f"{adm_name}_{exp_cat}", f"RP10_{exp_cat}_tot", f"RP100_{exp_cat}_tot", f"RP1000_{exp_cat}_tot",
                                f"RP10_{exp_cat}_imp", f"RP100_{exp_cat}_imp", f"RP1000_{exp_cat}_imp", 
                                "RP10_EAI", "RP100_EAI", "RP1000_EAI", f"{exp_cat}_EAI", f"{exp_cat}_EAI%", "geometry"]]

Need to make output format aligned with custom RP selection

Curves for flood damage on crops

We could try to produce something based on this review:

https://link.springer.com/article/10.1007/s11069-022-05791-0

immagine

Although it is mainly rice crops.

Vulnerability curves for crops other than cereals should be implemented, given the importance that perennial crops and vegetables have in terms of economic value. Functions for forage crops (alfalfa, pastures or similar) could be useful to evaluate the impacts of extreme events on livestock and have not been considered in none of the reviewed studies.

The inclusion of field experiments to assess the effect of extremes on the different crop growth stage should be better studied by including field observations in the analysis, rather than using crop models results.

Coastal flood - wrong CSV output

The Gpkg is correct, while the csv assign numbers to wrong units.
Happens for Function selection. I don't see how they can be different since is the same dataframe.

    if analysis_type == "Function":
        no_geom.to_csv(os.path.join(OUTPUT_DIR, f"{country}_CF_{adm_name}_{exp_cat}_EAI.csv"), index=False)
        result_df.to_file(os.path.join(OUTPUT_DIR, f"{country}_CF_{adm_name}_{exp_cat}_EAI.gpkg"))

Parallel code refinement

Parallelization WORKS on Linux and Win! Thanks @artessen and @ConnectedSystems for this magic!

Remains some issues to solve:

  • Does work for function, but not for classes
  • The code can be more efficient: we don't need EAI calculation as a raster. EAI calculation is done on the table output, after zonal aggregation, and presented as output chart. See #2

Auto-collection of Population data layers

LOW PRIORITY

Right now all data must be hand-feed.
The prototype for auto-feed from API is there for worldpop, only needs refinement with "year" selector.

It should be something like:

    if exp_cat_dd.value == 'pop':
      for year_pop_dd.value == 'year'
        try:
            exp_ras = f"{DATA_DIR}/EXP/{country}_WPOP{year}.tif"
        except ValueError:
            do the magic stuff

The magic stuff (api harvesting):

    # Load or save ISO3 country list
    iso3_path = os.path.join(DATA_DIR, "cache/iso3.json")
    if not os.path.exists(iso3_path):
        resp = json.loads(requests.get(f"https://www.worldpop.org/rest/data/pop/wpgp?iso3={country}").text)

        with open(iso3_path, 'w') as outfile:
            json.dump(resp, outfile)
    else:
        with open(iso3_path, 'r') as infile:
            resp = json.load(infile)

    # TODO: Download WorldPop data from API if the layer is not found (see except before)
    # Target population data files are extracted from the JSON list downloaded above
    metadata = resp['data'][1]
    data_src = metadata['files']

    Save population data to cache location
    for data_fn in tqdm(data_src):
        fid = metadata['id']
        cache_fn = os.path.basename(data_fn)

        # Look for indicated file in cache directory
        # Use the data file if it is found, but warn the user. 
        # (if data is incorrect or corrupted, they should delete it from cache)
        if f"{fid}_{cache_fn}" in os.listdir(CACHE_DIR):
            warnings.warn(f"Found {fid}_{cache_fn} in cache, skipping...")
            continue

        # Write to cache file if not found
        with open(os.path.join(CACHE_DIR, f"{fid}_{cache_fn}"), "wb") as handle:
            response = requests.get(data_fn)
            handle.write(response.content)

    Run analysis

I am discussing with DLR to make the same thing possible with WSF19 and WSF-Evo data, which would be cool to calculate change of risk across years.

Heat stress aggregation

We want to get as much as possible to one significant measure for ADM; this is nicely done with EAI with function approach;
for classification approach, need a to find a gimmick.

My idea for heat stress is to aggregate as Expected Annual Exposure (EAE), it can be calculated after # End RP loop as:

EAE = (affected_exp_RP5 / 5 + affected_exp_RP20 / 20 + affected_exp_RP100 / 100)
EAE% = EAE/ADM_population

Plot Exceedance Frequency Curve

Use the results to plot a chart of Annual Exceedence Probability and related EAI at the end, above or below map:

immagine

  • The total exposure (blue chart) has RP frequency as X, and total exposure as Y.
  • The total impact (orange chart) has RP frequency as X, and impacted exp as Y; and a label for total EAI.

As in:
immagine

Drought hazard: better indicator and projections

The drought frequency analysis is currently based on FAO Agricultural Stress Index. It is based on satellite observations of crop health since 1984, meaning there is no probabilistic modelling, just empirical data.

The current representation of drought hazard:

  • Combines cropland and pasture land
  • Shows 2 separate seasons
  • Measure hazard as frequency of impact over two threshold values of affected land: one third (30%) and half (50%).

Example:

This is aligned with the approach used by FAO website.
However, it is not the most intuitive metric to explain; either we simplify how it is expressed, or elaborate it into a new, easier to understand index.

@stufraser1 always interested in your suggestions if you have any

Efficiency of statistics

Currently we have this loop:

    for rp in valid_RPs:
        
        # Get total population for each ADM2 region
        pop_per_ADM = gen_zonal_stats(vectors=adm_data["geometry"], raster=pop_fn, stats=["sum"])
        
        result_df[f"{adm_name}_Pop"] = [x['sum'] for x in pop_per_ADM]

        # Load corresponding flood dataset
        flood_data = rxr.open_rasterio(os.path.join(flood_RP_data_loc, f"{country}_RP{rp}.tif"))

At the beginning, it run the zonal over total population. This should be out of the loop, since total population value does not depend on RP: it gets extracted 3 times, but the value is always the same.

However, the code fails if I move the line before the loop :(

Align climate scenarios to CCDR guidance

There has been some confusion across national teams on which are the scenarios to include.
We should follow the following from Hallegatte team:

immagine

  • Old RCP scenarios: 2.6, 4.5, 8.5
  • New SSP scenarios: SSP1-1.9, SSP2-4.5, SSP3-7.0

data(.tif) files in notebook/EXP

Where can I get the .tif files? I followed all the instructions from ReadME but got an error (Top-down/notebooks/EXP/NPL_WPOP20.tif: No such file or directory). Do help me with it.

New heat stress data

We have quite outdated and low-res WBGT layers from VITO analysis.
WBGT considers heavy labour under heat conditions, hence it is good to measure impact on health,

immagine

but it doesn't match with physical temperature and thus 1) generates a bit of confusion in the map interpretation 2) cannot be applied to other exp categories such as crop. Also, does not cover extreme cold.

immagine

We should explore the chance to switch to another metric, or add another metric: Universal Thermal Climate Index

More details about the indicator and thresholds
Even more details

We would also have the projections from Copernicus.

However, it would need to be turned into a probabilistic layer of extremes. See #16

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.