gfdrr / ccdr-tools Goto Github PK

View Code? Open in Web Editor NEW

10.0 9.0 7.0 72.48 MB

Geoanalytics for climate and disaster risk screening

Home Page: https://gfdrr.github.io/CCDR-tools/

Jupyter Notebook 72.89% Python 17.77% Makefile 0.07% CSS 0.13% R 9.13%

climate disaster hazard risk

ccdr-tools's People

Contributors

Stargazers

Watchers

Forkers

connectedsystems asfandyar-sabri danishjameel francho3 zia-foisal takaakimasaki gracedoherty

ccdr-tools's Issues

Disaster data collection and elaboration

We are mostly levereging EM-DAT, with some additional stats fro Desinventar if the country is covered.

EM-DAT gives subnational reference for each event in a string of values.

It takes some manual work to extract the subnational stats and map them:

Would be nice to produce a script to make this a bit faster.

Update air quality hazard and vulnerability

Air quality analytics

Hazard layer(s) need to provide duration of pollution events.
Mortality rate calculation can be improved.

Check these:

https://www.nature.com/articles/s41597-022-01405-3

https://www.pnas.org/doi/10.1073/pnas.1803222115

Preview to show custom color scale?

When viewing the result preview the color scale may be misleading due to outliers.

Suggest using a logarithmic colormap instead:

With absolute values:

With a custom color map that maps colors to minimum value, 0.1 quantile, 0.25 quantile, 0.5 quantile and maximum:

Although the color bar needs to be fixed up as the bin values at the lower end get squashed together:

TEMP - Global hazard data

The most relevant datasets (updated, high resolution, scientific quality) representing extreme events and long-term hazards that were considered for inclusion in the CCDR and other risk-related activities across the Bank have been listed below for each hazard, explaining their pros and cons and providing suggestions for improvement.

Geophysical	Hydro-meteorological	Environmental factors
Earthquake	River flood	Air pollution
Tsunami	Landslide
Volcanic activity	Coastal flood
	Tropical cyclones
	Drought
	Extreme heat
	Wildfires

Some hazards are modelled using a probabilistic approach, providing a set of scenarios linked to hazard frequency for the period of reference. For the current data availability, this is the case for floods, storm surges, cyclones, heatwaves, and wildfires.
Others, such as landslides, use a deterministic approach, providing an individual map of hazard intensity or susceptibility.

GEOPHYSICAL HAZARDS

Earthquake

Tsunami

Volcanic activity

HYDRO-METEOROLOGICAL HAZARDS

River floods

Flood hazard is commonly described in terms of flood frequency (multiple scenarios) and severity, which is measured in terms of water extent and related depth modelled over Digital Elevation Model (DEM). Inland flood events can be split into 2 categories:

Fluvial (or river) floods occur when intense precipitation or snow melt collects in a catchment, causing river(s) to exceed capacity, triggering the overflow, or breaching of barriers and causing the submersion of land, especially along the floodplains.
Pluvial (or surface water) floods are a consequence of heavy rainfall, but unrelated to the presence of water bodies. Fast accumulation of rainfall is due to reduced soil absorbing capacity or due to the saturation of the drainage infrastructures; meaning that the same event intensity can trigger very different risk outcomes depending on those parameters. For this reason, static hazard maps based on rainfall and DEM alone should be used with extreme caution.

Name	Fathom flood hazard maps	Aqueduct flood hazard maps
Developer	Fathom	WRI
Hazard process	Fluvial flood, Pluvial flood	Fluvial flood
Resolution	90 m	900 m
Analysis type	Probabilistic	Probabilistic
Frequency type	Return Period (11 RPs)	Return Period (10 RPs)
Time reference	Baseline (1989-2018)	Baseline (1960-1999); Projections – CMIP5 (2030-2050-2080)
Intensity metric	Water depth [m]	Water depth [m]
License	Commercial	Open data
Other	Includes defended/undefended option
Notes	Standard for WB analysis	The only open flood dataset addressing future hazard scenarios

Despite missing projections, Fathom modelling has consistently proven to be the preferred option due to its higher quality (better resolution, updated data and a more advanced modelling approach). There are, however, important details and limitations to consider for the correct use and interpretation of the model. The undefended model (FU) is typically the preferred product to use in assessments, since the defended model (FD) does not account for physical presence of defense measures, rather proxies the defense standard by using GDP as proxy (FLOPROS database).
WRI hazard maps are the preferred choice only in cases when 1) data needs to be open/public; 2) explicit climate scenarios are required, however the scientific quality and granularity of this dataset is far from the one offered by Fathom – and far from optimal, in general (low resolution, old baseline, simplified modelling).

It is important to note that pluvial (flash) flood events are extremely hard to model properly on the base of global static hazard maps alone. This is especially true for densely-populated urban areas, where the hazardous water cumulation is often the results of undersized or undermaintained discharge infrastructures. Because of this, while Fathom does offer pluvial hazard maps, their application for pluvial risk assessment is questionable as it cannot account for these key drivers.

A complementary perspective on flood risk is offered by the Global Surface Water layer produced by JRC using remote sensing data (Landsat 5, 7, 8) over the period1984-2020. It provides information on all the locations ever detected as max water level, water occurrence, occurrence change, recurrence, seasonality, and seasonality change. However, this layer does not seem to properly account for extreme flood events, I.e. recorded flood events for the period 1984-2020 most often exceed the extent of this layer. Hence it can be used to identify permanent and semi-permanent water bodies, but not to identify the baseline flood extent from past events.

---
align: center
---
Global Surface Water Layer

Coastal floods (storm surge)

Coastal floods occur when the level in a water body (sea, estuary) rises to engulf otherwise dry land. This happens mainly due to storm surges, triggered by tropical cyclones and/or strong winds pushing surface water inland. Like for inland floods, hazard intensity is measured using the water extent and associated depth.

Name	Aqueduct flood hazard maps	Global Flood map
Developer	WRI-Deltares	Deltares
Hazard process	Coastal flood	Coastal flood, SLR
Resolution	1 km	90 m, 1 km, 5 km
Analysis type	Probabilistic
Frequency type	Return Period (10 RPs)	Return Period (6 RPs)
Time reference	Baseline (1960–1999); Projections – CMIP5 (2030-2050-2080)	Baseline (2018); Projections – SLR (2050)
Intensity metric	Water depth [m]	Water depth [m]
License	Open data	Access requested
Notes	Includes effect of local subsidence (2 datasets) and flood attenuation. Modelled future scenarios.	Essentially an evolution of the WRI

The current availability of global dataset is poor, with WRI products (recently updated by Deltares) representing the best option in terms of resolution and time coverage (baseline + scenarios), and water routing, including inundation attenuation to generate more realistic flood extent. The latest version has a much better resolution of 90 m based on MeritDEM or NASADEM, overcoming WRI limitations for local-scale assessment. Note that the Fathom is working to include coastal floods and climate scenarios in the next version (3) of the dataset (coming sometime in 2023/24), which will likely become the best option for risk assessment in the next future.

Additional datasets that have been previously used in WB coastal flood analytics are:

Name	Coastal flood hazard maps	Coastal risk screening
Developer	Muis et al. (2016, 2020)	Climate Central
Hazard process	Coastal flood	Mean sea level
Resolution	1 km
Analysis type	Probabilistic
Frequency type	Return Period (10 RPs)	One layer per period
Time reference	Baseline (1979–2014)	Baseline; Projections
Intensity metric	Water depth [m]	Water extent
License	Open data	Licensed
Notes	The update of Muis 2020 has been considered; however, the available data does include easily applicable land inundation, only extreme sea levels.	Does use simple bathtub distribution without flood attenuation – does not simulate extreme sea events.

Both these models seem to be affrom a simplified bathtub modelling approach, projecting unrealistic flood extent already under baseline climate conditions.

As shown in figure below, considering the minimum baseline values (least impact criteria), the flood extent drawn by the Climate Central layer is similar to the baseline RP100 from Muis, in the middle - both generously overestimating water spreading inland even under less extreme scenarios [the locaiton of comparison is chosen as both the Netherlands and N Italy are low-lying areas, which are typically the most difficult to model].
In comparison, the WRI is far from perfection (it is also a bathtub model), but it seems to apply a more realistic max flood extent, which ultimately makes it more realistic for application.

---
align: center
---
Quick comparison of coastal flood layers over Northern Europe under baseline conditions, RP 100 years.

Sea level rise

Landslide

Landslides (mass movements) are affected by geological features (rock type and structure) and geomorphological setting (slope gradient). Landslides can be split into two categories depending on their trigger:

Dry mass movements (rockfalls, debris flows) are driven by gravity and can be triggered by seismic events, but they can also be a consequence of soil erosion and environmental degradation.
Wet mass movements can be triggered by heavy precipitation and flooding and are strongly affected by geological features (e.g. soil type and structure) and geomorphological settings (e.g., slope gradient). They do not typically include avalanches.

Name	Global landslide hazard layer	Global landslide susceptibility layer
Developer	ARUP	NASA
Hazard process	Dry (seismic) mass movement Wet (rainfall) mass movement	Wet (rainfall) mass movement
Resolution	1 km	1 km
Analysis type	Deterministic	Deterministic
Frequency type	none	none
Time reference	Baseline (rainfall trigger) (1980-2018)
Intensity metric	Hazard index [-]	Susceptibility index [-]
License	Open
Notes	Based on NASA landslide susceptibility layer. Median and Mean layers provided.	Although not a hazard layer, it can be accounted for in addition to the ARUP layer.

Landslide hazard description can rely on either the NASA Landslide Hazard Susceptibility map (LHASA) or the derived ARUP layer funded by GFDRR in 2019. This dataset considers empirical events from the COOLR database and model both the earthquake and rainfall triggers over the existing LHASA map. The metric of choice is frequency of occurrence of a significant landslide per km2, which is however provided as synthetic index (not directly translatable as time occurrence probability).

---
align: center
---
Example from the ARUP landslide hazard layer (rainfall trigger, median): Pakistan. The continuos index is displayed into 3 discrete classes (Low, Medium, High).

Tropical cyclones

Tropical cyclones (including hurricanes, typhoons) are events that can trigger different hazard processes at once such as strong winds, intense rainfall, extreme waves, and storm surges. In this category, we consider only the wind component of cyclone hazard, while other components (floods, storm surge) are typically considered separately.

Name	GAR15-IBTrACS	IBTrACSv4	STORMv3
Developer	NOAA	NOAA	IVM
Hazard process	Strong winds	Strong winds	Strong winds
Resolution	30 km	10 km	10 km
Analysis type	Probabilistic	Historical	Historical, Probabilistic
Frequency type	Return Period (5 RPs)		Return periods (10 10,000 years)
Time reference	Baseline (1989-2007)	Baseline (1980-2022)	Baseline (1984-2022)
Intensity metric	Wind gust speed [5-sec m/s]	Many variables	Many variables
License	Open data	Open data	Open data

A newer version (IBTrACSv4) has been released in 2018 and could be leveraged to generate an updated wind-hazard layer, with better resolution and possibly the inclusion of orography effect. There are several attributes tied to each event; the map shows the USA_WIND variable (Maximum sustained wind speed in knots: 0 - 300 kts) as general intensity measure.
The STORM database has recently released their new version (STORMv3), which includes synthetic global maps of 1) maximum wind speeds for a fixed set of return periods; and 2) return periods for a fixed set of maximum wind speeds, at 10 km resolution over all ocean basins. In addition, it contains the same set for events occurring within 100 km from a selection of 18 coastal cities and another for events occurring within 100 km from the capital city of an island.

More recently (2022), simulated tracks for climate change scenarios have been developed as described in Bloemendaal, et al., 2022. Both synthetic tracks and wind speed maps are available.

Drought & Water scarcity

Heat stress

Wildfires

ENVIRONMENTAL FACTORS

Air pollution

Tropical cyclones (strong wind): regionalized impact function

From this paper: https://nhess.copernicus.org/articles/21/393/2021/ and used in Climada.
Based on the same global approach currently in use (impact function for builtup from Emanuel 2011), it provides function parameters for macro-regions.

The median can be used as math formula in our processing (notebook).

Accessing and processing Fathom 3 data

Fathom 3 purchase is completed and the data is currently on dropbox. Soon it will be moved to Azure data lake.
The data is split into 1 degree tiles for all the world, for each scenario. A csv lists the ISO-a2 country code for each tile.

We need an automatic selector of country, scenarios, and RPS to download the tiles and the get them directly into the processing.

It is unlikely we will be able to consider the full range of scenarios (280 layers) for each analysis; propose a selection.

Type	Period	Scenario	Defence
Fluvial	2020	SSP 1/2.6	Undefended
Pluvial	2030	SSP 2/4.5	Defended
Coastal	2050	SSP 5/8.5

Return periods: 5, 10, 20, 50, 100, 200, 500, 1000

Fluvial: 112 global layers
2020: 2x8
2030: 2x3x8
2050: 2x3x8

Pluvial: 56 global layers
Note: there is no pluvial undefended; only defended option.
2020: 1x8
2030: 1x3x8
2050: 1x3x8

Coastal: 112 global layers
2020: 2x8
2030: 2x3x8
2050: 2x3x8

Update notebooks

Notebooks reflects the earlier version of the code, while the parallel version has implemented several improvements.
Update the individual hazard notebook according to that.
In particular check the EAI calculation as the notebooks seems to use frequency * impact instead of exc. frequency * impact.

Exposure datasets: migrate to other sources

As exposure indicators, we are currently using:

Worldpop for population
World Settlement Footprint for built-up
ESA for land cover (agri land)

Worldpop has shown it's limitations in several occasions across CCDR analytics, in both terms of pop distribution and total value.
WSF is great at 10m res, but underfunded and with uncertain developments - the connected pop dataset doesn't seem to come soon.
Moreover, the two datasets are independent from eachother, bringing sometimes unaligned exposure results for the two indicators.

Meanwhile, GHS (by JRC) has updated its data offer, with better resolution, new height/volumetric buildings data, new popolation layers, etc. It also offers projections up to 2030!
I think it will ultimately offer a more consistent analysis, and sure it seems more future-proof than what we are relying on now.

Framework for models intercomparison

This applies mostly to hazard, but potentially exposure as well:

Comparing different probabilistic hazard models for the same hazard
- Example: I was requested to compare Fathom and JBA flood models
Measure the difference between population and land cover at different times (past / present / future)
- Example: WSF evolution

Add Wildfire hazard [optional feature, only wherever relevant]

Wildfire hazars is currently not included in the screening.

There are some recent options in terms of third-party global datasets to evaluate:

Fire danger indices historical data Copernicus
Fire burned area from 2001 to present derived from satellite observations Copernicus

A review of existing datasets and methodologies for fire hazard has been previously included in the hazard review document, but needs to be updated.

Wildfire

A wildfire is any uncontrolled burning of biomass and affected man-made assets, which spreads based on environmental conditions. The probability of wildfire occurrence is typically measured by the Fire Weather Index (FWI), possibly in conjunction with a fuel model.

Name	Global Fire Weather Index	Global fire danger re-analysis (1980–2018) for the Canadian Fire Weather Indices
Developer	CSIRO	Vitolo et al.
Hazard process	Wildfire	Wildfire
Resolution	10 km
Analysis type	Probabilistic
Frequency type	Return Period (3 RPs): 2, 5, 10 years
Time reference	Baseline (36 years)	Baseline (1980-2018)
Intensity metric	Fire Weather Index
License	Open data
Notes

The CSIRO datasets (Fig. 8), which drove the wildfire assessment in ThinkHazard and other applications, uses an approach which is entirely based on fire weather index climatology (Fire Weather Index, FWI) to assess both the onset of conditions that will allow fires to spread, as well as the likelihood of fire at any point in the landscape. The method uses statistical modelling (extreme value analysis) of a 36 years fire weather climatology from GFWED to assess the predicted fire weather intensity for specific return period intervals. These intensities are classified based on thresholds using conventions to provide hazard classes that correspond to conditions that can support problematic fire spread in the landscape if an ignition and sufficient fuel were to be present.

Figure 8. CSIRO FWI, RP 30 years.

Despite the fact that CSIRO tried rebalancing the distribution of hazard classes, the resulting FWI is strongly skewed towards extremes, as shown by the number of countries falling in each hazard rank.

FWI	Hazard rank	N. of countries
> 30	High	163
20 – 30	Medium	17
15 – 20	Low	2
<15	Very low	92

This raster shows values up to 300 (ten times the “high” threshold). The raster uses FWI ranks as averaged from various country studies. According to CSIRO report, the FWI method does not account for fuel, but just the meteorological forcing related to wildfire generation, the only masking has been applied for desert areas.

The index is compared to fire frequencies derived from the Global Fire Emissions Database (GFED4, Giglio et al. 2013) for the period 1997-Today, shown as an overlay in Fig. 9. The largest majority of recorded burning happened within the “high” hazard zones (in red), yet we notice some important discrepancies: general hazard overestimation for the Indian sub-continent and Europe; underestimation of fire hazard in some northern regions such as North America and North-East Asia.

Figure 9. FWI from CSIRO and overlay of pixel burnt over the period 1995 to 2016. Light grey values indicate at least 10% of pixel burnt over the period from 1995 to 2016, and black indicates frequent recurrent burning of the entire pixel.

Simply put, meteorological conditions are not sufficient to trigger a wildfire if there is no fire trigger and fuel to burn. A fuel layer should be applied to mask out the areas that cannot produce the hazard (e.g. no vegetation). Values from GlobCover 2009 (resolution 300 m) corresponding to vegetation are used to identify potential fuel and applied as mask for the CSIRO layer. The simple masking of non-vegetated areas produces some improvement, especially for in the Indian sub-continent (Fig. 10).

Figure 10. CSIRO (RP10) masked (white) for non-vegetated areas (300 m).

To test if vegetation aggregation and threshold criteria can further improve the filtering, a mask of vegetated areas is produced from Globcover 2009 on a 10km grid by flagging as “vegetated” only those cells with more than 10% of vegetated area. This is also required to match with the resolution of the FWI layer. The vegetation grid is used as a binary mask, that means that vegetation density is not used as a weight (but that information is stored and available). The effect of this filtering impacts the most in the area of North India/Nepal, with little to no effect in other places (Fig. 11).

Figure 11. Vegetation aggregated on a 0.7 degree cell with criteria vegetation >10% of area.

Since the filtering through the vegetation mask does not look sufficient to fix the apparent overestimation of fire hazard provided by CSIRO layers, new global datasets were explored and compared. Updated fire indices from Vitolo et al (2019) are aggregated from 38 years of global reanalysis of wildfire danger (Fig. 12). The dataset used to produce the analysis and the final products are available for download.

Figure 12. FWI 100-year mean (1980-2018) from Vitolo et al. masked for vegetation <10%.

The whole dataset consists of seven indices, each of which describes a different aspect of the effect that fuel moisture and wind have on fire ignition probability and its behavior, if started. Three indices measure the soil moisture: Fine Fuel Moisture Code (FFMC), Duff Moisture Code (DMC), Drought Code (DC). From these, the FWI model generates two fire behaviour indeces: Initial Spread Index (ISI), Build Up Index (BUI). Then the model generates the Fire Weather Index (FWI) and Daily Severity Rating (DSR). For convenience, each index is archived separately. All datasets are calculated using a daily time step by interpolating the atmospheric fields at local noon when fire conditions are considered to be at their worst. Fig. 13 shows how much larger the CSIRO value is compared to Vitolo. Even in area that both data rank as high class, the difference in value is enormous.

Figure 13. Difference in the FWI index is calculated as CSIRO(value) – Vitolo(value).

To better understand how hazard ranking matches with observations, the FWI is compared with a fire density map (fig. 14) produced from the NASA MODIS fire archive M6 (2000 to present) and distributed by FIRMS. Points representing fire events are counted on the same grid as FWI: only “vegetation fire” type is considered (Type = 0). Confidence value (0-100) can be also used to filter out uncertain events. Threshold of 30% confidence is applied. This reduces the sample from 42 to 38 thousand records. FRP (Fire Radiative Power, expressed in MW) depicts the pixel-integrated fire radiative power and can be potentially used as weight for event severity.

Figure 14. Point density map of vegetation fire events (confidence >30%) from MODIS remote sensing using Fire Radiative Power as unit of intensity.

The MODIS map appears consistent (at least in relative terms) to the one from GFED4 (fig. 15).

Figure 15. Burned ground from GFED4.

In both cases, we can notice some important differences when comparing empirical fire maps against the FWI rankings. See central Africa in fig. 16 as example.

Figure 16. Comparing MODIS event grid and Vitolo FWI index.

One partial explanation is provided by the fact that MODIS fire archive considers agricultural fires as vegetation fires, as found when comparing vegetation mask and MODIS events. The vegetation map masks the MODIS grid almost perfectly; one notable exception, the Punjab region, which is in fact excluded as it is identified as post-flooding agricultural land. The high number of events here are waste fires from agricultural activities, as confirmed by NASA focus also in central Africa. These are fires that do not require any FWI severity or natural fuel to happen, posing an issue on using these observed fires as validation for FWI. Further details about wildfire data and comparison are found in a dedicated doc.

Align data analytics pipeline and risk dashboard

The objective is to plug country risk outputs from CCDR analytics into a global dashboard.

To do so effectively, we want to do the least amount of changes between the output of the python analytics and the input required by the dashboard.

I think we already agree on the list of output fields to plot, and mostly on the way to plot them.

ADM0_CODE	Unique identifier
ADM1_CODE	Unique identifier
ADM2_CODE	Unique identifier
ADM3_CODE	Unique identifier
ADM3_NAME	ADM unit names
ADM2_NAME	ADM unit names
ADM1_NAME	ADM unit names
ADM0_NAME	ADM unit names
ADM4_pop	Total population count
ADM4_builtup	Total builtup extent (ha)
ADM4_agr	Total agricultural land (ha)
FL_pop_EAI	Expected mortality from river floods (population count)
FL_pop_EAI%	Expected mortality from river floods (% of ADM3 population)
FL_builtup_EAI	Expected damage on builtup from river floods (hectars)
FL_builtup_EAI%	Expected damage on builtup from river floods (% of ADM3 builtup)
FL_EAE_agri	Expected damage on agricultural land from river floods (hectars)
FL_EAE_agri%	Expected damage on agricultural land from river floods (% of ADM3 builtup)
CF_pop_EAI	Expected mortality from river floods (population count)
CF_pop_EAI%	Expected mortality from river floods (% of ADM3 population)
CF_builtup_EAI	Expected damage on builtup from river floods (hectars)
CF_builtup_EAI%	Expected damage on builtup from river floods (% of ADM3 builtup)
CF_EAE_agri	Expected damage on agricultural land from river floods (hectars)
CF_EAE_agri%	Expected damage on agricultural land from river floods (% of ADM3 builtup)
DR_S1_30p	Frequency of agricultural stress affecting at least 30% of arable land during Season 1 (percentage of historical period 1984-2022)
DR_S2_30p	Frequency of agricultural stress affecting at least 30% of arable land during Season 2 (percentage of historical period 1984-2022)
SW_BU_EAI	Expected annual impact from tropical cyclon strong winds over builtiup (hectares)
SW_BU_EAI%	Expected annual impact from tropical cyclon strong winds over builtiup (% of ADM3 builtup)
LS_pop_C3	Population within landslide hazard zones class 3 (high)
LS_builtup_C3	Built-up within landslide hazard zones class 3 (high

Ranking of risk: combining indices, comparability in time and space

In a couple of works along the CCDR (Caribbean) we got asked to:

express the individual risks score into some combined metric;
rank the risk in a way that is comparable with other countries of future periods (i.e. a risk growing from "medium" today to "high" in the future).

For option 1, we thought of this:

Consider the EAI or EAE, both as absolute and % over total, for each exposure category within an hazard (e.g. FL_Pop_EAI, FL_Pop_EAI%, FL_Builtup_EAI, FL_Builtup_EAI%, FL_Crops_EAE, FL_Crops_EAE%)
Normalise 0-1 each of the column using max and min (excluding zeros)
Calculate the GEOMEAN between the normalised scores: we obtain a new score 0-1 indicating the intra-country risk ranking for each hazard (actually using the double-inverse GEOMEAN calculation to make it conservative towards the max).
In the same way, we calculate the geomean of each hazard normalised score to produce a general risk ranking.

This ranking is only useful to set priorities within a country. In this sense, "low" risk score wouldn't necessarily mean that that is low in absolute terms. The same non-normalised value could correspond to "high" risk in another country.
This is also not useful if you want to compare to the future of the same country, as the future scores would be again normalised 0-1 (i.e. can't turn the amp to 11).

To tackle this, after discussion with OECS people, I came out in extremis with option 2: which is simply expert based thresholds for each hazard, only accounting for the relative value (EAI% and EAE%). Each hazardXexposure has its own threshold, purely expert based, accounting both for the data distribution and general rules of thumb (generalisation potential). The individual scores are not combined. I reckon this is very sensitive to ADM size, i.e. an EAI=50% could correspond to 1 people in some unit, and to 10,000 in another unit.

But I hadn't the chance to think too much - also I'm very much over the agreed contract time.

Any suggestion is welcome, as I feel it won't be the last time we get this kind of requests.

Impact function used in classification?

So I tried to create new notebook for heat stress.
It requires classification only, with multiple RPs. So I stripped down the code of ifs related to function, including the lines that specify impact_array.

        if exp_cat_dd.value == 'pop':
            impact_array = mortality_factor(fld_array)
        elif exp_cat_dd.value == 'builtup':
            impact_array = damage_factor_builtup(fld_array)
        elif exp_cat_dd.value == 'agri':
            impact_array = damage_factor_agri(fld_array)

But these are required to build the impac_rst, which seems to be used in both procedures??

        # Create raster from array
        impact_rst = xr.DataArray(np.array([impact_array]).astype(np.float32), 
                                  coords=hazard_data.coords, 
                                  dims=hazard_data.dims)
        
        if save_inter_rst_chk.value:
            impact_rst.rio.to_raster(os.path.join(OUTPUT_DIR, f"{country}_LS_{rp}_{exp_cat}_hazard_imp_factor.tif"))

Please clarify, am I messing things? impact_array should not be part of the classification approach!
I checked the results for classification and I'm pretty sure they are correct - no function used. But then why the script doesn't work if function is not specified? :(

The expected notebook for this has:

NO choice of analytical procedures - just classes intervals
Does only combine selected EXP category with specified hazard classes (bins) - no impact function!
Individual, global hazard layers (3 RPs) instead of country clips

Framework for Return Period calculation

Some hazard layers are produced from annul data into individual "total" or "mean" value layers, such as:

Drought (frequency of events over an exposure threshold)
Air pollution (mean of mean annual values, see also #17 )
New heat indices (see #14)

Looking to improve the representativeness of these data, we could develop an approach to obtain a probabilistic representation of hazard in terms of multiple return periods from a long series of observed or simulated past records.

CCDR development plan

OBJECTIVES

Updated 07/2023

More efficient spatial processing to work on large countries at high resolution, with better user control on the input data and integrated output presentation. Align the climate indices to new CCKP service features.

NEW TOOL FEATURES

Code optimisation, data splitting and parallel processing
Uniformed interface with more user customisation
Risk Dashboard
Direct API access to data where allowed

DATA UPDATE

Hazard geodata

Exposure geodata

Exposure change
- #27
- #29
WSF
- WSF-3D
- Pop prototype testing
- Harvest of WSF data via query STAC catalogue
Exploit new GHS products
- GHS-BUILT-H for building height (res 100m)
- GHS-BUILT-V for built-up volumes (res 100m)
- GHS-BUILT-C for differentiating between residential and non-residential (res 10m).
Add Critical infrastructures

Vulnerability

Vulnerability functions
- Air pollution - better mortality rates or function (#17)
- #18

Poverty

Poverty data: compare DHS, RWI and census-based indices
New Poverty Mapping by UNICEF AI4D

Climate indices

#28
Shift to dynamic sourcing from CCKP
Benny Istanto work
Add missing ones (wind, others)

Past disasters

Automated maps/charts from EM-DAT (#25)
Hazard-specific sources
- Floods?
- Landslides: NASA/COOLR catalogue

Analytical approach

Investigate GEE potential for cloud processing

Data presentation

Risk dashboard
- R-shiny by Zeeshan

(Classification approach) Last class gives always empty exposure

No matter the number of bins (classes) specified, the last one will always produce zero or null output.

Extend to more return periods

I have been trying to add more hazard layers (return periods) in the flood analysis.

valid_RPs = [5, 10, 20, 50, 75, 100, 200, 250, 500, 1000]

But output is still just for 10, 100 and 1000, because the output structure is fixed:

  if analysis_type == "Function":
        # Sum all EAI to get total EAI across all RPs
        result_df.loc[:, f"{exp_cat}_EAI"] = result_df.loc[:, result_df.columns.str.contains('_EAI')].sum(axis=1)

        # Calculate Exp_EAI% (Percent affected exposure per year)
        result_df.loc[:, f"{exp_cat}_EAI%"] = (result_df.loc[:, f"{exp_cat}_EAI"] / result_df.loc[:, f"{adm_name}_{exp_cat}"]) * 100.0
    
        # Reorder - need ADM code, name, and exp at the front regardless of ADM level
        result_df = result_df.loc[:, all_adm_code_tmp + all_adm_name_tmp +
                                [f"{adm_name}_{exp_cat}", f"RP10_{exp_cat}_tot", f"RP100_{exp_cat}_tot", f"RP1000_{exp_cat}_tot",
                                f"RP10_{exp_cat}_imp", f"RP100_{exp_cat}_imp", f"RP1000_{exp_cat}_imp", 
                                "RP10_EAI", "RP100_EAI", "RP1000_EAI", f"{exp_cat}_EAI", f"{exp_cat}_EAI%", "geometry"]]

Need to make output format aligned with custom RP selection

Curves for flood damage on crops

We could try to produce something based on this review:

https://link.springer.com/article/10.1007/s11069-022-05791-0

Although it is mainly rice crops.

Vulnerability curves for crops other than cereals should be implemented, given the importance that perennial crops and vegetables have in terms of economic value. Functions for forage crops (alfalfa, pastures or similar) could be useful to evaluate the impacts of extreme events on livestock and have not been considered in none of the reviewed studies.

The inclusion of field experiments to assess the effect of extremes on the different crop growth stage should be better studied by including field observations in the analysis, rather than using crop models results.

Coastal flood - wrong CSV output

The Gpkg is correct, while the csv assign numbers to wrong units.
Happens for Function selection. I don't see how they can be different since is the same dataframe.

    if analysis_type == "Function":
        no_geom.to_csv(os.path.join(OUTPUT_DIR, f"{country}_CF_{adm_name}_{exp_cat}_EAI.csv"), index=False)
        result_df.to_file(os.path.join(OUTPUT_DIR, f"{country}_CF_{adm_name}_{exp_cat}_EAI.gpkg"))

Parallel code refinement

Parallelization WORKS on Linux and Win! Thanks @artessen and @ConnectedSystems for this magic!

Remains some issues to solve:

Does work for function, but not for classes
The code can be more efficient: we don't need EAI calculation as a raster. EAI calculation is done on the table output, after zonal aggregation, and presented as output chart. See #2

Auto-collection of Population data layers

LOW PRIORITY

Right now all data must be hand-feed.
The prototype for auto-feed from API is there for worldpop, only needs refinement with "year" selector.

It should be something like:

    if exp_cat_dd.value == 'pop':
      for year_pop_dd.value == 'year'
        try:
            exp_ras = f"{DATA_DIR}/EXP/{country}_WPOP{year}.tif"
        except ValueError:
            do the magic stuff

The magic stuff (api harvesting):

    # Load or save ISO3 country list
    iso3_path = os.path.join(DATA_DIR, "cache/iso3.json")
    if not os.path.exists(iso3_path):
        resp = json.loads(requests.get(f"https://www.worldpop.org/rest/data/pop/wpgp?iso3={country}").text)

        with open(iso3_path, 'w') as outfile:
            json.dump(resp, outfile)
    else:
        with open(iso3_path, 'r') as infile:
            resp = json.load(infile)

    # TODO: Download WorldPop data from API if the layer is not found (see except before)
    # Target population data files are extracted from the JSON list downloaded above
    metadata = resp['data'][1]
    data_src = metadata['files']

    Save population data to cache location
    for data_fn in tqdm(data_src):
        fid = metadata['id']
        cache_fn = os.path.basename(data_fn)

        # Look for indicated file in cache directory
        # Use the data file if it is found, but warn the user. 
        # (if data is incorrect or corrupted, they should delete it from cache)
        if f"{fid}_{cache_fn}" in os.listdir(CACHE_DIR):
            warnings.warn(f"Found {fid}_{cache_fn} in cache, skipping...")
            continue

        # Write to cache file if not found
        with open(os.path.join(CACHE_DIR, f"{fid}_{cache_fn}"), "wb") as handle:
            response = requests.get(data_fn)
            handle.write(response.content)

    Run analysis

I am discussing with DLR to make the same thing possible with WSF19 and WSF-Evo data, which would be cool to calculate change of risk across years.

Heat stress aggregation

We want to get as much as possible to one significant measure for ADM; this is nicely done with EAI with function approach;
for classification approach, need a to find a gimmick.

My idea for heat stress is to aggregate as Expected Annual Exposure (EAE), it can be calculated after # End RP loop as:

EAE = (affected_exp_RP5 / 5 + affected_exp_RP20 / 20 + affected_exp_RP100 / 100)
EAE% = EAE/ADM_population

Update Landslide data

Integrate the NASA updated model and COOLR empirical records into the landslide hazard analysis.

https://github.com/nasa/LHASA
https://gpm.nasa.gov/landslides/data.html

Plot Exceedance Frequency Curve

Use the results to plot a chart of Annual Exceedence Probability and related EAI at the end, above or below map:

The total exposure (blue chart) has RP frequency as X, and total exposure as Y.
The total impact (orange chart) has RP frequency as X, and impacted exp as Y; and a label for total EAI.

As in:

Drought hazard: better indicator and projections

The drought frequency analysis is currently based on FAO Agricultural Stress Index. It is based on satellite observations of crop health since 1984, meaning there is no probabilistic modelling, just empirical data.

The current representation of drought hazard:

Combines cropland and pasture land
Shows 2 separate seasons
Measure hazard as frequency of impact over two threshold values of affected land: one third (30%) and half (50%).

Example:

This is aligned with the approach used by FAO website.
However, it is not the most intuitive metric to explain; either we simplify how it is expressed, or elaborate it into a new, easier to understand index.

@stufraser1 always interested in your suggestions if you have any

Efficiency of statistics

Currently we have this loop:

    for rp in valid_RPs:
        
        # Get total population for each ADM2 region
        pop_per_ADM = gen_zonal_stats(vectors=adm_data["geometry"], raster=pop_fn, stats=["sum"])
        
        result_df[f"{adm_name}_Pop"] = [x['sum'] for x in pop_per_ADM]

        # Load corresponding flood dataset
        flood_data = rxr.open_rasterio(os.path.join(flood_RP_data_loc, f"{country}_RP{rp}.tif"))

At the beginning, it run the zonal over total population. This should be out of the loop, since total population value does not depend on RP: it gets extracted 3 times, but the value is always the same.

However, the code fails if I move the line before the loop :(

Align climate scenarios to CCDR guidance

There has been some confusion across national teams on which are the scenarios to include.
We should follow the following from Hallegatte team:

Old RCP scenarios: 2.6, 4.5, 8.5
New SSP scenarios: SSP1-1.9, SSP2-4.5, SSP3-7.0

We should explore the chance to switch to another metric, or add another metric: Universal Thermal Climate Index

More details about the indicator and thresholds
Even more details

We would also have the projections from Copernicus.

However, it would need to be turned into a probabilistic layer of extremes. See #16

gfdrr / ccdr-tools Goto Github PK

ccdr-tools's People

Contributors

Stargazers

Watchers

Forkers

ccdr-tools's Issues

Air quality analytics

GEOPHYSICAL HAZARDS

Earthquake

Tsunami

Volcanic activity

HYDRO-METEOROLOGICAL HAZARDS

River floods

Coastal floods (storm surge)

Sea level rise

Landslide

Tropical cyclones

Tropical cyclones

Drought & Water scarcity

Heat stress

Wildfires

ENVIRONMENTAL FACTORS

Air pollution

Wildfire

OBJECTIVES

NEW TOOL FEATURES

DATA UPDATE

Hazard geodata

Exposure geodata

Vulnerability

Poverty

Climate indices

Past disasters

Analytical approach

Data presentation

LOW PRIORITY

Recommend Projects

Recommend Topics

Recommend Org