Comments (22)
Hi @sahmoli,
Thanks for reaching out!
Yeah, I don't recommend trying to use erase overlaps with the global datset unless you really need it (and don't mind waiting ages for it to complete). For example, if you plan on rasterizing the protected area data after cleaning it, then you don't need to worry about erasing the overlapping parts of protected areas. Additionally, if you want a vector layer showing protected area boundaries without any overlaps -- and don't need the attribute data for each protected area individually -- then you can (1) clean that data without the erase overlaps step, and (2) dissolve all the protected areas togeather (removing overlaps) using wdpa_dissolve()
. To achieve this, you can tell the wdpa_clean()
function to skip the erase overlaps step by setting erase_overlaps = FALSE
.
Also, regarding your question about paralellizing the code, I don't think it could be easily parallelized because the prcoedure for erasing overlaps is sequential. This is because the algorithm works by iteratively removing overlapping bits---so the part of a protected area that gets erased depends on what protected areas have previously been processed. For reference, the relevant code is here: https://github.com/prioritizr/wdpar/blob/master/R/st_erase_overlaps.R. If you can come up with a way of parallelizing this, I'd be happy to review a PR?
Does that help? Let me know if you have any further questions?
from wdpar.
Thank you so much for the quick reply.
I am doing a global analysis using wdpa. I need to calculate the cropland area within each PA, and analyze them with different attributes of PA. So that means I need the vector layer with attributes.
I runned the code for global analysis several times. It works well in the begining, but it usually has some error after two or three weeks. It only occupied 11% of CPU and 16% of RAM, so I wonder whether there is a way to accelerate it.
You are right, the procedure seems to be sequential. I have not figure out how to do it in parrelell.
Do you have any other suggestions to get a global PA after erase overlap?
from wdpar.
No worries!
Oh I see - do you know beforehand which PA attributes you want to analyze? E.g., if you want to examine cropland area within different IUCN categories, then you could (1) run the cleaning process with erase_overlaps = FALSE
, (2) split the single WDPA dataset/object into seperate datasets/objects based on the different IUCN categories, (3) dissolve each of the IUCN category datasets seperately, (4) combine (e.g. rbind) the seperate datasets/object into a single dataset/object (ensuring that data are sorted according to how you want to deal with overlaps), and (5) then use the st_erase_overlaps()
function to remove overlapping areas. This would be faster/robuster because the erase overlaps function would only iterate through one multi-polygon per IUCN category, instead of a multi-polygon for each and every single protected area.
from wdpar.
Wow, super smart! Yes, I want to analyze the IUCN categories.
I will try this now, and let you know how it goes.
Thank you so much
from wdpar.
Excellent - great to hear we might have a solution!
Yeah, please let me know if this doesn't work and we can brainstorm some more on how to get this working?
I think I've answered all your qestions for now so, if you don't mind, could we please close this issue? I use open issues to keep track of which things on GitHub I need to respond to or focus on. Please feel free to re-open this issue, or open a new one, if you have any further questions?
from wdpar.
Sounds great.
from wdpar.
Awesome - thanks!
from wdpar.
Yeah, that's what I would expect. If we dissolve an sf object, then this is spatially merging all the the data togeather (effectively the same as sf::st_union()
by with extra stuff to help avoid geometry issues). So, we uf subset the global PA data to extract IUCN category "Ia"
protected areas, and then dissolve that subset, we should get a new sf object with a single row that has the spatial boundaries for all category `"Ia" protected areas (with no overlaps between them). If I understand correctly, this isn't an issue though, because you're only interested in IUCN categories and not other attributes? Or maybe I'm misunderstanding something?
from wdpar.
Yeah, you are right. I will let it you know how it goes.
You could close it now. I guess the whole process needs hours.
from wdpar.
Ah ok - sounds good - thanks!
from wdpar.
I realized that this method might have potential problems. In one of the following analyses, I need the variables of the status_yr and GIS_area of the PA, which are not available following the current method.
If I want to keep multiple attributes of each PA after erase overlap of global PA dataset., are there any other solutions?,
from wdpar.
Hmm, if you want to keep track of the (i) year each PA was established and (ii) IUCN category of each PA, you could split the full dataset into multiple subsets based on different combinations of STATUS_YR
and IUCN_CAT
and then apply the procedure we talked about previously to dissolve and then recombine the data. Since each combination of STATUS_YR
/IUCN_CAT
is processed seperately, it might be possible to speed this up using parallel processing.
I can't think of a way to retain the GIS_AREA
of each protected area though. Since you want to account for the overlapping bits of protected areas, maybe it would be better to manually calculate the total area of each protected area after the removing the overlaps (e.g. using sf::st_area()
)? Otherwise, if you use the GIS_AREA
column, then that will not match the updated spatial extent of each PA after removing overlaps.
Also, since you're looking at overlap with agricultural areas, may I ask what data/resolution you're using for the agricultural areas? E.g., if the agricultural data are in raster format, then maybe there could be some way to rasterize the WDPA data at some stage in the analysis to reduce computational burden?
from wdpar.
You are right, I do not need the GIS_AREA which could be calculated later. I plan to rasterize the layer after erase overlap.
After I filter the global PA dataset with GIS_AREA > 1km2 and STATUS_YR between a range, there are only 85836 obs left. It already runs for a whole day and it seems that it will finish in 2 days.
I will try the STATUS_YR/IUCN_CAT combination in parrellel in another PC, to see how long it will take.
I will let you know, how it goes.
Thank you so much.
from wdpar.
No worries! Oh I see. Yeah, if you plan on rasterizing the WDPA data, then I don't think you need to worry about erasing overlaps? This is because the rasterization process (e.g. if using ArcGIS, raster::rasterize()
, terra::rasterize()
, fasterize::fasterize()
, or gdalUtils::gdal_rasterize()
) will assign values to the output raster based on whether or not the geometries overlap with raster cell? I guess that might not work if you need to rasterize a specific field in the WDPA layer, or compute percent coverage of each raster cell though?
from wdpar.
Yes, I want to rasterize the STATUS_YR, so I need to erase overlap before rasterize. right?
from wdpar.
Hmm, just to check, do you want (i) a single-layer raster where each grid cell contains a value with STATUS_YR value, or (ii) a multi-layer raster (one layer for each STATUS_YR) indicating whether a PA was established within each grid cell in that year (i.e. grid cell values contain 0s and 1s)?
from wdpar.
from wdpar.
I think I want a single-layer raster where each grid cell contains a value with STATUS_YR value
from wdpar.
Sorry - I've been stuck in a meeting for the last while. Yeah, I can chat tomorrow - I'll respond to your email.
from wdpar.
Ah - yeah if you want a single-layer raster with grid cells containing STATUS_YR - then that makes things a lot easier, because we don't neccesarily have to worry overlapping protected areas. To clarify, in cases where you have multiple protected areas overlapping with a single grid cell, how should this be handled? E.g., would you want the minimum value (corresponding to earliest established protected area in the grid cell) or the maximum value (corresponding to the most recently established protected area in the grid cell)?
from wdpar.
I would suggest to first try using fasterize to rasterize the STATUS_YR values (https://www.rdocumentation.org/packages/fasterize/versions/1.0.3/topics/fasterize). If that doesn't work, then maybe try gdal_rasterize (https://www.rdocumentation.org/packages/gdalUtils/versions/2.0.3.2/topics/gdal_rasterize) (I find this works well for very large datasets).
from wdpar.
I think we resolved this issue, so I'll close this now - but please feel free to reopen it if you have any further questions.
from wdpar.
Related Issues (20)
- Request to download individual PAs HOT 2
- Feature Request: Add functionality to keep UNESCO sites and not yet implemented areas HOT 15
- HTTP error 404 in "global" query with wdpa_fetch HOT 3
- Fails to download country-level data HOT 2
- Fails package checks under noSuggests config
- Poor internet connection breaks wdpa_fetch HOT 9
- Port error HOT 10
- wdpa_fetch HOT 6
- I used the newest version of wdpar, still have this "side location conflict" HOT 7
- The package not run for previous download data, and do not start a new download data HOT 6
- GEOS version sensitivity HOT 7
- upcoming sf breaks wdpar HOT 6
- JOSS Review: Improve documentation on geo-processing steps and its effects on the original geometries HOT 12
- JOSS Review: Improve Statement of need / description of use-cases HOT 16
- JOSS Review - Add links to references cited in README HOT 3
- JOSS Review warning about out of date local data HOT 6
- JOSS Review: Add small POC about Performance claims HOT 7
- JOSS Review: Add statement about state of the field HOT 4
- wdpa_fetch() no longer works HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wdpar.