Giter Site home page Giter Site logo

mggg / maup Goto Github PK

View Code? Open in Web Editor NEW
65.0 6.0 23.0 24.02 MB

The geospatial toolkit for redistricting data.

Home Page: https://maup.readthedocs.io/en/latest/

License: MIT License

Python 16.01% Jupyter Notebook 83.99%
python gis redistricting shapefile

maup's Introduction

maup

maup tests codecov PyPI

maup is the geospatial toolkit for redistricting data. The package streamlines the basic workflows that arise when working with blocks, precincts, and districts, such as

The project's priorities are to be efficient by using spatial indices whenever possible and to integrate well with the existing ecosystem around pandas, geopandas and shapely. The package is distributed under the MIT License.

Installation

To install maup from PyPI, run pip install maup from your terminal.

For development, maup uses Poetry. To develop new maup features, clone this repository and run poetry install.

Examples

Here are some basic situations where you might find maup helpful. For these examples, we use test data from Providence, Rhode Island, which you can find in our Rhode Island shapefiles repo, or in the examples folder of this repo, reprojected to a non-geographic coordinate reference system (CRS) optimized for Rhode Island.

** Many of maup's functions behave badly in geographic projections (i.e., lat/long coordinates), which are the default for shapefiles from the U.S. Census bureau. In order to find an appropriate CRS for a particular shapefile, consult the database at https://epsg.org. **

>>> import geopandas
>>> import pandas
>>>
>>> blocks = geopandas.read_file("zip://./examples/blocks.zip").to_crs(32030)
>>> precincts = geopandas.read_file("zip://./examples/precincts.zip").to_crs(32030)
>>> districts = geopandas.read_file("zip://./examples/districts.zip").to_crs(32030)

Assigning precincts to districts

The assign function in maup takes two sets of geometries called sources and targets and returns a pandas Series. The Series maps each geometry in sources to the geometry in targets that covers it. (Here, geometry A covers geometry B if every point of A and its boundary lies in B or its boundary.) If a source geometry is not covered by one single target geometry, it is assigned to the target geometry that covers the largest portion of its area.

>>> import maup
>>>
>>> precinct_to_district_assignment = maup.assign(precincts, districts)
>>> # Add the assigned districts as a column of the `precincts` GeoDataFrame:
>>> precincts["DISTRICT"] = precinct_to_district_assignment
>>> precinct_to_district_assignment.head()
0     7
1     5
2    13
3     6
4     1
dtype: int64

As an aside, you can use that precinct_to_district_assignment object to create a gerrychain Partition representing this districting plan.

Aggregating block data to precincts

Precinct shapefiles usually come with election data, but not demographic data. In order to study their demographics, we need to aggregate demographic data from census blocks up to the precinct level. We can do this by assigning blocks to precincts and then aggregating the data with a Pandas groupby operation:

>>> variables = ["TOTPOP", "NH_BLACK", "NH_WHITE"]
>>>
>>> blocks_to_precincts_assignment = maup.assign(blocks, precincts)
>>> precincts[variables] = blocks[variables].groupby(blocks_to_precincts_assignment).sum()
>>> precincts[variables].head()
   TOTPOP  NH_BLACK  NH_WHITE
0    5907       886       380
1    5636       924      1301
2    6549       584      4699
3    6009       435      1053
4    4962       156      3713

If you want to move data from one set of geometries to another but your source geometries do not nest cleanly into your target geometries, see Prorating data when units do not nest neatly.

Disaggregating data from precincts down to blocks

It's common to have data at a coarser scale that you want to attach to finer-scale geometries. For instance, this may happen when vote totals for a certain election are only reported at the county level, and we want to attach that data to precinct geometries.

Let's say we want to prorate the vote totals in the columns "PRES16D", "PRES16R" from our precincts GeoDataFrame down to our blocks GeoDataFrame. The first crucial step is to decide how we want to distribute a precinct's data to the blocks within it. Since we're prorating election data, it makes sense to use a block's total population or voting-age population. Here's how we might prorate by population ("TOTPOP"):

>>> election_columns = ["PRES16D", "PRES16R"]
>>> blocks_to_precincts_assignment = maup.assign(blocks, precincts)
>>>
>>> # We prorate the vote totals according to each block's share of the overall
>>> # precinct population:
>>> weights = blocks.TOTPOP / blocks_to_precincts_assignment.map(blocks.TOTPOP.groupby(blocks_to_precincts_assignment).sum())
>>> prorated = maup.prorate(blocks_to_precincts_assignment, precincts[election_columns], weights)
>>>
>>> # Add the prorated vote totals as columns on the `blocks` GeoDataFrame:
>>> blocks[election_columns] = prorated
>>>
>>> # We'll call .round(2) to round the values for display purposes, but note that the 
>>> # actual values should NOT be rounded in order to avoid accumulation of rounding
>>> # errors.
>>> blocks[election_columns].round(2).head()
   PRES16D  PRES16R
0     0.00     0.00
1    12.26     1.70
2    15.20     2.62
3    15.50     2.67
4     3.28     0.45

Warning about areal interpolation

We strongly urge you not to prorate by area! The area of a census block is not a good predictor of its population. In fact, the correlation goes in the other direction: larger census blocks are less populous than smaller ones.

Warnings about data anomalies

(1) Many states contain Census blocks and precincts that have zero population. In the example above, a zero-population precinct leads to division by zero in the definition of the weights, which results in NaN values for some entries.

Although it is not strictly necessary to resolve this in the example above, sometimes this creates issues down the line. One option is to replace NaN values with zeros, using

>>> weights = weights.fillna(0)

(2) In some cases, zero-population precincts may have a small nonzero number of recorded votes in some elections. The procedure outlined above will lose these votes in the proration process due to the zero (or NaN) values for the weights corresponding to all the blocks in those precincts. If it is crucial to keep vote totals perfectly accurate, these votes will need to be assigned to the new units manually.

Prorating data when units do not nest neatly

Suppose you have a shapefile of precincts with some election results data and you want to join that data onto a different, more recent precincts shapefile. The two sets of precincts will have overlaps, and will not nest neatly like the blocks and precincts did in the above examples. (Not that blocks and precincts always nest neatly---in fact, they usually don't!)

In most cases, election data should be prorated from each old precinct to the new precincts with weights proportional to the population of the intersections between the old precinct and each new precinct. The most straightforward way to accomplish this is to first disaggregate the data from the old precincts to Census blocks as in the example above, and then reaggregate from blocks to the new precincts.

>>> old_precincts = precincts
>>> new_precincts = geopandas.read_file("zip://./examples/new_precincts.zip").to_crs(32030)
>>>
>>> election_columns = ["SEN18D", "SEN18R"]
>>>
>>> blocks_to_old_precincts_assignment = maup.assign(blocks, old_precincts)
>>> blocks_to_new_precincts_assignment = maup.assign(blocks, new_precincts)
>>>
>>> # We prorate the vote totals according to each block's share of the overall
>>> # old precinct population:
>>> weights = blocks.TOTPOP / blocks_to_old_precincts_assignment.map(blocks.TOTPOP.groupby(blocks_to_old_precincts_assignment).sum()).fillna(0)
>>> prorated = maup.prorate(blocks_to_old_precincts_assignment, precincts[election_columns], weights)
>>>
>>> # Add the prorated vote totals as columns on the `blocks` GeoDataFrame:
>>> blocks[election_columns] = prorated
>>>
>>> new_precincts[election_columns] = blocks[election_columns].groupby(blocks_to_new_precincts_assignment).sum()
>>> new_precincts[election_columns].round(2).head()
    SEN18D   SEN18R
0   728.17    49.38
1	370.00	  21.00
2	 97.00	  17.00
3	 91.16	   5.55
4	246.00	  20.00

As a sanity check, let's make sure that no votes were lost in either step. Total votes in the old precincts, blocks, and new precincts:

>>> old_precincts[election_columns].sum()
SEN18D    23401
SEN18R     3302
dtype: float64
>>>
>>> blocks[election_columns].sum()
SEN18D    23401.0
SEN18R     3302.0
dtype: float64
>>>
>>> new_precincts[election_columns].sum()
SEN18D    20565.656675
SEN18R     2947.046857
dtype: float64

Oh no - what happened??? All votes were successfully disaggregated to blocks, but a significant percentage were lost when reaggregating to new precincts.

It turns out that when blocks were assigned to both old and new precincts, many blocks were not assigned to any precincts. We can count how many blocks were unassigned in each case:

print(len(blocks))
print(blocks_to_old_precincts_assignment.isna().sum())
print(blocks_to_new_precincts_assignment.isna().sum())
3014
884
1227

So, out of 3,014 total Census blocks, 884 were not assigned to any old precinct and 1,227 were not assigned to any new precinct. If we plot the GeoDataFrames, we can see why:

>>> blocks.plot()

Providence blocks

>>> old_precincts.plot()

Providence old precincts

>>> new_precincts.plot()

Providence new precincts

The boundaries of the regions covered by these shapefiles are substantially different---and that doesn't even get into the possibility that the precinct shapefiles may have gaps between precinct polygons that some blocks may fall into.

Once we know to look for this issue, we can see that it affected the previous example as well:

>>> blocks[variables].sum()
TOTPOP      178040
NH_BLACK     23398
NH_WHITE     66909
dtype: int64
>>>
>>> precincts[variables].sum()
TOTPOP      140332
NH_BLACK     19345
NH_WHITE     46667
dtype: int64

Moral: Precinct shapefiles often have terrible topological issues!

These issues should be diagnosed and repaired to the greatest extent possible before moving data around between shapefiles; see Fixing topological issues, overlaps, and gaps below for details about how maup can help with this.

Progress bars

For long-running operations, the user might want to see a progress bar to estimate how much longer a task will take (and whether to abandon it altogether).

maup provides an optional progress bar for this purpose. To temporarily activate a progress bar for a certain operation, use with maup.progress()::

>>> with maup.progress():
...     assignment = maup.assign(precincts, districts)
...

To turn on progress bars for all applicable operations (e.g. for an entire script), set maup.progress.enabled = True:

>>> maup.progress.enabled = True
>>> # Now a progress bar will display while this function runs:
>>> assignment = maup.assign(precincts, districts)
>>> # And this one too:
>>> pieces = maup.intersections(old_precincts, new_precincts, area_cutoff=0)

Fixing topological issues, overlaps, and gaps

Precinct shapefiles are often created by stitching together collections of precinct geometries sourced from different counties or different years. As a result, the shapefile often has gaps or overlaps between precincts where the different sources disagree about the boundaries. (And by "often," we mean "for almost every shapefile that isn't produced by the U.S. Census Burueau.") As we saw in the examples above, these issues can pose problems when moving data between shapefiles.

Even when working with a single shapefile, gaps and overlaps may cause problems if you are interested in working with the adjacency graph of the precincts. This adjacency information is especially important when studying redistricting, because districts are almost always expected to be contiguous.

Before doing anything else, it is wise to understand the current status of a shapefile with regard to topological issues. maup provides a doctor function to diagnose gaps, overlaps, and invalid geometries. If a shapefile has none of these issues, maup.doctor returns a value of True; otherwise it returns False along with a brief summary of the problems that it found.

The blocks shapefile, like most shapefiles from the Census Bureau, is clean:

>>> maup.doctor(blocks)
True

The old precincts shapefile, however, has some minor issues:

>>> maup.doctor(old_precincts)
There are 2 overlaps.
There are 3 holes.
False

As of version 2.0.0, maup provides two repair functions with a variety of options for fixing these issues:

  1. quick_repair is the new name for the autorepair function from version 1.x (and autorepair still works as a synonym). This function makes fairly simplistic repairs to gaps and overlaps:

    • Any polygon $Q$ created by the overlapping intersection of two geometries $P_1$ and $P_2$ is removed from both polygons and reassigned to the one with which it shares the greatest perimeter.
    • Any polygon $Q$ representing a gap between geometries $P_1,\ldots, P_n$ is assigned to the one with which it shares the greatest perimeter.

    This function is probably sufficient when gaps and overlaps are all very small in area relative to the areas of the geometries, AND when the repaired file will only be used for operations like aggregating and prorating data. But it should NOT be relied upon when it is important for the repaired file to accurately represent adjacency relations between neighboring geometries, such as when a precinct shapefile is used as a basis for creating districting plans with contiguous districts.

    For instance, when a gap adjoins many geometries (which happens frequently along county boundaries in precinct shapefiles!), whichever geometry the gap is adjoined to becomes "adjacent" to all the other geometries adjoining the gap, which can lead to the creation of discontiguous districts in plans based on the repaired shapefile.

  2. smart_repair is a more sophisticated repair function designed to reproduce the "true" adjacency relations between geometries as accurately as possible. In the case of gaps that adjoin several geometries, this is accomplished by an algorithm that divides the gap into pieces to be assigned to different geometries instead of assigning the entire gap to a single geometry.

    In addition to repairing gaps and overlaps, smart_repair includes two optional features:

    • In many cases, the shapefile geometries are intended to nest cleanly into some larger units; e.g., in many states, precincts should nest cleanly into counties. smart_repair allows the user to optionally specify a second shapefile---e.g., a shapefile of county boundaries within a state---and then performs the repair process so that the repaired geometries nest cleanly into the units in the second shapefile.
    • Whether as a result of inaccurate boundaries in the original map or as an artifact of the repair algorithm, it may happen that some units share boundaries with very short perimeter but should actually be considered "queen adjacent"---i.e., intersecting at only a single point---rather than "rook adjacent"---i.e., intersecting along a boundary of positive length. smart_repair includes an optional step in which all rook adjacencies of length below a user-specified parameter are converted to queen adjacencies.

smart_repair can accept either a GeoSeries or GeoDataFrame as input, and the output type will be the same as the input type. The input must be projected to a non-geographic coordinate reference system (CRS)---i.e., not lat/long coordinates---in order to have sufficient precision for the repair. One option is to reproject a GeoDataFrame called gdf to a suitable UTM (Universal Transverse Mercator) projection via

gdf = gdf.to_crs(gdf.estimate_utm_crs())

At a minimum, all overlaps will be repaired in the output. Optional arguments include:

  • snapped (default value True): If True, all polygon vertices are snapped to a grid of size no more than $10^{-10}$ times the maximum of width/height of the entire shapefile extent. HIGHLY RECOMMENDED to avoid topological exceptions due to rounding errors.
  • fill_gaps (default value True): If True, all simply connected gaps with area less than fill_gaps_threshold times the largest area of all geometries adjoining the gap are filled. Default threshold is $0.1$; setting fill_gaps_threshold = None will fill all simply connected gaps.
  • nest_within_regions (default value None): If nest_within_regions is a secondary GeoSeries or GeoDataFrame of region boundaries (e.g., counties within a state) then the repair will be performed so that repaired geometries nest cleanly into the region boundaries; specifically, each repaired geometry will be contained in the region with which the original geometry has the largest area of intersection. Note that the CRS for the region GeoSeries/GeoDataFrame must be the same as that for the primary input.
  • min_rook_length (default value None): If min_rook_length is given a numerical value, all rook adjacencies with length below this value will be replaced with queen adjacencies. Note that this is an absolute value and not a relative value, so make sure that the value provided is in the correct units with respect to the input GeoSeries/GeoDataFrame's CRS.

Examples

First, we'll use shapely and geopandas to create a GeoDataFrame of "toy precincts" from scratch.

import random
import geopandas
import maup
from shapely.geometry import Polygon

random.seed(2023) # For reproducibility

ppolys = []
for i in range(4):
    for j in range(4):
        poly = Polygon(
            [(0.5*i + 0.1*k, 0.5*j + (random.random() - 0.5)/12) for k in range(6)] +
            [(0.5*(i+1) + (random.random() - 0.5)/12, 0.5*j + 0.1*k) for k in range(1,6)] +
            [(0.5*(i+1) - 0.1*k, 0.5*(j+1) + (random.random() - 0.5)/12) for k in range(1,6)] +
            [(0.5*i + (random.random() - 0.5)/12, 0.5*(j+1) - 0.1*k) for k in range(1,5)]
        )
        ppolys.append(poly)
        
toy_precincts_df = geopandas.GeoDataFrame(geometry = geopandas.GeoSeries(ppolys))
toy_precincts_df.plot(cmap = "tab20", alpha=0.7)

toy_precincts

Check for gaps and overlaps:

>>> maup.doctor(old_precincts)
There are 28 overlaps.
There are 23 holes.
False

All the gaps between geometries in this example are below the default threshold, so a basic application of smart_repair will resolve all overlaps and fill all gaps:

toy_precincts_repaired_df = maup.smart_repair(toy_precincts_df)
toy_precincts_repaired_df.plot(cmap = "tab20", alpha=0.7)

toy_precincts_repaired

We can check that the repair succeeded:

>>> maup.doctor(old_precincts)
True

Now suppose that the precincts are intended to nest cleanly into the following "toy counties:"

cpoly1 = Polygon([(0,0), (1,0), (1,1), (0,1)])
cpoly2 = Polygon([(1,0), (2,0), (2,1), (1,1)])
cpoly3 = Polygon([(0,1), (1,1), (1,2), (0,2)])
cpoly4 = Polygon([(1,1), (2,1), (2,2), (1,2)])

toy_counties_df = geopandas.GeoDataFrame(geometry = geopandas.GeoSeries([cpoly1, cpoly2, cpoly3, cpoly4]))

toy_counties_df.plot(cmap='tab20')

toy_counties

We can perform a "county-aware" repair as follows:

toy_precincts_repaired_county_aware_df = maup.smart_repair(toy_precincts_df, nest_within_regions = toy_counties_df)
toy_precincts_repaired_county_aware_df.plot(cmap = "tab20", alpha=0.7)

toy_precincts_repaired_county_aware

Next, suppose that we'd like to get rid of small rook adjacencies at corner points where 4 precincts meet. We might reasonably estimate that these all have length less than $0.1$, so we can accomplish this as follows:

toy_precincts_repaired_county_aware_rook_to_queen_df = maup.smart_repair(toy_precincts_df, nest_within_regions = toy_counties_df, min_rook_length = 0.1)
toy_precincts_repaired_county_aware_rook_to_queen_df.plot(cmap = "tab20", alpha=0.7)

toy_precincts_repaired_county_aware_rook_to_queen

The difference is hard to see, so let's zoom in on gap between the 4 original precincts in the upper left-hand corner.

Original precincts:

toy_precincts_corner

County-aware repair:

toy_precincts_corner_repaired

County-aware repair with rook adjacency converted to queen:

toy_precincts_corner_repaired_rook_to_queen

Modifiable areal unit problem

The name of this package comes from the modifiable areal unit problem (MAUP): the same spatial data will look different depending on how you divide up the space. Since maup is all about changing the way your data is aggregated and partitioned, we have named it after the MAUP to encourage users to use the toolkit thoughtfully and responsibly.

maup's People

Contributors

calebclimatecabinet avatar dependabot[bot] avatar drdeford avatar innovativeinventor avatar jnclelland avatar maxhully avatar peterrrock2 avatar pjrule avatar rkbuck1 avatar tylerjarvis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

maup's Issues

Dependency versions

setup.py doesn't specify any versions for the dependencies. I'm running into a problem that seems to be related to Shapely==1.7.1, in particular with the spatial index (STRtree):

AttributeError: 'MultiPolygon' object has no attribute 'index'

and:

AttributeError: 'Polygon' object has no attribute 'index'

from:

maup/indexed_geometries.py", line 20, in <listcomp>
    relevant_indices = [geom.index for geom in self.spatial_index.query(geometry)]

What version of Shapely should I be using?

Thanks

Interpolate data between overlapping geometries

E.g., if you have data on one precincts shapefile that you want to add to another different precincts shapefile.

The steps are:

  • use intersections to get the geometries' common refinement
  • aggregate population from smaller units (blocks) onto the refinement (if possible)
  • disaggregate the data from the source geometries onto the common refinement (using the population you aggregated to weight it)
  • aggregate the data from the refinement onto the target geometries

Should this be implemented as its own function, or just documented as an example?

Automated CRS Conversion

Assign should probably throw a warning if the input files have different CRS or auto convert to a common CRS.

Enable shapely speedups automatically if possible

Shapely has a speedups flag that can be set in order to use their performance enhancements written in C. Something like this in __init__.py should suffice:

from shapely import speedups
...
if speedups.available:
    speedups.enable()

Or, perhaps making a light wrapper in maup around shapely's speedups would be better.

Commutitivity checks and inverse checks

We should add tests to verify that indeed:

maup.assign(source, target)

is the inverse of

maup.assign(target, source)

That is, maup.assign is invertible and bijective by swapping the inputs when source and target geometries at least overlap with each other.

maup.intersections(source, target)

produces the same list of geometries as

maup.intersections(target, source)

That is, maup.intersections is commutative.

This the expected behavior for maup, but may not necessarily be true on all geometries.

AttributeError when calling assign

Hi,

My intention is to disaggregate precincts to blocks. The traceback I receive below indicates an attribute error at the geopandas layer, which is being called from maup's assign method. Please advise.

`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py in crs(self)
436 try:
--> 437 return self.geometry.crs
438 except AttributeError:

14 frames
/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py in getattr(self, name)
6292 return self[name]
-> 6293 return object.getattribute(self, name)
6294

/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py in _get_geometry(self)
235
--> 236 raise AttributeError(msg)
237 return self[self._geometry_column_name]

AttributeError: You are calling a geospatial method on the GeoDataFrame, but the active geometry column to use has not been set.
There are columns with geometry data type (['geometry']), and you can either set one as the active geometry with df.set_geometry("name") or access the column as a GeoSeries (df["name"]) and call the method directly on it.

During handling of the above exception, another exception occurred:

AttributeError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py in setattr(self, name, value)
6318 try:
-> 6319 existing = getattr(self, name)
6320 if isinstance(existing, Index):

/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py in getattr(self, name)
6292 return self[name]
-> 6293 return object.getattribute(self, name)
6294

/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py in crs(self)
438 except AttributeError:
--> 439 raise AttributeError(
440 "The CRS attribute of a GeoDataFrame without an active "

AttributeError: The CRS attribute of a GeoDataFrame without an active geometry column is not defined. Use GeoDataFrame.set_geometry to set the active geometry column.

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
in <cell line: 2>()
1 blocks = blocks.set_geometry("geometry")
----> 2 mauped_blocks = maup_precinct_to_blocks(blocks, precincts, ['G22SOSDLAF','G22SOSRALL'])

in maup_precinct_to_blocks(blocks, precincts, election_columns)
7
8 def maup_precinct_to_blocks(blocks, precincts, election_columns):
----> 9 blocks_to_precincts_assignment = maup.assign(blocks, precincts)
10 weights = blocks['TOTPOP20'] / blocks_to_precincts_assignment.map(blocks['TOTPOP20'].groupby(blocks_to_precincts_assignment).sum())
11 prorated = maup.prorate(blocks_to_precincts_assignment, precincts[election_columns], weights)

/usr/local/lib/python3.10/dist-packages/maup/crs.py in wrapped(*args, **kwargs)
12 )
13 )
---> 14 return f(*args, **kwargs)
15
16 return wrapped

/usr/local/lib/python3.10/dist-packages/maup/assign.py in assign(sources, targets)
21 if len(unassigned): # skip if done
22 assignments_by_area = pandas.Series(
---> 23 assign_by_area(unassigned, targets),
24 dtype="float"
25 )

/usr/local/lib/python3.10/dist-packages/maup/assign.py in assign_by_area(sources, targets)
36
37 def assign_by_area(sources, targets):
---> 38 return assign_to_max(intersections(sources, targets, area_cutoff=0).area)
39
40

/usr/local/lib/python3.10/dist-packages/maup/crs.py in wrapped(*args, **kwargs)
12 )
13 )
---> 14 return f(*args, **kwargs)
15
16 return wrapped

/usr/local/lib/python3.10/dist-packages/maup/intersections.py in intersections(sources, targets, output_type, area_cutoff)
40 df = GeoDataFrame.from_records(records, columns=["source", "target", "geometry"])
41 df = df.sort_values(by=["source", "target"]).reset_index(drop=True)
---> 42 df.crs = sources.crs
43
44 geometries = df.set_index(["source", "target"]).geometry

/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py in setattr(self, attr, val)
204 object.setattr(self, attr, val)
205 else:
--> 206 super().setattr(attr, val)
207
208 def _get_geometry(self):

/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py in setattr(self, name, value)
6333 stacklevel=find_stack_level(),
6334 )
-> 6335 object.setattr(self, name, value)
6336
6337 @Final

/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py in crs(self, value)
447 """Sets the value of the crs"""
448 if self._geometry_column_name is None:
--> 449 raise ValueError(
450 "Assigning CRS to a GeoDataFrame without a geometry column is not "
451 "supported. Use GeoDataFrame.set_geometry to set the active "

ValueError: Assigning CRS to a GeoDataFrame without a geometry column is not supported. Use GeoDataFrame.set_geometry to set the active geometry column.`

Higher level `maup` API functions

We should expose higher-level API functions to maup from small to big, big to small, and same granularity to same granularity in maup to prevent user error.

maup / install new version

Hello, I'm trying to use maup.autorepair to fix problems in shapefiles for TX I downloaded from mggg states [TX_mggg.shp] but despite having reinstalled maup [using conda install -c conda-forge maup]

when I do maup.autorepair()
I get the error
AttributeError: module 'maup' has no attribute 'autorepair'

Checking the Conda environment it says maup is version 0.7. Which may not be up to date/ include the latest features. How do I upgrade to most recent version?? On OS X

then my hope is that maup.assign(blocks, TX_mggg) suitably fixed won't end up with the dreaded 'can't reindex from duplicate axis' error.

Any clues welcome...

Document additional setup steps

This might be too much for the readme file, so maybe the Github wiki (or a new docs folder) would be the place for this.

Getting maup's dependencies to work in my environment required some additional steps that weren't obvious. I'm using conda, Visual Studio Code, and Windows 10. In order to find the DLLs (e.g. MKL), I ended up installing geopandas in conda's base environment, as well as the environment I created for maup. I'm not sure if both were necessary, but this is what worked for me.

maup configures pytest to check code coverage, apparently using pytest-cov. It's worth mentioning that developers will need to have pytest-cov installed.

pytest-cov conflicts with debuggers, so you need to disable it in order to run tests in debug mode. You can do this by giving pytest the "--no-cov" argument. For example, in Visual Studio Code, this is what my settings.json file looks like:

{
    "python.testing.pytestArgs": [
        "tests",
        "--no-cov"
    ],
    "python.testing.unittestEnabled": false,
    "python.testing.nosetestsEnabled": false,
    "python.testing.pytestEnabled": true,
    "python.pythonPath": "C:\\Users\\jswis\\.conda\\envs\\maup\\python.exe"
}

You'll need to edit the settings file each time you switch between running code coverage and using debug mode.

ValueError: cannot reindex from a duplicate axis

then maup.assign just crashes... after spending a while getting thru the assignments. example:

In [10]: assign1 = maup.assign(blocks20, vtds10)
100%|██████████| 8941/8941 [11:36<00:00, 12.85it/s]
Traceback (most recent call last):

File "", line 1, in
assign1 = maup.assign(blocks20, vtds10)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/maup/crs.py", line 14, in wrapped
return f(*args, **kwargs)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/maup/assign.py", line 12, in assign
assignment = assign_by_covering(sources, targets)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/maup/assign.py", line 22, in assign_by_covering
return indexed_sources.assign(targets)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/maup/indexed_geometries.py", line 42, in assign
assignment = pandas.concat(groups).reindex(self.index)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/series.py", line 4579, in reindex
return super().reindex(index=index, **kwargs)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4810, in reindex
axes, level, limit, tolerance, method, fill_value, copy

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4834, in _reindex_axes
allow_dups=False,

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4880, in _reindex_with_indexers
copy=copy,

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 663, in reindex_indexer
self.axes[axis]._validate_can_reindex(indexer)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3785, in _validate_can_reindex
raise ValueError("cannot reindex from a duplicate axis")

ValueError: cannot reindex from a duplicate axis

TypeError raised in `maup.assign` when no targets cover an entire source

When calling maup.assign(sources, targets) where no sources are completely covered by a target, we get:

TypeError: Input must be valid geometry objects: 0

Reproducible example:

import geopandas as gpd
from shapely.geometry import Polygon
from shapely.affinity import translate
import maup

# Make a simple grid of 4 1x1 blocks
s1 = Polygon([(0, 0), (0, 1), (1, 1), (1, 0)])
s2 = Polygon([(1, 0), (1, 1), (2, 1), (2, 0)])
s3 = Polygon([(0, 1), (0, 2), (1, 2), (1, 1)])
s4 = Polygon([(1, 1), (1, 2), (2, 2), (2, 1)])
sources = gpd.GeoSeries([s1, s2, s3, s4])

# Make 4 matching targets that overlap
targets = sources.apply(lambda x: translate(x, xoff=0.1))

# Raises error
maup.assign(sources, targets)

I would expect that the above would return the Series: pd.Series([0, 1, 2, 3])

ValueError raised in maup.assign when a source geometry is fully covered by more than one target

This line:

assignment = pandas.concat(groups).reindex(self.index)
causes ValueError: cannot reindex from a duplicate axis to be raised when a source geometry is fully covered by more than one target as it assumes that every source geometry is mapped to at most one target geometry. The solution is to remove overlaps. This is annoying to debug as the error message is very vague.

Disaggregating population from block groups to blocks changes total population

Using the latest version of maup (version 1.0.1), disaggregating CVAP data from block groups down to blocks gives strange behavior.

import geopandas as gpd
import maup
import math

bgs = gpd.read_file("bgs.shp")
blocks = gpd.read_file("blocks.shp")

Even though these shapefiles came from the Census, there are a couple geometry errors that need to be fixed:

blocks[blocks["geometry"].apply(lambda x: isinstance(x, (Polygon, MultiPolygon)))]
bgs["geometry"] = maup.close_gaps(bgs)

Cropping and dropping empty geometries (per @InnovativeInventor and @amybecker's suggestions) doesn't change anything — in this case there aren't any empty geometries:

blocks["geometry"] = maup.crop_to(blocks, bgs)
bgs["geometry"] = maup.crop_to(bgs, blocks)
blocks = blocks[~blocks["geometry"].is_empty]
bgs = bgs[~bgs["geometry"].is_empty]

After doing this, the area of the symmetric difference between these two regions is 0:

print(blocks.unary_union.symmetric_difference(bgs.unary_union).area)

Disaggregating CVAP from bgs -> blocks:

cvap_cols = ["CVAP", "WCVAP", "BCVAP", "HCVAP", "ASIANCVAP"]
assignment = maup.assign(blocks, bgs)
weights = blocks["VAP"] / assignment.map(bgs["VAP"])
prorated = maup.prorate(assignment, bgs[cvap_cols], weights)
blocks[cvap_cols] = prorated

The issue is that the following should be True, but it isn't:

math.isclose(blocks["CVAP"].sum(), bgs["CVAP"].sum())

AttributeError: `Polygon` object has no attribute 'index'

I'm getting the above Attribute Error when trying to aggregate VAP data from blocks up to precincts in Arizona. I think I remember running into this with a different shapefile and can't remember how it was fixed, so putting this issue up here. Shapefiles can be found here.

Running the following code...

import maup
import geopandas as gpd

blocks = gpd.read_file("AZ_blocks_VAP/")
precincts = gpd.read_file("AZ_precincts_data/")

variables = ["VAP", "AMINVAP", "AMIN*VAP"]

assignment = maup.assign(blocks, precincts)
precincts[variables] = blocks[variables].groupby(assignment).sum()

gives this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-11-ee818f0bf79d> in <module>
      1 variables = ["VAP", "AMINVAP", "AMIN*VAP"]
      2 
----> 3 assignment = maup.assign(blocks, precincts)
      4 precincts[variables] = blocks[variables].groupby(assignment).sum()

~/miniconda3/envs/maup/lib/python3.9/site-packages/maup/crs.py in wrapped(*args, **kwargs)
     12                 )
     13             )
---> 14         return f(*args, **kwargs)
     15 
     16     return wrapped

~/miniconda3/envs/maup/lib/python3.9/site-packages/maup/assign.py in assign(sources, targets)
     10     target that covers the most of its area.
     11     """
---> 12     assignment = assign_by_covering(sources, targets)
     13     unassigned = sources[assignment.isna()]
     14     assignments_by_area = assign_by_area(unassigned, targets)

~/miniconda3/envs/maup/lib/python3.9/site-packages/maup/assign.py in assign_by_covering(sources, targets)
     20 def assign_by_covering(sources, targets):
     21     indexed_sources = IndexedGeometries(sources)
---> 22     return indexed_sources.assign(targets)
     23 
     24 

~/miniconda3/envs/maup/lib/python3.9/site-packages/maup/indexed_geometries.py in assign(self, targets)
     40     def assign(self, targets):
     41         target_geometries = get_geometries(targets)
---> 42         groups = [
     43             self.covered_by(container).apply(lambda x: container_index)
     44             for container_index, container in progress(

~/miniconda3/envs/maup/lib/python3.9/site-packages/maup/indexed_geometries.py in <listcomp>(.0)
     41         target_geometries = get_geometries(targets)
     42         groups = [
---> 43             self.covered_by(container).apply(lambda x: container_index)
     44             for container_index, container in progress(
     45                 target_geometries.items(), len(target_geometries)

~/miniconda3/envs/maup/lib/python3.9/site-packages/maup/indexed_geometries.py in covered_by(self, container)
     29 
     30     def covered_by(self, container):
---> 31         relevant_geometries = self.query(container)
     32         prepared_container = prep(container)
     33 

~/miniconda3/envs/maup/lib/python3.9/site-packages/maup/indexed_geometries.py in query(self, geometry)
     19 
     20     def query(self, geometry):
---> 21         relevant_indices = [geom.index for geom in self.spatial_index.query(geometry)]
     22         relevant_geometries = self.geometries.loc[relevant_indices]
     23         return relevant_geometries

~/miniconda3/envs/maup/lib/python3.9/site-packages/maup/indexed_geometries.py in <listcomp>(.0)
     19 
     20     def query(self, geometry):
---> 21         relevant_indices = [geom.index for geom in self.spatial_index.query(geometry)]
     22         relevant_geometries = self.geometries.loc[relevant_indices]
     23         return relevant_geometries

AttributeError: 'Polygon' object has no attribute 'index'
[AZ_precincts_data.zip](https://github.com/mggg/maup/files/6932307/AZ_precincts_data.zip)

Push releases to conda-forge

I installed maup a month or two ago using conda. The examples from the README file were getting errors. It looks like there are fixes on the GitHub site that weren't pushed to conda; perhaps they should be?

Resolve_overlaps complains about differing CRSes despite setting them manually (and other problems)

Thanks for the library! This seems to solve exactly the kind of issues have. Unfortunately I encounter issues. Could you give some pointers on how to use resolve_overlaps?

I have a shapefile of which I am trying to resolve overlaps. When call resolve_overlaps it complains that source and target geometries must have the same CRS. I manually set the CRS of the GeoSeries to epsg:28992 however and I don't know how to set them for the target.

Since the error mentions target and source CRSes being None and EPSG:28992, I also tried manually setting the CRS to None. That resolves the previous issue, but now ends with a NoneType object has no attribute '_geom'.

In summary:

  • Calling resolve_overlaps complains about differing src and target CRSes
  • Setting src crs to None 'fixes' that problem, but introduces a new one

I included code and errors below.

from maup import resolve_overlaps, close_gaps
import geopandas as gpd

gdf = gpd.read_file("/home/workworkwork/Downloads/for_simplification/segmentation_weerribben_largetest_vegetatietypen_redef_sm3_mf5.shp")

print("Resolving self-intersections and removing empty polygons")
polygons = gpd.GeoSeries([pp.buffer(0) for pp in polygons])
polygons = polygons[~polygons.is_empty]
polygons.crs = {'init': 'epsg:28992'} # or set to None
print("Resolving overlaps")
polygons = resolve_overlaps(polygons, relative_threshold=None)

# Crashes

Wrong CRSes:

Traceback (most recent call last):
  File "resolve.py", line 17, in <module>
    polygons = resolve_overlaps(polygons, relative_threshold=None)
  File "/home/workworkwork/.local/lib/python3.7/site-packages/maup/repair.py", line 98, in resolve_overlaps
    overlaps, with_overlaps_removed, relative_threshold=None
  File "/home/workworkwork/.local/lib/python3.7/site-packages/maup/crs.py", line 11, in wrapped
    geoms1.crs, geoms2.crs
TypeError: the source and target geometries must have the same CRS. None {'init': 'epsg:28992'}

NoneType has no _geom:

Traceback (most recent call last):
  File "resolve.py", line 17, in <module>
    polygons = resolve_overlaps(polygons, relative_threshold=None)
  File "/home/workworkwork/.local/lib/python3.7/site-packages/maup/repair.py", line 98, in resolve_overlaps
    overlaps, with_overlaps_removed, relative_threshold=None
  File "/home/workworkwork/.local/lib/python3.7/site-packages/maup/crs.py", line 14, in wrapped
    return f(*args, **kwargs)
  File "/home/workworkwork/.local/lib/python3.7/site-packages/maup/repair.py", line 117, in absorb_by_shared_perimeter
    assignment = assign_to_max(intersections(sources, targets, area_cutoff=None).length)
  File "/home/workworkwork/.local/lib/python3.7/site-packages/maup/crs.py", line 14, in wrapped
    return f(*args, **kwargs)
  File "/home/workworkwork/.local/lib/python3.7/site-packages/maup/intersections.py", line 33, in intersections
    reindexed_targets
  File "/home/workworkwork/.local/lib/python3.7/site-packages/maup/intersections.py", line 31, in <listcomp>
    (sources.index[j], targets.index[i], geometry)
  File "/home/workworkwork/.local/lib/python3.7/site-packages/maup/indexed_geometries.py", line 45, in enumerate_intersections
    for j, intersection in self.intersections(target).items():
  File "/home/workworkwork/.local/lib/python3.7/site-packages/maup/indexed_geometries.py", line 24, in intersections
    relevant_geometries = self.query(geometry)
  File "/home/workworkwork/.local/lib/python3.7/site-packages/maup/indexed_geometries.py", line 19, in query
    relevant_indices = [geom.index for geom in self.spatial_index.query(geometry)]
  File "/usr/lib64/python3.7/site-packages/shapely/strtree.py", line 60, in query
    lgeos.GEOSSTRtree_query(self._tree_handle, geom._geom, lgeos.GEOSQueryCallback(callback), None)
AttributeError: 'NoneType' object has no attribute '_geom'

Example in README loses votes and contains non-explicit assumptions

import geopandas as gpd
import geopandas
import maup

blocks = geopandas.read_file("zip://./examples/blocks.zip")
precincts = geopandas.read_file("zip://./examples/precincts.zip")
districts = geopandas.read_file("zip://./examples/districts.zip")

election_columns = ["PRES16D", "PRES16R"]

assignment = maup.assign(blocks, precincts)
weights = blocks.TOTPOP / assignment.map(precincts.TOTPOP)
prorated = maup.prorate(assignment, precincts[election_columns], weights)
blocks[election_columns] = prorated

print(blocks[election_columns].sum())
print(precincts[election_columns].sum())

forcing precinct TOTPOP to equal block TOTPOP doesn't resovle the issue:

import geopandas as gpd
import geopandas
import maup

blocks = geopandas.read_file("zip://./examples/blocks.zip")
precincts = geopandas.read_file("zip://./examples/precincts.zip")
districts = geopandas.read_file("zip://./examples/districts.zip")
precincts["TOTPOP"] *= blocks.TOTPOP.sum()/precincts.TOTPOP.sum()

assert precincts["TOTPOP"].sum() == blocks["TOTPOP"].sum()
election_columns = ["PRES16D", "PRES16R"]

assignment = maup.assign(blocks, precincts)
weights = blocks.TOTPOP / assignment.map(precincts.TOTPOP)
prorated = maup.prorate(assignment, precincts[election_columns], weights)
blocks[election_columns] = prorated

print(blocks[election_columns].sum())
print(precincts[election_columns].sum())

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.