microsoft / usbuildingfootprints Goto Github PK
View Code? Open in Web Editor NEWComputer generated building footprints for the United States
License: Other
Computer generated building footprints for the United States
License: Other
unbelievably cool announcement!
if there's anywhere in your pipeline that you could rough up the subatomic coordinate precision, consumers everywhere would be much obliged.
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [[
[-162.29951646880551, 67.065383815817825],
[-162.29952962664339, 67.065303662945269],
[-162.29931058186409, 67.065298202629734],
[-162.29929742402621, 67.065378355520338],
[-162.29951646880551, 67.065383815817825]
]]
}
}
Please note that you have a typo in the image here:
https://github.com/Microsoft/USBuildingFootprints/raw/master/images/segmentation.PNG
We can read sematic
rather than semantic
;-)
I've attempted to parse the files with Node.js using all available encodings. What is the actual encoding of the file so that I can possibly manipulate it prior to parsing.
Where can I find the building footprint for Puerto Rico?
What is the coordinate reference system these data are expressed in? I couldn't find this in the documentation. EPSG 4326? EPSG 4269?
Less a issue, but more a question. Will we seen a continuous stream of refreshes from your group as imagery/data continues to expand and evolve?
Hello,
Could you please confirm if you use the labelled training data of same geographic location (US) to train your FCN network ? Second, as you mentioned that the training data was having resolution of 1ft/pixel, do you have to use same resolution images for predictions also or this trained work is able to generate building footprints on any resolution image ?
Thanks,
Amardeep
The team that produces these footprints should be congratulated for their efforts. A truely amazing piece of work.
There are, however, some boundary effects where there are missing footprints. These seems to tiles that have not gone through the analysis. You can see it in the Bay Area of San Francisco, and, for example, around Philadelphia in the following screenshots:
Is there a reason they are missing, and will they be added in the future?
Thanks in advace,
Simon.
Hi to all,
Thanks to everyone in this project for publishing and sharing such useful data.
I have two simple questions about this project.
1-) Do we have a chance to get the front-door (entrance) coordinates of the buildings in this data set? I particularly need the coordinates of the connection points between the roads (streets) and the buildings.
2-) Do this data set provide (split) the coordinates of the sub-properties in the combined buildings (including multiple properties inside themselves) individually (independently)?
Thanks in advance for any kind of responses.
Regards,
Mustafa
Tremendous resource. Thank you for this ...
I just downloaded the DC example and, in looking at the GeoJSON, all feature objects are missing the member "properties" as a requirement (at least according to the GeoJSON spec 3.2).
A Feature object has a member with the name "properties". The value of the properties member is an object (any JSON object or a JSON null value)
I know at least http://geojson.io is throwing an error with the GeoJSON files:
Invalid JSON file: TypeError: Cannot convert undefined or null to object
Any possibility of adding "properties":{}
to each feature object?
I used ogr2ogr to load buildings from southern Louisiana into SQL Server. That process detected a small number of topology errors. Here is a lightly edited version of the command and error messages:
ogr2ogr.exe -f "MSSQLSpatial" "MSSQL:server=mssql2016;database=Test;trusted_connection=yes;" "Louisiana.geojson" -t_srs "EPSG:4326" -a_srs "EPSG:4326" -lco "GEOM_TYPE=geography" -lco "GEOM_NAME=Geometry" -progress -clipsrc -93.911601 28.909162 -88.950615 30.485868
ERROR 1: TopologyException: Input geom 0 is invalid: Self-intersection at or near point -93.360444081030153 30.248059998115576 at -93.360444081030153 30.248059998115576 ERROR 1: TopologyException: Input geom 0 is invalid: Self-intersection at or near point -90.071937444444444 29.934727555555554 at -90.071937444444444 29.934727555555554 Warning 1: Ring Self-intersection at or near point -90.043338000000006 29.857880999999999 ERROR 1: TopologyException: Input geom 0 is invalid: Self-intersection at or near point -90.014427003208837 29.878139898386664 at -90.014427003208837 29.878139898386664 ERROR 1: TopologyException: Input geom 0 is invalid: Ring Self-intersection at or near point -93.468692000000004 30.358803999999999 at -93.468692000000004 30.358803999999999 ERROR 1: TopologyException: Input geom 0 is invalid: Self-intersection at or near point -92.034133058082276 30.273381288798021 at -92.034133058082276 30.273381288798021 ERROR 1: TopologyException: Input geom 0 is invalid: Self-intersection at or near point -90.104453000000007 29.898416000000001 at -90.104453000000007 29.898416000000001 ERROR 1: TopologyException: Input geom 0 is invalid: Self-intersection at or near point -91.989179968309486 30.283958535569884 at -91.989179968309486 30.283958535569884 ERROR 1: TopologyException: Input geom 0 is invalid: Self-intersection at or near point -90.33083200171086 29.532288062446536 at -90.33083200171086 29.532288062446536 ERROR 1: TopologyException: Input geom 0 is invalid: Self-intersection at or near point -90.768384909631834 30.408825994049835 at -90.768384909631834 30.408825994049835
The process was otherwise successful producing a table with 1145195 rows. Thanks for making these data available.
When looking closely at Salt Lake City for example, there are whole strips of missing buildings. It seems that in the NYT article their version doesn't have these missing strips and in fact those exact areas seems to load first when panning around as if they are a separate layer. Any idea has to how I can get these missing strips?
The horizontal black in the attached image is an example of a strip of missing buildings and a second image with NAIP imagery.
I'm seeing a large number of false positives, numbering at least 5,000 polygons, in the rocky desert areas of southern and eastern Utah. It appears that large rocks in certain types of geologic formations are being extracted as buildings. I have attached screenshots from the Navajo Nation area in southeastern Utah. All of these images are north of the Utah-Arizona and south of the San Juan River.
In area shown in the image below, building footprints are represented with a yellow outline and a transparent fill. This image shows approximately 1,500 individual footprints:
The area shown is very sparsely populated, with likely no more than a hundred actual buildings. When examined in detail, the errors are apparent:
Large rectangular rock features are being extracted as buildings footprints. Also, areas with no discernible features are being extracted as buildings. For example, this is the area near the Monument Valley Airport has no rocks, but large irregular footprints are present:
Errors like these are present over hundreds of square miles in the rocky desert and mountainous areas of the eastern half of the state. The total number is easily more than 5,000 false footprint polygons. I imagine it's similar in other desert states.
One thing that would be helpful in the ReadMe would be to tell us what the dates of imagery acquisition (when the plane flew) are. Are these datasets 2017 US buildings or 2018 US buildings? Or earlier?
Hello,
I'm preparing to send a manuscript off for publication and I have included this layer as a variable in my analysis. I could not find metadata with a preferred citation for this layer. Do you have a preferred citation?
Thank you for your help,
Dan
Thank you for publishing this data. It is really wonderful.
You might want to consider more training for forested areas. I'm seeing a high false positive rate in stands of ponderosa pine on flat terrain. Here are some examples:
https://binged.it/2zordYk
https://binged.it/2ufYOOa
When converting the json with tippecanoe I'm getting an error for every line.
Code line:
tippecanoe -o Wyoming.mbtiles -zg --drop-densest-as-needed Wyoming.json
Error:
Wyoming.json:376881: feature without properties hash
In JSON object {"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[-104.4228260023874,41.151506633415309],[-104.4229350980483,41.151537892349722],[-104.4229579082815,41.151492748294707],[-104.4230375721159,41.151515570002118],[-104.4230714466938,41.151448528312251],[-104.4228826937695,41.151394455225308],[-104.4228260023874,41.151506633415309]]]}}
Just curious if you see your project doing a refresh of this data. This is a few years old now and where some folks would have a interest in running this on a regular/annual basis the horsepower isn't cheap.
Would it be possible to also create smaller .json files for smaller parts of the state or even cities? Because large files like CA, FL or OH are very difficult to read without special tools (that I'm not aware of).
A quick extract for my county (Nevada County, CA) shows that there's no height data. Is there height data for other locations, or is it not present for anything?
a priory should be a priori
Super awesome to see so much data released.
What would be really cool is to release the labeled training data - the verified buildings, along with the imagery that was used to create it. Then others could make their own computer vision models.
Related to #9 - the data would be much more useful if it could be a commons for others to do similar things. But definitely appreciated what you've done so far!
I would like to know if you are going to release the Polygonization algorithm? Really looking forward that.
Or could you give some brief description on how it is implemented?
I wrote a little program to convert a bounding box's worth of data into a .osm file.
https://gist.github.com/RussNelson/a2da4104eb1b83b3a66fc6bc32a82ec4
Are you aware of round buildings like storage tanks?
They become very ugly:
I've been processing the California data and I'm seeing a lot of large "buildings" that really aren't buildings. Many of them are in the desert, south of the Salton Sea. Example:
Others are in farm land:
(I realize that sometimes farms acreage is covered in tents, which might appear to be a building. But in this case, it's just a field.)
Of the 100 largest buildings in California, it looks like about 50 of them aren't really buildings. I can provide a list if that will help.
Hi, Awesome work!
Could you provide the data of imagery and / or capturing in the documentation?
I am not able to open California data, is anyone else having this problem?
after importing the California json set into PostGIS, testing for ST_Valid(geom) returns 193 invalids out of 10556550 rows.
POSTGIS="2.4.3" PGSQL="100" GEOS="3.6.2-CAPI-1.10.2 4d2925d6" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.3.1, released 2018/06/22" LIBXML="2.9.3" LIBJSON="0.11.99" LIBPROTOBUF="1.2.1"
ogr2ogr -dim 2 -lco GEOMETRY_NAME=geom -lco SCHEMA=public -nln msft_ca_jun18 -nlt POLYGON -lco FID=gid -f PostgreSQL PG:dbname=msft_bldgs California.json -t_srs EPSG:4326
msft_bldgs-# COPY (select gid, st_asgeojson(geom) as geom from msft_ca_jun18 where not st_isvalid(geom)) to '/tmp/msft_bldgs_invalids.tsv' ;
I would like to cite this data source in a journal article. Do you have a recommended citation?
Hello,
I see that the datasets are available on the state level – however, these datasets for each state are very large and I did not have any luck with trying to convert the geoJSON files to shapefiles (the process took way longer than usual and overheated my computer).
Besides suggesting a better computer to handle heavier processing - is it possible get any building footprint datasets for cities, so that the files are smaller and easier on my computer for converting?
I would prefer that the data are all from one source (for consistency), as opposed to going to multiple sources for the data. My team is searching for building footprints for the following cities:
• Minneapolis, MN
• Atlanta, GA
• Raleigh, NC
• St Paul, MN
• Charlotte, NC
• Winston-Salem, NC
• Chicago, IL
• Dallas, TX
Any other suggestions/solutions would be very much appreciated!
Thanks,
Elija
Just resurfacing #13 - it looks like this is still happening in several metro areas. So far I've seen the tile edge artifacts in Denver, San Francisco, Los Angeles, Houston, and New York.
I pasted some snapshots on the previous issue.
The readme says:
This dataset contains 124,885,597 computer generated building footprints in all 50 US states.
However, when I load all the files, I find 122,608,100 total, which exactly matches the sum of the state-level counts in the table toward the bottom of the readme. So it appears the dataset actually contains 122,608,100 footprints, correct?
Hi, and thanks for posting all this data! It should be very useful as described, but I'm having trouble opening/converting the JSON file (I'm trying to use the Georgia.json file). I've tried opening it in QGIS, but I think the file is so large that it crashes the program. In ArcMap, I've tried the JSON to Features tool, but I get "unexpected error" - not sure if this is due to file size or what. I also tried the ArcGIS Data Interoperability Quick Import tool, but it doesn't list JSON as an input file type.
It would be great if this data were broken up into smaller pieces, instead of an entire state in one big chunk...but since that's not currently available, any suggestions on how to use this huge file would be appreciated!
Is there anyone know the detial or reference of the polygonization method?
@jharpster @nitrif Is that possible tell us some details?
Hi All,
Thanks to the team for developing such a useful set of data. I'm interested in using building footprints for Santa Clara County, CA and need a rough year estimate of the underlying source Bing imagery used to generate the footprints. Does anyone have suggestions on how I can track down the source imagery to generate an estimate of the time period these footprints represent?
Any tips would be greatly appreciated.
Thanks!
Jenny
May you please also publish one giant GeoJSON (or any other format) for the whole US? Doing the merging on your own hardware is nearly impossible with a standard PC.
I have 32 GB of memory and was still unable to merge the geojsons with geojson-merge despite increasing the node heap limit to 30 GB.
Great to see so many more buildings have been added in the 2nd release!!
It seems that in this version, building height data is not available. Is there any plan to add building height data in the future?
Thanks.
Thanks for releasing such awesome data.
I wanted to know if you are doing any correction for predictions if some of the images are off-nadir. Or the model itself predicts results with the correction?
Is it possible to gain access to the trained DNN ResNet34 model and or labeled training data used to detect building footprints for academic use. Is this something that I could ask for in a Microsoft AI for Earth training grant?
seems like the link is broken for these two states after I run wget I get this:
wget https://usbuildingdata.blob.core.windows.net/usbuildings/Texas.zip --2018-06-28 14:12:01-- https://usbuildingdata.blob.core.windows.net/usbuildings/Texas.zip Resolving usbuildingdata.blob.core.windows.net (usbuildingdata.blob.core.windows.net)... 13.93.168.80 Connecting to usbuildingdata.blob.core.windows.net (usbuildingdata.blob.core.windows.net)|13.93.168.80|:443... connected. HTTP request sent, awaiting response... 404 The specified blob does not exist. 2018-06-28 14:12:01 ERROR 404: The specified blob does not exist..
We are working through the internal process to open source the segmentation models and polygonization algorithms.
Any idea (rough timeline) of when the model and algorithms will be open sourced?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.