Comments (2)
Is the problem that you have something like:
df = pl.select(a=1,b=2,c=pl.lit('{"foo":[1, 2, 3],"bar":[4,5,6]}'))
df.select(pl.col.c.str.json_decode()).write_ndjson()
# '{"c":{"foo":[1,2,3],"bar":[4,5,6]}}\n'
But you need the data written without the outer column name label?
# {"foo":[1,2,3],"bar":[4,5,6]}\n'
from polars.
Here are some examples that hopefully help better explain my issue with that.
The issue is that null values are outputed as "geometry":{"type":null,"coordinates":null,"features":null}
instead of just null
.
# Issue #17054
import polars as pl
import json
import tempfile
big_geojson_obj = {"type":"FeatureCollection","features":[{"id":"baddba6f1276e861263d05d9cbecff74","type":"Feature","properties":{"lineColor":"#ffa000","lineWidth":2,"fillColor":"#ffe082","fillOpacity":0.1},"geometry":{"coordinates":[[[-114.42286807,55.199035275],[-118.90384586,53.413681626],[-115.7853142,51.95024781],[-111.63559015,53.23660491],[-114.42286807,55.199035275]]],"type":"Polygon"}}]}
df = pl.DataFrame({
"id": [1, 2, 3, 4],
"name": ["Location1", "Location2", "LocationWithLongGeom", "LocationWithNullGeom"],
"geometry": [
'{"type":"Point","coordinates":[102.0,0.5]}',
'{"type":"Point","coordinates":[103.0,1.0]}',
json.dumps(big_geojson_obj),
None
]
})
print("================ START BASIC WAY ================")
# Just output the column as a String (not what I want)
with tempfile.NamedTemporaryFile(suffix=".ndjson") as f:
df.write_ndjson(f.name)
f.seek(0)
print(f.read().decode())
print("================ END BASIC WAY ================")
print("================ START DEMO GOAL ================")
# Obviously this way is very slow
for row in df.iter_rows(named=True):
row_out = row.copy()
row_out["geometry"] = json.loads(row_out["geometry"]) if row_out["geometry"] is not None else None
print(row_out) # fill write would happen here
print("================ END DEMO GOAL ================")
print("================ START SUGGESTION 1 ================")
# This is the previous suggestion.
# The issue is that null values are outputed as `"geometry":{"type":null,"coordinates":null,"features":null}` instead of just `null`.
df1 = df.with_columns(pl.col('geometry').str.json_decode())
with tempfile.NamedTemporaryFile(suffix=".ndjson") as f:
df1.write_ndjson(f.name)
f.seek(0)
print(f.read().decode())
print("================ END SUGGESTION 1 ================")
================ START BASIC WAY ================
{"id":1,"name":"Location1","geometry":"{\"type\":\"Point\",\"coordinates\":[102.0,0.5]}"}
{"id":2,"name":"Location2","geometry":"{\"type\":\"Point\",\"coordinates\":[103.0,1.0]}"}
{"id":3,"name":"LocationWithLongGeom","geometry":"{\"type\": \"FeatureCollection\", \"features\": [{\"id\": \"baddba6f1276e861263d05d9cbecff74\", \"type\": \"Feature\", \"properties\": {\"lineColor\": \"#ffa000\", \"lineWidth\": 2, \"fillColor\": \"#ffe082\", \"fillOpacity\": 0.1}, \"geometry\": {\"coordinates\": [[[-114.42286807, 55.199035275], [-118.90384586, 53.413681626], [-115.7853142, 51.95024781], [-111.63559015, 53.23660491], [-114.42286807, 55.199035275]]], \"type\": \"Polygon\"}}]}"}
{"id":4,"name":"LocationWithNullGeom","geometry":null}
================ END BASIC WAY ================
================ START DEMO GOAL ================
{'id': 1, 'name': 'Location1', 'geometry': {'type': 'Point', 'coordinates': [102.0, 0.5]}}
{'id': 2, 'name': 'Location2', 'geometry': {'type': 'Point', 'coordinates': [103.0, 1.0]}}
{'id': 3, 'name': 'LocationWithLongGeom', 'geometry': {'type': 'FeatureCollection', 'features': [{'id': 'baddba6f1276e861263d05d9cbecff74', 'type': 'Feature', 'properties': {'lineColor': '#ffa000', 'lineWidth': 2, 'fillColor': '#ffe082', 'fillOpacity': 0.1}, 'geometry': {'coordinates': [[[-114.42286807, 55.199035275], [-118.90384586, 53.413681626], [-115.7853142, 51.95024781], [-111.63559015, 53.23660491], [-114.42286807, 55.199035275]]], 'type': 'Polygon'}}]}}
{'id': 4, 'name': 'LocationWithNullGeom', 'geometry': None}
================ END DEMO GOAL ================
================ START SUGGESTION 1 ================
{"id":1,"name":"Location1","geometry":{"type":"Point","coordinates":[102.0,0.5],"features":null}}
{"id":2,"name":"Location2","geometry":{"type":"Point","coordinates":[103.0,1.0],"features":null}}
{"id":3,"name":"LocationWithLongGeom","geometry":{"type":"FeatureCollection","coordinates":null,"features":[{"id":"baddba6f1276e861263d05d9cbecff74","type":"Feature","properties":{"lineColor":"#ffa000","lineWidth":2,"fillColor":"#ffe082","fillOpacity":0.1},"geometry":{"coordinates":[[[-114.42286807,55.199035275],[-118.90384586,53.413681626],[-115.7853142,51.95024781],[-111.63559015,53.23660491],[-114.42286807,55.199035275]]],"type":"Polygon"}}]}}
{"id":4,"name":"LocationWithNullGeom","geometry":{"type":null,"coordinates":null,"features":null}}
================ END SUGGESTION 1 ================
from polars.
Related Issues (20)
- sql conbine CTE and cross join leads to internal error
- Expression/context evaluation bug HOT 3
- Expose `coalesce` option to asof joins
- Nested struct column is null after pivoting DataFrame HOT 1
- Panic when glob scanning with two files with different schemas HOT 2
- `quantile` fails on various numeric edge cases HOT 2
- Tracking Issue: utilize and track array metadata/statistics HOT 1
- `.last()` can't be used on LazyGroupBy HOT 2
- Panic when doing an invalid melt HOT 2
- Panic on DataFrame.pivot when using common aggregate function on string data HOT 2
- Read data with Float32 and Float64 have different outputs HOT 2
- `group_by` with `map_elements` result incorrectly gets wrapped in a list for lazyframes
- Support writing Parquet `distinct_count` statistics for all types
- Platform Dependent pyo3_runtime.PanicException HOT 3
- `.struct.field('*')` PanicException when used after `.list.to_struct()`
- DATE() SQL function always returns Date type, even with DateTime strftime format string HOT 2
- Add formatting option to `write_excel` for dataframe values
- Performance scaling not working (at least as expected) HOT 2
- Parquet file writer uses non-compliant list element field name
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.