Comments (6)
Can reproduce.
It seems to be specific to n_unique()
?
If I use .unique().len()
instead, its runs as fast as the others.
from polars.
Interestingly if you do:
ds = pl.read_parquet("weird2.parquet", rechunk=True)
It makes it fast again. It has something to do with the layout of the parquet file, using parquet-layout
I see:
{
"row_groups": [
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4,
"compressed_bytes": 12313,
"uncompressed_bytes": 21688,
"header_bytes": 18,
"num_values": 5422
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 12335,
"compressed_bytes": 150331,
"uncompressed_bytes": 366078,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 162752,
"compressed_bytes": 480,
"uncompressed_bytes": 1272,
"header_bytes": 16,
"num_values": 159
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 163248,
"compressed_bytes": 64231,
"uncompressed_bytes": 216222,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 227568,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 227600,
"compressed_bytes": 11921,
"uncompressed_bytes": 24011,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 239598,
"compressed_bytes": 11534,
"uncompressed_bytes": 20328,
"header_bytes": 18,
"num_values": 5082
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 251150,
"compressed_bytes": 86553,
"uncompressed_bytes": 206159,
"header_bytes": 39,
"num_values": 148837
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 337791,
"compressed_bytes": 460,
"uncompressed_bytes": 1216,
"header_bytes": 16,
"num_values": 152
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 338267,
"compressed_bytes": 36050,
"uncompressed_bytes": 122588,
"header_bytes": 39,
"num_values": 148837
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 374406,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 374438,
"compressed_bytes": 6897,
"uncompressed_bytes": 13421,
"header_bytes": 32,
"num_values": 148837
}
]
}
],
"row_count": 148837
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 381410,
"compressed_bytes": 12348,
"uncompressed_bytes": 21744,
"header_bytes": 18,
"num_values": 5436
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 393776,
"compressed_bytes": 150910,
"uncompressed_bytes": 367061,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 544774,
"compressed_bytes": 476,
"uncompressed_bytes": 1264,
"header_bytes": 16,
"num_values": 158
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 545266,
"compressed_bytes": 64053,
"uncompressed_bytes": 216534,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 609408,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 609440,
"compressed_bytes": 11997,
"uncompressed_bytes": 23708,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 621514,
"compressed_bytes": 11664,
"uncompressed_bytes": 20560,
"header_bytes": 18,
"num_values": 5140
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 633196,
"compressed_bytes": 86374,
"uncompressed_bytes": 205668,
"header_bytes": 39,
"num_values": 149237
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 719658,
"compressed_bytes": 472,
"uncompressed_bytes": 1248,
"header_bytes": 16,
"num_values": 156
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 720146,
"compressed_bytes": 36122,
"uncompressed_bytes": 120994,
"header_bytes": 39,
"num_values": 149237
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 756357,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 756389,
"compressed_bytes": 6919,
"uncompressed_bytes": 13176,
"header_bytes": 32,
"num_values": 149237
}
]
}
],
"row_count": 149237
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 763383,
"compressed_bytes": 12389,
"uncompressed_bytes": 21816,
"header_bytes": 18,
"num_values": 5454
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 775790,
"compressed_bytes": 150689,
"uncompressed_bytes": 366531,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 926567,
"compressed_bytes": 478,
"uncompressed_bytes": 1272,
"header_bytes": 16,
"num_values": 159
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 927061,
"compressed_bytes": 64301,
"uncompressed_bytes": 217659,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 991451,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 991483,
"compressed_bytes": 11933,
"uncompressed_bytes": 23998,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1003493,
"compressed_bytes": 11623,
"uncompressed_bytes": 20472,
"header_bytes": 18,
"num_values": 5118
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1015134,
"compressed_bytes": 86669,
"uncompressed_bytes": 207107,
"header_bytes": 39,
"num_values": 149043
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1101891,
"compressed_bytes": 474,
"uncompressed_bytes": 1248,
"header_bytes": 16,
"num_values": 156
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1102381,
"compressed_bytes": 35985,
"uncompressed_bytes": 121520,
"header_bytes": 39,
"num_values": 149043
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1138456,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1138488,
"compressed_bytes": 6950,
"uncompressed_bytes": 13438,
"header_bytes": 32,
"num_values": 149043
}
]
}
],
"row_count": 149043
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1145514,
"compressed_bytes": 12295,
"uncompressed_bytes": 21660,
"header_bytes": 18,
"num_values": 5415
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1157827,
"compressed_bytes": 150064,
"uncompressed_bytes": 367783,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1307980,
"compressed_bytes": 472,
"uncompressed_bytes": 1256,
"header_bytes": 16,
"num_values": 157
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1308468,
"compressed_bytes": 64144,
"uncompressed_bytes": 218101,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1372702,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1372734,
"compressed_bytes": 12070,
"uncompressed_bytes": 24020,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1384882,
"compressed_bytes": 11563,
"uncompressed_bytes": 20384,
"header_bytes": 18,
"num_values": 5096
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1396463,
"compressed_bytes": 86456,
"uncompressed_bytes": 205516,
"header_bytes": 39,
"num_values": 148331
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1483008,
"compressed_bytes": 454,
"uncompressed_bytes": 1208,
"header_bytes": 16,
"num_values": 151
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1483478,
"compressed_bytes": 35664,
"uncompressed_bytes": 121053,
"header_bytes": 39,
"num_values": 148331
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1519232,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1519264,
"compressed_bytes": 6891,
"uncompressed_bytes": 13397,
"header_bytes": 32,
"num_values": 148331
}
]
}
],
"row_count": 148331
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1526231,
"compressed_bytes": 12348,
"uncompressed_bytes": 21748,
"header_bytes": 18,
"num_values": 5437
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1538597,
"compressed_bytes": 150050,
"uncompressed_bytes": 364520,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1688736,
"compressed_bytes": 481,
"uncompressed_bytes": 1264,
"header_bytes": 16,
"num_values": 158
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1689233,
"compressed_bytes": 64151,
"uncompressed_bytes": 215372,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1753474,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1753506,
"compressed_bytes": 12060,
"uncompressed_bytes": 24007,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1765644,
"compressed_bytes": 11570,
"uncompressed_bytes": 20380,
"header_bytes": 18,
"num_values": 5095
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1777232,
"compressed_bytes": 86628,
"uncompressed_bytes": 206800,
"header_bytes": 39,
"num_values": 149113
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1863949,
"compressed_bytes": 454,
"uncompressed_bytes": 1200,
"header_bytes": 16,
"num_values": 150
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1864419,
"compressed_bytes": 35929,
"uncompressed_bytes": 121826,
"header_bytes": 39,
"num_values": 149113
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1900438,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1900470,
"compressed_bytes": 6946,
"uncompressed_bytes": 13543,
"header_bytes": 32,
"num_values": 149113
}
]
}
],
"row_count": 149113
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 1907492,
"compressed_bytes": 12350,
"uncompressed_bytes": 21752,
"header_bytes": 18,
"num_values": 5438
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 1919860,
"compressed_bytes": 150814,
"uncompressed_bytes": 368460,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2070763,
"compressed_bytes": 481,
"uncompressed_bytes": 1272,
"header_bytes": 16,
"num_values": 159
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2071260,
"compressed_bytes": 64398,
"uncompressed_bytes": 216883,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2135748,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2135780,
"compressed_bytes": 12138,
"uncompressed_bytes": 24077,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2147996,
"compressed_bytes": 11573,
"uncompressed_bytes": 20392,
"header_bytes": 18,
"num_values": 5098
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2159587,
"compressed_bytes": 86817,
"uncompressed_bytes": 208011,
"header_bytes": 39,
"num_values": 148878
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2246493,
"compressed_bytes": 472,
"uncompressed_bytes": 1248,
"header_bytes": 16,
"num_values": 156
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2246981,
"compressed_bytes": 36123,
"uncompressed_bytes": 122951,
"header_bytes": 39,
"num_values": 148878
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2283194,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2283226,
"compressed_bytes": 7013,
"uncompressed_bytes": 13481,
"header_bytes": 32,
"num_values": 148878
}
]
}
],
"row_count": 148878
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2290315,
"compressed_bytes": 12306,
"uncompressed_bytes": 21680,
"header_bytes": 18,
"num_values": 5420
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2302639,
"compressed_bytes": 150092,
"uncompressed_bytes": 365903,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2452820,
"compressed_bytes": 478,
"uncompressed_bytes": 1272,
"header_bytes": 16,
"num_values": 159
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2453314,
"compressed_bytes": 64259,
"uncompressed_bytes": 216406,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2517663,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2517695,
"compressed_bytes": 12002,
"uncompressed_bytes": 24012,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2529775,
"compressed_bytes": 11633,
"uncompressed_bytes": 20504,
"header_bytes": 18,
"num_values": 5126
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2541426,
"compressed_bytes": 86793,
"uncompressed_bytes": 207327,
"header_bytes": 39,
"num_values": 148495
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2628308,
"compressed_bytes": 481,
"uncompressed_bytes": 1264,
"header_bytes": 16,
"num_values": 158
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2628805,
"compressed_bytes": 35995,
"uncompressed_bytes": 122310,
"header_bytes": 39,
"num_values": 148495
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2664890,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2664922,
"compressed_bytes": 6935,
"uncompressed_bytes": 13371,
"header_bytes": 32,
"num_values": 148495
}
]
}
],
"row_count": 148495
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2671933,
"compressed_bytes": 12341,
"uncompressed_bytes": 21736,
"header_bytes": 18,
"num_values": 5434
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2684292,
"compressed_bytes": 150442,
"uncompressed_bytes": 366688,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2834823,
"compressed_bytes": 473,
"uncompressed_bytes": 1256,
"header_bytes": 16,
"num_values": 157
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2835312,
"compressed_bytes": 64048,
"uncompressed_bytes": 217513,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2899450,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2899482,
"compressed_bytes": 12077,
"uncompressed_bytes": 23943,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 2911637,
"compressed_bytes": 11637,
"uncompressed_bytes": 20496,
"header_bytes": 18,
"num_values": 5124
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 2923292,
"compressed_bytes": 86157,
"uncompressed_bytes": 205770,
"header_bytes": 39,
"num_values": 148701
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3009538,
"compressed_bytes": 467,
"uncompressed_bytes": 1240,
"header_bytes": 16,
"num_values": 155
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3010021,
"compressed_bytes": 35974,
"uncompressed_bytes": 121958,
"header_bytes": 39,
"num_values": 148701
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3046085,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3046117,
"compressed_bytes": 6903,
"uncompressed_bytes": 13441,
"header_bytes": 32,
"num_values": 148701
}
]
}
],
"row_count": 148701
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3053096,
"compressed_bytes": 12341,
"uncompressed_bytes": 21736,
"header_bytes": 18,
"num_values": 5434
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3065455,
"compressed_bytes": 150138,
"uncompressed_bytes": 366283,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3215682,
"compressed_bytes": 475,
"uncompressed_bytes": 1264,
"header_bytes": 16,
"num_values": 158
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3216173,
"compressed_bytes": 64166,
"uncompressed_bytes": 215673,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3280429,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3280461,
"compressed_bytes": 12054,
"uncompressed_bytes": 24018,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3292593,
"compressed_bytes": 11575,
"uncompressed_bytes": 20396,
"header_bytes": 18,
"num_values": 5099
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3304186,
"compressed_bytes": 86428,
"uncompressed_bytes": 205403,
"header_bytes": 39,
"num_values": 148790
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3390703,
"compressed_bytes": 470,
"uncompressed_bytes": 1240,
"header_bytes": 16,
"num_values": 155
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3391189,
"compressed_bytes": 35934,
"uncompressed_bytes": 121449,
"header_bytes": 39,
"num_values": 148790
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3427213,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3427245,
"compressed_bytes": 6976,
"uncompressed_bytes": 13590,
"header_bytes": 32,
"num_values": 148790
}
]
}
],
"row_count": 148790
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3434297,
"compressed_bytes": 12320,
"uncompressed_bytes": 21704,
"header_bytes": 18,
"num_values": 5426
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3446635,
"compressed_bytes": 150885,
"uncompressed_bytes": 366781,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3597609,
"compressed_bytes": 478,
"uncompressed_bytes": 1264,
"header_bytes": 16,
"num_values": 158
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3598103,
"compressed_bytes": 64308,
"uncompressed_bytes": 216793,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3662501,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3662533,
"compressed_bytes": 11976,
"uncompressed_bytes": 23852,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3674587,
"compressed_bytes": 11545,
"uncompressed_bytes": 20344,
"header_bytes": 18,
"num_values": 5086
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3686150,
"compressed_bytes": 86843,
"uncompressed_bytes": 206726,
"header_bytes": 39,
"num_values": 148943
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3773082,
"compressed_bytes": 462,
"uncompressed_bytes": 1232,
"header_bytes": 16,
"num_values": 154
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3773560,
"compressed_bytes": 36254,
"uncompressed_bytes": 122803,
"header_bytes": 39,
"num_values": 148943
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3809904,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3809936,
"compressed_bytes": 7026,
"uncompressed_bytes": 13490,
"header_bytes": 32,
"num_values": 148943
}
]
}
],
"row_count": 148943
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3817038,
"compressed_bytes": 12326,
"uncompressed_bytes": 21708,
"header_bytes": 18,
"num_values": 5427
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3829382,
"compressed_bytes": 150643,
"uncompressed_bytes": 366520,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 3980114,
"compressed_bytes": 477,
"uncompressed_bytes": 1264,
"header_bytes": 16,
"num_values": 158
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 3980607,
"compressed_bytes": 64207,
"uncompressed_bytes": 216712,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4044904,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4044936,
"compressed_bytes": 12013,
"uncompressed_bytes": 23955,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4057027,
"compressed_bytes": 11581,
"uncompressed_bytes": 20408,
"header_bytes": 18,
"num_values": 5102
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4068626,
"compressed_bytes": 86745,
"uncompressed_bytes": 206617,
"header_bytes": 39,
"num_values": 148856
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4155460,
"compressed_bytes": 470,
"uncompressed_bytes": 1248,
"header_bytes": 16,
"num_values": 156
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4155946,
"compressed_bytes": 35966,
"uncompressed_bytes": 121804,
"header_bytes": 39,
"num_values": 148856
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4192002,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4192034,
"compressed_bytes": 6978,
"uncompressed_bytes": 13480,
"header_bytes": 32,
"num_values": 148856
}
]
}
],
"row_count": 148856
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4199088,
"compressed_bytes": 12336,
"uncompressed_bytes": 21728,
"header_bytes": 18,
"num_values": 5432
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4211442,
"compressed_bytes": 150475,
"uncompressed_bytes": 366875,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4362006,
"compressed_bytes": 477,
"uncompressed_bytes": 1264,
"header_bytes": 16,
"num_values": 158
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4362499,
"compressed_bytes": 64285,
"uncompressed_bytes": 217222,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4426874,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4426906,
"compressed_bytes": 11938,
"uncompressed_bytes": 23863,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4438922,
"compressed_bytes": 11687,
"uncompressed_bytes": 20596,
"header_bytes": 18,
"num_values": 5149
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4450627,
"compressed_bytes": 86696,
"uncompressed_bytes": 205907,
"header_bytes": 39,
"num_values": 148628
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4537412,
"compressed_bytes": 465,
"uncompressed_bytes": 1224,
"header_bytes": 16,
"num_values": 153
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4537893,
"compressed_bytes": 35854,
"uncompressed_bytes": 121727,
"header_bytes": 39,
"num_values": 148628
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4573837,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4573869,
"compressed_bytes": 6955,
"uncompressed_bytes": 13471,
"header_bytes": 32,
"num_values": 148628
}
]
}
],
"row_count": 148628
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4580900,
"compressed_bytes": 12336,
"uncompressed_bytes": 21728,
"header_bytes": 18,
"num_values": 5432
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4593254,
"compressed_bytes": 149892,
"uncompressed_bytes": 365900,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4743235,
"compressed_bytes": 480,
"uncompressed_bytes": 1264,
"header_bytes": 16,
"num_values": 158
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4743731,
"compressed_bytes": 64287,
"uncompressed_bytes": 215980,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4808108,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4808140,
"compressed_bytes": 11952,
"uncompressed_bytes": 23948,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4820170,
"compressed_bytes": 11589,
"uncompressed_bytes": 20428,
"header_bytes": 18,
"num_values": 5107
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4831777,
"compressed_bytes": 87168,
"uncompressed_bytes": 208270,
"header_bytes": 39,
"num_values": 149092
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4919034,
"compressed_bytes": 474,
"uncompressed_bytes": 1248,
"header_bytes": 16,
"num_values": 156
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4919524,
"compressed_bytes": 36175,
"uncompressed_bytes": 122168,
"header_bytes": 39,
"num_values": 149092
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4955789,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4955821,
"compressed_bytes": 7092,
"uncompressed_bytes": 13760,
"header_bytes": 32,
"num_values": 149092
}
]
}
],
"row_count": 149092
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 4962989,
"compressed_bytes": 12351,
"uncompressed_bytes": 21752,
"header_bytes": 18,
"num_values": 5438
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 4975358,
"compressed_bytes": 150592,
"uncompressed_bytes": 367334,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5126039,
"compressed_bytes": 473,
"uncompressed_bytes": 1264,
"header_bytes": 16,
"num_values": 158
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5126528,
"compressed_bytes": 64222,
"uncompressed_bytes": 215959,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5190840,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5190872,
"compressed_bytes": 12050,
"uncompressed_bytes": 24039,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5203000,
"compressed_bytes": 11534,
"uncompressed_bytes": 20320,
"header_bytes": 18,
"num_values": 5080
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5214552,
"compressed_bytes": 86708,
"uncompressed_bytes": 205898,
"header_bytes": 39,
"num_values": 148708
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5301349,
"compressed_bytes": 476,
"uncompressed_bytes": 1256,
"header_bytes": 16,
"num_values": 157
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5301841,
"compressed_bytes": 36081,
"uncompressed_bytes": 121193,
"header_bytes": 39,
"num_values": 148708
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5338012,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5338044,
"compressed_bytes": 7060,
"uncompressed_bytes": 13588,
"header_bytes": 32,
"num_values": 148708
}
]
}
],
"row_count": 148708
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5345180,
"compressed_bytes": 12334,
"uncompressed_bytes": 21720,
"header_bytes": 18,
"num_values": 5430
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5357532,
"compressed_bytes": 150163,
"uncompressed_bytes": 364784,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5507784,
"compressed_bytes": 478,
"uncompressed_bytes": 1264,
"header_bytes": 16,
"num_values": 158
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5508278,
"compressed_bytes": 64180,
"uncompressed_bytes": 215922,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5572548,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5572580,
"compressed_bytes": 11912,
"uncompressed_bytes": 23712,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5584570,
"compressed_bytes": 11579,
"uncompressed_bytes": 20396,
"header_bytes": 18,
"num_values": 5099
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5596167,
"compressed_bytes": 86735,
"uncompressed_bytes": 206633,
"header_bytes": 39,
"num_values": 148709
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5682991,
"compressed_bytes": 471,
"uncompressed_bytes": 1248,
"header_bytes": 16,
"num_values": 156
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5683478,
"compressed_bytes": 35796,
"uncompressed_bytes": 121758,
"header_bytes": 39,
"num_values": 148709
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5719364,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5719396,
"compressed_bytes": 6882,
"uncompressed_bytes": 13292,
"header_bytes": 32,
"num_values": 148709
}
]
}
],
"row_count": 148709
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5726354,
"compressed_bytes": 12368,
"uncompressed_bytes": 21784,
"header_bytes": 18,
"num_values": 5446
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5738740,
"compressed_bytes": 150123,
"uncompressed_bytes": 366118,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5888952,
"compressed_bytes": 474,
"uncompressed_bytes": 1256,
"header_bytes": 16,
"num_values": 157
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5889442,
"compressed_bytes": 64205,
"uncompressed_bytes": 217537,
"header_bytes": 39,
"num_values": 264562
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5953737,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5953769,
"compressed_bytes": 11949,
"uncompressed_bytes": 23691,
"header_bytes": 33,
"num_values": 264562
}
]
}
],
"row_count": 264562
},
{
"columns": [
{
"path": "date",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 5965796,
"compressed_bytes": 11459,
"uncompressed_bytes": 20184,
"header_bytes": 18,
"num_values": 5046
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 5977273,
"compressed_bytes": 85714,
"uncompressed_bytes": 203487,
"header_bytes": 39,
"num_values": 148705
}
]
},
{
"path": "ident",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 6063076,
"compressed_bytes": 462,
"uncompressed_bytes": 1224,
"header_bytes": 16,
"num_values": 153
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 6063554,
"compressed_bytes": 35758,
"uncompressed_bytes": 120171,
"header_bytes": 39,
"num_values": 148705
}
]
},
{
"path": "label",
"has_offset_index": true,
"has_column_index": true,
"has_bloom_filter": false,
"pages": [
{
"compression": "zstd",
"encoding": "plain",
"page_type": "dictionary",
"offset": 6099402,
"compressed_bytes": 19,
"uncompressed_bytes": 10,
"header_bytes": 13,
"num_values": 2
},
{
"compression": "zstd",
"encoding": "rle_dictionary",
"page_type": "data_page_v1",
"offset": 6099434,
"compressed_bytes": 7044,
"uncompressed_bytes": 13615,
"header_bytes": 32,
"num_values": 148705
}
]
}
],
"row_count": 148705
}
]
}
from polars.
Interesting!
This happened in the middle of my data pipeline (e.g. not reading from a parquet file). So the pattern is
ds = pl.read_parquet(....)
(
ds
.lazy()
.with_columns(...)
.join(....)
.with_columns(...)
.with_columns(...) # adding this line suddenly makes the collect() call slow!
.collect()
)
In particular, there is no place in the pipeline to call .rechunk()
! (e.g. LazyFrames have no .rechunk()
)
(It seems this behavior is not particular to n_unique()
either; other functions could trigger it as well.)
from polars.
@kszlim I marked your comment as spam not because it is irrelevant but because it made the issue unreadable to scroll through its pages and pages of JSON. Please post that as an attachment file or a link to a gist or similar.
Regarding this snippet:
# This does not happen with all input dataframes.
# I managed to produce a parquet file showcasing this behavior.
# The file `weird2.parquet` is attached: https://github.com/pola-rs/polars/files/15424892/weird2.parquet.zip
ds = pl.read_parquet("weird2.parquet")
# This takes around 4 minutes to compute (!!)
print(ds.select(pl.col("label").n_unique().over("date", "ident")))
I can't reproduce on 0.20.28 on Apple M1 (query finishes instantly), but can on 0.20.29 so it's a very recent regression.
from polars.
The problem is that we were rechunking the entire data for each group in the aggregation. This wasn't exposed before because in the past we always rechunked by default when reading a parquet file.
from polars.
Just to add - I seem to have hit this with .over()
using one of rolling_mean
, len
, shift
or rank("ordinal")
- not sure which yet.
from polars.
Related Issues (20)
- BinViewChunkedBuilder has incorrect docs HOT 1
- Series.hist resulting series name changes with include_breakpoint=False, include_category=False HOT 2
- Series.hist adds two bins when specifying bins HOT 2
- data spilled to disk not cleaned up on failure HOT 1
- Cannot serialize polars.LazyFrame (`Expr::RenameAlias cannot be serialized`)
- Big difference in iteration speed over GroupBy object depending on dataFrame construction HOT 1
- Can't `sink_parquet` on a sorted LazyFrame containing decimal columns HOT 4
- `write_database` closes adbc connection HOT 3
- `list_concat([list<T>, list<T>])` gives `list<T>`, not `list<list<T>>` HOT 10
- mismatching schemas when opening csv file
- SchemaError for Non-Exiting dtype on Concat HOT 3
- LazyFrame select in 0.20.31 includes hive partition column even when not in specified columns HOT 2
- cross-join should not work on any key HOT 1
- `.sink_parquet()` sometimes panics when `statistics` has `"null_count": False` HOT 1
- separate `pl.list()` and `pl.concat_list` HOT 4
- Panick since first release candidate when expressions in `select()` return a different number of rows HOT 3
- Question, how do you generate and test the Python examples in the README? HOT 2
- min/max operations on i16 list with None elements HOT 1
- Panic when mismatching types between glob files HOT 8
- `write_database` fails for UInts and Time dtypes when ADBC used HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.