Giter Site home page Giter Site logo

Expression/context evaluation bug about polars HOT 3 OPEN

Ge0rges avatar Ge0rges commented on June 21, 2024
Expression/context evaluation bug

from polars.

Comments (3)

Ge0rges avatar Ge0rges commented on June 21, 2024

Prior to the traceback the tail of the log file shows:

RUN STREAMING PIPELINE
[csv -> filter -> hstack -> generic-group_by -> callback -> filter -> ordered_sink, csv -> filter -> hstack -> generic_join_build]
STREAMING CHUNK SIZE: 3571 rows
STREAMING CHUNK SIZE: 7142 rows
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
process partition 0 during generic-group_by-source
process partition 1 during generic-group_by-source
process partition 2 during generic-group_by-source
process partition 3 during generic-group_by-source
process partition 4 during generic-group_by-source
process partition 5 during generic-group_by-source
process partition 6 during generic-group_by-source
process partition 7 during generic-group_by-source
process partition 8 during generic-group_by-source
process partition 9 during generic-group_by-source
process partition 10 during generic-group_by-source
process partition 11 during generic-group_by-source
process partition 12 during generic-group_by-source
process partition 13 during generic-group_by-source
process partition 14 during generic-group_by-source
process partition 15 during generic-group_by-source
process partition 16 during generic-group_by-source
process partition 17 during generic-group_by-source
process partition 18 during generic-group_by-source
process partition 19 during generic-group_by-source
process partition 20 during generic-group_by-source
process partition 21 during generic-group_by-source
process partition 22 during generic-group_by-source
process partition 23 during generic-group_by-source
process partition 24 during generic-group_by-source
process partition 25 during generic-group_by-source
process partition 26 during generic-group_by-source
process partition 27 during generic-group_by-source
process partition 28 during generic-group_by-source
process partition 29 during generic-group_by-source
process partition 30 during generic-group_by-source
process partition 31 during generic-group_by-source
process partition 32 during generic-group_by-source
process partition 33 during generic-group_by-source
process partition 34 during generic-group_by-source
process partition 35 during generic-group_by-source
process partition 36 during generic-group_by-source
process partition 37 during generic-group_by-source
process partition 38 during generic-group_by-source
process partition 39 during generic-group_by-source
process partition 40 during generic-group_by-source
process partition 41 during generic-group_by-source
process partition 42 during generic-group_by-source
process partition 43 during generic-group_by-source
process partition 44 during generic-group_by-source
process partition 45 during generic-group_by-source
process partition 46 during generic-group_by-source
process partition 47 during generic-group_by-source
process partition 48 during generic-group_by-source
process partition 49 during generic-group_by-source
process partition 50 during generic-group_by-source
process partition 51 during generic-group_by-source
process partition 52 during generic-group_by-source
process partition 53 during generic-group_by-source
process partition 54 during generic-group_by-source
process partition 55 during generic-group_by-source
process partition 56 during generic-group_by-source
process partition 57 during generic-group_by-source
process partition 58 during generic-group_by-source
process partition 59 during generic-group_by-source
process partition 60 during generic-group_by-source
process partition 61 during generic-group_by-source
process partition 62 during generic-group_by-source
process partition 63 during generic-group_by-source
RUN STREAMING PIPELINE
[csv -> filter -> hstack -> generic-group_by -> callback -> filter -> ordered_sink, csv -> filter -> hstack -> generic_join_build]
STREAMING CHUNK SIZE: 3571 rows
STREAMING CHUNK SIZE: 7142 rows
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
process partition 0 during generic-group_by-source
process partition 1 during generic-group_by-source
process partition 2 during generic-group_by-source
process partition 3 during generic-group_by-source
process partition 4 during generic-group_by-source
process partition 5 during generic-group_by-source
process partition 6 during generic-group_by-source
process partition 7 during generic-group_by-source
process partition 8 during generic-group_by-source
process partition 9 during generic-group_by-source
process partition 10 during generic-group_by-source
process partition 11 during generic-group_by-source
process partition 12 during generic-group_by-source
process partition 13 during generic-group_by-source
process partition 14 during generic-group_by-source
process partition 15 during generic-group_by-source
process partition 16 during generic-group_by-source
process partition 17 during generic-group_by-source
process partition 18 during generic-group_by-source
process partition 19 during generic-group_by-source
process partition 20 during generic-group_by-source
process partition 21 during generic-group_by-source
process partition 22 during generic-group_by-source
process partition 23 during generic-group_by-source
process partition 24 during generic-group_by-source
process partition 25 during generic-group_by-source
process partition 26 during generic-group_by-source
process partition 27 during generic-group_by-source
process partition 28 during generic-group_by-source
process partition 29 during generic-group_by-source
process partition 30 during generic-group_by-source
process partition 31 during generic-group_by-source
process partition 32 during generic-group_by-source
process partition 33 during generic-group_by-source
process partition 34 during generic-group_by-source
process partition 35 during generic-group_by-source
process partition 36 during generic-group_by-source
process partition 37 during generic-group_by-source
process partition 38 during generic-group_by-source
process partition 39 during generic-group_by-source
process partition 40 during generic-group_by-source
process partition 41 during generic-group_by-source
process partition 42 during generic-group_by-source
process partition 43 during generic-group_by-source
process partition 44 during generic-group_by-source
process partition 45 during generic-group_by-source
process partition 46 during generic-group_by-source
process partition 47 during generic-group_by-source
process partition 48 during generic-group_by-source
process partition 49 during generic-group_by-source
process partition 50 during generic-group_by-source
process partition 51 during generic-group_by-source
process partition 52 during generic-group_by-source
process partition 53 during generic-group_by-source
process partition 54 during generic-group_by-source
process partition 55 during generic-group_by-source
process partition 56 during generic-group_by-source
process partition 57 during generic-group_by-source
process partition 58 during generic-group_by-source
process partition 59 during generic-group_by-source
process partition 60 during generic-group_by-source
process partition 61 during generic-group_by-source
process partition 62 during generic-group_by-source
process partition 63 during generic-group_by-source

from polars.

Ge0rges avatar Ge0rges commented on June 21, 2024

When df = df.collect.lazy() is called prior to the problematic code the log file (ending immediately after the call to with_columns) shows:

found multiple sources; run comm_subplan_elim
UNION: `parallel=false` union is run sequentially
join parallel: false
join parallel: false
read files in parallel
avg line length: 67.58008
std. dev. line length: 6.988767
initial row estimate: 2131851
no. of chunks: 8 processed by: 8 threads.
CACHE SET: cache id: 0
CACHE HIT: cache id: 0
estimated unique values: 770990
estimated unique count: 770990 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join triggered a rechunk of the right DataFrame: 3 columns are affected
INNER join dataframes finished
dataframe filtered
LEFT join dataframes finished
join parallel: false
join parallel: false
read files in parallel
avg line length: 71.887695
std. dev. line length: 3.7583435
initial row estimate: 2079664
no. of chunks: 8 processed by: 8 threads.
CACHE SET: cache id: 1
CACHE HIT: cache id: 1
estimated unique values: 512530
estimated unique count: 512530 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join triggered a rechunk of the right DataFrame: 3 columns are affected
INNER join dataframes finished
dataframe filtered
LEFT join dataframes finished
join parallel: false
join parallel: false
read files in parallel
avg line length: 71.74707
std. dev. line length: 4.2742543
initial row estimate: 2028932
no. of chunks: 8 processed by: 8 threads.
CACHE SET: cache id: 2
CACHE HIT: cache id: 2
estimated unique values: 637402
estimated unique count: 637402 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join triggered a rechunk of the right DataFrame: 3 columns are affected
INNER join dataframes finished
dataframe filtered
LEFT join dataframes finished
join parallel: false
join parallel: false
read files in parallel
avg line length: 73.89453
std. dev. line length: 2.0176597
initial row estimate: 1954295
no. of chunks: 8 processed by: 8 threads.
CACHE SET: cache id: 3
CACHE HIT: cache id: 3
estimated unique values: 736951
estimated unique count: 736951 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join triggered a rechunk of the right DataFrame: 3 columns are affected
INNER join dataframes finished
dataframe filtered
LEFT join dataframes finished
join parallel: false
join parallel: false
read files in parallel
avg line length: 72.50195
std. dev. line length: 2.815622
initial row estimate: 2055499
no. of chunks: 8 processed by: 8 threads.
CACHE SET: cache id: 4
CACHE HIT: cache id: 4
estimated unique values: 403573
estimated unique count: 403573 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join triggered a rechunk of the right DataFrame: 3 columns are affected
INNER join dataframes finished
dataframe filtered
LEFT join dataframes finished
join parallel: false
join parallel: false
read files in parallel
avg line length: 65.9375
std. dev. line length: 4.904956
initial row estimate: 2201427
no. of chunks: 8 processed by: 8 threads.
CACHE SET: cache id: 5
CACHE HIT: cache id: 5
estimated unique values: 764336
estimated unique count: 764336 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join triggered a rechunk of the right DataFrame: 3 columns are affected
INNER join dataframes finished
dataframe filtered
LEFT join dataframes finished
join parallel: false
join parallel: false
read files in parallel
avg line length: 70.17578
std. dev. line length: 4.5201983
initial row estimate: 2064369
no. of chunks: 8 processed by: 8 threads.
CACHE SET: cache id: 6
CACHE HIT: cache id: 6
estimated unique values: 465196
estimated unique count: 465196 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join triggered a rechunk of the right DataFrame: 3 columns are affected
INNER join dataframes finished
dataframe filtered
LEFT join dataframes finished

from polars.

Ge0rges avatar Ge0rges commented on June 21, 2024

On polars==0.20.0 the log is as follows, same error:

join parallel: false
avg line length: 67.58008
std. dev. line length: 6.988767
initial row estimate: 2131851
no. of chunks: 8 processed by: 8 threads.
estimated unique values: 770990
estimated unique count: 770990 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join dataframes finished
dataframe filtered
join parallel: false
avg line length: 71.887695
std. dev. line length: 3.7583435
initial row estimate: 2079664
no. of chunks: 8 processed by: 8 threads.
estimated unique values: 512530
estimated unique count: 512530 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join dataframes finished
dataframe filtered
join parallel: false
avg line length: 71.74707
std. dev. line length: 4.2742543
initial row estimate: 2028932
no. of chunks: 8 processed by: 8 threads.
estimated unique values: 637402
estimated unique count: 637402 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join dataframes finished
dataframe filtered
join parallel: false
avg line length: 73.89453
std. dev. line length: 2.0176597
initial row estimate: 1954295
no. of chunks: 8 processed by: 8 threads.
estimated unique values: 736951
estimated unique count: 736951 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join dataframes finished
dataframe filtered
join parallel: false
avg line length: 72.50195
std. dev. line length: 2.815622
initial row estimate: 2055499
no. of chunks: 8 processed by: 8 threads.
estimated unique values: 403573
estimated unique count: 403573 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join dataframes finished
dataframe filtered
join parallel: false
avg line length: 65.9375
std. dev. line length: 4.904956
initial row estimate: 2201427
no. of chunks: 8 processed by: 8 threads.
estimated unique values: 764336
estimated unique count: 764336 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join dataframes finished
dataframe filtered
join parallel: false
avg line length: 70.17578
std. dev. line length: 4.5201983
initial row estimate: 2064369
no. of chunks: 8 processed by: 8 threads.
estimated unique values: 465196
estimated unique count: 465196 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join dataframes finished
dataframe filtered
Traceback (most recent call last):
  File "/Users/GeorgesKanaan/Documents/Development/Methylation/code/my_dmr_analysis.py", line 49, in <module>
    run_analysis("polaribacter_r-contigs", "dmr_by_gene", data_dir, fig_savepath="../plots/plots_5")
  File "/Users/GeorgesKanaan/Documents/Development/Methylation/code/my_dmr_analysis.py", line 28, in run_analysis
    df = group_methyl_data_by_genes(combined_methyl_data, genes)
  File "/Users/GeorgesKanaan/Documents/Development/Methylation/code/utilities/utils.py", line 228, in group_methyl_data_by_genes
    df.collect()
  File "/Users/GeorgesKanaan/micromamba/envs/jupyter/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 1706, in collect
    return wrap_df(ldf.collect())
polars.exceptions.ColumnNotFoundError: name

Error originated just after this operation:
UNION
  PLAN 0:
    DF []; PROJECT */0 COLUMNS; SELECTION: "None"
  PLAN 1:
     WITH_COLUMNS:
     [Utf8(bottom).alias("sample")]
       SELECT [col("name"), col("21839"), col("a"), col("m"), col("Ncanonical")] FROM
         WITH_COLUMNS:
         [col("m").fill_null([0]), col("21839").fill_null([0]), col("a").fill_null([0]), col("Ncanonical").fill_null([0])]
          UNIQUE BY None
            LEFT JOIN:
            LEFT PLAN ON: [col("name")]
              DF ["name", "m", "21839", "a"]; PROJECT */4 COLUMNS; SELECTION: "None"
            RIGHT PLAN ON: [col("name")]
               SELECT [col("name"), col("Ncanonical")] FROM
                FILTER [(col("Nvalid_cov")) == (col("max_valid_cov"))] FROM

                INNER JOIN:
                LEFT PLAN ON: [col("name"), col("mod_group")]
                   WITH_COLUMNS:
                   [col("modified base code and motif").replace([Series, Series, col("modified base code and motif")]).alias("mod_group")]
                    FILTER [(col("Ndiff")) < (col("Nvalid_cov"))] FROM

                     WITH_COLUMNS:
                     [[([([([([([(col("chrom")) + (Utf8(|))]) + (col("strand"))]) + (Utf8(|))]) + (col("inclusive start position").strict_cast(Utf8))]) + (Utf8(|))]) + (col("exclusive end position").strict_cast(Utf8))].alias("name")]
                       SELECT [col("chrom"), col("inclusive start position"), col("exclusive end position"), col("modified base code and motif"), col("strand"), col("Nvalid_cov"), col("fraction modified"), col("Nmod"), col("Ncanonical"), col("Nother_mod"), col("Ndelete"), col("Nfail"), col("Ndiff"), col("Nnocall")] FROM

                          Csv SCAN /Users/GeorgesKanaan/Documents/Development/Methylation/code/../data/methylation_5/polaribacter_r-contigs/bottom.bed
                          PROJECT */18 COLUMNS
                RIGHT PLAN ON: [col("name"), col("mod_group")]
                  AGGREGATE
                  	[col("Nvalid_cov").max().alias("max_valid_cov")] BY [col("name"), col("mod_group")] FROM
                     WITH_COLUMNS:
                     [col("modified base code and motif").replace([Series, Series, col("modified base code and motif")]).alias("mod_group")]
                      FILTER [(col("Ndiff")) < (col("Nvalid_cov"))] FROM

                       WITH_COLUMNS:
                       [[([([([([([(col("chrom")) + (Utf8(|))]) + (col("strand"))]) + (Utf8(|))]) + (col("inclusive start position").strict_cast(Utf8))]) + (Utf8(|))]) + (col("exclusive end position").strict_cast(Utf8))].alias("name")]
                         SELECT [col("chrom"), col("inclusive start position"), col("exclusive end position"), col("modified base code and motif"), col("strand"), col("Nvalid_cov"), col("fraction modified"), col("Nmod"), col("Ncanonical"), col("Nother_mod"), col("Ndelete"), col("Nfail"), col("Ndiff"), col("Nnocall")] FROM

                            Csv SCAN /Users/GeorgesKanaan/Documents/Development/Methylation/code/../data/methylation_5/polaribacter_r-contigs/bottom.bed
                            PROJECT */18 COLUMNS
                END INNER JOIN
            END LEFT JOIN
  PLAN 2:
     WITH_COLUMNS:
     [Utf8(barcode11).alias("sample")]
       SELECT [col("name"), col("21839"), col("a"), col("m"), col("Ncanonical")] FROM
         WITH_COLUMNS:
         [col("a").fill_null([0]), col("m").fill_null([0]), col("21839").fill_null([0]), col("Ncanonical").fill_null([0])]
          UNIQUE BY None
            LEFT JOIN:
            LEFT PLAN ON: [col("name")]
              DF ["name", "a", "m", "21839"]; PROJECT */4 COLUMNS; SELECTION: "None"
            RIGHT PLAN ON: [col("name")]
               SELECT [col("name"), col("Ncanonical")] FROM
                FILTER [(col("Nvalid_cov")) == (col("max_valid_cov"))] FROM

                INNER JOIN:
                LEFT PLAN ON: [col("name"), col("mod_group")]
                   WITH_COLUMNS:
                   [col("modified base code and motif").replace([Series, Series, col("modified base code and motif")]).alias("mod_group")]
                    FILTER [(col("Ndiff")) < (col("Nvalid_cov"))] FROM

                     WITH_COLUMNS:
                     [[([([([([([(col("chrom")) + (Utf8(|))]) + (col("strand"))]) + (Utf8(|))]) + (col("inclusive start position").strict_cast(Utf8))]) + (Utf8(|))]) + (col("exclusive end position").strict_cast(Utf8))].alias("name")]
                       SELECT [col("chrom"), col("inclusive start position"), col("exclusive end position"), col("modified base code and motif"), col("strand"), col("Nvalid_cov"), col("fraction modified"), col("Nmod"), col("Ncanonical"), col("Nother_mod"), col("Ndelete"), col("Nfail"), col("Ndiff"), col("Nnocall")] FROM

                          Csv SCAN /Users/GeorgesKanaan/Documents/Development/Methylation/code/../data/methylation_5/polaribacter_r-contigs/barcode11.bed
                          PROJECT */18 COLUMNS
                RIGHT PLAN ON: [col("name"), col("mod_group")]
                  AGGREGATE
                  	[col("Nvalid_cov").max().alias("max_valid_cov")] BY [col("name"), col("mod_group")] FROM
                     WITH_COLUMNS:
                     [col("modified base code and motif").replace([Series, Series, col("modified base code and motif")]).alias("mod_group")]
                      FILTER [(col("Ndiff")) < (col("Nvalid_cov"))] FROM

                       WITH_COLUMNS:
                       [[([([([([([(col("chrom")) + (Utf8(|))]) + (col("strand"))]) + (Utf8(|))]) + (col("inclusive start position").strict_cast(Utf8))]) + (Utf8(|))]) + (col("exclusive end position").strict_cast(Utf8))].alias("name")]
                         SELECT [col("chrom"), col("inclusive start position"), col("exclusive end position"), col("modified base code and motif"), col("strand"), col("Nvalid_cov"), col("fraction modified"), col("Nmod"), col("Ncanonical"), col("Nother_mod"), col("Ndelete"), col("Nfail"), col("Ndiff"), col("Nnocall")] FROM

                            Csv SCAN /Users/GeorgesKanaan/Documents/Development/Methylation/code/../data/methylation_5/polaribacter_r-contigs/barcode11.bed
                            PROJECT */18 COLUMNS
                END INNER JOIN
            END LEFT JOIN
  PLAN 3:
     WITH_COLUMNS:
     [Utf8(barcode13).alias("sample")]
       SELECT [col("name"), col("21839"), col("a"), col("m"), col("Ncanonical")] FROM
         WITH_COLUMNS:
         [col("a").fill_null([0]), col("m").fill_null([0]), col("21839").fill_null([0]), col("Ncanonical").fill_null([0])]
          UNIQUE BY None
            LEFT JOIN:
            LEFT PLAN ON: [col("name")]
              DF ["name", "a", "m", "21839"]; PROJECT */4 COLUMNS; SELECTION: "None"
            RIGHT PLAN ON: [col("name")]
               SELECT [col("name"), col("Ncanonical")] FROM
                FILTER [(col("Nvalid_cov")) == (col("max_valid_cov"))] FROM

                INNER JOIN:
                LEFT PLAN ON: [col("name"), col("mod_group")]
                   WITH_COLUMNS:
                   [col("modified base code and motif").replace([Series, Series, col("modified base code and motif")]).alias("mod_group")]
                    FILTER [(col("Ndiff")) < (col("Nvalid_cov"))] FROM

                     WITH_COLUMNS:
                     [[([([([([([(col("chrom")) + (Utf8(|))]) + (col("strand"))]) + (Utf8(|))]) + (col("inclusive start position").strict_cast(Utf8))]) + (Utf8(|))]) + (col("exclusive end position").strict_cast(Utf8))].alias("name")]
                       SELECT [col("chrom"), col("inclusive start position"), col("exclusive end position"), col("modified base code and motif"), col("strand"), col("Nvalid_cov"), col("fraction modified"), col("Nmod"), col("Ncanonical"), col("Nother_mod"), col("Ndelete"), col("Nfail"), col("Ndiff"), col("Nnocall")] FROM

                          Csv SCAN /Users/GeorgesKanaan/Documents/Development/Methylation/code/../data/methylation_5/polaribacter_r-contigs/barcode13.bed
                          PROJECT */18 COLUMNS
                RIGHT PLAN ON: [col("name"), col("mod_group")]
                  AGGREGATE
                  	[col("Nvalid_cov").max().alias("max_valid_cov")] BY [col("name"), col("mod_group")] FROM
                     WITH_COLUMNS:
                     [col("modified base code and motif").replace([Series, Series, col("modified base code and motif")]).alias("mod_group")]
                      FILTER [(col("Ndiff")) < (col("Nvalid_cov"))] FROM

                       WITH_COLUMNS:
                       [[([([([([([(col("chrom")) + (Utf8(|))]) + (col("strand"))]) + (Utf8(|))]) + (col("inclusive start position").strict_cast(Utf8))]) + (Utf8(|))]) + (col("exclusive end position").strict_cast(Utf8))].alias("name")]
                         SELECT [col("chrom"), col("inclusive start position"), col("exclusive end position"), col("modified base code and motif"), col("strand"), col("Nvalid_cov"), col("fraction modified"), col("Nmod"), col("Ncanonical"), col("Nother_mod"), col("Ndelete"), col("Nfail"), col("Ndiff"), col("Nnocall")] FROM

                            Csv SCAN /Users/GeorgesKanaan/Documents/Development/Methylation/code/../data/methylation_5/polaribacter_r-contigs/barcode13.bed
                            PROJECT */18 COLUMNS
                END INNER JOIN
            END LEFT JOIN
  PLAN 4:
     WITH_COLUMNS:
     [Utf8(barcode12).alias("sample")]
       SELECT [col("name"), col("21839"), col("a"), col("m"), col("Ncanonical")] FROM
         WITH_COLUMNS:
         [col("m").fill_null([0]), col("21839").fill_null([0]), col("a").fill_null([0]), col("Ncanonical").fill_null([0])]
          UNIQUE BY None
            LEFT JOIN:
            LEFT PLAN ON: [col("name")]
              DF ["name", "m", "21839", "a"]; PROJECT */4 COLUMNS; SELECTION: "None"
            RIGHT PLAN ON: [col("name")]
               SELECT [col("name"), col("Ncanonical")] FROM
                FILTER [(col("Nvalid_cov")) == (col("max_valid_cov"))] FROM

                INNER JOIN:
                LEFT PLAN ON: [col("name"), col("mod_group")]
                   WITH_COLUMNS:
                   [col("modified base code and motif").replace([Series, Series, col("modified base code and motif")]).alias("mod_group")]
                    FILTER [(col("Ndiff")) < (col("Nvalid_cov"))] FROM

                     WITH_COLUMNS:
                     [[([([([([([(col("chrom")) + (Utf8(|))]) + (col("strand"))]) + (Utf8(|))]) + (col("inclusive start position").strict_cast(Utf8))]) + (Utf8(|))]) + (col("exclusive end position").strict_cast(Utf8))].alias("name")]
                       SELECT [col("chrom"), col("inclusive start position"), col("exclusive end position"), col("modified base code and motif"), col("strand"), col("Nvalid_cov"), col("fraction modified"), col("Nmod"), col("Ncanonical"), col("Nother_mod"), col("Ndelete"), col("Nfail"), col("Ndiff"), col("Nnocall")] FROM

                          Csv SCAN /Users/GeorgesKanaan/Documents/Development/Methylation/code/../data/methylation_5/polaribacter_r-contigs/barcode12.bed
                          PROJECT */18 COLUMNS
                RIGHT PLAN ON: [col("name"), col("mod_group")]
                  AGGREGATE
                  	[col("Nvalid_cov").max().alias("max_valid_cov")] BY [col("name"), col("mod_group")] FROM
                     WITH_COLUMNS:
                     [col("modified base code and motif").replace([Series, Series, col("modified base code and motif")]).alias("mod_group")]
                      FILTER [(col("Ndiff")) < (col("Nvalid_cov"))] FROM

                       WITH_COLUMNS:
                       [[([([([([([(col("chrom")) + (Utf8(|))]) + (col("strand"))]) + (Utf8(|))]) + (col("inclusive start position").strict_cast(Utf8))]) + (Utf8(|))]) + (col("exclusive end position").strict_cast(Utf8))].alias("name")]
                         SELECT [col("chrom"), col("inclusive start position"), col("exclusive end position"), col("modified base code and motif"), col("strand"), col("Nvalid_cov"), col("fraction modified"), col("Nmod"), col("Ncanonical"), col("Nother_mod"), col("Ndelete"), col("Nfail"), col("Ndiff"), col("Nnocall")] FROM

                            Csv SCAN /Users/GeorgesKanaan/Documents/Development/Methylation/code/../data/methylation_5/polaribacter_r-contigs/barcode12.bed
                            PROJECT */18 COLUMNS
                END INNER JOIN
            END LEFT JOIN
  PLAN 5:
     WITH_COLUMNS:
     [Utf8(barcode14).alias("sample")]
       SELECT [col("name"), col("21839"), col("a"), col("m"), col("Ncanonical")] FROM
         WITH_COLUMNS:
         [col("m").fill_null([0]), col("21839").fill_null([0]), col("a").fill_null([0]), col("Ncanonical").fill_null([0])]
          UNIQUE BY None
            LEFT JOIN:
            LEFT PLAN ON: [col("name")]
              DF ["name", "m", "21839", "a"]; PROJECT */4 COLUMNS; SELECTION: "None"
            RIGHT PLAN ON: [col("name")]
               SELECT [col("name"), col("Ncanonical")] FROM
                FILTER [(col("Nvalid_cov")) == (col("max_valid_cov"))] FROM

                INNER JOIN:
                LEFT PLAN ON: [col("name"), col("mod_group")]
                   WITH_COLUMNS:
                   [col("modified base code and motif").replace([Series, Series, col("modified base code and motif")]).alias("mod_group")]
                    FILTER [(col("Ndiff")) < (col("Nvalid_cov"))] FROM

                     WITH_COLUMNS:
                     [[([([([([([(col("chrom")) + (Utf8(|))]) + (col("strand"))]) + (Utf8(|))]) + (col("inclusive start position").strict_cast(Utf8))]) + (Utf8(|))]) + (col("exclusive end position").strict_cast(Utf8))].alias("name")]
                       SELECT [col("chrom"), col("inclusive start position"), col("exclusive end position"), col("modified base code and motif"), col("strand"), col("Nvalid_cov"), col("fraction modified"), col("Nmod"), col("Ncanonical"), col("Nother_mod"), col("Ndelete"), col("Nfail"), col("Ndiff"), col("Nnocall")] FROM

                          Csv SCAN /Users/GeorgesKanaan/Documents/Development/Methylation/code/../data/methylation_5/polaribacter_r-contigs/barcode14.bed
                          PROJECT */18 COLUMNS
                RIGHT PLAN ON: [col("name"), col("mod_group")]
                  AGGREGATE
                  	[col("Nvalid_cov").max().alias("max_valid_cov")] BY [col("name"), col("mod_group")] FROM
                     WITH_COLUMNS:
                     [col("modified base code and motif").replace([Series, Series, col("modified base code and motif")]).alias("mod_group")]
                      FILTER [(col("Ndiff")) < (col("Nvalid_cov"))] FROM

                       WITH_COLUMNS:
                       [[([([([([([(col("chrom")) + (Utf8(|))]) + (col("strand"))]) + (Utf8(|))]) + (col("inclusive start position").strict_cast(Utf8))]) + (Utf8(|))]) + (col("exclusive end position").strict_cast(Utf8))].alias("name")]
                         SELECT [col("chrom"), col("inclusive start position"), col("exclusive end position"), col("modified base code and motif"), col("strand"), col("Nvalid_cov"), col("fraction modified"), col("Nmod"), col("Ncanonical"), col("Nother_mod"), col("Ndelete"), col("Nfail"), col("Ndiff"), col("Nnocall")] FROM

                            Csv SCAN /Users/GeorgesKanaan/Documents/Development/Methylation/code/../data/methylation_5/polaribacter_r-contigs/barcode14.bed
                            PROJECT */18 COLUMNS
                END INNER JOIN
            END LEFT JOIN
  PLAN 6:
     WITH_COLUMNS:
     [Utf8(middle).alias("sample")]
       SELECT [col("name"), col("21839"), col("a"), col("m"), col("Ncanonical")] FROM
         WITH_COLUMNS:
         [col("m").fill_null([0]), col("21839").fill_null([0]), col("a").fill_null([0]), col("Ncanonical").fill_null([0])]
          UNIQUE BY None
            LEFT JOIN:
            LEFT PLAN ON: [col("name")]
              DF ["name", "m", "21839", "a"]; PROJECT */4 COLUMNS; SELECTION: "None"
            RIGHT PLAN ON: [col("name")]
               SELECT [col("name"), col("Ncanonical")] FROM
                FILTER [(col("Nvalid_cov")) == (col("max_valid_cov"))] FROM

                INNER JOIN:
                LEFT PLAN ON: [col("name"), col("mod_group")]
                   WITH_COLUMNS:
                   [col("modified base code and motif").replace([Series, Series, col("modified base code and motif")]).alias("mod_group")]
                    FILTER [(col("Ndiff")) < (col("Nvalid_cov"))] FROM

                     WITH_COLUMNS:
                     [[([([([([([(col("chrom")) + (Utf8(|))]) + (col("strand"))]) + (Utf8(|))]) + (col("inclusive start position").strict_cast(Utf8))]) + (Utf8(|))]) + (col("exclusive end position").strict_cast(Utf8))].alias("name")]
                       SELECT [col("chrom"), col("inclusive start position"), col("exclusive end position"), col("modified base code and motif"), col("strand"), col("Nvalid_cov"), col("fraction modified"), col("Nmod"), col("Ncanonical"), col("Nother_mod"), col("Ndelete"), col("Nfail"), col("Ndiff"), col("Nnocall")] FROM

                          Csv SCAN /Users/GeorgesKanaan/Documents/Development/Methylation/code/../data/methylation_5/polaribacter_r-contigs/middle.bed
                          PROJECT */18 COLUMNS
                RIGHT PLAN ON: [col("name"), col("mod_group")]
                  AGGREGATE
                  	[col("Nvalid_cov").max().alias("max_valid_cov")] BY [col("name"), col("mod_group")] FROM
                     WITH_COLUMNS:
                     [col("modified base code and motif").replace([Series, Series, col("modified base code and motif")]).alias("mod_group")]
                      FILTER [(col("Ndiff")) < (col("Nvalid_cov"))] FROM

                       WITH_COLUMNS:
                       [[([([([([([(col("chrom")) + (Utf8(|))]) + (col("strand"))]) + (Utf8(|))]) + (col("inclusive start position").strict_cast(Utf8))]) + (Utf8(|))]) + (col("exclusive end position").strict_cast(Utf8))].alias("name")]
                         SELECT [col("chrom"), col("inclusive start position"), col("exclusive end position"), col("modified base code and motif"), col("strand"), col("Nvalid_cov"), col("fraction modified"), col("Nmod"), col("Ncanonical"), col("Nother_mod"), col("Ndelete"), col("Nfail"), col("Ndiff"), col("Nnocall")] FROM

                            Csv SCAN /Users/GeorgesKanaan/Documents/Development/Methylation/code/../data/methylation_5/polaribacter_r-contigs/middle.bed
                            PROJECT */18 COLUMNS
                END INNER JOIN
            END LEFT JOIN
  PLAN 7:
     WITH_COLUMNS:
     [Utf8(top).alias("sample")]
       SELECT [col("name"), col("21839"), col("a"), col("m"), col("Ncanonical")] FROM
         WITH_COLUMNS:
         [col("m").fill_null([0]), col("21839").fill_null([0]), col("a").fill_null([0]), col("Ncanonical").fill_null([0])]
          UNIQUE BY None
            LEFT JOIN:
            LEFT PLAN ON: [col("name")]
              DF ["name", "m", "21839", "a"]; PROJECT */4 COLUMNS; SELECTION: "None"
            RIGHT PLAN ON: [col("name")]
               SELECT [col("name"), col("Ncanonical")] FROM
                FILTER [(col("Nvalid_cov")) == (col("max_valid_cov"))] FROM

                INNER JOIN:
                LEFT PLAN ON: [col("name"), col("mod_group")]
                   WITH_COLUMNS:
                   [col("modified base code and motif").replace([Series, Series, col("modified base code and motif")]).alias("mod_group")]
                    FILTER [(col("Ndiff")) < (col("Nvalid_cov"))] FROM

                     WITH_COLUMNS:
                     [[([([([([([(col("chrom")) + (Utf8(|))]) + (col("strand"))]) + (Utf8(|))]) + (col("inclusive start position").strict_cast(Utf8))]) + (Utf8(|))]) + (col("exclusive end position").strict_cast(Utf8))].alias("name")]
                       SELECT [col("chrom"), col("inclusive start position"), col("exclusive end position"), col("modified base code and motif"), col("strand"), col("Nvalid_cov"), col("fraction modified"), col("Nmod"), col("Ncanonical"), col("Nother_mod"), col("Ndelete"), col("Nfail"), col("Ndiff"), col("Nnocall")] FROM

                          Csv SCAN /Users/GeorgesKanaan/Documents/Development/Methylation/code/../data/methylation_5/polaribacter_r-contigs/top.bed
                          PROJECT */18 COLUMNS
                RIGHT PLAN ON: [col("name"), col("mod_group")]
                  AGGREGATE
                  	[col("Nvalid_cov").max().alias("max_valid_cov")] BY [col("name"), col("mod_group")] FROM
                     WITH_COLUMNS:
                     [col("modified base code and motif").replace([Series, Series, col("modified base code and motif")]).alias("mod_group")]
                      FILTER [(col("Ndiff")) < (col("Nvalid_cov"))] FROM

                       WITH_COLUMNS:
                       [[([([([([([(col("chrom")) + (Utf8(|))]) + (col("strand"))]) + (Utf8(|))]) + (col("inclusive start position").strict_cast(Utf8))]) + (Utf8(|))]) + (col("exclusive end position").strict_cast(Utf8))].alias("name")]
                         SELECT [col("chrom"), col("inclusive start position"), col("exclusive end position"), col("modified base code and motif"), col("strand"), col("Nvalid_cov"), col("fraction modified"), col("Nmod"), col("Ncanonical"), col("Nother_mod"), col("Ndelete"), col("Nfail"), col("Ndiff"), col("Nnocall")] FROM

                            Csv SCAN /Users/GeorgesKanaan/Documents/Development/Methylation/code/../data/methylation_5/polaribacter_r-contigs/top.bed
                            PROJECT */18 COLUMNS
                END INNER JOIN
            END LEFT JOIN
END UNION

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.