Describe the bug When add

`add_missing_columns` sometimes adds same missing column multiple times about pandera HOT 3 OPEN

r-terada commented on June 16, 2024

`add_missing_columns` sometimes adds same missing column multiple times

from pandera.

Comments (3)

derinwalters commented on June 16, 2024 1

Nice find! Looks like a bug in the missing column insertion logic that occurs when multiple columns not in the schema, in this case "col_b" and "col_c", are positioned after the missing column location in the dataframe to be validated. Thank you for the great working example. I'll submit a pull request shortly.

from pandera.

r-terada commented on June 16, 2024

Thank you for the investigation and quick fix!
I'm waiting for your pull request to be merged :)

from pandera.

aphorton commented on June 16, 2024

First, thank you very much for this fantastic package.

The code in OP's example runs as intended now for pandera 0.18.0, but adding non-unique column names causes a similar column-duplication problem.

import pandas as pd
import pandera as pa

schema = pa.DataFrameSchema(
    {
        "col_a": pa.Column(str),
        "col_missing": pa.Column(str, nullable=True)
    },
    add_missing_columns=True
)

df = pd.DataFrame({
    "col_a": ["a", "b", "c"],
    "col_b": ["d", "e", "f"]
})

print(schema.validate(df))
# -> works well
#   col_a col_missing col_b
# 0     a        None     d
# 1     b        None     e
# 2     c        None     f

df.columns = ["col_a", "col_a"]
print(schema.validate(df))
# -> duplicates columns
#   col_a col_a col_missing col_a col_a
# 0     a     d        None     a     d
# 1     b     e        None     b     e
# 2     c     f        None     c     f

Expected behavior

add only 1 col_missing
#   col_a col_a col_missing
# 0     a     d        None
# 1     b     e        None
# 2     c     f        None

from pandera.

`add_missing_columns` sometimes adds same missing column multiple times about pandera HOT 3 OPEN

Comments (3)

Expected behavior

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent