Giter Site home page Giter Site logo

Comments (6)

mzinkevi avatar mzinkevi commented on June 30, 2024

That looks really weird. No, in general it shouldn't have a new column and a column dropped at the same time. I have never seen such errors before.

If you can produce a small statistics object and schema object that produces these kinds of errors, that would help in the diagnosis of what is wrong.

from data-validation.

sudroy avatar sudroy commented on June 30, 2024

One possibility for why this could happen is if your eval data has whitespace character that causes our data validation code to treat is as a separate feature. Can you please check if the edits that you made to your eval data inadvertently also changed the feature column name for the feature "f_key" in the CSV header to have a whitespace character ?

It will also be useful if you provide us the snippets of the statistics for this particular feature "f_key" from your eval statistics (i.e., the FeatureNameStatistics with name field set to "f_key"), and also make sure that there is a single instance of FeatureNameStatistics with this name.

from data-validation.

hanneshapke avatar hanneshapke commented on June 30, 2024

Thank you @mzinkevi and @sudroy for your comments. I tried to reproduce the odd behavior with a smaller dataset, but I didn't see the same errors. I checked the data source for whitespaces, but the file looks good.

I also noticed that validate_statistics reported a bunch of type changes for a large number of columns, when I only added a few new categorical tokens to some of the columns.

I think more and more that the odd behavior is related to my initial csv data, rather than related to tfdv. Let me test a few more cases in the coming days and report back if I encounter similar errors.|
Thank you for your help investigating the behavior.

from data-validation.

paulgc avatar paulgc commented on June 30, 2024

@hanneshapke Closing this issue for now. Feel free to reopen this in case you are able to reproduce the odd behavior.

from data-validation.

jackhawa avatar jackhawa commented on June 30, 2024

I get the same issue even a year after this issue has been closed.

@hanneshapke did you end up solving your issue? If so it would be great if you can share that. Thanks.

from data-validation.

npoly avatar npoly commented on June 30, 2024

@jackhawa Can you provide a minimal repro so that we can investigate further?

from data-validation.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.