Giter Site home page Giter Site logo

Comments (13)

root-11 avatar root-11 commented on July 1, 2024

Hey @ypanagis
Thanks for this. I'll make the error message more informative in the next release.

The file you use has claims to have 31 headers, but there are only 30 columns.
The column for "court" is missing.

This will work for you:

from tablite import Table
from pathlib import Path
path = Path(__file__).parent / "data" / 'long_text_test.csv'
assert path.exists()

columns = [
    "sharepointid","Rank","ITEMID","docname","doctype","application","APPNO",
    "ARTICLES","violation","nonviolation","CONCLUSION","importance","ORIGINATING BODY ID",
    "typedescription","kpdate","kpdateAsText","documentcollectionid","documentcollectionid2",
    "languageisocode","extractedappno","isplaceholder","doctypebranch","RESPONDENT",
    "respondentOrderEng","scl","ECLI","ORIGINATING BODY","YEAR","FULLTEXT","judges"]

t = Table.import_file(path, import_as='csv', columns={c:'f' for c in columns}, text_qualifier='"')
selection = columns[:5]
t.__getitem__(*selection).show()

PS> Note that your file is cut mid-row in your file test.csv (the last row has 25 rows)

Here's what the output looks like on my machine:
image

from tablite.

ypanagis avatar ypanagis commented on July 1, 2024

Hi @root-11 and thank you for your reply. First of all, yes this CSV is rather ill-structured and is missing values at some columns. One of those is the column "court" as you very correctly noticed.

I didn't know of the columns and text_qualifier parameters of import_file and it is a real convenience that they are included!

I played a bit in the example script that you gave, with setting selection = columns[-3:] to see e.g. how it can work with the last few columns that includes TEXT which is the long one. I saw however the error I submit in the attached file. After browsing the messages it seems that lines 45-46 of the CSV, cause an error but not very obvious what and couldn't really see something in the CSV (there can be something of course).
error.txt

from tablite.

root-11 avatar root-11 commented on July 1, 2024

That's python multiprocessing module crashing.
As the test suite runs python3.8 on linux just like you this seems strange. Could it be a difference between your conda env and pythons own venv?

from tablite.

ypanagis avatar ypanagis commented on July 1, 2024

I run the script on MacOS, can this also be an issue with multiprocessing?

from tablite.

root-11 avatar root-11 commented on July 1, 2024

I'm not sure. Can you try to run the test multiprocessing test suite in this script:
https://github.com/root-11/mplite/blob/main/tests/test_basics.py

If that doesn't work I'll have to do a deeper dive to why MacOS behaves differently.

from tablite.

root-11 avatar root-11 commented on July 1, 2024

I've added windows and macOS to the test matrix and they all come out positive:

image

from tablite.

ypanagis avatar ypanagis commented on July 1, 2024

I changed to Python 3.9 as you suggested but gives me now the error in the attached file. My PC has also mamba installed the environment is now a mamba one but I hope this is not a problem.

Note that I saw the same error when I removed the last two columns that had some emtpy values, in case that caused issues.

tablite is in version 2022.10.08.
error.txt

from tablite.

root-11 avatar root-11 commented on July 1, 2024

Thanks for that Yannis! I'll look into that immediately.

from tablite.

root-11 avatar root-11 commented on July 1, 2024

So the error says that psutil.virtual_memory().free is zero.

Could you run this on your mac for me:

import psutil
psutil.virtual_memory().free

from tablite.

ypanagis avatar ypanagis commented on July 1, 2024

Thanks Bjorn, I just ran it and gives this RuntimeError from Python 3.8 bug

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

...

from tablite.

root-11 avatar root-11 commented on July 1, 2024

@ypanagis - you think we can close this ticket now?

from tablite.

ypanagis avatar ypanagis commented on July 1, 2024

Yes @root-11 makes total sense to me. Will try the package some more, but this part is definitely over now.

from tablite.

root-11 avatar root-11 commented on July 1, 2024

Neat. Just FYI: I've released a new version today with slightly better memory management.
The details are in the changelog: https://github.com/root-11/tablite/blob/master/changelog.md

from tablite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.