Giter Site home page Giter Site logo

Comments (13)

ethanwhite avatar ethanwhite commented on June 21, 2024

Windows was on Postgres. Ubuntu was on MySQL. I'm rerunning on Ubuntu from the command line to narrow in on the source of the problem.

from retriever.

ethanwhite avatar ethanwhite commented on June 21, 2024

Ubuntu seems to be fine on MySQL and Postgres using the CLI. Maybe that issue was just a problem with a download or some other one off thing. I'm rerunning now through the GUI (where I had the original failure) just to confirm that it's not something bizarre.

I'm replicated the failure on Windows on the same laptop. I still need to test FIA on Windows using Master. I'll do that on Monday unless someone beats me to it over the weekend.

from retriever.

ethanwhite avatar ethanwhite commented on June 21, 2024

There doesn't seem to be a problem with Master on Windows...

from retriever.

ethanwhite avatar ethanwhite commented on June 21, 2024

I can confirm that this is a Postgres problem on Windows using Master. It does not seem to influence MySQL. The error is still empty, so I am rerunning it from the command line to see if I can get more information.

from retriever.

ethanwhite avatar ethanwhite commented on June 21, 2024

It's an out of memory error, which is why it's been showing up inconsistently.

Creating table FIA.TREE...
Traceback (most recent call last):
  File "c:\Python27\Scripts\retriever-script.py", line 8, in <module>
    load_entry_point('retriever==1.2.1', 'console_scripts', 'retriever')()
  File "c:\Python27\lib\site-packages\retriever-1.2.1-py2.7.egg\retriever\main.p
y", line 93, in main
    script.download(engine)
  File "scripts\fia.py", line 76, in download
    engine.insert_data_from_file(engine.format_filename(prep_file_name))
  File "c:\Python27\lib\site-packages\retriever-1.2.1-py2.7.egg\retriever\engine
s\postgres.py", line 89, in insert_data_from_file
    return Engine.insert_data_from_file(self, filename)
  File "c:\Python27\lib\site-packages\retriever-1.2.1-py2.7.egg\retriever\lib\en
gine.py", line 612, in insert_data_from_file
    self.add_to_table()
  File "c:\Python27\lib\site-packages\retriever-1.2.1-py2.7.egg\retriever\lib\en
gine.py", line 51, in add_to_table
    if line.strip('\n\r\t ')]
MemoryError

from retriever.

ethanwhite avatar ethanwhite commented on June 21, 2024

On lines 50-51 of engine.py we make an extra copy of the table, which in some cases runs us out of memory. Ben is implementing switch to generators instead of lists that will solve this.

from retriever.

bendmorris avatar bendmorris commented on June 21, 2024

This is not as simple as I thought.

The lines of data are coming from an iterator, usually a file object; we should avoid coercing the lines into a list, as the list may be quite large and cause memory issues like this one. It's easy to just use a generator expression instead to filter out blank lines.

It is necessary, however, to get the total number of lines in the generator before looping over it and inserting the data. Doing so will exhaust the generator, and there will be no lines left to loop over. Using itertools.tee won't work either, as the first iterator is completely exhausted before the second begins, meaning the data is now queued in a list for the second iterator to go over; this is the memory equivalent of coercing the generator into a list. The retriever needs, instead of just a "source" iterator, something that can be run to re-generate the iterator when it has been used up. This will require some refactoring.

from retriever.

bendmorris avatar bendmorris commented on June 21, 2024

I think I've got this worked out; I'm testing now. The Retriever should require a lot less memory now.

from retriever.

ethanwhite avatar ethanwhite commented on June 21, 2024

Nice. Post when you commit the fix and I'll run tests as well. I'm also going to update the name of this issue to reflect the eventual problem.

from retriever.

bendmorris avatar bendmorris commented on June 21, 2024

The changes have been committed to master.

from retriever.

ethanwhite avatar ethanwhite commented on June 21, 2024

New error:

Scanning data for table TREE...
Traceback (most recent call last):
  File "c:\Python27\Scripts\retriever-script.py", line 8, in <module>
    load_entry_point('retriever==1.2.1', 'console_scripts', 'retriever')()
  File "c:\Python27\lib\site-packages\retriever-1.2.1-py2.7.egg\retriever\main.p
y", line 93, in main
    script.download(engine)
  File "scripts\fia.py", line 72, in download
    prep_file.write(line)
IOError: [Errno 28] No space left on device
close failed in file object destructor:
IOError: [Errno 28] No space left on device

from retriever.

ethanwhite avatar ethanwhite commented on June 21, 2024

Never mind. Apparently in all of the testing I filled up my hard drive.

from retriever.

ethanwhite avatar ethanwhite commented on June 21, 2024

Working great. Thanks.

from retriever.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.