Comments (13)
Windows was on Postgres. Ubuntu was on MySQL. I'm rerunning on Ubuntu from the command line to narrow in on the source of the problem.
from retriever.
Ubuntu seems to be fine on MySQL and Postgres using the CLI. Maybe that issue was just a problem with a download or some other one off thing. I'm rerunning now through the GUI (where I had the original failure) just to confirm that it's not something bizarre.
I'm replicated the failure on Windows on the same laptop. I still need to test FIA on Windows using Master. I'll do that on Monday unless someone beats me to it over the weekend.
from retriever.
There doesn't seem to be a problem with Master on Windows...
from retriever.
I can confirm that this is a Postgres problem on Windows using Master. It does not seem to influence MySQL. The error is still empty, so I am rerunning it from the command line to see if I can get more information.
from retriever.
It's an out of memory error, which is why it's been showing up inconsistently.
Creating table FIA.TREE...
Traceback (most recent call last):
File "c:\Python27\Scripts\retriever-script.py", line 8, in <module>
load_entry_point('retriever==1.2.1', 'console_scripts', 'retriever')()
File "c:\Python27\lib\site-packages\retriever-1.2.1-py2.7.egg\retriever\main.p
y", line 93, in main
script.download(engine)
File "scripts\fia.py", line 76, in download
engine.insert_data_from_file(engine.format_filename(prep_file_name))
File "c:\Python27\lib\site-packages\retriever-1.2.1-py2.7.egg\retriever\engine
s\postgres.py", line 89, in insert_data_from_file
return Engine.insert_data_from_file(self, filename)
File "c:\Python27\lib\site-packages\retriever-1.2.1-py2.7.egg\retriever\lib\en
gine.py", line 612, in insert_data_from_file
self.add_to_table()
File "c:\Python27\lib\site-packages\retriever-1.2.1-py2.7.egg\retriever\lib\en
gine.py", line 51, in add_to_table
if line.strip('\n\r\t ')]
MemoryError
from retriever.
On lines 50-51 of engine.py we make an extra copy of the table, which in some cases runs us out of memory. Ben is implementing switch to generators instead of lists that will solve this.
from retriever.
This is not as simple as I thought.
The lines of data are coming from an iterator, usually a file object; we should avoid coercing the lines into a list, as the list may be quite large and cause memory issues like this one. It's easy to just use a generator expression instead to filter out blank lines.
It is necessary, however, to get the total number of lines in the generator before looping over it and inserting the data. Doing so will exhaust the generator, and there will be no lines left to loop over. Using itertools.tee won't work either, as the first iterator is completely exhausted before the second begins, meaning the data is now queued in a list for the second iterator to go over; this is the memory equivalent of coercing the generator into a list. The retriever needs, instead of just a "source" iterator, something that can be run to re-generate the iterator when it has been used up. This will require some refactoring.
from retriever.
I think I've got this worked out; I'm testing now. The Retriever should require a lot less memory now.
from retriever.
Nice. Post when you commit the fix and I'll run tests as well. I'm also going to update the name of this issue to reflect the eventual problem.
from retriever.
The changes have been committed to master.
from retriever.
New error:
Scanning data for table TREE...
Traceback (most recent call last):
File "c:\Python27\Scripts\retriever-script.py", line 8, in <module>
load_entry_point('retriever==1.2.1', 'console_scripts', 'retriever')()
File "c:\Python27\lib\site-packages\retriever-1.2.1-py2.7.egg\retriever\main.p
y", line 93, in main
script.download(engine)
File "scripts\fia.py", line 72, in download
prep_file.write(line)
IOError: [Errno 28] No space left on device
close failed in file object destructor:
IOError: [Errno 28] No space left on device
from retriever.
Never mind. Apparently in all of the testing I filled up my hard drive.
from retriever.
Working great. Thanks.
from retriever.
Related Issues (20)
- Enable subsetting / clipping the spatial dataset to smaller extents HOT 3
- API research for API integration in Data Retriever (GSoC '21)
- Add a default bounding box for usgs-elevation
- Retriever doesn't detect new python scripts HOT 1
- Add RDatasets
- Tidycensus dataset doesn't work with the download and install csv commands. HOT 3
- Make sure that the the R api dataset are run on the retrieverdash
- Add new functions to rdataretriever and Retriever.jl
- Excel xlsx file; not supported HOT 9
- Update codecov to action stage in workflows HOT 2
- not able to use gdal==3.3.2 while working with ".shp" files HOT 2
- Improve test coverage HOT 6
- display_all_rdatasets_names in rdatasets takes a list of package_name HOT 4
- Create breeding bird survey for all releases. HOT 4
- Downloading fails for files with no Content-Disposition HOT 1
- Retriever should gracefully fail if there is no internet. HOT 2
- hacktoberfest guide
- Installation from source fails due to missing configuration HOT 6
- Installation failing on Python 3.12 due to removal of imp package HOT 1
- Test and update Bioclim data
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from retriever.