mirador / nhanes Goto Github PK

View Code? Open in Web Editor NEW

22.0 22.0 19.0 179 KB

Scripts to download and aggregate NHANES data

Home Page: http://www.cdc.gov/nchs/nhanes.htm

License: GNU General Public License v2.0

Python 99.58% Shell 0.42%

nhanes's People

Contributors

Stargazers

Watchers

Forkers

imclab guhjy hehema guoyalun irreversibly alldefector veraxrl lixiccccc dalinhuang odinokov waterflower6666 pengwon cheryl221 jinbinchan dw-an jtnedoctor elegant-spider ltj-github vvanhieu

nhanes's Issues

UnicodeDecodeError when reading converted csv files

This happened so far only with 2013-2014 cycle, where opening csv files with the default encoding on Mac results in the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position N: invalid start/continuing byte. The files should be opened with latin-1 encoding.

Add mirador config file to individual cycles

The config.mira file is not copied into the mirador datasets.

Full name of variables in aggregated dataset incorrectly stored in groups file

The string has a "b" appended to it:

Seems is caused by Python 3 encoding handling.

makedataset hangs with 2013-2014 cycle

Although running the steps individually works fine.

Generating the final dataset

Hi! First of all, I'm a big fan of you guys, making such an effort to make it easy to use the NHANES data--thanks!
That being said, I'm struggling to generate the data. I've cloned your repo and am running

makeall.sh 1999 2018

from the terminal, which creates a folder (mirador/1999-2018/) containing a single file called config.mira. Am I doing anything wrong? Have can I access the final dataset?

Order values in release cycle variable is wrong

The "Data release cycle" variable is generated when aggregating consecutive cycles. It is a categorical variable with a coded value for each NHANES cycle (i.e.: 7 for 2011-2012, 8 for 2013-2014, etc). The. order of this values is not correct in the dictionary.

Use xport Python pacakge for XPORT conversion

Right now, the xpt2csv calls R to convert the .xpt files into .csv. The xport package for Python could be used instead. Tested with one file, seems to work well, need to check more to make sure that there are no discrepancies with R-based conversion.

FTP structure has changed on CDC server.

It appears that the FTP structure has changed rendering getdata.py and download.py useless. If anyone has a fix, I'd love to see the changes merged. This looks like an incredibly useful set of scripts otherwise.

Metadata generation error with 2015-2016 cycle

Command: python makemeta.py 2015-2016 Questionnaire data/sources/csv/2015-2016 data/mirador/2015-2016/question.xml_strings
XML validation error:
Traceback (most recent call last):
File "makemeta.py", line 461, in
doc = parseString(''.join(xml_strings))
File "/Users/andres/anaconda3/lib/python3.7/xml/dom/minidom.py", line 1968, in parseString
return expatbuilder.parseString(string)
File "/Users/andres/anaconda3/lib/python3.7/xml/dom/expatbuilder.py", line 925, in parseString
return builder.parseString(string)
File "/Users/andres/anaconda3/lib/python3.7/xml/dom/expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 847, column 256

Stuck when running getdata.py

Hi there, I really appreciate you guys for this great work to download the NHANES dataset.
But when I running this

python getdata.py 2007-2008

It stuck at the DOWNLOADING XPT FILES message for 30 minutes
Please take a look at it.
Thanks a lot

Allow user to specify location of composites group

Add argument to composites.py script to indicate where to insert the composites group in the xml file

mirador / nhanes Goto Github PK

nhanes's People

Contributors

Stargazers

Watchers

Forkers

nhanes's Issues

UnicodeDecodeError when reading converted csv files

Add mirador config file to individual cycles

Full name of variables in aggregated dataset incorrectly stored in groups file

makedataset hangs with 2013-2014 cycle

Generating the final dataset

Order values in release cycle variable is wrong

Use xport Python pacakge for XPORT conversion

FTP structure has changed on CDC server.

Metadata generation error with 2015-2016 cycle

Stuck when running getdata.py

Allow user to specify location of composites group

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent