Giter Site home page Giter Site logo

asdfree's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

asdfree's Issues

Server not ready (monetdb.program.path is incorrect)

Thanks for your amazing work on the ACS!

I don't know if the script can address this, but when I run your script, I see this "Server not ready" message:

db <- dbConnect( MonetDB.R() , monet.url , wait = TRUE )
/home/USERNAME/R/ACS/MonetDB/acs.sh: 2: /home/USERNAME/R/ACS/MonetDB/acs.sh: /bin/mserver5: not found
Server not ready(Could not connect to localhost:50001), retrying (ESC or CTRL+C to abort)
...

On my Ubuntu 14.04 machine, following install directions here (https://www.monetdb.org/Documentation/UserGuide/Downloads/UbuntuDebian), my mserver5 binary ends up in /usr/bin. However, the script appears to search for mserver5 in /bin. Both folders are in my $PATH.

I altered script so that: " monetdb.program.path = "/usr" , "

This now connects to the Monet Server.

Thanks again for outstanding work!

Download issue with ESS data

Excellent ESS code!

But I got the following error towards the end, while downloading ESS Round 1:

importing /download.html?file=ESS1cfNO&c=NO&y=2002 ...
Error in factor([email protected], levels = values[use.levels], labels = labels[use.levels]) : 
RMate stopped at line 0
  invalid 'labels'; length 2 should be 1 or 1
Calls: data.frame ... as.data.frame.double.item -> as.factor -> as.factor
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted

Could it be due to some SPSS import setting?

P.S. I have not forgotten that I need to work on Insee data when I get some free time!

Provide combined db?

Is it possible to run all the scripts into the same database to create one table with data for all years?

think about converting censo demografico and IPUMSI over with `as.svrepdesign`?

looks like many linearization calculations are very slow, but once converted to jackknifed weights, things should move as fast as the ACS does.. need to speed up basic analysis commands like svymean in http://monetdb.cwi.nl/testweb/web/eanthony/fedora20-3/1452365149/censolite.log

@DjalmaPessoa from your perspective is there any methodological problem with taking the cluster/strata/fpc and coercing them all to replicate weights with survey:::as.svrepdesign?

http://r-survey.r-forge.r-project.org/survey/html/as.svrepdesign.html

you can read about what the function actually does by looking at the code in survey:::jknweights and survey:::jk1weights

cc @hannesmuehleisen

i cannot reopen the issue...

hey!

sorry but without admin privileges i cannot reopen the issue! :-)

anyway, left another comment there.

cheers

Suggestion: README files

Hi Anthony,

I'm a huge fan. Apologies for not being able to answer your call to user contributions: I have limited experience with complex survey objects, and French official statistics are locked into paper-based institutions that make automation impossible for most sources.

Reading your recent SEER scripts, I was thinking that a lot of the information would benefit from appearing in a folder-specific README file, as GitHub recommends. It would shorten the scripts, show up online when users browse your repo, and perhaps make your invitation to contribute more visible. It might also be easier to update.

All the best,

François

World Values Survey: longitudinal dataset

Dear Anthony,

I cannot seem to download the longitudinal file for the [World Values Survey](https://github.com/ajdamico/usgsd/tree/master/World Values Survey) through your script. I checked the code line per line, and it seems that the HTTP headers that you get at AJDocumentation.jsp?CndWAVE=-1 do not return the links to the data files (or the documentation) any more.

Note sure if that's connected, but the WVS website recently updated its longitudinal data file.

Cheers,

François

Feature Request Economic Census

Hi:

I was wondering if there were any plans to port the US economic census into R? It is an important data source for some economists.

error downloading Current Population Survey

When running

library(downloader)
# setwd( "C:/My Directory/CPS/" )
cps.years.to.download <- 2014:1998
source_url( "https://raw.github.com/ajdamico/usgsd/master/Current%20Population%20Survey/download%20all%20microdata.R" , prompt = FALSE , echo = TRUE )

I received the error

Invalid file, or file has unsupported features.
In addition: Warning message:
In parse.SAScii(sas_ri, beginline, lrecl) : NAs introduced by coercion

The file being parsed was http://www.census.gov/housing/povmeas/spmresearch/spmresearch2013.sas7bdat

wrong to assume missing values stay missing and not zero

read.SAScii.sqlite suffers from this issue-

problem <- c( "v01\tv02" , "1000\t1000" , "\t", "\t" , "1000\t1000" )

library(RSQLite)
db <- dbConnect( SQLite() )

tf <- tempfile()
writeLines( problem , tf )
dbWriteTable(db, 'this_table', tf, sep = "\t", header = TRUE)

# missing data stays missing
read.table( tf , sep = '\t' , header = TRUE )

# missing data is zero
dbGetQuery( db , "SELECT * FROM this_table" )

problem in read.SAScii.sqlite

When runnign the 2004 SIPP downloader script, I hit a bug at line 141 of the file

I tracked it down to this line in read.SAScii.sqlite.

dbSendQuery has object statement that is

"ALTER TABLE w1 RENAME TO temp_backup"

after which it fails with

Error in sqliteExecStatement(conn, statement, ...) : RS-DBI driver: (error in statement: no such table: w1)

it seems there is no table w1? Notice that the read.SAScii.sqlite function has at that point been called successfully at line 122 of the file

this error occurs on both MacOS and unix in exactly identical fashion (posting both sessionInfo()s here).

Cheers
flo

sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] descr_1.0.2 RSQLite_0.11.4 DBI_0.2-6 SAScii_1.0 downloader_0.3 vimcom_0.9-8

loaded via a namespace (and not attached):
[1] digest_0.6.3 tools_3.0.0 xtable_1.7-1

sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] descr_1.0.2 downloader_0.3 SAScii_1.0 RSQLite_0.11.4 DBI_0.2-7
[6] devtools_1.3

loaded via a namespace (and not attached):
[1] digest_0.6.3 evaluate_0.5.1 httr_0.2 memoise_0.1 parallel_3.0.1
[6] RCurl_1.95-4.1 stringr_0.6.2 tools_3.0.1 whisker_0.3-2 xtable_1.7-1

OS X Yosemite 10.10.1 Script fails

It could be something specific to my system but this script would not work for me. For:

install.packages( "sqlsurvey" , repos = c( "http://cran.r-project.org" , "http://R-Forge.R-project.org" ) , dep=TRUE )

I kept getting

Installing package into ‘/Users/phparker/Library/R/3.1/library’(as ‘lib’ is unspecified)
Warning: unable to access index for repository http://R-Forge.R-project.org/bin/macosx/mavericks/contrib/3.1

   package ‘sqlsurvey’ is available as a source package but not as a binary

Warning message:
package ‘sqlsurvey’ is not available (for R version 3.1.2) 

Instead I had to:

a) install.packages( c( 'SAScii' , 'descr' , 'survey' , 'MonetDB.R' , 'downloader' , 'R.utils' ) )
Then in terminal:
b) $svn checkout svn://r-forge.r-project.org/svnroot/sqlsurvey/
c) $ R --vanilla CMD INSTALL --build sqlsurvey/pkg/sqlsurvey
d) $ R --vanilla CMD INSTALL --build sqlsurvey/pkg/RMonetDB

Don't use `setwd()`

Generally it's a bad idea to use setwd() because it means your code is no longer portable

prepare ipumsi for eanthony integration

for password--
what i would do is put username password into a list in a rds file and read that and put that file into gitignore list

[1:51:44 PM] Hannes Mühleisen: a <- list(username="bla", password="blubb")
[1:52:13 PM] Hannes Mühleisen: tf <- tempfile()
[1:52:17 PM] Hannes Mühleisen: saveRDS(a, tf)
[1:52:36 PM] Hannes Mühleisen: a <- readRDS(tf)
[1:52:50 PM] Hannes Mühleisen: its nicer because it only loads a single variable
[1:52:53 PM] Anthony Damico: and just make sure never to print the contents of `a` to the logs

censolite subsetting mistake

2016-01-10 10:56:33 > dbGetQuery(db, "SELECT SUM( pes_wgt * v6033 ) / SUM( pes_wgt ) AS mean_age FROM c10 WHERE v6033 < 900")
2016-01-10 10:56:33 QQ: 'SELECT SUM( pes_wgt * v6033 ) / SUM( pes_wgt ) AS mean_age FROM c10 WHERE v6033 < 900'
2016-01-10 10:56:35 II: Finished in 2s
2016-01-10 10:56:35 mean_age
2016-01-10 10:56:35 1 32.03549
2016-01-10 10:56:35
2016-01-10 10:56:35 > svymean(~v6033, pes.d, na.rm = TRUE)
2016-01-10 10:56:35 QQ: 'SELECT name, value from sys.env()'
2016-01-10 10:56:35 II: Finished in 0s
2016-01-10 10:56:35 QQ: 'select v6033 from c10'
2016-01-10 10:56:36 II: Finished in 0.99s
2016-01-10 21:27:28 mean SE
2016-01-10 21:27:28 v6033 44.532 0.0265

!MALException:setScenario:Scenario not initialized 'sql'

Is it normal to see this error a lot while the script is running?

I'll put it in a bit of context. This is the latest release of MonetDB, R, RStudio, and Windows:

> # loop through each possible acs year
> for ( year in 2050:2005 ){
+ 
+   # loop through each possible acs dataset size category
+   for ( size in c(  .... [TRUNCATED] 
Downloading from URL 'http://www2.census.gov/acs2013_1yr/pums/unix_hwy.zip' to file 'C:\Users\Jason\AppData\Local\Temp\RtmpaYkKCC\file39a43b7b650f'... 
trying URL 'http://www2.census.gov/acs2013_1yr/pums/unix_hwy.zip'
Content type 'application/zip' length 630801 bytes (616 KB)
downloaded 616 KB

MonetDB: Switching to single-threaded query execution.
Downloading from URL 'http://www2.census.gov/acs2013_1yr/pums/csv_hus.zip' to file 'C:\Users\Jason\AppData\Local\Temp\RtmpaYkKCC\file39a43b7b650f'... 
trying URL 'http://www2.census.gov/acs2013_1yr/pums/csv_hus.zip'
Content type 'application/zip' length 259462625 bytes (247.4 MB)
downloaded 247.4 MB

[1] "warning: column name 'type' unacceptable in monetdb.  changing to 'type_'"
SUCCESS: The process with PID 12204 has been terminated.
Downloading from URL 'http://www2.census.gov/acs2013_1yr/pums/unix_pwy.zip' to file 'C:\Users\hackr\AppData\Local\Temp\RtmpaYkKCC\file39a43b7b650f'... SUCCESS: The process with PID 12204 has been terminated.
Downloading from URL 'http://www2.census.gov/acs2013_1yr/pums/unix_pwy.zip' to file 'C:\Users\hackr\AppData\Local\Temp\RtmpaYkKCC\file39a43b7b650f'... 
trying URL 'http://www2.census.gov/acs2013_1yr/pums/unix_pwy.zip'
Content type 'application/zip' length 1240992 bytes (1.2 MB)
downloaded 1.2 MB

Server not ready(Authentication error: !MALException:setScenario:Scenario not initialized 'sql'
), retrying (ESC or CTRL+C to abort)
Server not ready(Authentication error: !MALException:setScenario:Scenario not initialized 'sql'
), retrying (ESC or CTRL+C to abort)
MonetDB: Switching to single-threaded query execution.
Downloading from URL 'http://www2.census.gov/acs2013_1yr/pums/csv_pus.zip' to file 'C:\Users\Jason\AppData\Local\Temp\RtmpaYkKCC\file39a43b7b650f'... 
trying URL 'http://www2.census.gov/acs2013_1yr/pums/csv_pus.zip'
Content type 'application/zip' length 616326250 bytes (587.8 MB)
downloaded 587.8 MB

SUCCESS: The process with PID 12292 has been terminated.
Server not ready(Authentication error: !MALException:setScenario:Scenario not initialized 'sql'
), retrying (ESC or CTRL+C to abort)
Server not ready(Authentication error: !MALException:setScenario:Scenario not initialized 'sql'
), retrying (ESC or CTRL+C to abort)
Server not ready(Authentication error: !MALException:setScenario:Scenario not initialized 'sql'
), retrying (ESC or CTRL+C to abort)
Server not ready(Authentication error: !MALException:setScenario:Scenario not initialized 'sql'
), retrying (ESC or CTRL+C to abort)
Server not ready(Authentication error: !MALException:setScenario:Scenario not initialized 'sql'
), retrying (ESC or CTRL+C to abort)
Server not ready(Authentication error: !MALException:setScenario:Scenario not initialized 'sql'
), retrying (ESC or CTRL+C to abort)
Server not ready(Authentication error: !MALException:setScenario:Scenario not initialized 'sql'
), retrying (ESC or CTRL+C to abort)
MonetDB: Switching to single-threaded query execution.
Error in .local(conn, statement, ...) : 
  Unable to execute statement 'create table acs2013_1yr_m as select 'M' as rt, a.serialno, a.division, a.puma, a.region, a.st, a.ad...'.
Server says '!GDK reported error.'.

By the way, why the single-threaded option? Won't that slow it down a good bit? I suppose something bad happens without it?

A few notes and suggestions

Hi Anthony,

I've written this Gist that contains pointers to bits and pieces of survey data online, and how to read them into R. There's also a short bibliography of example studies and survey-specific packages.

I have not documented the weighting schemes, but the European Social Survey has a very simple structure, and the ANES data extracts should be straightforward too. The GSS is best weighted as you show in your own, much more elaborate script.

Some quick notes after replicating the NHIS scripts: would it be a good idea to use file.exists and avoid re-downloading any existing file in the download-all-microdata routines? If you run the script in repeated runs (a plausible scenario, given the amount of data), it would help to skip previously done jobs, e.g. documentation files.

Adding a makefile could also help running all tasks in the background with the smallest possible amount of CPU. That would also hint to the reader that it is not a good idea to run some of the scripts in RStudio, which renders SAScii progress statements in a slightly strange way).

I can make a quick fork to illustrate both suggestions.

if HPSA was orig. designated 'withdrawn' *before* the time point, AND never updated, throw it out

I have one suggestion that should help pick out only those Designated areas. Currently, the code appears to keep those areas that were withdrawn on an early date, but then never updated.

In my tests, just below this snippet:

if the hpsa was updated to 'withdrawn' before the user-defined time point, throw it out

x <- x[ !( no.na( x$ud < designated.time.point ) & x$status.description == 'Withdrawn' ) , ]

I added this snippet:

RTM: if HPSA was orig. designated 'withdrawn' before the time point, AND never updated, throw it out:

x <- x[ !( (x$dd < designated.time.point) & (is.na(x$ud)) & (x$status.description == 'Withdrawn') ) , ]

and this appears to exclude those areas that were designated Withdrawn before the 'designated.time.point' and never updated. (For me, this eliminates 412 additional areas today).

your.email

Hello,

Small bug in the (awesome) ANES code: your.username at lines 57 and 61 should be your.email to match the later call to that object. Keep on rocking.

World Values Survey: using countries as strata

Quick question related to #25:

Some time ago on Statalist, Stas Kolenikov recommended using countries as super-strata when weighting the WVS longitudinal dataset for multi-country analysis.

As far as I understand, this translates into this survey design:

wvs.design = svydesign(~ 1, data = wvs.multi.country.dataset, strata = ~ S009, weights = ~ S019)

where S009 designates the country variable and S019 designates the 'N = 1500' homogenized survey weights that give every country in the WVS wave the same sample weight, regardless of their actual number of observations.

Is that something you would recommend?

think of better ways to catch and restart broken downloads

currently working on 2002Downloading from URL 'ftp://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_anual/microdados/reponderacao_2001_2012/PNAD_reponderado_2002.zip' to file 'C:\Users\AnthonyD\AppData\Local\Temp\3\RtmpKsOkkL\filefd0772f6b8a'... trying URL 'ftp://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_anual/microdados/reponderacao_2001_2012/PNAD_reponderado_2002.zip'
ftp data connection made, file length unknown
downloaded 0 bytes


Error in file(con, "r") : invalid 'description' argument

pns `dom` table does not get used in the import script

djalma, i made some changes to the main script to get started--

e7d9b6e

--but the dom table never gets used, and i believe it should be merged onto the pes table before the survey design gets created? that way, it is a rectangular file (like the censo demografico gets dom and pes merged). what merge fields should be used? so that i can merge on the household-level information

using the current version of the import file, the intersecting fields are--

> intersect(names(dom),names(pes))
 [1] "v0001"     "v0024"     "upa_pns"   "v0006_pns" "upa"       "v0028"     "v0029"     "v00281"    "v00291"   "v00282"    "v00292"    "v00283"    "v00293" 

thanks!

Consumer Expenditure 2 warnings generated.

Thanks for making such a great resource!

I was running the "download all microdata.R" file for the Consumer expenditure data and I got two warnings from the code used to find years.to.download.

Warning messages:
1: In readLines("http://www.bls.gov/cex/pumdhome.htm") :
  incomplete final line found on 'http://www.bls.gov/cex/pumdhome.htm'
2: In grep("/pumd_([0-9][0-9][0-9][0-9]).htm", readLines("http://www.bls.gov/cex/pumdhome.htm"),  :
  input string 785 is invalid in this locale

I was able to get the two warnings to go away by making the following changes

  • add warn=FALSE to the readlines, so it doesn't complain about the html not ending with a new line
  • set the locale used for interpreting the string format, so that it doesn't complain about invalid input strings

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.