iqss / amelia Goto Github PK
View Code? Open in Web Editor NEWAmelia: A Package for Missing Data
Home Page: http://gking.harvard.edu/amelia
Amelia: A Package for Missing Data
Home Page: http://gking.harvard.edu/amelia
Y2015W1_m1_NA <- Y2015W1_m1 %>%
dplyr::select(index, BidOpen, BidHigh, BidLow, BidClose, AskOpen, AskHigh, AskLow, AskClose) %>%
prodNA(noNA = 0.01)
> Y2015W1_m1_NA
# A tibble: 7,200 x 9
index BidOpen BidHigh BidLow BidClose AskOpen AskHigh AskLow AskClose
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2015-01-04 22:00:00 120. 121. 120. 121. 121. 121. 121. 121.
2 2015-01-04 22:01:00 121. 121. 120. 121. 121. 121. 121. 121.
3 2015-01-04 22:02:00 121. 121. 121. 121. 121. 121. 121. 121.
4 2015-01-04 22:03:00 121. 121. 121. 121. 121. 121. 121. 121.
5 2015-01-04 22:04:00 121. 121. 120. 121. 121. 121. 121. 121.
6 2015-01-04 22:05:00 121. 121. 120. 120. 121. 121. 121. 121.
7 2015-01-04 22:06:00 120. 121. 120. 121. 121. 121. 120. 121.
8 2015-01-04 22:07:00 121. 121. 121. 121. 121. 121. 121. 121.
9 2015-01-04 22:08:00 121. 121. 121. 121. 121. 121. 121. 121.
10 2015-01-04 22:09:00 121. 121. 121. 121. 121. 121. 121. 121.
# ... with 7,190 more rows
> Y2015W1_m1_NA %>% str
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 7200 obs. of 9 variables:
$ index : POSIXct, format: "2015-01-04 22:00:00" "2015-01-04 22:01:00" "2015-01-04 22:02:00" ...
$ BidOpen : num 120 121 121 121 121 ...
$ BidHigh : num 121 121 121 121 121 ...
$ BidLow : num 120 120 121 121 120 ...
$ BidClose: num 121 121 121 121 121 ...
$ AskOpen : num 121 121 121 121 121 ...
$ AskHigh : num 121 121 121 121 121 ...
$ AskLow : num 121 121 121 121 121 ...
$ AskClose: num 121 121 121 121 121 ...
> Y2015W1_m1_NA %>% amelia(ts = 'index')
-- Imputation 1 --
1 2
Error in as.POSIXct.numeric(value) : 'origin' must be supplied
> Y2015W1_m1_NA %>% amelia(idvars = 'index')
-- Imputation 1 --
1 2
Error in as.POSIXct.numeric(value) : 'origin' must be supplied
By refer to https://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf, I tried to impute the missing value for tibble
format data frame but the system prompt me the date
origin error.
From a user email:
I am currently trying to use the overimputation feature of "Amelia" in conjunction with parallelized computation in the "snow" package. Unfortunately, my computations fail with error code 3 and message "The setting for the data argument doesn't exist." On the CRAN, you're listed as the maintainer, so I was hoping that you might be able to help me sort out my problem.
Some additional details: Following the examples in the documentation, I generate a "molist" with moPrep(), which I then pass to amelia(). When run without parallelization, this works perfectly fine, and amelia() returns the imputed data as expected. However, in parallel, amelia() appears unable to find the data set that is pointed to in the "molist" generated by moPrep(). There appears to be no difference between the "molist" I get when running with/without parallelization (i.e., both have a symbolic reference to the data in the $data slot). However, in parallel, amelia() can't find it, whereas it can when run without parallelization.
This is likely due to scoping issues with "eval" and parallel. We could try to specify the frame in the amelia.molist
function.
When testing a piece of software I am working on that uses Amelia I came across an error where R suddenly terminates. After some investigation it turns out that chol() throws a std::runtime_error when the matrix it tries to decompose is singular.
I have attached a file which causes the problem when running Amelia with the command
test <- amelia(temp_data, m = 5, ts = "dates", p2s = 0, idvars = NULL, cs = NULL, parallel="no", lags = colnames(temp_data)[-1])
I have also forked the repository to https://github.com/jonlachmann/Amelia where a fix is applied. Please let me know if you want me to create a pull request or if you find a better solution. My solution is quite simple, I have added try/catch blocks around the chol() calls, and on failure I let C++ return R_Nilvalue. This is then handled in R the same way as was already present when Amelia even before calling the C++ function was able to determine that the matrix was singular.
"overimpute()
This function temporarily treats each observed value in var as missing and imputes that value based on the imputation model of output. "
^^ Quote from R-Help.
The bounds parameter is used in amelia() for all NA values of a specific dataset-column.
But if the bounds parameter is set, it's not used in the function overimpute().
The problem in such a situation is, that it is possible to force amelia() in a specific way (for example to use only values between 40 and 50).
But if you use overimpute() nobody notice these boundarys.
Anyway: this behavior is missing in the documentation.
set.seed(1234)
x.out_overimpute_bug<-amelia(africa,cs=2,ts=1 ,bounds=rbind(c(5,40,50)) ,lags="infl" )
test<-overimpute(x.out_overimpute_bug,var=c(5))
test$lower.overimputed[c(100:115)]
test$mean.overimputed[c(100:115)]
test$upper.overimputed[c(100:115)]
^^ values are not between 40 and 50 ..
"tscsPlot()
Plots a time series for a given variable in a given cross-section and provides confidence intervals for the imputed values."
^^Quote from R help
My first thought was: great "tscsPlot" plots my imputed values and there is no need for an individual ggplot().
Just because the function get's as an input: the output of the imputation process based on amelia().
Normaly i expect in such a situation, that multiple calling of the same function (tscsPlot) generates equal output.
That's not the case. The output is not only based on the amelia() output.
Internal functions of amelia() and the random numbers are involved too.
The question ist, what ist the information gain (if the values always change) or is there a bug?
Actually the same result is only possible, if the random number generator is set every time calling.
First, a warning in the documentation about the behavior (random numbers).
Second, a warning in the documentation, that the (mean) output is not equal the imputed values of amelia().
set.seed(1234)
tcc<-amelia(africa,cs="country",ts="year")
set.seed(1234)
tscsPlot(output=tcc,cs="Cameroon",var="trade")
set.seed(4711)
tscsPlot(output=tcc,cs="Cameroon",var="trade")
tscsPlot(output=tcc,cs="Cameroon",var="trade",ylim=c(40,60))
tscsPlot(output=tcc,cs="Cameroon",var="trade",ylim=c(40,60))
tscsPlot(output=tcc,cs="Cameroon",var="trade",ylim=c(40,60))
tscsPlot(output=tcc,cs="Cameroon",var="trade",ylim=c(40,60))
tscsPlot(output=tcc,cs="Cameroon",var="trade",ylim=c(40,60))
tscsPlot(output=tcc,cs="Cameroon",var="trade",ylim=c(40,60))
Per an email on the amelia list from Jonathan Zadra, the following code doesn't work:
library(Amelia)
library(tibble)
data(africa)
africa <- as_tibble(africa)
a.out <- amelia(africa, ts = "year", cs = "country")
tscsPlot(a.out, cs = "Burundi", var = "trade")
We get the following error:
Error: Unsupported use of matrix or array for column indexing
deleted
MWE:
library(Amelia)
data(freetrade)
freetrade$signed <- ifelse(freetrade$signed, "yes", "no")
out <- amelia(freetrade, ts = "year", cs = "country", noms = "signed")
with error:
-- Imputation 1 --
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Error in yy %*% unique(na.omit(x.orig[, i])) : non-conformable arguments
The imputed values are different on OS X with R 3.5.0 than they are on other platforms or other R versions. To demonstrate:
library(Amelia)
data(africa)
set.seed(1)
a.out <- amelia(africa[ , c("infl", "trade")],
m = 1,
boot.type = "none")
saveRDS(a.out, "linux350.rds")
library(Amelia)
data(africa)
set.seed(1)
a.out <- amelia(africa[ , c("infl", "trade")],
m = 1,
boot.type = "none")
saveRDS(a.out, "mac344.rds")
library(Amelia)
data(africa)
set.seed(1)
a.out <- amelia(africa[ , c("infl", "trade")],
m = 1,
boot.type = "none")
saveRDS(a.out, "mac350.rds")
linux350 <- readRDS("linux350.rds")
mac344 <- readRDS("mac344.rds")
mac350 <- readRDS("mac350.rds")
all.equal(linux350, mac344)
## TRUE
all.equal(mac344, mac350)
## [1] "Component “imputations”: Component “imp1”: Component “trade”: Mean relative difference: 0.2454121"
This is a bit troubling since in some cases we would like to provide code that is exactly reproducible.
Add more options for spacing in the missmap() functions. Ideas include adding spacing or allowing rotation. Especially needed for x-axis.
Hello,
I cannot install Amelia in my R environment due to a compilation error:
/usr/bin/ld: cannot find -lgfortran
g++ -std=gnu++11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o Amelia.so em.o init.o -llapack -lblas -lgfortran -lm -lquadmath -L/usr/lib/R/lib -lR
/usr/bin/ld: cannot find -lgfortran
collect2: error: ld returned 1 exit status
/usr/share/R/share/make/shlib.mk:10: recipe for target 'Amelia.so' failed
make: *** [Amelia.so] Error 1
ERROR: compilation failed for package ‘Amelia’
* removing ‘/home/user/R/x86_64-pc-linux-gnu-library/4.0/Amelia’
The downloaded source packages are in
‘/tmp/RtmpUGAush/downloaded_packages’
Warning message:
In install.packages("Amelia") :
installation of package ‘Amelia’ had non-zero exit status
Altough I do have gcc, g++, and gfortran (all version 7.5.0) installed in my computer, the compilation process still fails.
Amelia should be able to use its output to impute a new dataset. Ideal for held out data.
After using Amelia the output structure includes a lot of informations.
The covMatrices and mu output-matrix include the corresponding fieldnames, only if the m parameter is set to one.
If you want to use these informations in the case of multiple imputations (m>1), it's necessary to call the function two times
(m=1 and m>1).
That's irritating.
names(amelia(x=africa,m=1,cs="country")$covMatrices[1,,])
names(amelia(x=africa,m=2,cs="country")$covMatrices[1,,])
When boot.type = "none"
, amelia()
will run the identical EM algorithm m
times. Instead, we should probably run EM once and then call amelia.impute()
multiple times.
The xlim
and ylim
values for the disperse
function are currently hardcoded. Users should be able to set them.
Sorry for the basic/stupid question but, during the installation, Amelia cannot find the R directory, and does not accept the right one (just reinstalled). I tried this on different PCs, and with Win10 and Win11 (see screenshot).
Surprisingly, I haven't found similar questions on the web or on Github, and thus suppose that I am the problem...
Any suggestion? Thanks a lot
AmeliaView includes a logfile, which is very helpful if you don't know the syntax of statements.
In the case of using the bounds parameter, the generated "call of amelia" is not working.
Set a few bounds and call amelia to get a logfile.
amelia(x = getAmelia("amelia.data"), m = 5, idvars = "country",
ts = "year", cs = NULL, priors = NULL, lags = "infl",
empri = 0, intercs = FALSE, leads = "population", splinetime = NULL,
logs = NULL, sqrts = NULL, lgstc = NULL, ords = NULL, noms = NULL,
bounds = c(3, 5, 5, 10, 10, 20), max.resample = 1000, tolerance = 1e-04)
Working in R it generates the following error:
Error in bounds[, 1] : incorrect number of dimensions
I think, the correct output is:
amelia(x = africa, m = 5, idvars = "country",
ts = NULL, cs = NULL, priors = NULL, lags = NULL, empri = 0,
intercs = FALSE, leads = NULL, splinetime = NULL, logs = NULL,
sqrts = NULL, lgstc = NULL, ords = "year", noms = NULL,
bounds = rbind(c(3,5,10),c(5,10,20))
, max.resample = 1000, tolerance = 1e-04)
I'm working on implementing a testthat
testing regime for Amelia, and I'm running into an issue with the moPrep
test. The test passes when I run it in the standard R console, but fails when I run the R CMD check
. Is there anything unique in the mo
methods that could potentially cause this behavior?
amcheck()
checks to see if there are any NA
values in any POSIXt
(date/time) variables in the dataset and throws an error if so, but it runs the check for all variables in the dataset, not just those not named in idvars
. It should ignore variables named in idvars
, as these will neither be imputed nor used in the imputation but may exist in the dataset as metadata. The simple fix is to exclude the idvars
variables in the check for POSIXt
variables.
Lines 941 to 947 in 4de6306
I'm using Amelia 1.7.1, R 3.0.1
by trying to plot the result of an amelia run i get the following error
Error in compare.density(output = x, var = which.vars[i], legend = FALSE, :
The 'var' option points to a non-existant column.
my test data is a 500 by 4 matrix with time in the first column and doubles in 2,3 and 4 columns (2,3,4 contain missing values)
require(Amelia)
require(R.matlab)
with_missings <- readMat("Artdata_wm_xt2_ut2.mat")
time_series = as.matrix(with_missings$simp)
time_series[is.nan(time_series)] <- NA
colnames(time_series) <- c("time","S2", "u_sin","u_lin")
a.out <- amelia(x = time_series, m=1, ts=1, lgstc=c("S2"), lags=c("S2","u_sin"), leads=c("S2","u_sin"), polytime = 3)
summary(a.out)
plot(a.out)
if i change the last line to
plot(a.out, which.vars <- c(2,3,4))
its working fine.
Calling plot.amelia() without specification of vars:
It seems that the problem is caused by the following line
numericVars <- sapply(x$imputations[[1]], "is.numeric")
changing it to
numericVars <- sapply(x$imputations$imp1[1,], "is.numeric")
or
numericVars <- sapply(x$imputations[[1]][1,], "is.numeric")
solved the problem.
Where you have all aggregate level data, eg data for a set of 100 studies, and partly missing summary statistics (such as a mean and standard decision for age), is there any specific advice for imputing the standard deviations?
A log transformation makes sense to me since the SD must be positive, but is there anything else we should think about?
I'm trying to plot overimpute in an own plot:
plot(amelia_result, var_names, overimpute=TRUE, compare=FALSE)
However your function set.mfrow is forcing mfrow to c(2,1) when overimpute=TRUE
I would propose to extend the if statement to check if both compare and overimpute is TRUE, before forcing mfrow to c(2,1)
set.mfrow <- function(nvars = 1, overimpute = FALSE) {
if (compare && overimpute) {
## If we are overimputing as well, we need
## two plots per variable
mfrow <- switch(min(nvars, 13),
c(2,1), ## 2 plot : 1x2
c(2,2), ## 4 plots: 2x2
c(3,2), ## 6 plots: 3x2
c(4,2), ## 8 plots: 4x2
c(3,2), ## 10 plots: 3x2
c(3,2), ## 12 plots: 3x2
c(4,2), ## 14 plots: 4x2
c(4,2), ## 16 plots: 4x2
c(4,2), ## 18 plots: 4x2
c(3,2), ## 20 plots: 3x2
c(3,2), ## 22 plots: 3x2
c(3,2), ## 24 plots: 3x2
c(4,2)) ## 26 plots: 4x2
} else {
mfrow <- switch(min(nvars, 13),
c(1,1), ## 1 plot : 1x1
c(2,1), ## 2 plots: 2x1
c(2,2), ## 3 plots: 2x2
c(2,2), ## 4 plots: 2x2
c(3,2), ## 5 plots: 3x2
c(3,2), ## 6 plots: 3x2
c(3,3), ## 7 plots: 3x3
c(3,3), ## 8 plots: 3x3
c(3,3), ## 9 plots: 3x3
c(3,2), ## 10 plots: 3x2
c(3,2), ## 11 plots: 3x2
c(3,2), ## 12 plots: 3x2
c(3,3)) ## 13 plots: 3x3
}
return(mfrow)
}
As a workaround for #21, I used purrr:map()
to run amelia()
with m=1
. This returns a list. I was hoping that I could use ameliabind()
to combine the list, but it seems it wants me to type out the names of the individual objects. dplyr::bind_rows()
is an example of a function that can combine datasets and takes either object names as separate arguments or a list. This might be a nice feature for ameliabind()
Using the documentation menu the used link is not showing the documentation (only a empty site).
Sourcecode:
label = "Documentation", command = function() browseURL("http://gking.harvard.edu/amelia/docs/"),
For example the link could be corrected to: http://r.iq.harvard.edu/docs/amelia/amelia.pdf
Btw. the document is about version 1.6.2.
ONLY !!! If using priors - parameter AND using a exchanged cs and ts dataset-position,
amelia generates in the following example "NAs" in cs-entries which are imputed.
Based on the documentation a fixed position of ts and cs in a dataset is not necessary.
The parameter cs= and ts= exists to make everything dynamical.
Using Version 1.8 a warning is thrown, but it's only a warning. There should be an error ;-)
africa_switch<-data.frame( country= africa$country,year= africa$year, gdp_pc= africa$gdp_pc ,
infl= africa$infl, trade= africa$trade, civlib= africa$civlib, population=africa$population)
imp_amelia<-amelia(africa_switch,ts=c("year"),cs=c("country"),m=2,logs=c("trade","population"),
lags=c("trade","population"),leads=c("trade","population"),
idvars=c("infl"),
polytime=2, boot.type="ordinary",
splinetime=1,
bound = rbind(c(5, 0, Inf)),
priors=matrix(c(35,5,100,95),nrow=1,ncol=4 ), p2s=1)
unique(imp_amelia$imputations$imp1$country)
[1] Burkina Faso NA Burundi Cameroon Congo Senegal Zambia
Levels: Burkina Faso Burundi Cameroon Congo Senegal Zambia
unique(africa_switch$country)
[1] Burkina Faso Burundi Cameroon Congo Senegal Zambia
Levels: Burkina Faso Burundi Cameroon Congo Senegal Zambia
(Greater and smaller symbols surrounding NA removed)
In other situations (more cases and priors) the cs value is overwritten with a unique value (1,2,3,4...) which could be identical with a real cs value.
It's not clear (for me) if there is an impact on the EM process and other parts in amelia() imputation.
The generated output is - in such a case - not very useful.
Change column position in datasets.
If each row of the data has its own pattern of missingness, there is an out of bounds error in the ameliaImpute() function in the C++ code. Here is the error:
error: Mat::operator(): index out of bounds libc++abi.dylib: terminating with uncaught exception of type std::logic_error: Mat::operator(): index out of bounds
Probably has to do with the looping over patterns and the last iteration of that loop.
Hi,
I have a dataset of around 2000 rows by 50ish columns that belong to around 170 cross-sectional units for a population. My call looks like this:
df_test <-
df[Cross.Sectional.ID %in% df[, pmax(response, na.rm = TRUE) > 100, by =
Cross.Sectional.ID][V1 == FALSE, Cross.Sectional.ID],]
start_time <- Sys.time()
df_amelia <- amelia(setDF(df_test[, c(-1,-2)])
, m = 1
, p2s = 2
, cs = "Cross.Sectional.ID"
, ts = "Time_Unit"
, ords = c("Ordinal.Variable.1", "Ordinal.Variable.2", "Ordinal.Variable.3")
)
end_time <- Sys.time()
Running this on my business laptop has been going for multiple days without completion. Oddly, R doesn't seem to be soaking up much of my processor or ram - processor usage seems to be absorbing only 30 percent of capacity, even when nothing else is running. Are there any common mistakes on a dataset this size that might cause Amelia to run interminably or break silently? How could I adjust my settings to speed things up?
RStudio crashes running Amelia on a 324000 rows x 17 cols dataframe at about 11GiB: "R session aborted. R encountered a fatal error. The session was terminated".
Would be nice with suggestions for (a) how to deal with memory issues or (b) features actually dealing with it. E.g. running on a distributed system or on a local database rather than in memory.
MI <- amelia(x = data,
m = 1,
p2s = 1,
idvars = c("index","var2"),
noms = c("var3","var4"),
ords = "var5",
ts = "dt",
cs = "var1",
empri = 0.05 * nrow(data),
polytime = 2,
intercs = TRUE,
bounds = matrix(c(11,0,1000, 12,0,1000, 13,0,1000), nrow = 3, ncol = 3, byrow = TRUE),
parallel = 'multicore',
ncpus = 8,
collect = TRUE)
Amelia has no problem running on subsets of the data i.e. one of 1-10 datasets indicated by var2.
Debian Bullseye
Hi, first of all I would like to thank you for a great package!
As you see in the picture below the missmap plot looks a bit strange:
Y-axis is covered
The plot ledgend "missing" and "observed" is covered
Secondly, my missing data in this example is located in observation x8: 2,6 and x13: 2,3,7 while looking at the plot it seems that its located in the end of the variable (around observation 120).
Further, it would be great to also present the percentage missing in the ledgend, such as missing 5%
To reproduce the plot, see data and code-snippet below.
R 3.3.2
Amelia 1.7.4
---------------Script below---------------------
dates <- c("2004-01-01","2004-02-01","2004-03-01","2004-04-01","2004-05-01","2004-06-01","2004-07-01","2004-08-01","2004-09-01","2004-10-01","2004-11-01","2004-12-01","2005-01-01","2005-02-01","2005-03-01","2005-04-01","2005-05-01","2005-06-01","2005-07-01","2005-08-01","2005-09-01","2005-10-01","2005-11-01","2005-12-01","2006-01-01","2006-02-01","2006-03-01","2006-04-01","2006-05-01","2006-06-01","2006-07-01","2006-08-01","2006-09-01","2006-10-01","2006-11-01","2006-12-01","2007-01-01","2007-02-01","2007-03-01","2007-04-01","2007-05-01","2007-06-01","2007-07-01","2007-08-01","2007-09-01","2007-10-01","2007-11-01","2007-12-01","2008-01-01","2008-02-01","2008-03-01","2008-04-01","2008-05-01","2008-06-01","2008-07-01","2008-08-01","2008-09-01","2008-10-01","2008-11-01","2008-12-01","2009-01-01","2009-02-01","2009-03-01","2009-04-01","2009-05-01","2009-06-01","2009-07-01","2009-08-01","2009-09-01","2009-10-01","2009-11-01","2009-12-01","2010-01-01","2010-02-01","2010-03-01","2010-04-01","2010-05-01","2010-06-01","2010-07-01","2010-08-01","2010-09-01","2010-10-01","2010-11-01","2010-12-01","2011-01-01","2011-02-01","2011-03-01","2011-04-01","2011-05-01","2011-06-01","2011-07-01","2011-08-01","2011-09-01","2011-10-01","2011-11-01","2011-12-01","2012-01-01","2012-02-01","2012-03-01","2012-04-01","2012-05-01","2012-06-01","2012-07-01","2012-08-01","2012-09-01","2012-10-01","2012-11-01","2012-12-01","2013-01-01","2013-02-01","2013-03-01","2013-04-01","2013-05-01","2013-06-01","2013-07-01","2013-08-01","2013-09-01","2013-10-01","2013-11-01","2013-12-01")
c0 <- c(33736.25,NA,35005.65,35640.35,36275.05,NA,37604.00,35536.00,37919.25,38211.00,39905.75,38832.75,36678.75,37647.75,41619.50,39772.00,34867.25,38081.75,37346.50,41084.00,40469.00,40494.25,45103.50,44942.25,49926.50,49098.25,55861.75,49798.50,60079.50,54494.25,52755.50,54108.50,51919.50,58384.00,59443.75,53449.75,61783.50,56632.25,60741.25,53469.25,58679.25,56215.50,60113.75,55327.25,47813.50,56163.75,55138.25,42860.50,53791.75,58305.75,57092.25,65094.00,58048.50,62106.75,70625.75,58003.75,57788.25,48779.00,37041.50,31290.50,29668.50,26596.25,29381.00,28410.00,27741.25,34613.25,38353.25,38667.75,40339.25,41320.00,40927.75,45773.50,44696.75,40971.50,50719.75,46328.00,38762.75,42482.50,43731.25,44469.75,47563.50,49267.50,51317.50,49352.00,48782.50,50155.00,58700.25,47921.25,51833.50,56210.75,52743.50,52627.50,50518.75,45608.75,45609.25,40429.50,45020.25,46274.50,48017.00,38877.50,44003.75,35805.50,41224.25,40429.75,41069.75,45422.00,42740.25,39639.75,44829.50,41062.75,38255.50,38981.00,38435.50,36320.00,40649.25,38103.50,36962.00,41676.00,36727.25,12062.25)
c1 <- c(50885.5,NA,NA,58949.00,51924.25,59090.50,NA,59753.00,63674.50,63230.50,68688.25,66035.00,63383.50,65055.75,70958.75,71269.00,64961.75,77509.00,75885.00,83534.50,84858.00,85241.75,93909.75,91522.00,99407.00,99633.25,117389.50,114951.75,168926.25,158309.50,161919.50,169263.50,159617.50,164985.50,154616.75,126790.50,124711.25,113504.50,141918.00,147539.75,161306.75,156962.00,175396.50,165245.25,152951.00,184169.25,153251.25,118557.25,155322.25,165626.25,160326.00,191048.25,167630.75,173460.25,193503.50,152676.25,159510.75,113272.75,74325.25,64498.75,67625.75,66280.75,82473.00,88117.25,86794.75,110290.00,119936.50,123294.50,136306.75,138322.25,140173.25,146597.25,147712.25,136953.75,171634.50,154887.75,129906.75,142970.50,148161.75,152943.75,169596.50,174128.75,186321.00,192080.00,191098.25,197343.50,219192.50,170692.25,178529.75,198992.50,201994.75,198898.00,182915.25,154289.25,166130.00,151343.25,168903.00,176862.50,186044.00,156918.75,174224.25,140976.00,166951.50,164822.00,161360.50,185588.75,169266.25,151279.75,177073.00,161400.25,153244.75,151262.25,151801.00,140074.25,158527.75,150819.50,150383.25,165332.75,148387.25,49426.25)
data <- data.frame(c0,c1)
colnames(data) <- c("X8","X13")
ind_category <- NULL
ind_group <- NULL
target <- 0
amelia_results <-amelia(data,m=5,ts='dates',p2s=0,idvars=ind_category,cs=ind_group,lags=colnames(data))
missingMapAmelia <- missmap(amelia_results, legend = TRUE)
missingMapData <- missmap(data[,-1])
set.seed()
does not seem to work when using parallel = "multicore"
. I assume that's because there's no way to pass the seed onto the parallel jobs. I'm not sure if this is a bug or simply a limitation of using parallel processing with Amelia.
library(Amelia)
#> Warning: package 'Amelia' was built under R version 4.0.2
#> Loading required package: Rcpp
#> ##
#> ## Amelia II: Multiple Imputation
#> ## (Version 1.7.6, built: 2019-11-24)
#> ## Copyright (C) 2005-2020 James Honaker, Gary King and Matthew Blackwell
#> ## Refer to http://gking.harvard.edu/amelia/ for more information
#> ##
library(parallel)
data(africa)
# Reproducible:
set.seed(123)
a.out1 <- amelia(x = africa, cs = "country", ts = "year", logs = "gdp_pc", p2s = 0)
set.seed(123)
a.out2 <- amelia(x = africa, cs = "country", ts = "year", logs = "gdp_pc", p2s = 0)
## original
africa[38:42, ]
#> year country gdp_pc infl trade civlib population
#> 38 1989 Burundi 532 11.66 NA 0.1666667 5330730
#> 39 1990 Burundi 550 7.00 NA 0.1666667 5487000
#> 40 1991 Burundi 560 9.00 38.42 0.1666667 5643320
#> 41 1972 Cameroon 815 8.09 46.48 0.5000000 6835870
#> 42 1973 Cameroon NA 10.38 NA 0.5000000 7021850
## run 1
a.out1$imputations[[1]][38:42, ]
#> year country gdp_pc infl trade civlib population
#> 38 1989 Burundi 532.000 11.66 34.01444 0.1666667 5330730
#> 39 1990 Burundi 550.000 7.00 28.77401 0.1666667 5487000
#> 40 1991 Burundi 560.000 9.00 38.42000 0.1666667 5643320
#> 41 1972 Cameroon 815.000 8.09 46.48000 0.5000000 6835870
#> 42 1973 Cameroon 1534.801 10.38 85.77617 0.5000000 7021850
## run 2
a.out2$imputations[[1]][38:42, ]
#> year country gdp_pc infl trade civlib population
#> 38 1989 Burundi 532.000 11.66 34.01444 0.1666667 5330730
#> 39 1990 Burundi 550.000 7.00 28.77401 0.1666667 5487000
#> 40 1991 Burundi 560.000 9.00 38.42000 0.1666667 5643320
#> 41 1972 Cameroon 815.000 8.09 46.48000 0.5000000 6835870
#> 42 1973 Cameroon 1534.801 10.38 85.77617 0.5000000 7021850
# Not Reproducible:
set.seed(123)
a.out1 <- amelia(x = africa, cs = "country", ts = "year", logs = "gdp_pc", p2s = 0, parallel = "multicore", ncpus = detectCores() - 1)
set.seed(123)
a.out2 <- amelia(x = africa, cs = "country", ts = "year", logs = "gdp_pc", p2s = 0, parallel = "multicore", ncpus = detectCores() - 1)
## run 1
a.out1$imputations[[1]][38:42, ]
#> year country gdp_pc infl trade civlib population
#> 38 1989 Burundi 532.000 11.66 41.76351 0.1666667 5330730
#> 39 1990 Burundi 550.000 7.00 64.16109 0.1666667 5487000
#> 40 1991 Burundi 560.000 9.00 38.42000 0.1666667 5643320
#> 41 1972 Cameroon 815.000 8.09 46.48000 0.5000000 6835870
#> 42 1973 Cameroon 871.101 10.38 64.33208 0.5000000 7021850
## run 2
a.out2$imputations[[1]][38:42, ]
#> year country gdp_pc infl trade civlib population
#> 38 1989 Burundi 532.0000 11.66 40.37939 0.1666667 5330730
#> 39 1990 Burundi 550.0000 7.00 25.26368 0.1666667 5487000
#> 40 1991 Burundi 560.0000 9.00 38.42000 0.1666667 5643320
#> 41 1972 Cameroon 815.0000 8.09 46.48000 0.5000000 6835870
#> 42 1973 Cameroon 570.4258 10.38 48.06267 0.5000000 7021850
Created on 2020-08-17 by the reprex package (v0.3.0)
In case you had not seen this StackOverflow error report from 2020:
I just ran into the same problem. The solution is simple. Just add this to convert x
to data frame if it is a tibble
:
if (inherits(x, "tbl_df")) {
x <- as.data.frame(x)
}
When function mi.combine()
outputs confidence intervals, lower and upper bounds are reversed, i.e., lower bound (conf.low
) is higher than upper bound (conf.high
).
Setting lower.tail=TRUE
when calculating critical value should fix this issue, as currently is negative value.
From a.out$imputations
, the last imputed dataset is the final result or do I need to average across all datasets?
Hello, I'm trying to figure out why I am suddenly receiving a fatal error while running amelia. I'm fairly new to this package, but when I used it originally (2 weeks ago), it was running. Now, I can't run a MI without R crashing. Any suggestions? I've already deleted and re-installed all of R on my laptop.
Here is the code that I am running:
library(lavaan)
library(readxl)
library(haven)
library(semTools)
set.seed(5)
library(naniar)
library(finalfit)
library(Amelia)
#load in new data set, adolescents only
trauma<- read_excel("C:/Users/PayneWinston/OneDrive - Newport Academy/Research Papers & Projects/Trauma Paper/adol_short.rev.xlsx")
View(trauma)
#multiple imputation
mi<-amelia(trauma, m=5, idvars = c("age","bothpar", "coerc","lsu",
"ECR_01M", "ECR_02M","ECR_03M","ECR_04M", "ECR_01F", "ECR_02F","ECR_03F","ECR_04F"),
ords =c("ECR_01Mr", "ECR_02Mr","ECR_03Mr","ECR_04Mr","ECR_05M","ECR_06M","ECR_07M","ECR_08M","ECR_09M",
"ECR_01Fr", "ECR_02Fr","ECR_03Fr","ECR_04Fr","ECR_05F","ECR_06F","ECR_07F","ECR_08F","ECR_09F"))
In AmeliaView it is possible to choose the splinetime= Parameter as "Splines" with knots from zero to ten.
Amelia is called with the parameter values from 0 to 10. But these parameters are used in a different definition (as seen below).
Short: The number of knots is limited to three and has to be translated into the values 4,5 and 6.
On the other hand, if you want a polynominal of time the values 1,2 and 3 are possible values.
"splinetime:
interger value of 0 or greater to control cubic smoothing splines of time. Values between 0 and 3 create a simple polynomial of time (identical to the polytime argument). Values k greater than 3 create a spline with an additional k-3 knotpoints."
Just use the entry 10 knots, which leads to an error.
Read the debug-log-file for amelia() call.
Normaly a simple typo is no problem, but in this case it's different.
Using AmeliaView() with ts without knowing this bug, leads into wrong documentation of the used functions and modell.
Conclusions from such work should be checked and reviewed.
Help file in R-"Amelia":
"Note that the theta, mu and covMatrcies[!] objects refers to the data as seen by the EM algorithm and is thusly centered, scaled, stacked, tranformed and rearranged. See the manual for details and how to access this information."
There is no link to the detail informations (centering, scaling, stacking...)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.