All major functions in the convey
library have S3 methods for the classes: survey.design
, svyrep.design
and DBIdesign
. When the argument design
is a survey design object with replicate weights created by the survey
library, convey
uses the method svrepdesign
.
Considering the remarks in Wolter (1985, 163Wolter, K. M. 1985. Introduction to Variance Estimation. New York: Springer-Verlag.), concerning the deficiency of the Jackknife
method in estimating the variance of quantiles
, we suggest using bootstrap methods. This is specified in the survey::svrepdesign
or survey::as.svrepdesign
functions, as we show in the next example. For comparison, the function bootVar
from the laeken
library (Alfons and Templ 2013Alfons, Andreas, and Matthias Templ. 2013. “Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken.” Journal of Statistical Software 54 (15): 1–25. http://www.jstatsoft.org/v54/i15/.), also uses the bootstrap method to estimate variances.
It is worth pointing out tha bootstrap methods rely on randomly resampling the dataset.
Therefore, users are likely to have slightly different variance estimates by chance.
For reproducibility, the user can set a random seed using set.seed()
before running survey::as.svrepdesign
.
# load libraries
library( survey )
library( convey )
library( laeken ) # for the dataset
# get laeken eusilc data
data( eusilc ) ; names( eusilc ) <- tolower( names( eusilc ) )
# survey design object for TSL/ifluence function variance estimation
des_eusilc <- svydesign( ids = ~rb030 , strata = ~db040 , weights = ~rb050 , data = eusilc )
des_eusilc <- convey_prep( des_eusilc )
# influence function SE estimate for the gini index
convey:::svygini.survey.design( ~eqincome , design = des_eusilc )
## gini SE
## eqincome 0.26497 0.0019
# create survey design object for replicate-based variance estimation
des_eusilc_rep <- as.svrepdesign( des_eusilc , type = "bootstrap" )
des_eusilc_rep <- convey_prep( des_eusilc_rep )
# replicate-based (bootstrao) SE estimate for the gini index
# with option to keep replicates
( gini.repstat <- convey:::svygini.svyrep.design( ~eqincome , design = des_eusilc_rep , return.replicates = TRUE ) )
## gini SE
## eqincome 0.26497 0.0022
To understand how that variance is estimated, we can look at the replicates:
## [1] 0.2628816 0.2653672 0.2663951 0.2646311 0.2660021 0.2648396 0.2633737
## [8] 0.2611656 0.2613233 0.2647174 0.2638256 0.2621616 0.2646830 0.2587950
## [15] 0.2642947 0.2651559 0.2663231 0.2673018 0.2687169 0.2671058 0.2654078
## [22] 0.2632859 0.2636763 0.2643964 0.2659183 0.2676759 0.2637824 0.2661465
## [29] 0.2652060 0.2635074 0.2678686 0.2655230 0.2614917 0.2636580 0.2670249
## [36] 0.2671674 0.2629817 0.2612323 0.2618645 0.2641601 0.2646633 0.2637810
## [43] 0.2631755 0.2639137 0.2618225 0.2598284 0.2670922 0.2672737 0.2625786
## [50] 0.2677954
## attr(,"scale")
## [1] 0.02042526
## attr(,"rscales")
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1
## attr(,"mse")
## [1] FALSE
These are resampling (bootstrap) replicates. With them, we can look at the variance of these replicates to get an estimate of the Gini index estimator’s variance:
# variance estimate
des.scale <- des_eusilc_rep$scale
meantheta <- mean( gini.reps )[[1]]
v <- sum( ( gini.reps - meantheta )^2 ) * des.scale
# SE estimate
( gini.se <- ( sqrt( v ) ) )
## [1] 0.002226009
## [1] TRUE