5.1 Richness Measures (svyrich)

✔️ focuses on the top -- i.e., the "rich"
✔️ can have an inequality or polarization interpretation
✔️ interesting complementary approach to income inequality or polarization
❌ hard to interpret as parameters change
❌ convex richness measures are severely affected by outliers, unreliable for inference
❌ requires a richness line definition
❌ hardly ever used

Peichl, Schaefer, and Scheicher (2010Peichl, Andreas, Thilo Schaefer, and Christoph Scheicher. 2010. Measuring Richness and Poverty: A micro data application to Europe and Germany.” Review of Income and Wealth 56 (3): 597–619. https://doi.org/https://doi.org/10.1111/j.1475-4991.2010.00404.x.) also presented a general class of richness measures, proposing three particular sub-classes: the (concave) Chakravarty class, the concave-FGT class (T1) and the convex-FGT class (T2), defined at the population level as:

\[ \begin{aligned} R^{Cha}_\gamma &= \frac{1}{N} \sum_{k \in U} \bigg[( 1 - \bigg( \frac{z_r}{y_k}\bigg)^\gamma \bigg] \delta( y_k \geq z_r ) , \quad \gamma > 0 \\ R^{FGT,T1}_\gamma &= \frac{1}{N} \sum_{k \in U} \bigg( \frac{y_k - z_r}{y_k} \bigg)^\gamma \delta( y_k \geq z_r ) , \quad \gamma \in [0,1) \\ R^{FGT,T2}_\gamma &= \frac{1}{N} \sum_{k \in U} \bigg( \frac{y_k - z_r}{z_r} \bigg)^\gamma \delta( y_k \geq z_r ) , \quad \gamma > 1 \\ \end{aligned} \] where \(z_r\) is the richness threshold and \(\gamma\) is a sensitivity parameter.

To estimate these measures, Brzezinski (2014Brzezinski, Michal. 2014. “Statistical Inference for Richness Measures.” Applied Economics 46 (14): 1599–1608. https://doi.org/10.1080/00036846.2014.880106.) proposed the estimators

\[ \begin{aligned} \widehat{R}^{Cha}_\gamma &= \frac{1}{\widehat{N}} \sum_{k \in s} w_k \bigg[( 1 - \bigg( \frac{z_r}{y_k}\bigg)^\gamma \bigg] \delta( y_k \geq z_r ) , \quad \gamma > 0 \\ \widehat{R}^{FGT,T1}_\gamma &= \frac{1}{\widehat{N}} \sum_{k \in s} w_k \bigg( \frac{y_k - z_r}{y_k} \bigg)^\gamma \delta( y_k \geq z_r ) , \quad \gamma \in [0,1) \\ \widehat{R}^{FGT,T2}_\gamma &= \frac{1}{\widehat{N}} \sum_{k \in s} w_k \bigg( \frac{y_k - z_r}{z_r} \bigg)^\gamma \delta( y_k \geq z_r ) , \quad \gamma > 1 \\ \end{aligned} \] where \(w_k\) is the sampling (or calibration) weight.

In order to estimate the variance of these estimators, Brzezinski (2014Brzezinski, Michal. 2014. “Statistical Inference for Richness Measures.” Applied Economics 46 (14): 1599–1608. https://doi.org/10.1080/00036846.2014.880106.) derived influence functions under the Deville (1999Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” Survey Methodology 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf.) approach. These functions are the ones used in the svyrich function.

Brzezinski (2014Brzezinski, Michal. 2014. “Statistical Inference for Richness Measures.” Applied Economics 46 (14): 1599–1608. https://doi.org/10.1080/00036846.2014.880106.) also studied the reliability of the inference based on these estimators using a model-based Monte Carlo simulation, which are also valid for (design-based) simple random sampling with replacement. His results showed that inferences for convex richness measures are unreliable. The (convex) FGT-T2 estimator is highly sensitive to outliers, and confidence intervals are invalid (i.e., their actual coverage is much smaller than the nominal level). The vignette below shows a design-based simulation reproducing the same conclusions.


5.1.1 Monte Carlo Simulation

Brzezinski (2014Brzezinski, Michal. 2014. “Statistical Inference for Richness Measures.” Applied Economics 46 (14): 1599–1608. https://doi.org/10.1080/00036846.2014.880106.) presented results using a model-based Monte Carlo — i.e., he simulated several samples from the model and compared the behaviour of the estimators with the superpopulation model parameters.

In the simulation below, we take a design-based approach: we take several samples from a fixed finite population using a particular sampling design, compute the estimator for each sample and compare them to the finite population parameter (not the superpopulation model parameter!).

For the sake of similarity, we start by simulating a large finite population (\(N = 10^5\)) using the same distribution from Brzezinski (2014Brzezinski, Michal. 2014. “Statistical Inference for Richness Measures.” Applied Economics 46 (14): 1599–1608. https://doi.org/10.1080/00036846.2014.880106.):

# load libraries
library(survey)
library(convey)
library(sampling)
library(VGAM)

# set random seed
set.seed(2023)

# superpopulation parameters
scale.x0 <- 1
shape.theta <- 2
cha.beta <- 2
fgt.alpha <- 1.5
n.pop <- as.integer(10 ^ 5)

# generate finite population
pop.df <-
  data.frame(y1 = rparetoI(n.pop  ,  scale.x0 , shape.theta))

Then, we compute the finite population parameters using the simulated population:

# richness measures: finite population parameters
cha.scores <-
  function(y , b , rho)
    ifelse(y > rho , (1 - (rho / y) ^ b) , 0)
fgtt2.scores <-
  function(y , g , rho)
    ifelse(y > rho , (y / rho - 1) ^ a , 0)
median.fp <- quantile(pop.df$y1 , .50)
rho.fp <- 3 * median.fp
rHC.fp <- mean(pop.df$y1 > rho.fp)
rCha.fp <- mean(cha.scores(pop.df$y1  , cha.beta , rho.fp))
rFGTT2.fp <- mean(cha.scores(pop.df$y1  , fgt.alpha , rho.fp))

For our sampling design, we select \(n = 1000\) units using multinomial sampling, with the variable x.aux as the size variable for the selection probabilities:

# define sample size
n.sample <- 1000L

# selection probability
pop.df$x.aux <- plogis( pop.df$y1 ) / 1.1
pop.df$pi1 <- sampling::inclusionprobabilities( pop.df$x.aux , n.sample )

We run the procedure 5000 times and store the estimate objects using the code below:

# define the number of simulation runs
mc.reps <- 5000L

# simulation runs
rep.list <- lapply(seq_len(mc.reps) , function(this.iter) {
  # multinomial sampling
  this.sample <- sampling::UPmultinomial(pop.df$pi1)
  this.sample <- rep(1:n.pop , this.sample)
  sample.df <- pop.df[this.sample ,]
  sample.df$weights <- 1 / sample.df$pi1
  des.obj <-
    svydesign(
      ids = ~ 1 ,
      weights = ~ weights ,
      data = sample.df ,
      nest = FALSE
    )
  
  # run estimation
  des.obj <- convey_prep(des.obj)
  rCha.hat <-
    svyrich(
      ~ y1 ,
      des.obj ,
      type_measure = "Cha" ,
      g = cha.beta ,
      type_thresh = "relq" ,
      percent = 3
    )
  suppressWarnings(
    rHC.hat <-
      svyrich(
        ~ y1 ,
        des.obj ,
        type_measure = "FGTT1" ,
        g = 0 ,
        type_thresh = "relq"  ,
        percent = 3
      )
  )
  suppressWarnings(
    rFGTT2.hat <-
      svyrich(
        ~ y1 ,
        des.obj ,
        type_measure = "FGTT2" ,
        g = fgt.alpha ,
        type_thresh = "relq"  ,
        percent = 3
      )
  )
  est.list <- list(rHC.hat , rCha.hat , rFGTT2.hat)
  est.list
  
})

To study the behaviour of the estimators, we estimate their expected values, empirical variance (for the main parameter) and mean squared error. To study the validity of the normal approximation, we also estimate the percent coverage rate of the nominal 95% confidence interval. This is done using the function below:

sim.compile <- function(ll ,
                        pv ,
                        level = .95 ,
                        na.rm = FALSE) {
  # collect estimates
  mhat.vec <- sapply(ll , coef)
  vhat.vec <- sapply(ll , vcov)
  
  # estimate expected value
  mhat.exp <- mean(mhat.vec , na.rm = na.rm)
  vhat.exp <- mean(vhat.vec , na.rm = na.rm)
  
  # calculate empirical variance
  mhat.empvar <- var(mhat.vec , na.rm = na.rm)
  
  # estimate squared bias
  mhat.bias2 <- (mhat.exp - pv) ^ 2
  vhat.bias2 <- (vhat.exp - mhat.empvar) ^ 2
  
  # estimate mse
  mhat.mse <- mhat.bias2 + mhat.empvar
  
  # estimate coverage rate
  ci.hats <- t(sapply(ll , confint))
  ci.check <-
    matrix(as.logical(NA) , nrow = nrow(ci.hats) , ncol = 3)
  ci.check[, 1] <- ci.hats[, 1] <= pv
  ci.check[, 2] <- ci.hats[, 2] >= pv
  ci.check[, 3] <- apply(ci.check[, 1:2] , 1 , all)
  pcr.emp <- mean(ci.check[, 3] , na.rm = na.rm)
  
  # setup final table
  data.frame(
    "mc.reps" = length(ll) ,
    "theta" = pv ,
    "theta.hat" = mhat.exp ,
    "theta.bias2" = mhat.bias2 ,
    "theta.empvar" = mhat.empvar ,
    "theta.hat.mse" = mhat.mse ,
    "theta.varhat" = vhat.exp ,
    "pcr" = pcr.emp
  )
}

For the Headcount Richness Ratio (computed using the concave FGT measure), we have:

rhc.list <- lapply(rep.list , `[[` , 1)
sim.compile(rhc.list, rHC.fp)
##   mc.reps   theta  theta.hat  theta.bias2 theta.empvar theta.hat.mse
## 1    5000 0.05469 0.05452618 2.683797e-08 4.112616e-05    4.1153e-05
##   theta.varhat    pcr
## 1 4.336252e-05 0.9524
stopifnot(round(
  sim.compile(rhc.list, rHC.fp)["theta.bias2"] / sim.compile(rhc.list, rHC.fp)["theta.hat.mse"] ,
  4
) == 0.0007)

stopifnot(round(
  sim.compile(rhc.list, rHC.fp)["theta.varhat"] / sim.compile(rhc.list, rHC.fp)["theta.empvar"] ,
  2
) == 1.05)

stopifnot(round(sim.compile(rhc.list, rHC.fp)["pcr"] , 4) == 0.9524)

Under this approach, the squared bias accounts for approx. 0.07% of the MSE, indicating that the MSE of the estimator is reasonably approximated by its variance. Additionally, the ratio between the expected value of the variance estimator and the empirical variance is approx. 1.05, indicating that the variance estimates are expected to be a (slightly conservative, but) good approximation of the empirical variance. Finally, the estimated percent coverage rate of 95.24% is close to the nominal level of 95%, indicating that the confidence intervals are approximately valid.

For the Chakravarty measure, we have:

rcha.list <- lapply(rep.list , `[[` , 2)
sim.compile(rcha.list, rCha.fp)
##   mc.reps      theta  theta.hat  theta.bias2 theta.empvar theta.hat.mse
## 1    5000 0.02743915 0.02737177 4.538966e-09 1.428639e-05  1.429093e-05
##   theta.varhat    pcr
## 1 1.402416e-05 0.9428
stopifnot(round(
  sim.compile(rcha.list, rCha.fp)["theta.bias2"] / sim.compile(rcha.list, rCha.fp)["theta.hat.mse"] ,
  4
) == 0.0003)

stopifnot(round(
  sim.compile(rcha.list, rCha.fp)["theta.varhat"] / sim.compile(rcha.list, rCha.fp)["theta.empvar"] ,
  2
) == 0.98)

stopifnot(round(sim.compile(rcha.list, rCha.fp)["pcr"] , 4) == 0.9428)

Under this approach, the squared bias is approx 0.03% of the MSE, indicating that the MSE of the estimator is reasonably approximated by its variance. Additionally, the ratio between the expected value of the variance estimator and the empirical variance is approx. 0.98, indicating that the variance estimates are expected to be a good approximation of the empirical variance. Finally, the estimated percent coverage rate of 94.28% is close to the nominal level of 95%, indicating that the confidence intervals are approximately valid.

For the convex FGT measure, we have:

rfgtt2.list <- lapply(rep.list , `[[` , 3)
sim.compile(rfgtt2.list, rFGTT2.fp)
##   mc.reps      theta theta.hat theta.bias2 theta.empvar theta.hat.mse
## 1    5000 0.02349189 0.1045973  0.00657808   0.01865219    0.02523027
##   theta.varhat    pcr
## 1   0.01857144 0.5212
stopifnot(round(
  sim.compile(rfgtt2.list, rFGTT2.fp)["theta.bias2"] / sim.compile(rfgtt2.list, rFGTT2.fp)["theta.hat.mse"] ,
  2
) == 0.26)

stopifnot(round(
  sim.compile(rfgtt2.list, rFGTT2.fp)["theta.varhat"] / sim.compile(rfgtt2.list, rFGTT2.fp)["theta.empvar"] ,
  2
) == 1)

stopifnot(round(sim.compile(rfgtt2.list, rFGTT2.fp)["pcr"] , 4) == 0.5212)

Under this approach, the squared bias is approx 26% of the MSE, indicating that the bias is substantial. The ratio between the expected value of the variance estimator and the empirical variance is approx. 1.00, indicating that the variance estimates are expected to be a good approximation of the empirical variance (but not of the MSE!). Finally, the estimated percent coverage rate of 52.12% is far from the nominal level of 95%, indicating that the confidence intervals are invalid. This comes from the fact that the estimator is very sensitive to the extreme values and is the reason why Brzezinski (2014Brzezinski, Michal. 2014. “Statistical Inference for Richness Measures.” Applied Economics 46 (14): 1599–1608. https://doi.org/10.1080/00036846.2014.880106.) does not recommend using convex richness measures.

For additional usage examples of svyrich, type ?convey::svyrich in the R console.

5.1.2 Real World Examples

This section displays example results using nationally-representative surveys from both the United States and Brazil. We present a variety of surveys, levels of analysis, and subpopulation breakouts to provide users with points of reference for the range of plausible values of the svyrich function.

To understand the construction of each survey design object and respective variables of interest, please refer to section 1.4 for CPS-ASEC, section 1.5 for PNAD Contínua, and section 1.6 for SCF.

5.1.2.1 CPS-ASEC Household Income

# richness gap index, richness threshold equal to the median
svyrich(
  ~ htotval ,
  cps_household_design ,
  type_measure = "Cha" ,
  g = 1 ,
  type_thresh = "relq"
)
##         Cha-1-richness measure     SE
## htotval                0.11926 0.0017
svyby(
  ~ htotval ,
  ~ sex ,
  cps_household_design ,
  svyrich ,
  type_measure = "Cha" ,
  g = 1 ,
  type_thresh = "relq"
)
##           sex   htotval          se
## male     male 0.1384483 0.002267358
## female female 0.0998675 0.001604764

5.1.2.2 CPS-ASEC Family Income

# richness gap index, richness threshold equal to the median
svyrich(
  ~ ftotval ,
  cps_family_design ,
  type_measure = "Cha" ,
  g = 1 ,
  type_thresh = "relq"
)
##         Cha-1-richness measure     SE
## ftotval                0.13662 0.0014
svyby(
  ~ ftotval ,
  ~ sex ,
  cps_family_design ,
  svyrich ,
  type_measure = "Cha" ,
  g = 1 ,
  type_thresh = "relq"
)
##           sex   ftotval          se
## male     male 0.1550754 0.002071117
## female female 0.1170849 0.001642368

5.1.2.3 CPS-ASEC Worker Earnings

# richness gap index, richness threshold equal to the median
svyrich(
  ~ pearnval ,
  cps_ftfy_worker_design ,
  type_measure = "Cha" ,
  g = 1 ,
  type_thresh = "relq"
)
##          Cha-1-richness measure     SE
## pearnval               0.085477 0.0017
svyby(
  ~ pearnval ,
  ~ sex ,
  cps_ftfy_worker_design ,
  svyrich ,
  type_measure = "Cha" ,
  g = 1 ,
  type_thresh = "relq"
)
##           sex   pearnval          se
## male     male 0.10498563 0.002103013
## female female 0.06013727 0.001630926

5.1.2.4 PNAD Contínua Per Capita Income

# richness gap index, richness threshold equal to the median
svyrich(
  ~ deflated_per_capita_income ,
  pnadc_design ,
  na.rm = TRUE ,
  type_measure = "Cha" ,
  g = 1 ,
  type_thresh = "relq"
)
##                            Cha-1-richness measure     SE
## deflated_per_capita_income                0.12399 0.0013
svyby(
  ~ deflated_per_capita_income ,
  ~ sex ,
  pnadc_design ,
  svyrich ,
  na.rm = TRUE ,
  type_measure = "Cha" ,
  g = 1 ,
  type_thresh = "relq"
)
##           sex deflated_per_capita_income          se
## male     male                  0.1271896 0.001330992
## female female                  0.1209397 0.001431801

5.1.2.5 PNAD Contínua Worker Earnings

# richness gap index, richness threshold equal to the median
svyrich(
  ~ deflated_labor_income ,
  pnadc_design ,
  na.rm = TRUE ,
  type_measure = "Cha" ,
  g = 1 ,
  type_thresh = "relq"
)
##                       Cha-1-richness measure     SE
## deflated_labor_income                 0.1247 0.0015
svyby(
  ~ deflated_labor_income ,
  ~ sex ,
  pnadc_design ,
  svyrich ,
  na.rm = TRUE ,
  type_measure = "Cha" ,
  g = 1 ,
  type_thresh = "relq"
)
##           sex deflated_labor_income          se
## male     male             0.1405740 0.001669085
## female female             0.1031143 0.001636094

5.1.2.6 SCF Family Net Worth

# richness gap index, richness threshold equal to the median
scf_MIcombine(with(
  scf_design ,
  svyrich(
    ~ networth ,
    type_measure = "Cha" ,
    g = 1 ,
    type_thresh = "relq"
  )
))
## Multiple imputation results:
##       with(scf_design, svyrich(~networth, type_measure = "Cha", g = 1, 
##     type_thresh = "relq"))
##       scf_MIcombine(with(scf_design, svyrich(~networth, type_measure = "Cha", 
##     g = 1, type_thresh = "relq")))
##            results          se
## networth 0.2516477 0.005687832
scf_MIcombine(with(
  scf_design ,
  svyby(
    ~ networth,
    ~ hhsex ,
    svyrich ,
    type_measure = "Cha" ,
    g = 1 ,
    type_thresh = "relq"
  )
))
## Multiple imputation results:
##       with(scf_design, svyby(~networth, ~hhsex, svyrich, type_measure = "Cha", 
##     g = 1, type_thresh = "relq"))
##       scf_MIcombine(with(scf_design, svyby(~networth, ~hhsex, svyrich, 
##     type_measure = "Cha", g = 1, type_thresh = "relq")))
##          results          se
## male   0.2979309 0.007576695
## female 0.1318583 0.007329980

5.1.2.7 SCF Family Income

# richness gap index, richness threshold equal to the median
scf_MIcombine(with(
  scf_design ,
  svyrich(
    ~ income ,
    type_measure = "Cha" ,
    g = 1 ,
    type_thresh = "relq"
  )
))
## Multiple imputation results:
##       with(scf_design, svyrich(~income, type_measure = "Cha", g = 1, 
##     type_thresh = "relq"))
##       scf_MIcombine(with(scf_design, svyrich(~income, type_measure = "Cha", 
##     g = 1, type_thresh = "relq")))
##         results          se
## income 0.139043 0.003921955
scf_MIcombine(with(
  scf_design ,
  svyby(
    ~ income,
    ~ hhsex ,
    svyrich ,
    type_measure = "Cha" ,
    g = 1 ,
    type_thresh = "relq"
  )
))
## Multiple imputation results:
##       with(scf_design, svyby(~income, ~hhsex, svyrich, type_measure = "Cha", 
##     g = 1, type_thresh = "relq"))
##       scf_MIcombine(with(scf_design, svyby(~income, ~hhsex, svyrich, 
##     type_measure = "Cha", g = 1, type_thresh = "relq")))
##           results          se
## male   0.17971360 0.005720753
## female 0.03378472 0.003350917