✔️ can be interpreted in terms of GEI indices
✔️ can be group-decomposed into within-inequality and between-inequality
✔️ does not need to (explicitly) choose inequality aversion parameters
❌ does not handle zero or negative incomes
❌ hard to interpret
❌ the decomposition interpretation is not exactly the same as the GEI, but very similar
The J-divergence measure (Rohde 2016Rohde, Nicholas. 2016. “J-Divergence Measurements of Economic Inequality.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 179 (3): 847–70. https://doi.org/10.1111/rssa.12153.) can be seen as the sum of \(GE_0\) and \(GE_1\), satisfying axioms that, individually, those two indices do not. The J-divergence measure is defined as:
\[ J = \frac{1}{N} \sum_{i \in U} \bigg( \frac{ y_i }{ \mu } -1 \bigg) \ln \bigg( \frac{y_i}{\mu} \bigg) \]
Since it is a sum of two additive decomposable measures, \(J\) itself is decomposable. The within and between parts of the J-divergence decomposition are:
\[ \begin{aligned} J_B &= \sum_{g \in G} \frac{N_g}{N} \bigg( \frac{ \mu_g }{ \mu } - 1 \bigg) \ln \bigg( \frac{\mu_g}{\mu} \bigg) \\ J_W &= \sum_{g \in G} \bigg[ \frac{Y_g}{Y} GE^{(1)}_g + \frac{N_g}{N} GE^{(0)}_g \bigg] = \sum_{g \in G} \bigg[ \bigg( \frac{Y_g N + YN_g}{YN} \bigg) J_g \bigg] - \sum_{g \in G} \bigg[ \frac{Y_g}{Y} GE^{(1)}_g + \frac{N_g}{N} GE^{(0)}_g \bigg] \end{aligned} \]
where the subscript \(g\) denotes one of the \(G\) subgroups in the population. The main difference with the additively decomposable Generalized Entropy family is that the “within” component of the J-divergence index cannot be seen as a weighted contribution of the J-divergence across groups. So \(J_W\) cannot be interpreted as a weighted mean of the J-divergence across groups, but as the sum of two terms: (1) the income-share weighted mean of the \(GE^{(1)}\) across groups; and (2) the population-share weighted mean of the \(GE^{(0)}\) across groups. In other words, the “within” component still accounts for the inequality in each group; it just does not have a direct interpretation.
First, we should check that the finite-population values make sense. The J-divergence can be seen as the sum of \(GE^{(0)}\) and \(GE^{(1)}\). So, taking the starting population from the svyzenga
section of this text, we have:
# compute finite population J-divergence
(jdivt.pop <-
(convey:::CalcJDiv(pop.df$x , ifelse(pop.df$x > 0 , 1 , 0))))
## [1] 0.4332649
# compute finite population GE indices
(gei0.pop <-
convey:::CalcGEI(pop.df$x , ifelse(pop.df$x > 0 , 1 , 0) , 0))
## [1] 0.2215037
## [1] 0.2117612
## [1] TRUE
As expected, the J-divergence matches the sum of GEs in the finite population. And as we’ve checked the GE measures before, the J-divergence computation function seems safe.
In order to assess the estimators implemented in svyjdiv
and svyjdivdec
, we can run a Monte Carlo experiment. Using the same samples we used in the svyzenga
replication example, we have:
# estimate J-divergence with each sample
jdiv.estimate.list <-
lapply(survey.list ,
function(this.sample) {
svyjdiv( ~ x ,
subset(this.sample , x > 0) ,
na.rm = TRUE)
})
# compute the (finite population overall) J-divergence
(theta.pop <-
convey:::CalcJDiv(pop.df$x , ifelse(pop.df$x > 0 , 1 , 0)))
## [1] 0.4332649
# estimate the expected value of the J-divergence estimator
# using the average of the estimates
(theta.exp <- mean(sapply(jdiv.estimate.list , coef)))
## [1] 0.4327823
## [1] -0.1113765
# estimate the variance of the J-divergence estimator
# using the variance of the estimates
(vartheta.popest <- var(sapply(jdiv.estimate.list , coef)))
## [1] 0.0005434848
# estimate the expected value of the J-divergence index variance estimator
# using the expected of the variance estimates
(vartheta.exp <- mean(sapply(jdiv.estimate.list , vcov)))
## [1] 0.0005342947
# estimate the percentage relative bias of the variance estimator
100 * (vartheta.exp / vartheta.popest - 1)
## [1] -1.690964
For the decomposition, we repeat the same procedure:
# estimate J-divergence decomposition with each sample
jdivdec.estimate.list <-
lapply(survey.list ,
function(this.sample) {
svyjdivdec( ~ x ,
~ SEX ,
subset(this.sample , x > 0) ,
na.rm = TRUE)
})
# compute the (finite population) J-divergence decomposition per sex
jdivt.pop <-
convey:::CalcJDiv(pop.df$x , ifelse(pop.df$x > 0 , 1 , 0))
overall.mean <- mean(pop.df$x[pop.df$x > 0])
group.mean <-
c(by(pop.df$x[pop.df$x > 0] , list("SEX" = factor(pop.df$SEX[pop.df$x > 0])) , FUN = mean))
group.pshare <-
c(prop.table(by(rep(1 , sum(
pop.df$x > 0
)) , list(
"SEX" = factor(pop.df$SEX[pop.df$x > 0])
) , FUN = sum)))
jdivb.pop <-
sum(group.pshare * (group.mean / overall.mean - 1) * log(group.mean / overall.mean))
jdivw.pop <- jdivt.pop - jdivb.pop
(theta.pop <- c(jdivt.pop , jdivw.pop , jdivb.pop))
## [1] 0.433264877 0.428398928 0.004865949
# estimate the expected value of the J-divergence decomposition estimator
# using the average of the estimates
(theta.exp <- rowMeans(sapply(jdivdec.estimate.list , coef)))
## total within between
## 0.432782322 0.427480257 0.005302065
## total within between
## -0.1113765 -0.2144430 8.9626164
The estimated PRB for the total is the same as before, so we will focus on the within and between components. While the within component has a small relative bias (-0.21%), the between component PRB is significant, amounting to 8.96%.
For the variance estimator, we do:
# estimate the variance of the J-divergence estimator
# using the variance of the estimates
(vartheta.popest <-
diag(var(t(
sapply(jdivdec.estimate.list , coef)
))))
## total within between
## 5.434848e-04 5.391901e-04 8.750879e-06
# estimate the expected value of the J-divergence index variance estimator
# using the expected of the variance estimates
(vartheta.exp <-
rowMeans(sapply(jdivdec.estimate.list , function(z)
diag(vcov(
z
)))))
## total within between
## 5.342947e-04 5.286750e-04 8.891772e-06
# estimate the percentage relative bias of the variance estimator
100 * (vartheta.exp / vartheta.popest - 1)
## total within between
## -1.690964 -1.950177 1.610034
The PRB of the variance estimators for both components are small: -1.95% for the within component and 1.61% for the between component.
How much should we care about the between component bias? Our simulations show that the Squared Bias of this estimator accounts for less than 2% of its Mean Squared Error:
theta.bias2 <- (theta.exp - theta.pop) ^ 2
theta.mse <- theta.bias2 + vartheta.popest
100 * (theta.bias2 / theta.mse)
## total within between
## 0.04282728 0.15627854 2.12723212
Next, we evaluate the Percentage Coverage Rate (PCR). In theory, under repeated sampling, the estimated 95% CIs should cover the population parameter 95% of the time. We can evaluate that using:
# estimate confidence intervals of the Zenga index
# for each of the samples
est.coverage <-
t(sapply(jdivdec.estimate.list, function(this.stat)
confint(this.stat)[, 1] <= theta.pop &
confint(this.stat)[, 2] >= theta.pop))
# evaluate
colMeans(est.coverage)
## total within between
## 0.9390 0.9376 0.9108
Our coverages are not too far from the nominal coverage level of 95%, however the bias of the between component estimator can affect its coverage rate.
For additional usage examples of svyjdiv
or svyjdivdec
, type ?convey::svyjdiv
or ?convey::svyjdivdec
in the R console.
This section displays example results using nationally-representative surveys from both the United States and Brazil. We present a variety of surveys, levels of analysis, and subpopulation breakouts to provide users with points of reference for the range of plausible values of the svyjdiv
function.
To understand the construction of each survey design object and respective variables of interest, please refer to section 1.4 for CPS-ASEC, section 1.5 for PNAD Contínua, and section 1.6 for SCF.
## j-divergence SE
## htotval 0.91009 0.0091
## sex htotval se.htotval
## male male 0.8468719 0.01304558
## female female 0.9601656 0.01229789
## j-divergence SE
## ftotval 0.78947 0.0097
## sex ftotval se.ftotval
## male male 0.7255059 0.0150296
## female female 0.8478025 0.0123227
## j-divergence SE
## pearnval 0.63767 0.01
## sex pearnval se.pearnval
## male male 0.6493827 0.01177377
## female female 0.5888245 0.01833507
svyjdiv(
~ deflated_per_capita_income ,
subset(pnadc_design , deflated_per_capita_income > 0),
na.rm = TRUE
)
## j-divergence SE
## deflated_per_capita_income 0.99732 0.0162
svyby(
~ deflated_per_capita_income ,
~ sex ,
subset(pnadc_design , deflated_per_capita_income > 0),
svyjdiv ,
na.rm = TRUE
)
## sex deflated_per_capita_income se.deflated_per_capita_income
## male male 1.0074672 0.01724635
## female female 0.9864209 0.01639167
svyjdiv(
~ deflated_labor_income ,
subset(pnadc_design , deflated_labor_income > 0) ,
na.rm = TRUE
)
## j-divergence SE
## deflated_labor_income 0.92486 0.0178
svyby(
~ deflated_labor_income ,
~ sex ,
subset(pnadc_design , deflated_labor_income > 0) ,
svyjdiv ,
na.rm = TRUE
)
## sex deflated_labor_income se.deflated_labor_income
## male male 0.9412711 0.02098598
## female female 0.8628107 0.01653023
## Warning in subset.svyimputationList(scf_design, networth > 0): subset differed
## between imputations
## Multiple imputation results:
## with(subset(scf_design, networth > 0), svyjdiv(~networth))
## scf_MIcombine(with(subset(scf_design, networth > 0), svyjdiv(~networth)))
## results se
## networth 3.755306 0.06761237
## Warning in subset.svyimputationList(scf_design, networth > 0): subset differed
## between imputations
## Multiple imputation results:
## with(subset(scf_design, networth > 0), svyby(~networth, ~hhsex,
## svyjdiv))
## scf_MIcombine(with(subset(scf_design, networth > 0), svyby(~networth,
## ~hhsex, svyjdiv)))
## results se
## male 3.604860 0.07501126
## female 3.311156 0.26160038
## Warning in subset.svyimputationList(scf_design, income > 0): subset differed
## between imputations
## Multiple imputation results:
## with(subset(scf_design, income > 0), svyjdiv(~income))
## scf_MIcombine(with(subset(scf_design, income > 0), svyjdiv(~income)))
## results se
## income 1.67673 0.1004542
## Warning in subset.svyimputationList(scf_design, income > 0): subset differed
## between imputations
## Multiple imputation results:
## with(subset(scf_design, income > 0), svyby(~income, ~hhsex, svyjdiv))
## scf_MIcombine(with(subset(scf_design, income > 0), svyby(~income,
## ~hhsex, svyjdiv)))
## results se
## male 1.6283323 0.10752174
## female 0.8483978 0.07380227