✔️ flexible inequality-aversion parameter -- varying its epsilon parameter can highlight the effect of inequality in different parts of the income distribution
✔️ can be group-decomposed into within-inequality and between-inequality
✔️ this parameter can also be (somewhat) tuned to be less affected by outliers
❌ does not handle zero or negative incomes
❌ hard to interpret
❌ can be very sensitive to outliers
Using a generalization of the information function, now defined as:
\[ g(f) = \frac{1}{\alpha-1} [ 1 - f^{\alpha - 1} ] \]
the \(\alpha\)-class entropy is:
\[ H^{(\alpha)} (f) = \frac{1}{\alpha - 1} \bigg[ 1 - \int_{-\infty}^{\infty} f(y)^{ \alpha - 1} f(y) dy \bigg] \text{.} \]
This relates to a class of inequality measures, the Generalized entropy indices, defined as:
\[ GE^{(\alpha)} = \frac{1}{\alpha^2 - \alpha} \int_{0}^\infty \bigg[ \bigg( \frac{y}{\mu} \bigg)^\alpha - 1 \bigg]dF(x) = - \frac{-H_\alpha(s) }{ \alpha } \text{.} \]
The parameter \(\alpha\) also has an economic interpretation: as \(\alpha\) increases, the influence of high incomes upon the index increases. In some cases, this measure takes special forms, such as the mean log deviation and the aforementioned Theil-T index.
Biewen and Jenkins (2003Biewen, Martin, and Stephen Jenkins. 2003. “Estimation of Generalized Entropy and Atkinson Inequality Indices from Complex Survey Data.” Discussion Papers of DIW Berlin 345. DIW Berlin, German Institute for Economic Research. http://EconPapers.repec.org/RePEc:diw:diwwpp:dp345.) use the following finite-population as the basis for a plugin estimator:
\[ GE^{(\alpha)} = \begin{cases} ( \alpha^2 - \alpha)^{-1} \big[ U_0^{\alpha - 1} U_1^{-\alpha} U_\alpha -1 \big], & \text{if } \alpha \in \mathbb{R} \setminus \{0,1\} \\ - T_0 U_0^{-1} + \log ( U_1 / U_0 ), &\text{if } \alpha \rightarrow 0 \\ T_1 U_1^{-1} - \log ( U_1 / U_0 ), & \text{if } \alpha \rightarrow 1 \end{cases} \]
where \(U_\gamma = \sum_{i \in U} y_i^\gamma\) and \(T_\gamma = \sum_{i \in U} y_i^\gamma \log y_i\). Since those are all functions of totals, the linearization of the indices are easily achieved using the theorems described in Deville (1999Deville, Jean-Claude. 1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.” Survey Methodology 25 (2): 193–203. http://www.statcan.gc.ca/pub/12-001-x/1999002/article/4882-eng.pdf.).
This class of inequality measure also has several desirable properties, such as additive decomposition. Additive decomposition allows researchers to compare the effects of inequality within and between population groups on the population’s level of inequality. Put simply, taking \(G\) groups, an additive decomposable index allows for:
\[ \begin{aligned} I ( \mathbf{y} ) &= I_{Within} + I_{Between} \\ \end{aligned} \]
where \(I_{Within} = \sum_{g \in G} W_g I( \mathbf{y}_g )\), with \(W_g\) being measure-specific group weights; and \(I_{Between}\) is a function of the group means and population sizes.
In July 2006, Jenkins (2008Jenkins, Stephen. 2008. “Estimation and Interpretation of Measures of Inequality, Poverty, and Social Welfare Using Stata.” North American Stata Users' Group Meetings 2006. Stata Users Group. http://EconPapers.repec.org/RePEc:boc:asug06:16.) presented at the North American Stata Users’ Group Meetings on the stata Generalized Entropy Index command. The example below reproduces those statistics.
Load and prepare the same data set:
# load the convey package
library(convey)
# load the survey library
library(survey)
# load the foreign library
library(foreign)
# create a temporary file on the local disk
tf <- tempfile()
# store the location of the presentation file
presentation_zip <-
"https://web.archive.org/web/20150928053959/http://repec.org/nasug2006/nasug2006_jenkins.zip"
# download jenkins' presentation to the temporary file
download.file(presentation_zip , tf , mode = 'wb')
# unzip the contents of the archive
presentation_files <- unzip(tf , exdir = tempdir())
# load the institute for fiscal studies' 1981, 1985, and 1991 data.frame objects
x81 <-
read.dta(grep("ifs81" , presentation_files , value = TRUE))
x85 <-
read.dta(grep("ifs85" , presentation_files , value = TRUE))
x91 <-
read.dta(grep("ifs91" , presentation_files , value = TRUE))
# stack each of these three years of data into a single data.frame
x <- rbind(x81 , x85 , x91)
Replicate the author’s survey design statement from stata code..
. * account for clustering within HHs
. version 8: svyset [pweight = wgt], psu(hrn)
pweight is wgt
psu is hrn
construct an
.. into R code:
# initiate a linearized survey design object
y <- svydesign( ~ hrn , data = x , weights = ~ wgt)
# immediately run the `convey_prep` function on the survey design
z <- convey_prep(y)
Replicate the author’s subset statement and each of his svygei results..
. svygei x if year == 1981
Warning: x has 20 values = 0. Not used in calculations
Complex survey estimates of Generalized Entropy inequality indices
pweight: wgt Number of obs = 9752
Strata: <one> Number of strata = 1
PSU: hrn Number of PSUs = 7459
Population size = 54766261
---------------------------------------------------------------------------
Index | Estimate Std. Err. z P>|z| [95% Conf. Interval]
---------+-----------------------------------------------------------------
GE(-1) | .1902062 .02474921 7.69 0.000 .1416987 .2387138
MLD | .1142851 .00275138 41.54 0.000 .1088925 .1196777
Theil | .1116923 .00226489 49.31 0.000 .1072532 .1161314
GE(2) | .128793 .00330774 38.94 0.000 .1223099 .135276
GE(3) | .1739994 .00662015 26.28 0.000 .1610242 .1869747
---------------------------------------------------------------------------
..using R code:
## gei SE
## eybhc0 0.19021 0.0247
## gei SE
## eybhc0 0.11429 0.0028
## gei SE
## eybhc0 0.11169 0.0023
## gei SE
## eybhc0 0.12879 0.0033
## gei SE
## eybhc0 0.174 0.0066
Confirm this replication applies for subsetted objects as well. Compare stata output..
. svygei x if year == 1985 & x >= 1
Complex survey estimates of Generalized Entropy inequality indices
pweight: wgt Number of obs = 8969
Strata: <one> Number of strata = 1
PSU: hrn Number of PSUs = 6950
Population size = 55042871
---------------------------------------------------------------------------
Index | Estimate Std. Err. z P>|z| [95% Conf. Interval]
---------+-----------------------------------------------------------------
GE(-1) | .1602358 .00936931 17.10 0.000 .1418723 .1785993
MLD | .127616 .00332187 38.42 0.000 .1211052 .1341267
Theil | .1337177 .00406302 32.91 0.000 .1257543 .141681
GE(2) | .1676393 .00730057 22.96 0.000 .1533304 .1819481
GE(3) | .2609507 .01850689 14.10 0.000 .2246779 .2972235
---------------------------------------------------------------------------
..to R code:
## gei SE
## eybhc0 0.16024 0.0094
## gei SE
## eybhc0 0.12762 0.0033
## gei SE
## eybhc0 0.13372 0.0041
## gei SE
## eybhc0 0.16764 0.0073
## gei SE
## eybhc0 0.26095 0.0185
Replicate the author’s decomposition by population subgroup (work status) shown on PDF page 57..
# define work status (PDF page 22)
z <-
update(z , wkstatus = c(1 , 1 , 1 , 1 , 2 , 3 , 2 , 2)[as.numeric(esbu)])
z <-
update(z , wkstatus = factor(wkstatus , labels = c("1+ ft working" , "no ft working" , "elderly")))
# subset to 1991 and remove records with zero income
z91 <- subset(z , year == 1991 & eybhc0 > 0)
# population share
svymean( ~ wkstatus, z91)
## mean SE
## wkstatus1+ ft working 0.61724 0.0067
## wkstatusno ft working 0.20607 0.0059
## wkstatuselderly 0.17669 0.0046
## wkstatus eybhc0 se
## 1+ ft working 1+ ft working 278.8040 3.703790
## no ft working no ft working 151.6317 3.153968
## elderly elderly 176.6045 4.661740
## wkstatus eybhc0 se
## 1+ ft working 1+ ft working 0.2300708 0.02853959
## no ft working no ft working 10.9231761 10.65482557
## elderly elderly 0.1932164 0.02571991
## wkstatus eybhc0 se
## 1+ ft working 1+ ft working 0.1536921 0.006955506
## no ft working no ft working 0.1836835 0.014740510
## elderly elderly 0.1653658 0.016409770
## wkstatus eybhc0 se
## 1+ ft working 1+ ft working 0.1598558 0.008327994
## no ft working no ft working 0.1889909 0.016766120
## elderly elderly 0.2023862 0.027787224
## wkstatus eybhc0 se
## 1+ ft working 1+ ft working 0.2130664 0.01546521
## no ft working no ft working 0.2846345 0.06016394
## elderly elderly 0.3465088 0.07362898
## gei decomposition SE
## total 3.682893 3.3999
## within 3.646572 3.3998
## between 0.036321 0.0028
## gei decomposition SE
## total 0.195236 0.0065
## within 0.161935 0.0061
## between 0.033301 0.0025
## gei decomposition SE
## total 0.200390 0.0079
## within 0.169396 0.0076
## between 0.030994 0.0022
## gei decomposition SE
## total 0.274325 0.0167
## within 0.245067 0.0164
## between 0.029258 0.0021
For additional usage examples of svygei
or svygeidec
, type ?convey::svygei
or ?convey::svygeidec
in the R console.
This section displays example results using nationally-representative surveys from both the United States and Brazil. We present a variety of surveys, levels of analysis, and subpopulation breakouts to provide users with points of reference for the range of plausible values of the svygei
function.
To understand the construction of each survey design object and respective variables of interest, please refer to section 1.4 for CPS-ASEC, section 1.5 for PNAD Contínua, and section 1.6 for SCF.
## gei SE
## htotval 0.4252 0.0052
## sex htotval se.htotval
## male male 0.3972009 0.007639779
## female female 0.4491281 0.006953494
## gei SE
## ftotval 0.37484 0.0055
## sex ftotval se.ftotval
## male male 0.3494651 0.008638860
## female female 0.3990923 0.006595221
## gei SE
## pearnval 0.34162 0.0062
## sex pearnval se.pearnval
## male male 0.3456834 0.007121393
## female female 0.3178379 0.011904099
svygei(
~ deflated_per_capita_income ,
subset(pnadc_design , deflated_per_capita_income > 0),
na.rm = TRUE
)
## gei SE
## deflated_per_capita_income 0.52363 0.0107
svyby(
~ deflated_per_capita_income ,
~ sex ,
subset(pnadc_design , deflated_per_capita_income > 0),
svygei ,
na.rm = TRUE
)
## sex deflated_per_capita_income se.deflated_per_capita_income
## male male 0.5304924 0.01124340
## female female 0.5163178 0.01081883
## gei SE
## deflated_labor_income 0.49544 0.0119
svyby(
~ deflated_labor_income ,
~ sex ,
subset(pnadc_design , deflated_labor_income > 0) ,
svygei ,
na.rm = TRUE
)
## sex deflated_labor_income se.deflated_labor_income
## male male 0.5106575 0.01436090
## female female 0.4510399 0.01024883
## Warning in subset.svyimputationList(scf_design, networth > 0): subset differed
## between imputations
## Multiple imputation results:
## with(subset(scf_design, networth > 0), svygei(~networth))
## scf_MIcombine(with(subset(scf_design, networth > 0), svygei(~networth)))
## results se
## networth 1.834597 0.05022745
## Warning in subset.svyimputationList(scf_design, networth > 0): subset differed
## between imputations
## Multiple imputation results:
## with(subset(scf_design, networth > 0), svyby(~networth, ~hhsex,
## svygei))
## scf_MIcombine(with(subset(scf_design, networth > 0), svyby(~networth,
## ~hhsex, svygei)))
## results se
## male 1.770600 0.05227645
## female 1.563828 0.20502736
## Warning in subset.svyimputationList(scf_design, income > 0): subset differed
## between imputations
## Multiple imputation results:
## with(subset(scf_design, income > 0), svygei(~income))
## scf_MIcombine(with(subset(scf_design, income > 0), svygei(~income)))
## results se
## income 0.9948022 0.07542657
## Warning in subset.svyimputationList(scf_design, income > 0): subset differed
## between imputations
## Multiple imputation results:
## with(subset(scf_design, income > 0), svyby(~income, ~hhsex, svygei))
## scf_MIcombine(with(subset(scf_design, income > 0), svyby(~income,
## ~hhsex, svygei)))
## results se
## male 0.9715715 0.07988096
## female 0.4653207 0.05566310