✔️ used widely because encompasses interpretable measures
✔️ allows an arbitrary poverty threshold
✔️ can incorporate intensity and inequality among the poor
✔️ when `g >= 2`, measure can be decomposed in interpretable measures of extension, intensity and inequality in poverty
❌ scales are dependent on `g`
❌ becomes increasingly difficult to interpret as `g` parameter grows
❌ no component that allows for time to exit poverty, unlike the watts index
Foster, Greer, and Thorbecke (1984Foster, James, Joel Greer, and Erik Thorbecke. 1984. “A Class of Decomposable Poverty Measures.” Econometrica 52 (3): 761–66. http://www.jstor.org/stable/1913475.) proposed a family of indicators to measure poverty. This class of \(FGT\) measures, can be defined as
\[ p=\frac{1}{N}\sum_{k\in U}h(y_{k},\theta ), \]
where
\[ h(y_{k},\theta )=\left[ \frac{(\theta -y_{k})}{\theta }\right] ^{\gamma }\delta \left\{ y_{k}\leq \theta \right\} , \]
where: \(\theta\) is the poverty threshold; \(\delta\) the indicator function that assigns value \(1\) if the condition \(\{y_{k}\leq \theta \}\) is satisfied and \(0\) otherwise, and \(\gamma\) is a non-negative constant.
If \(\gamma =0\), the FGT(0) equals the poverty headcount ratio, which accounts for the spread of poverty. If \(\gamma =1\), FGT(1) is the mean of the normalized income shortfall of the poor. By doing so, the measure takes into account both the spread and the intensity of poverty. When \(\gamma =2\), the relative weight of larger shortfalls increases even more, which yields a measure that accounts for poverty severity, i.e., the inequality among the poor. This way, a transfer from a poor person to an even poorer person would reduce the FGT(2).
Although Foster, Greer, and Thorbecke (1984Foster, James, Joel Greer, and Erik Thorbecke. 1984. “A Class of Decomposable Poverty Measures.” Econometrica 52 (3): 761–66. http://www.jstor.org/stable/1913475.) already presented a decomposition for the FGT(2) index, Aristondo, De La Vega, and Urrutia (2010Aristondo, Oihana, Casilda Lasso De La Vega, and Ana Urrutia. 2010. “A New Multiplicative Decomposition for the Foster–Greer–Thorbecke Poverty Indices.” Bulletin of Economic Research 62 (3): 259–67. https://doi.org/10.1111/j.1467-8586.2009.00320.x.) provided a general formula that decomposes the FGT(\(\gamma\)) for any \(\gamma \geqslant 2\). Put simply, any such FGT(\(\gamma\)) index can be seen as function of the headcount ratio, the average normalized income gap among the poor, and a generalized entropy index of the normalized income gaps among poor. In mathematical terms,
\[ FGT_\gamma = FGT_0 \cdot I^\gamma \cdot \big[ 1 + \big( \gamma^2 -\gamma \big) GEI_\gamma^* \big] , \text{ } \gamma \geq 2 \]
where \(I\) is the average normalized income gap among the poor and \(GEI_\gamma^*\) is a generalized entropy index of such income gaps among the poor.
This result is particularly useful, as one can attribute cross-sectional differences of a FGT index to differences in the spread, depth, and inequality of poverty.
The FGT poverty class and its decomposition is implemented in the convey
library by the function svyfgt
and svyfgtdec
, respectively.
The argument thresh_type
of this function defines the type of poverty threshold adopted.
There are three possible choices:
abs
– fixed and given by the argument thresh_value.relq
– a fraction of a quantile fixed by the argument percent
and the quantile is defined by the argument quantiles
.relm
– a fraction of the mean fixed by the argument percent
.The quantile and the mean involved in the definition of the threshold are estimated for the whole population. When \(\gamma=0\) and \(\theta= .6*MED\), this measure is equal to the indicator ARPR computed by the function svyarpr
. The linearization of the FGT(0) is presented in Berger and Skinner (2003Berger, Yves G., and Chris J. Skinner. 2003. “Variance Estimation for a Low Income Proportion.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 52 (4): 457–68. https://doi.org/10.1111/1467-9876.00417.).
Next, we give some examples of the function svyfgt
to estimate the values of the FGT poverty index.
Consider first the poverty threshold fixed (\(\gamma=0\)) in the value \(10000\). The headcount ratio (FGT0) is
fgt0 SE
eqincome 0.11444 0.0027
The poverty gap ratio (FGT(1)) (\(\gamma=1\)) index for the poverty threshold fixed at the same value is
fgt1 SE
eqincome 0.032085 0.0011
To estimate the FGT(0) with the poverty threshold fixed at \(0.6* MED\), we first fix the argument type_thresh="relq"
and then use the default values for percent
and order
:
fgt0 SE
eqincome 0.14444 0.0028
This matches the estimate obtained by:
arpr SE
eqincome 0.14444 0.0028
To estimate the poverty gap ratio with the poverty threshold equal to \(0.6*MEAN\), we use:
fgt1 SE
eqincome 0.051187 0.0011
In July 2006, Jenkins (2008Jenkins, Stephen. 2008. “Estimation and Interpretation of Measures of Inequality, Poverty, and Social Welfare Using Stata.” North American Stata Users' Group Meetings 2006. Stata Users Group. http://EconPapers.repec.org/RePEc:boc:asug06:16.) presented at the North American Stata Users’ Group Meetings on the stata Atkinson Index command. The example below reproduces those statistics.
In order to match the presentation’s results using the svyfgt
function from the convey
library, the poverty threshold was considered absolute despite being directly estimated from the survey sample. This effectively treats the variance of the estimated poverty threshold as zero; svyfgt
does not account for the uncertainty of the poverty threshold when the level has been stated as absolute with the abs_thresh=
parameter. In general, we would instead recommend using either relq
or relm
in the type_thresh=
parameter in order to account for the added uncertainty of the poverty threshold calculation. This example serves only to show that svyfgt
behaves properly as compared to other software.
Load and prepare the same data set:
# load the convey package
library(convey)
# load the survey library
library(survey)
# load the foreign library
library(foreign)
# create a temporary file on the local disk
tf <- tempfile()
# store the location of the presentation file
presentation_zip <-
"https://web.archive.org/web/20150928053959/http://repec.org/nasug2006/nasug2006_jenkins.zip"
# download jenkins' presentation to the temporary file
download.file(presentation_zip , tf , mode = 'wb')
# unzip the contents of the archive
presentation_files <- unzip(tf , exdir = tempdir())
# load the institute for fiscal studies' 1981, 1985, and 1991 data.frame objects
x81 <-
read.dta(grep("ifs81" , presentation_files , value = TRUE))
x85 <-
read.dta(grep("ifs85" , presentation_files , value = TRUE))
x91 <-
read.dta(grep("ifs91" , presentation_files , value = TRUE))
# NOTE: we recommend using ?convey::svyarpt rather than this unweighted calculation #
# calculate 60% of the unweighted median income in 1981
unwtd_arpt81 <- quantile(x81$eybhc0 , 0.5) * .6
# calculate 60% of the unweighted median income in 1985
unwtd_arpt85 <- quantile(x85$eybhc0 , 0.5) * .6
# calculate 60% of the unweighted median income in 1991
unwtd_arpt91 <- quantile(x91$eybhc0 , 0.5) * .6
# stack each of these three years of data into a single data.frame
x <- rbind(x81 , x85 , x91)
Replicate the author’s survey design statement from stata code..
. ge poor = (year==1981)*(x < $z_81)+(year==1985)*(x < $z_85)+(year==1991)*(x < $z_91)
. * account for clustering within HHs
. svyset hrn [pweight = wgt]
.. into R
code:
# initiate a linearized survey design object
y <- svydesign( ~ hrn , data = x , weights = ~ wgt)
# immediately run the `convey_prep` function on the survey design
z <- convey_prep(y)
Replicate the author’s headcount ratio results with stata..
. svy: mean poor if year == 1981
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 1 Number of obs = 9772
Number of PSUs = 7476 Population size = 5.5e+07
Design df = 7475
--------------------------------------------------------------
| Linearized
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
poor | .1410125 .0044859 .132219 .149806
--------------------------------------------------------------
. svy: mean poor if year == 1985
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 1 Number of obs = 8991
Number of PSUs = 6972 Population size = 5.5e+07
Design df = 6971
--------------------------------------------------------------
| Linearized
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
poor | .137645 .0046531 .1285235 .1467665
--------------------------------------------------------------
. svy: mean poor if year == 1991
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 1 Number of obs = 6468
Number of PSUs = 5254 Population size = 5.6e+07
Design df = 5253
--------------------------------------------------------------
| Linearized
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
poor | .2021312 .0062077 .1899615 .2143009
--------------------------------------------------------------
..using R
code:
headcount_81 <-
svyfgt( ~ eybhc0 ,
subset(z , year == 1981) ,
g = 0 ,
abs_thresh = unwtd_arpt81)
stopifnot(round(coef(headcount_81) , 5) == .14101)
stopifnot(round(SE(headcount_81) , 5) == .00449)
headcount_81_ci <-
confint(headcount_81 , df = degf(subset(z , year == 1981)))
stopifnot(round(headcount_81_ci[1] , 5) == .13222)
stopifnot(round(headcount_81_ci[2] , 5) == .14981)
headcount_85 <-
svyfgt(~ eybhc0 ,
subset(z , year == 1985) ,
g = 0 ,
abs_thresh = unwtd_arpt85)
stopifnot(round(coef(headcount_85) , 5) == .13764)
stopifnot(round(SE(headcount_85) , 5) == .00465)
headcount_85_ci <-
confint(headcount_85 , df = degf(subset(z , year == 1985)))
stopifnot(round(headcount_85_ci[1] , 5) == .12852)
stopifnot(round(headcount_85_ci[2] , 5) == .14677)
headcount_91 <-
svyfgt(~ eybhc0 ,
subset(z , year == 1991) ,
g = 0 ,
abs_thresh = unwtd_arpt91)
stopifnot(round(coef(headcount_91) , 5) == .20213)
stopifnot(round(SE(headcount_91) , 5) == .00621)
headcount_91_ci <-
confint(headcount_91 , df = degf(subset(z , year == 1991)))
stopifnot(round(headcount_91_ci[1] , 5) == .18996)
stopifnot(round(headcount_91_ci[2] , 5) == .21430)
Confirm this replication applies for the normalized poverty gap as well, comparing stata code..
. ge ngap = poor*($z_81- x)/$z_81 if year == 1981
. svy: mean ngap if year == 1981
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 1 Number of obs = 9772
Number of PSUs = 7476 Population size = 5.5e+07
Design df = 7475
--------------------------------------------------------------
| Linearized
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
ngap | .0271577 .0013502 .0245109 .0298044
--------------------------------------------------------------
..to R
code:
norm_pov_81 <-
svyfgt( ~ eybhc0 ,
subset(z , year == 1981) ,
g = 1 ,
abs_thresh = unwtd_arpt81)
stopifnot(round(coef(norm_pov_81) , 5) == .02716)
stopifnot(round(SE(norm_pov_81) , 5) == .00135)
norm_pov_81_ci <-
confint(norm_pov_81 , df = degf(subset(z , year == 1981)))
stopifnot(round(norm_pov_81_ci[1] , 5) == .02451)
stopifnot(round(norm_pov_81_ci[2] , 5) == .02980)
To provide a example for our code, we proceed with a Monte Carlo experiment.
Using the eusilcP
data from the simPop
package (Templ et al. 2017Templ, Matthias, Bernhard Meindl, Alexander Kowarik, and Olivier Dupriez. 2017. “Simulation of Synthetic Complex Data: The R Package simPop.” Journal of Statistical Software 79 (10): 1–38. https://doi.org/10.18637/jss.v079.i10.), we can compute the actual values of the FGT components for that population:
# load libraries
library(sampling)
library(survey)
library(convey)
library(parallel)
# load pseudo population data
data("eusilcP" , package = "simPop")
# compute population value of the FGT(2) index decomposition
inc.all <- eusilcP$eqIncome
gfun <- function(y , thresh)
(thresh - y) / thresh
hfun <-
function(y , thresh , g)
(((thresh - y) / thresh) ^ g) * (y <= thresh)
fgt2 <- mean(hfun(inc.all , 10000 , 2) , na.rm = TRUE)
fgt1 <- mean(hfun(inc.all , 10000 , 1) , na.rm = TRUE)
fgt0 <- mean(hfun(inc.all , 10000 , 0) , na.rm = TRUE)
igr.fgt <-
mean(hfun(inc.all , 10000 , 1)[inc.all <= 10000] , na.rm = TRUE)
gei.poor <-
convey:::CalcGEI(gfun(inc.all , 10000) , ifelse(inc.all < 10000 , 1 , 0) , 2)
theta.pop <-
c(
"fgt2" = fgt2 ,
"fgt0" = fgt0 ,
"fgt1" = fgt1 ,
"igr" = igr.fgt ,
"gei(poor;epsilon=2)" = gei.poor
)
theta.pop
## fgt2 fgt0 fgt1 igr
## 0.01571192 0.11412691 0.03259544 0.28560699
## gei(poor;epsilon=2)
## 0.34386588
Now, to study the distribution of the estimator under a particular sampling design, we select 5000 samples under one-stage cluster sampling of 100 households using the cluster
function from the sampling
package (Tillé and Matei 2021Tillé, Yves, and Alina Matei. 2021. Sampling: Survey Sampling. https://CRAN.R-project.org/package=sampling.), and use the svyfgtdec
function to estimate the FGT components for each of those samples:
# define the number of monte carlo replicates
mc.rep <- 5000L
# simulation function
fgtdec_sim_fun <- function(this.iter) {
set.seed(this.iter)
library(sampling)
library(survey)
library(convey)
# load pseudo population data
data("eusilcP" , package = "simPop")
# compute size-like variable for PPS sampling design
eusilcP$aux <-
log(ifelse(eusilcP$eqIncome >= 1000 , eusilcP$eqIncome , 1000))
# select sample
tt <-
sampling::cluster(
data = eusilcP[sample.int(nrow(eusilcP) , nrow(eusilcP) , replace = FALSE) , ] ,
clustername = "hid" ,
size = 1000L ,
method = "systematic" ,
pik = eusilcP$aux
)
# collect data
this.sample <- getdata(eusilcP , tt)
# create survey design object
this.desobj <-
svydesign(
ids = ~ hid ,
probs = ~ Prob ,
data = this.sample ,
nest = FALSE
)
# prepare for convey functions
this.desobj <- convey_prep(this.desobj)
# compute estimates
svyfgtdec( ~ eqIncome , this.desobj , abs_thresh = 10000 , g = 2)
}
# run replications
cl <- makeCluster(detectCores() - 1)
fgtdec.estimate.list <-
clusterApply(cl, seq_len(mc.rep) , fgtdec_sim_fun)
stopCluster(cl)
The PRB of each component is estimated using the code below. Notice that PRBs are relatively small, with absolute values below 1%, with the largest bias in the GEI index component.
## fgt2 fgt0 fgt1 igr
## 0.01571192 0.11412691 0.03259544 0.28560699
## gei(poor;epsilon=2)
## 0.34386588
# estimate the expected values of the components estimators
# using the average of the estimates
(theta.exp <- rowMeans(sapply(fgtdec.estimate.list , coef)))
## fgt2 fgt0 fgt1 igr
## 0.01577844 0.11432776 0.03269125 0.28591312
## gei(poor;epsilon=2)
## 0.34234370
# estimate the percentage relative bias
(percentage_relative_bias <- 100 * (theta.exp / theta.pop - 1))
## fgt2 fgt0 fgt1 igr
## 0.4233956 0.1759893 0.2939132 0.1071867
## gei(poor;epsilon=2)
## -0.4426679
For the variance estimators, we estimate the PRB using the code below. Note that the bias is still relatively small, with absolute values of the PRB below 5%.
# estimate the variance of the components estimators
# using the empirical variance of the estimates
(vartheta.popest <-
diag(var(t(
sapply(fgtdec.estimate.list , coef)
))))
## fgt2 fgt0 fgt1 igr
## 6.447720e-06 1.015546e-04 1.468049e-05 4.753320e-04
## gei(poor;epsilon=2)
## 1.841360e-03
# estimate the expected value of the Watts index variance estimator
# using the (estimated) expected value of the variance estimates
(vartheta.exp <-
rowMeans(sapply(fgtdec.estimate.list , function(z)
diag(vcov(
z
)))))
## fgt2 fgt0 fgt1 igr
## 6.462904e-06 1.014859e-04 1.473047e-05 4.923412e-04
## gei(poor;epsilon=2)
## 1.911158e-03
# estimate the percentage relative bias of the variance estimators
(percentage_relative_bias_variance <- 100 * (vartheta.exp / vartheta.popest - 1))
## fgt2 fgt0 fgt1 igr
## 0.23548684 -0.06759586 0.34047392 3.57838870
## gei(poor;epsilon=2)
## 3.79052600
Regarding the MSE of the decomposition estimator, the squared bias accounts for less than 0.1% of the MSE. This means that, with a good estimate of the variance, we should be able to have a good approximation for the MSE.
# estimate MSE
theta.bias2 <- (theta.exp - theta.pop) ^ 2
(theta.mse <- theta.bias2 + vartheta.popest)
## fgt2 fgt0 fgt1 igr
## 6.452146e-06 1.015949e-04 1.468967e-05 4.754257e-04
## gei(poor;epsilon=2)
## 1.843678e-03
## fgt2 fgt0 fgt1 igr
## 0.06858780 0.03970787 0.06247986 0.01971227
## gei(poor;epsilon=2)
## 0.12567511
The CIs based on the normal approximation work reasonably well for all components. The code below shows that the coverage rates are close to the 95% nominal coverage rate.
# estimate confidence intervals of the Watts index
# for each of the samples
est.coverage <-
sapply(fgtdec.estimate.list, function(this.stat)
confint(this.stat)[, 1] <= theta.pop &
confint(this.stat)[, 2] >= theta.pop)
# evaluate empirical coverage
stopifnot( abs( rowMeans(est.coverage) - 0.95 ) < 0.015 )
For additional usage examples of svyfgt
and svyfgtdec
, type ?convey::svyfgt
or ?convey::svyfgtdec
in the R
console.
This section displays example results using nationally-representative surveys from both the United States and Brazil. We present a variety of surveys, levels of analysis, and subpopulation breakouts to provide users with points of reference for the range of plausible values of the svyfgt
function.
To understand the construction of each survey design object and respective variables of interest, please refer to section 1.4 for CPS-ASEC, section 1.5 for PNAD Contínua, and section 1.6 for SCF.
# poverty gap ratio, with poverty threshold fixed at 0.6 x median
svyfgt( ~ htotval ,
cps_household_design ,
g = 1 ,
type_thresh = "relq")
## fgt1 SE
## htotval 0.14559 0.0014
## sex htotval se.htotval
## male male 0.1194914 0.001658544
## female female 0.1719668 0.001996960
# poverty gap ratio, with poverty threshold fixed at 0.6 x median
svyfgt( ~ ftotval ,
cps_family_design ,
g = 1 ,
type_thresh = "relq")
## fgt1 SE
## ftotval 0.10372 0.0014
## sex ftotval se.ftotval
## male male 0.078008 0.001594309
## female female 0.130937 0.002294015
# poverty gap ratio, with poverty threshold fixed at 0.6 x median
svyfgt(~ pearnval , cps_ftfy_worker_design , g = 1 , type_thresh = "relq")
## fgt1 SE
## pearnval 0.063625 0.0016
## sex pearnval se.pearnval
## male male 0.05386479 0.001552206
## female female 0.07630316 0.002094740
# poverty gap ratio, with poverty threshold fixed at 0.6 x median
svyfgt( ~ deflated_per_capita_income , pnadc_design , na.rm = TRUE , g = 1 , type_thresh = "relq")
## fgt1 SE
## deflated_per_capita_income 0.11801 0.001
svyby(~ deflated_per_capita_income ,
~ sex ,
pnadc_design ,
svyfgt ,
na.rm = TRUE , g = 1 , type_thresh = "relq")
## sex deflated_per_capita_income se.deflated_per_capita_income
## male male 0.1147690 0.001002209
## female female 0.1211002 0.001160411
# poverty gap ratio, with poverty threshold fixed at 0.6 x median
svyfgt(
~ deflated_labor_income ,
pnadc_design ,
na.rm = TRUE ,
g = 1 ,
type_thresh = "relq"
)
## fgt1 SE
## deflated_labor_income 0.073191 7e-04
svyby(
~ deflated_labor_income ,
~ sex ,
pnadc_design ,
svyfgt ,
na.rm = TRUE ,
g = 1 ,
type_thresh = "relq"
)
## sex deflated_labor_income se.deflated_labor_income
## male male 0.05871789 0.0008038429
## female female 0.09286495 0.0011567031
# poverty gap ratio, with poverty threshold fixed at 0.6 x median
scf_MIcombine(with(scf_design , svyfgt(
~ networth , g = 1 , type_thresh = "relq"
)))
## Multiple imputation results:
## with(scf_design, svyfgt(~networth, g = 1, type_thresh = "relq"))
## scf_MIcombine(with(scf_design, svyfgt(~networth, g = 1, type_thresh = "relq")))
## results se
## networth 0.3272587 0.005178114
scf_MIcombine(with(
scf_design ,
svyby(
~ networth,
~ hhsex ,
svyfgt ,
g = 1 ,
type_thresh = "relq"
)
))
## Multiple imputation results:
## with(scf_design, svyby(~networth, ~hhsex, svyfgt, g = 1, type_thresh = "relq"))
## scf_MIcombine(with(scf_design, svyby(~networth, ~hhsex, svyfgt,
## g = 1, type_thresh = "relq")))
## results se
## male 0.2608188 0.006588845
## female 0.4992153 0.013970985
# poverty gap ratio, with poverty threshold fixed at 0.6 x median
scf_MIcombine(with(scf_design , svyfgt( ~ income , g = 1 , type_thresh = "relq")))
## Multiple imputation results:
## with(scf_design, svyfgt(~income, g = 1, type_thresh = "relq"))
## scf_MIcombine(with(scf_design, svyfgt(~income, g = 1, type_thresh = "relq")))
## results se
## income 0.1220061 0.003308005
## Multiple imputation results:
## with(scf_design, svyby(~income, ~hhsex, svyfgt, g = 1, type_thresh = "relq"))
## scf_MIcombine(with(scf_design, svyby(~income, ~hhsex, svyfgt,
## g = 1, type_thresh = "relq")))
## results se
## male 0.08097266 0.002983921
## female 0.22820761 0.008468860