Taylor-based Estimation

Samplics’s class TaylorEstimator uses linearization methods to estimate the variance of population parameters.

from samplics.datasets import load_nhanes2
from samplics.estimation import TaylorEstimator

from samplics.utils.types import PopParam

# Load Nhanes sample data
nhanes2_dict = load_nhanes2()
nhanes2 = nhanes2_dict["data"]

nhanes2.head(15)

	stratid	psuid	race	highbp	highlead	zinc	diabetes	finalwgt
0	1	1	1	0	NaN	104.0	0.0	8995
1	1	1	1	0	0.0	111.0	0.0	25964
2	1	1	3	0	NaN	102.0	0.0	8752
3	1	1	1	1	NaN	109.0	1.0	4310
4	1	1	1	0	0.0	99.0	0.0	9011
5	1	1	1	1	NaN	101.0	0.0	4310
6	1	1	1	0	0.0	93.0	0.0	3201
7	1	1	1	1	NaN	83.0	0.0	25386
8	1	1	1	0	NaN	98.0	0.0	12102
9	1	1	2	0	0.0	98.0	0.0	4312
10	1	1	1	1	NaN	92.0	0.0	4031
11	1	1	2	0	0.0	90.0	0.0	3628
12	1	1	1	0	NaN	101.0	0.0	28590
13	1	1	1	0	0.0	NaN	0.0	22754
14	1	1	2	0	1.0	123.0	0.0	7119

Using samplics, we can estimate the average level of zinc in the blood using the following

zinc_mean_str = TaylorEstimator(PopParam.mean)
zinc_mean_str.estimate(
    y=nhanes2["zinc"],
    samp_weight=nhanes2["finalwgt"],
    stratum=nhanes2["stratid"],
    psu=nhanes2["psuid"],
    remove_nan=True,
)

print(zinc_mean_str)

SAMPLICS - Estimation of Mean

Number of strata: 31
Number of psus: 62
Degree of freedom: 31

     MEAN       SE       LCI       UCI       CV
87.182067 0.494483 86.173563 88.190571 0.005672

The results of the estimation are stored in the dictionary zinc_mean_str. The users can covert the main estimation information into a pd.DataFrame by using the method to_dataframe().

zinc_mean_str.to_dataframe()

	_param	_estimate	_stderror	_lci	_uci	_cv
0	PopParam.mean	87.182067	0.494483	86.173563	88.190571	0.005672

The method to_dataframe() is more useful for domain estimation by producing a table where which row is a level of the domain of interest, as shown below.

zinc_mean_by_race = TaylorEstimator(PopParam.mean)
zinc_mean_by_race.estimate(
    y=nhanes2["zinc"],
    samp_weight=nhanes2["finalwgt"],
    stratum=nhanes2["stratid"],
    domain=nhanes2["race"],
    psu=nhanes2["psuid"],
    remove_nan=True,
)

zinc_mean_by_race.to_dataframe()

	_param	_domain	_estimate	_stderror	_lci	_uci	_cv
0	PopParam.mean	1	87.495389	0.479196	86.518062	88.472716	0.005477
1	PopParam.mean	2	85.085744	1.165209	82.709286	87.462203	0.013695
2	PopParam.mean	3	83.570910	1.585463	80.337338	86.804483	0.018971

Let’s remove the stratum parameter then we get

zinc_mean_nostr = TaylorEstimator(PopParam.mean)
zinc_mean_nostr.estimate(
    y=nhanes2["zinc"], 
    samp_weight=nhanes2["finalwgt"], 
    psu=nhanes2["psuid"], 
    remove_nan=True
)

print(zinc_mean_nostr)

SAMPLICS - Estimation of Mean

Number of strata: 1
Number of psus: 2
Degree of freedom: 1

     MEAN       SE       LCI       UCI       CV
87.182067 0.742622 77.746158 96.617976 0.008518