Taylor-based Estimation

Samplics’s class TaylorEstimator uses linearization methods to estimate the variance of population parameters.

from samplics.datasets import load_nhanes2
from samplics.estimation import TaylorEstimator
# Load Nhanes sample data
nhanes2_dict = load_nhanes2()
nhanes2 = nhanes2_dict["data"]

nhanes2.head(15)
stratid psuid race highbp highlead zinc diabetes finalwgt
0 1 1 1 0 NaN 104.0 0.0 8995
1 1 1 1 0 0.0 111.0 0.0 25964
2 1 1 3 0 NaN 102.0 0.0 8752
3 1 1 1 1 NaN 109.0 1.0 4310
4 1 1 1 0 0.0 99.0 0.0 9011
5 1 1 1 1 NaN 101.0 0.0 4310
6 1 1 1 0 0.0 93.0 0.0 3201
7 1 1 1 1 NaN 83.0 0.0 25386
8 1 1 1 0 NaN 98.0 0.0 12102
9 1 1 2 0 0.0 98.0 0.0 4312
10 1 1 1 1 NaN 92.0 0.0 4031
11 1 1 2 0 0.0 90.0 0.0 3628
12 1 1 1 0 NaN 101.0 0.0 28590
13 1 1 1 0 0.0 NaN 0.0 22754
14 1 1 2 0 1.0 123.0 0.0 7119

Using samplics, we can estimate the average level of zinc in the blood using the following

zinc_mean_str = TaylorEstimator("mean")
zinc_mean_str.estimate(
    y=nhanes2["zinc"],
    samp_weight=nhanes2["finalwgt"],
    stratum=nhanes2["stratid"],
    psu=nhanes2["psuid"],
    remove_nan=True,
)

print(zinc_mean_str)
SAMPLICS - Estimation of Mean

Number of strata: 31
Number of psus: 62
Degree of freedom: 31

     MEAN       SE       LCI       UCI       CV
87.182067 0.494483 86.173563 88.190571 0.005672

The results of the estimation are stored in the dictionary zinc_mean_str. The users can covert the main estimation information into a pd.DataFrame by using the method to_dataframe().

zinc_mean_str.to_dataframe()
_param _estimate _stderror _lci _uci _cv
0 mean 87.182067 0.494483 86.173563 88.190571 0.005672

The method to_dataframe() is more useful for domain estimation by producing a table where which row is a level of the domain of interest, as shown below.

zinc_mean_by_race = TaylorEstimator("mean")
zinc_mean_by_race.estimate(
    y=nhanes2["zinc"],
    samp_weight=nhanes2["finalwgt"],
    stratum=nhanes2["stratid"],
    domain=nhanes2["race"],
    psu=nhanes2["psuid"],
    remove_nan=True,
)

zinc_mean_by_race.to_dataframe()
_param _domain _estimate _stderror _lci _uci _cv
0 mean 1 87.495389 0.479196 86.518062 88.472716 0.005477
1 mean 2 85.085744 1.165209 82.709286 87.462203 0.013695
2 mean 3 83.570910 1.585463 80.337338 86.804483 0.018971

Let’s remove the stratum parameter then we get

zinc_mean_nostr = TaylorEstimator("mean")
zinc_mean_nostr.estimate(
    y=nhanes2["zinc"], 
    samp_weight=nhanes2["finalwgt"], 
    psu=nhanes2["psuid"], 
    remove_nan=True
)

print(zinc_mean_nostr)
SAMPLICS - Estimation of Mean

Number of strata: 1
Number of psus: 2
Degree of freedom: 1

     MEAN       SE       LCI       UCI       CV
87.182067 0.742622 77.746158 96.617976 0.008518