Sample Size for Stage Design

In the cells below, we illustrate a simple example of sample size calculation in the context of household surveys using stage sampling designs. Let’s assume that we want to calculate sample size for a vaccination survey in Senegal. We want to stratify the sample by administrative region. We will use the 2017 Senegal Demographic and Health Survey (DHS) to get an idea of the vaccination coverage rates for some main vaccine-doses. Below, we show coverage rates of hepatitis B birth dose (hepB0) vaccine, first and third dose of diphtheria, tetanus and pertussis (DTP), first dose of measles containing vaccine (MCV1) and coverage of basic vaccination. Basic vaccination refers to the 12-23 months old children that received BCG vaccine, three doses of DTP containing vaccine, three doses of polio vaccine, and the first dose of measles containing vaccine.The table below shows the 2017 Senegal DHS vaccination coverage of a few vaccine-doses for children aged 12 to 23 months old.

Region HepB0 DTP1 DTP3 MCV1 Basic vaccination
Dakar 53.6 99.1 98.5 97.0 84.9
Ziguinchor 47.1 98.6 94.1 93.6 80.9
Diourbel 62.8 94.6 88.2 86.1 68.2
Saint-Louis 40.1 99.1 97.2 94.7 80.6
Tambacounda 45.0 83.3 72.7 65.3 47.0
Kaolack 63.9 99.6 92.2 89.3 79.7
Thies 62.3 100.0 98.8 91.6 83.4
Louga 49.8 96.2 87.8 81.5 67.8
Fatick 62.7 98.5 93.8 90.3 76.6
Kolda 32.8 94.4 87.3 85.6 63.7
Matam 43.1 94.3 88.1 79.4 68.7
Kaffrine 56.9 98.0 93.6 88.7 76.6
Kedougou 44.4 70.7 60.2 46.5 33.6
Sedhiou 46.6 96.8 90.4 89.9 74.2

The 2017 Senegal DHS data collection happened from April to December 2018. Therefore, the data shown in the table represent children born from October 2016 to December 2017. For the purpose of this tutorial, we will assume that these vaccine coverage rates still hold. Furthermore, we will use the basic vaccination coverage rates to calculate sample size.

from samplics.sampling import SampleSize

Number of children

Wald method

The first step is to create and object using the SampleSize class with the parameter of interest, the sample size calculation method, and the stratification status. In this example, we want to calculate sample size for proportions, using wald method for a stratified design. This is achived with the following snippet of code.

SampleSize(
    param="proportion", method="wald", strat=True
)

Because, we are using a stratified sample design, it is best to specify the expected coverage levels by stratum. If the information is not available then aggregated values can be used across the strata. The 2017 Senegal DHS published the coverage rates by region hence we have the information available by stratum. To provide the informmation to Samplics we use the python dictionaries as follows

expected_coverage = {
    "Dakar": 0.849,
    "Ziguinchor": 0.809,
    "Diourbel": 0.682,
    "Saint-Louis": 0.806,
    "Tambacounda": 0.470,
    "Kaolack": 0.797,
    "Thies": 0.834,
    "Louga": 0.678,
    "Fatick": 0.766,
    "Kolda": 0.637,
    "Matam": 0.687,
    "Kaffrine": 0.766,
    "Kedougou": 0.336,
    "Sedhiou": 0.742,
}

Now, we want to calculate the sample size with desired precision of 0.07 which means that we want the expected vaccination coverage rates to have 7% half confidence intervals e.g. expected rate of 90% will have a confidence interval of [83%, 97%]. Note that the desired precision can be specified by stratum in a similar way as the target coverage using a python dictionary.

Given that information, we can calculate the sample size using SampleSize class as follows.

from samplics.utils.types import SizeMethod, PopParam

# Declare the sample size calculation parameters
sen_vaccine_wald = SampleSize(
    param=PopParam.prop, method=SizeMethod.wald, strat=True
)

# calculate the sample size
sen_vaccine_wald.calculate(target=expected_coverage, half_ci=0.07)

# show the calculated sample size
print(f"\nCalculated sample sizes by stratum: ")
sen_vaccine_wald.samp_size

Calculated sample sizes by stratum: 
{'Dakar': 101,
 'Ziguinchor': 122,
 'Diourbel': 171,
 'Saint-Louis': 123,
 'Tambacounda': 196,
 'Kaolack': 127,
 'Thies': 109,
 'Louga': 172,
 'Fatick': 141,
 'Kolda': 182,
 'Matam': 169,
 'Kaffrine': 141,
 'Kedougou': 175,
 'Sedhiou': 151}

SampleSize calculates the sample sizes and store the in teh samp_size attributes which is a python dictinary object. If a dataframe is better suited for the use case, the method to_dataframe() can be used to return a pandas dataframe.

sen_vaccine_wald.to_dataframe()
_param _stratum _target _sigma _half_ci _samp_size
0 PopParam.prop Dakar 0.849 0.128199 0.07 101
1 PopParam.prop Ziguinchor 0.809 0.154519 0.07 122
2 PopParam.prop Diourbel 0.682 0.216876 0.07 171
3 PopParam.prop Saint-Louis 0.806 0.156364 0.07 123
4 PopParam.prop Tambacounda 0.470 0.249100 0.07 196
5 PopParam.prop Kaolack 0.797 0.161791 0.07 127
6 PopParam.prop Thies 0.834 0.138444 0.07 109
7 PopParam.prop Louga 0.678 0.218316 0.07 172
8 PopParam.prop Fatick 0.766 0.179244 0.07 141
9 PopParam.prop Kolda 0.637 0.231231 0.07 182
10 PopParam.prop Matam 0.687 0.215031 0.07 169
11 PopParam.prop Kaffrine 0.766 0.179244 0.07 141
12 PopParam.prop Kedougou 0.336 0.223104 0.07 175
13 PopParam.prop Sedhiou 0.742 0.191436 0.07 151

The sample size calculation above assumes that the design effect (DEFF) was equal to 1. A design effect of 1 correspond to sampling design with a variance equivalent to a simple random selection of same sample size. In the context of complex sampling designs, DEFF is often different from 1. Stage sampling and unequal weights usually increase the design effect above 1. The 2017 Senegal DHS indicated a design effect equal to 1.963 (1.401^2) for basic vaccination. Hence, to calculate the sample size, we will use the design effect provided by DHS.

sen_vaccine_wald.calculate(
    target=expected_coverage, half_ci=0.07, deff=1.401 ** 2
)

sen_vaccine_wald.to_dataframe()
_param _stratum _target _sigma _half_ci _samp_size
0 PopParam.prop Dakar 0.849 0.128199 0.07 198
1 PopParam.prop Ziguinchor 0.809 0.154519 0.07 238
2 PopParam.prop Diourbel 0.682 0.216876 0.07 334
3 PopParam.prop Saint-Louis 0.806 0.156364 0.07 241
4 PopParam.prop Tambacounda 0.470 0.249100 0.07 384
5 PopParam.prop Kaolack 0.797 0.161791 0.07 249
6 PopParam.prop Thies 0.834 0.138444 0.07 214
7 PopParam.prop Louga 0.678 0.218316 0.07 336
8 PopParam.prop Fatick 0.766 0.179244 0.07 276
9 PopParam.prop Kolda 0.637 0.231231 0.07 356
10 PopParam.prop Matam 0.687 0.215031 0.07 331
11 PopParam.prop Kaffrine 0.766 0.179244 0.07 276
12 PopParam.prop Kedougou 0.336 0.223104 0.07 344
13 PopParam.prop Sedhiou 0.742 0.191436 0.07 295

Since the sample design is stratified, the sample size calculation will be more precised if DEFF is specified at the stratum level which is available from the 2017 Senegal DHS provided report. Some regions have a design effect below 1. To be conservative with our sample size calculation, we will use 1.21 as the minimum design effect to use in the sample size calculation.

# Target coverage rates
expected_deff = {
    "Dakar": 1.100 ** 2,
    "Ziguinchor": 1.100 ** 2,
    "Diourbel": 1.346 ** 2,
    "Saint-Louis": 1.484 ** 2,
    "Tambacounda": 1.366 ** 2,
    "Kaolack": 1.360 ** 2,
    "Thies": 1.109 ** 2,
    "Louga": 1.902 ** 2,
    "Fatick": 1.100 ** 2,
    "Kolda": 1.217 ** 2,
    "Matam": 1.403 ** 2,
    "Kaffrine": 1.256 ** 2,
    "Kedougou": 2.280 ** 2,
    "Sedhiou": 1.335 ** 2,
}

# Calculate sample sizes using deff at the stratum level
sen_vaccine_wald.calculate(
    target=expected_coverage, half_ci=0.07, deff=expected_deff
)

# Convert sample sizes to a dataframe
sen_vaccine_wald.to_dataframe()
_param _stratum _target _sigma _half_ci _samp_size
0 PopParam.prop Dakar 0.849 0.128199 0.07 122
1 PopParam.prop Ziguinchor 0.809 0.154519 0.07 147
2 PopParam.prop Diourbel 0.682 0.216876 0.07 309
3 PopParam.prop Saint-Louis 0.806 0.156364 0.07 270
4 PopParam.prop Tambacounda 0.470 0.249100 0.07 365
5 PopParam.prop Kaolack 0.797 0.161791 0.07 235
6 PopParam.prop Thies 0.834 0.138444 0.07 134
7 PopParam.prop Louga 0.678 0.218316 0.07 620
8 PopParam.prop Fatick 0.766 0.179244 0.07 171
9 PopParam.prop Kolda 0.637 0.231231 0.07 269
10 PopParam.prop Matam 0.687 0.215031 0.07 332
11 PopParam.prop Kaffrine 0.766 0.179244 0.07 222
12 PopParam.prop Kedougou 0.336 0.223104 0.07 910
13 PopParam.prop Sedhiou 0.742 0.191436 0.07 268

The sample size calculation above does not account for attrition of sample sizes due to non-response. In the 2017 Semegal DHS, the overal household and women reponse rate was abou 94.2%.

# Calculate sample sizes with a resp_rate of 94.2%
sen_vaccine_wald.calculate(
    target=expected_coverage, 
    half_ci=0.07, 
    deff=expected_deff, 
    resp_rate=0.942
)

# Convert sample sizes to a dataframe
sen_vaccine_wald.to_dataframe(
    col_names=[
        "Parameter",
        "region",
        "vaccine_cov",
        "stderr",
        "half_ci",
        "count_12_23",
    ]
)
Parameter region vaccine_cov stderr half_ci count_12_23
0 PopParam.prop Dakar 0.849 0.128199 0.07 130
1 PopParam.prop Ziguinchor 0.809 0.154519 0.07 156
2 PopParam.prop Diourbel 0.682 0.216876 0.07 328
3 PopParam.prop Saint-Louis 0.806 0.156364 0.07 287
4 PopParam.prop Tambacounda 0.470 0.249100 0.07 387
5 PopParam.prop Kaolack 0.797 0.161791 0.07 250
6 PopParam.prop Thies 0.834 0.138444 0.07 142
7 PopParam.prop Louga 0.678 0.218316 0.07 658
8 PopParam.prop Fatick 0.766 0.179244 0.07 181
9 PopParam.prop Kolda 0.637 0.231231 0.07 286
10 PopParam.prop Matam 0.687 0.215031 0.07 353
11 PopParam.prop Kaffrine 0.766 0.179244 0.07 236
12 PopParam.prop Kedougou 0.336 0.223104 0.07 966
13 PopParam.prop Sedhiou 0.742 0.191436 0.07 284

Fleiss method

The World Health Organization (WHO) recommends using the Fleiss method for calculating sample size for vaccination coverage survey, as specified in the following guideline document: https://www.who.int/immunization/documents/who_ivb_18.09/en/. To use the Fleiss method, the examples shown above are the same with method="fleiss".

sen_vaccine_fleiss = SampleSize(
    param=PopParam.prop, 
    method=SizeMethod.fleiss, 
    strat=True
)

sen_vaccine_fleiss.calculate(
    target=expected_coverage, 
    half_ci=0.07, 
    deff=expected_deff, 
    resp_rate=0.942
)

sen_vaccine_sample = sen_vaccine_fleiss.to_dataframe(
    col_names=[
        "Parameter",
        "region",
        "vaccine_cov",
        "stderr",
        "half_ci",
        "count_12_23",
    ]
)
sen_vaccine_sample.head(15)
Parameter region vaccine_cov stderr half_ci count_12_23
0 PopParam.prop Dakar 0.849 0.128199 0.07 190
1 PopParam.prop Ziguinchor 0.809 0.154519 0.07 210
2 PopParam.prop Diourbel 0.682 0.216876 0.07 398
3 PopParam.prop Saint-Louis 0.806 0.156364 0.07 384
4 PopParam.prop Tambacounda 0.470 0.249100 0.07 410
5 PopParam.prop Kaolack 0.797 0.161791 0.07 329
6 PopParam.prop Thies 0.834 0.138444 0.07 201
7 PopParam.prop Louga 0.678 0.218316 0.07 794
8 PopParam.prop Fatick 0.766 0.179244 0.07 228
9 PopParam.prop Kolda 0.637 0.231231 0.07 325
10 PopParam.prop Matam 0.687 0.215031 0.07 432
11 PopParam.prop Kaffrine 0.766 0.179244 0.07 297
12 PopParam.prop Kedougou 0.336 0.223104 0.07 1140
13 PopParam.prop Sedhiou 0.742 0.191436 0.07 348

At this point, we have the number of 12-23 months needed to achieve the desired precision given the expected proportions using wald or fleiss calculation methods.

Number of households

To obtain the number of households, we need to know the expected average number of children aged 12-23 months per household. This information can be obtained from census data or from surveys’ rosters. Since, the design is stratified, it is best to obtain the information per stratum. In this example, we wil assume that 5.2% of the population is between 12 and 23 months of age and apply that to all strata and household. Hence, the minimum number of households to select is:

sen_vaccine_sample["number_hhs"] = round(
    sen_vaccine_sample["count_12_23"] / 0.052, 0
)

sen_vaccine_sample.head(15)
Parameter region vaccine_cov stderr half_ci count_12_23 number_hhs
0 PopParam.prop Dakar 0.849 0.128199 0.07 190 3654.0
1 PopParam.prop Ziguinchor 0.809 0.154519 0.07 210 4038.0
2 PopParam.prop Diourbel 0.682 0.216876 0.07 398 7654.0
3 PopParam.prop Saint-Louis 0.806 0.156364 0.07 384 7385.0
4 PopParam.prop Tambacounda 0.470 0.249100 0.07 410 7885.0
5 PopParam.prop Kaolack 0.797 0.161791 0.07 329 6327.0
6 PopParam.prop Thies 0.834 0.138444 0.07 201 3865.0
7 PopParam.prop Louga 0.678 0.218316 0.07 794 15269.0
8 PopParam.prop Fatick 0.766 0.179244 0.07 228 4385.0
9 PopParam.prop Kolda 0.637 0.231231 0.07 325 6250.0
10 PopParam.prop Matam 0.687 0.215031 0.07 432 8308.0
11 PopParam.prop Kaffrine 0.766 0.179244 0.07 297 5712.0
12 PopParam.prop Kedougou 0.336 0.223104 0.07 1140 21923.0
13 PopParam.prop Sedhiou 0.742 0.191436 0.07 348 6692.0

Similarly, the number of clusters to select can be obtained by dividing the number of households by the number of households per cluster to be selected.