from pprint import pprint
from samplics.datasets import load_auto
from samplics.categorical.comparison import Ttest
T-test
The t-test module allows comparing means of continuous variables of interest to known means or across two groups. There are four main types of comparisons.
- Comparison of one-sample mean to a known mean
- Comparison of two groups from the same sample
- Comparison of two means from two different samples
- Comparison of two paired means
Ttest()
is the class that implements all four type of comparisons. To run a comparison, the user call the method compare()
with the appropriate parameters.
Comparison of one-sample mean to a knowm mean
For this comparison, the mean of a continuous variable, i.e. mpg, is compared to a know mean. In the example below, the user is testing whether the average mpg is equal to 20. Hence, the null hypothesis is H0: mean(mpg) = 20. There are three possible alternatives for this null hypotheses:
- Ha: mean(mpg) < 20 (less_than alternative)
- Ha: mean(mpg) > 20 (greater_than alternative)
- Ha: mean(mpg) != 20 (not_equal alternative)
All three alternatives are automatically computed by the method compare()
. This behavior is similar across the four type of comparisons.
# Load Auto sample data
= load_auto()
auto_dict = auto_dict["data"]
auto = auto["mpg"]
mpg
= Ttest(samp_type="one-sample")
one_sample_known_mean =mpg, known_mean=20)
one_sample_known_mean.compare(y
print(one_sample_known_mean)
Design-based One-Sample T-test
Null hypothesis (Ho): mean = 20
t statistics: 1.9289
Degrees of freedom: 73.00
Alternative hypothesis (Ha):
Prob(T < t) = 0.9712
Prob(|T| > |t|) = 0.0576
Prob(T > t) = 0.0288
Nb. Obs PopParam.mean Std. Error Std. Dev. Lower CI Upper CI
74 21.297297 0.672551 5.785503 19.956905 22.63769
The print below shows the information encapsulated in the object. point_est
provides the sample mean. Similarly, stderror
, stddev
, lower_ci
, and upper_ci
provide the standard error, standard deviation, lower bound confidence interval (CI), and upper bound CI, respectively. The class member stats
provides the statistics related to the three t-tests (for the three alternative hypothesis). There is additional information encapsulated in the object as shown below.
pprint(one_sample_known_mean.__dict__)
{'alpha': 0.05,
'deff': {},
'group_levels': {},
'group_names': [],
'lower_ci': 19.95690491373974,
'paired': False,
'point_est': 21.2972972972973,
'samp_type': 'one-sample',
'stats': {'df': 73,
'known_mean': 20,
'number_obs': 74,
'p_value': {'greater_than': 0.02881433507499831,
'less_than': 0.9711856649250017,
'not_equal': 0.05762867014999661},
't': 1.9289200809064198},
'stddev': 5.785503209735141,
'stderror': 0.6725510870764976,
'upper_ci': 22.637689680854855,
'vars_names': ['mpg']}
Comparison of two groups from the same sample
This type of comparison is used when the two groups are from the sample. For example, after running a survey, the user want to know if the domestic cars have the same mpg on average compare to the foreign cars. The parameter group
indicates the categorical variable. NB: note that, at this point, Ttest()
does not take into account potential dependencies between groups.
= auto["foreign"]
foreign
= Ttest(samp_type="one-sample")
one_sample_two_groups =mpg, group=foreign)
one_sample_two_groups.compare(y
print(one_sample_two_groups)
Design-based One-Sample T-test
Null hypothesis (Ho): mean(Domestic) = mean(Foreign)
Equal variance assumption:
t statistics: -3.6632
Degrees of freedom: 72.00
Alternative hypothesis (Ha):
Prob(T < t) = 0.0002
Prob(|T| > |t|) = 0.0005
Prob(T > t) = 0.9998
Unequal variance assumption:
t statistics: -3.2245
Degrees of freedom: 30.81
Alternative hypothesis (Ha):
Prob(T < t) = 0.0015
Prob(|T| > |t|) = 0.0030
Prob(T > t) = 0.9985
Group Nb. Obs PopParam.mean Std. Error Std. Dev. Lower CI Upper CI
Domestic 52 19.826923 0.655868 4.729532 18.519780 21.134066
Foreign 22 24.772727 1.386503 6.503276 22.009431 27.536024
Since there are two groups for this comparison, the sample mean, standard error, standard deviation, lower bound CI, and upper bound CI are provided by group as Python dictionaries. The class member stats
provides statistics for the comparison assuming both equal and unequal variances.
print("These are the group means for mpg:")
pprint(one_sample_two_groups.point_est)
These are the group means for mpg:
{'Domestic': 19.826923076923077, 'Foreign': 24.772727272727273}
print(f"These are the group standard error for mpg:")
pprint(one_sample_two_groups.stderror)
These are the group standard error for mpg:
{'Domestic': 0.6558681110509441, 'Foreign': 1.3865030562044942}
print("These are the group standard deviation for mpg:")
pprint(one_sample_two_groups.stddev)
These are the group standard deviation for mpg:
{'Domestic': 4.7295322086717775, 'Foreign': 6.50327578586491}
print("These are the computed statistics:")
pprint(one_sample_two_groups.stats)
These are the computed statistics:
{'df_eq_variance': 72,
'df_uneq_variance': 30.814287872636015,
'number_obs': {'Domestic': 52, 'Foreign': 22},
'p_value_eq_variance': {'greater_than': 0.9997637712766184,
'less_than': 0.00023622872338158258,
'not_equal': 0.00047245744676316517},
'p_value_uneq_variance': {'greater_than': 0.9985090924569335,
'less_than': 0.00149090754306649,
'not_equal': 0.00298181508613298},
't_eq_variance': -3.663245852011623,
't_uneq_variance': -3.2245353733260638}
Comparison of two means from two different samples
This type of comparison should be used when the two groups come from different samples or different strata. The group are assumed independent. Otherwise, the information is similar to the previous test. Note that, when instantiating the class, we used samp_type="two-sample"
.
= Ttest(samp_type="two-sample", paired=False)
two_samples_unpaired =mpg, group=foreign)
two_samples_unpaired.compare(y
print(two_samples_unpaired)
Design-based Two-Sample T-test
Null hypothesis (Ho): mean(Domestic) = mean(Foreign)
Equal variance assumption:
t statistics: -3.6308
Degrees of freedom: 72.00
Alternative hypothesis (Ha):
Prob(T < t) = 0.0003
Prob(|T| > |t|) = 0.0005
Prob(T > t) = 0.9997
Unequal variance assumption:
t statistics: -3.1797
Degrees of freedom: 30.55
Alternative hypothesis (Ha):
Prob(T < t) = 0.0017
Prob(|T| > |t|) = 0.0034
Prob(T > t) = 0.9983
Group Nb. Obs PopParam.mean Std. Error Std. Dev. Lower CI Upper CI
Domestic 52 19.826923 0.657777 4.743297 18.506381 21.147465
Foreign 22 24.772727 1.409510 6.611187 21.841491 27.703963
print("These are the group means for mpg:")
pprint(two_samples_unpaired.point_est)
These are the group means for mpg:
{'Domestic': 19.826923076923077, 'Foreign': 24.772727272727273}
print("These are the group standard error for mpg:")
pprint(two_samples_unpaired.stderror)
These are the group standard error for mpg:
{'Domestic': 0.6577769784877484, 'Foreign': 1.409509782735444}
print("These are the group standard deviation for mpg:")
pprint(two_samples_unpaired.stddev)
These are the group standard deviation for mpg:
{'Domestic': 4.7432972475147, 'Foreign': 6.611186898567625}
print("These are the computed statistics:")
pprint(two_samples_unpaired.stats)
These are the computed statistics:
{'df_eq_variance': 72,
'df_uneq_variance': 30.546277725121076,
'number_obs': {'Domestic': 52, 'Foreign': 22},
'p_value_eq_variance': {'greater_than': 0.9997372920330829,
'less_than': 0.00026270796691710003,
'not_equal': 0.0005254159338342001},
'p_value_uneq_variance': {'greater_than': 0.9983149592187673,
'less_than': 0.0016850407812326069,
'not_equal': 0.0033700815624652138},
't_eq_variance': -3.6308484477318372,
't_uneq_variance': -3.1796851846684073}
Comparison of two paired means
When two measures are taken from the same observations, the paired t-test is appropriate for comparing the means.
= Ttest(samp_type="two-sample", paired=True)
two_samples_paired =auto[["y1", "y2"]], group=foreign)
two_samples_paired.compare(y
print(two_samples_paired)
Design-based Two-Sample T-test
Null hypothesis (Ho): mean(Diff = y1 - y2) = 0
t statistics: 0.8733
Degrees of freedom: 73.00
Alternative hypothesis (Ha):
Prob(T < t) = 0.8073
Prob(|T| > |t|) = 0.3853
Prob(T > t) = 0.1927
Nb. Obs PopParam.mean Std. Error Std. Dev. Lower CI Upper CI
74 4.054054e-07 4.641962e-07 0.000004 -5.197363e-07 0.000001
varnames
can be used rename the variables
= auto["y1"]
y1 = auto["y2"]
y2
= Ttest(samp_type="two-sample", paired=True)
two_samples_paired
two_samples_paired.compare(=[y1, y2],
y= ["group_1", "gourp_2"],
varnames=foreign
group
)
print(two_samples_paired)
Design-based Two-Sample T-test
Null hypothesis (Ho): mean(Diff = group_1 - gourp_2) = 0
t statistics: 0.8733
Degrees of freedom: 73.00
Alternative hypothesis (Ha):
Prob(T < t) = 0.8073
Prob(|T| > |t|) = 0.3853
Prob(T > t) = 0.1927
Nb. Obs PopParam.mean Std. Error Std. Dev. Lower CI Upper CI
74 4.054054e-07 4.641962e-07 0.000004 -5.197363e-07 0.000001