T-test

The t-test module allows comparing means of continuous variables of interest to known means or across two groups. There are four main types of comparisons.

Comparison of one-sample mean to a known mean
Comparison of two groups from the same sample
Comparison of two means from two different samples
Comparison of two paired means

Ttest() is the class that implements all four type of comparisons. To run a comparison, the user call the method compare() with the appropriate parameters.

from pprint import pprint

from samplics.datasets import load_auto
from samplics.categorical.comparison import Ttest

Comparison of one-sample mean to a knowm mean

For this comparison, the mean of a continuous variable, i.e. mpg, is compared to a know mean. In the example below, the user is testing whether the average mpg is equal to 20. Hence, the null hypothesis is H0: mean(mpg) = 20. There are three possible alternatives for this null hypotheses:

Ha: mean(mpg) < 20 (less_than alternative)
Ha: mean(mpg) > 20 (greater_than alternative)
Ha: mean(mpg) != 20 (not_equal alternative)

All three alternatives are automatically computed by the method compare(). This behavior is similar across the four type of comparisons.

# Load Auto sample data
auto_dict = load_auto()
auto = auto_dict["data"]
mpg = auto["mpg"]

one_sample_known_mean = Ttest(samp_type="one-sample")
one_sample_known_mean.compare(y=mpg, known_mean=20)

print(one_sample_known_mean)


Design-based One-Sample T-test
 Null hypothesis (Ho): mean = 20
 t statistics: 1.9289
 Degrees of freedom: 73.00
 Alternative hypothesis (Ha):
  Prob(T < t) = 0.9712
  Prob(|T| > |t|) = 0.0576
  Prob(T > t) = 0.0288 

 Nb. Obs      Mean  Std. Error  Std. Dev.  Lower CI  Upper CI
      74 21.297297    0.672551   5.785503 19.956905  22.63769

The print below shows the information encapsulated in the object. point_est provides the sample mean. Similarly, stderror, stddev, lower_ci, and upper_ci provide the standard error, standard deviation, lower bound confidence interval (CI), and upper bound CI, respectively. The class member stats provides the statistics related to the three t-tests (for the three alternative hypothesis). There is additional information encapsulated in the object as shown below.

pprint(one_sample_known_mean.__dict__)

{'alpha': 0.05,
 'deff': {},
 'group_levels': {},
 'group_names': [],
 'lower_ci': 19.956904913980306,
 'paired': False,
 'point_est': 21.2972972972973,
 'samp_type': 'one-sample',
 'stats': {'df': 73,
           'known_mean': 20,
           'number_obs': 74,
           'p_value': {'greater_than': 0.02881433507499831,
                       'less_than': 0.9711856649250017,
                       'not_equal': 0.05762867014999661},
           't': 1.9289200809064198},
 'stddev': 5.785503209735141,
 'stderror': 0.6725510870764976,
 'upper_ci': 22.63768968061429,
 'vars_names': ['mpg']}

Comparison of two groups from the same sample

This type of comparison is used when the two groups are from the sample. For example, after running a survey, the user want to know if the domestic cars have the same mpg on average compare to the foreign cars. The parameter group indicates the categorical variable. NB: note that, at this point, Ttest() does not take into account potential dependencies between groups.

foreign = auto["foreign"]

one_sample_two_groups = Ttest(samp_type="one-sample")
one_sample_two_groups.compare(y=mpg, group=foreign)

print(one_sample_two_groups)


Design-based One-Sample T-test
 Null hypothesis (Ho): mean(Domestic) = mean(Foreign) 
 Equal variance assumption:
  t statistics: -3.6632
  Degrees of freedom: 72.00
  Alternative hypothesis (Ha):
   Prob(T < t) = 0.0002
   Prob(|T| > |t|) = 0.0005
   Prob(T > t) = 0.9998
 Unequal variance assumption:
  t statistics: -3.2245
  Degrees of freedom: 30.81
  Alternative hypothesis (Ha):
   Prob(T < t) = 0.0015
   Prob(|T| > |t|) = 0.0030
   Prob(T > t) = 0.9985 

   Group  Nb. Obs      Mean  Std. Error  Std. Dev.  Lower CI  Upper CI
Domestic       52 19.826923    0.655868   4.729532 18.519780 21.134066
 Foreign       22 24.772727    1.386503   6.503276 22.009431 27.536024

Since there are two groups for this comparison, the sample mean, standard error, standard deviation, lower bound CI, and upper bound CI are provided by group as Python dictionaries. The class member stats provides statistics for the comparison assuming both equal and unequal variances.

print("These are the group means for mpg:")
pprint(one_sample_two_groups.point_est)

These are the group means for mpg:
{'Domestic': 19.826923076923077, 'Foreign': 24.772727272727273}

print(f"These are the group standard error for mpg:")
pprint(one_sample_two_groups.stderror)

These are the group standard error for mpg:
{'Domestic': 0.6558681110509441, 'Foreign': 1.3865030562044942}

print("These are the group standard deviation for mpg:")
pprint(one_sample_two_groups.stddev)

These are the group standard deviation for mpg:
{'Domestic': 4.7295322086717775, 'Foreign': 6.50327578586491}

print("These are the computed statistics:")
pprint(one_sample_two_groups.stats)

These are the computed statistics:
{'df_eq_variance': 72,
 'df_uneq_variance': 30.814287872636015,
 'number_obs': {'Domestic': 52, 'Foreign': 22},
 'p_value_eq_variance': {'greater_than': 0.9997637712766184,
                         'less_than': 0.00023622872338158258,
                         'not_equal': 0.00047245744676316517},
 'p_value_uneq_variance': {'greater_than': 0.9985090924569335,
                           'less_than': 0.00149090754306649,
                           'not_equal': 0.00298181508613298},
 't_eq_variance': -3.663245852011623,
 't_uneq_variance': -3.2245353733260638}

Comparison of two means from two different samples

This type of comparison should be used when the two groups come from different samples or different strata. The group are assumed independent. Otherwise, the information is similar to the previous test. Note that, when instantiating the class, we used samp_type="two-sample".

two_samples_unpaired = Ttest(samp_type="two-sample", paired=False)
two_samples_unpaired.compare(y=mpg, group=foreign)

print(two_samples_unpaired)


Design-based Two-Sample T-test
 Null hypothesis (Ho): mean(Domestic) = mean(Foreign) 
 Equal variance assumption:
  t statistics: -3.6308
  Degrees of freedom: 72.00
  Alternative hypothesis (Ha):
   Prob(T < t) = 0.0003
   Prob(|T| > |t|) = 0.0005
   Prob(T > t) = 0.9997
 Unequal variance assumption:
  t statistics: -3.1797
  Degrees of freedom: 30.55
  Alternative hypothesis (Ha):
   Prob(T < t) = 0.0017
   Prob(|T| > |t|) = 0.0034
   Prob(T > t) = 0.9983 

   Group  Nb. Obs      Mean  Std. Error  Std. Dev.  Lower CI  Upper CI
Domestic       52 19.826923    0.657777   4.743297 18.506381 21.147465
 Foreign       22 24.772727    1.409510   6.611187 21.841491 27.703963

print("These are the group means for mpg:")
pprint(two_samples_unpaired.point_est)

These are the group means for mpg:
{'Domestic': 19.826923076923077, 'Foreign': 24.772727272727273}

print("These are the group standard error for mpg:") 
pprint(two_samples_unpaired.stderror)

These are the group standard error for mpg:
{'Domestic': 0.6577769784877484, 'Foreign': 1.409509782735444}

print("These are the group standard deviation for mpg:")
pprint(two_samples_unpaired.stddev)

These are the group standard deviation for mpg:
{'Domestic': 4.7432972475147, 'Foreign': 6.611186898567625}

print("These are the computed statistics:")
pprint(two_samples_unpaired.stats)

These are the computed statistics:
{'df_eq_variance': 72,
 'df_uneq_variance': 30.546277725121076,
 'number_obs': {'Domestic': 52, 'Foreign': 22},
 'p_value_eq_variance': {'greater_than': 0.9997372920330829,
                         'less_than': 0.00026270796691710003,
                         'not_equal': 0.0005254159338342001},
 'p_value_uneq_variance': {'greater_than': 0.9983149592187673,
                           'less_than': 0.0016850407812326069,
                           'not_equal': 0.0033700815624652138},
 't_eq_variance': -3.6308484477318372,
 't_uneq_variance': -3.1796851846684073}

Comparison of two paired means

When two measures are taken from the same observations, the paired t-test is appropriate for comparing the means.

two_samples_paired = Ttest(samp_type="two-sample", paired=True)
two_samples_paired.compare(y=auto[["y1", "y2"]], group=foreign)

print(two_samples_paired)


Design-based Two-Sample T-test
 Null hypothesis (Ho): mean(Diff = y1 - y2) = 0
 t statistics: 0.8733
 Degrees of freedom: 73.00
 Alternative hypothesis (Ha):
  Prob(T < t) = 0.8073
  Prob(|T| > |t|) = 0.3853
  Prob(T > t) = 0.1927 

 Nb. Obs         Mean   Std. Error  Std. Dev.      Lower CI  Upper CI
      74 4.054054e-07 4.641962e-07   0.000004 -5.197363e-07  0.000001

varnames can be used rename the variables

y1 = auto["y1"]
y2 = auto["y2"]

two_samples_paired = Ttest(samp_type="two-sample", paired=True)
two_samples_paired.compare(
    y=[y1, y2], 
    varnames= ["group_1", "gourp_2"], 
    group=foreign
    )

print(two_samples_paired)


Design-based Two-Sample T-test
 Null hypothesis (Ho): mean(Diff = group_1 - gourp_2) = 0
 t statistics: 0.8733
 Degrees of freedom: 73.00
 Alternative hypothesis (Ha):
  Prob(T < t) = 0.8073
  Prob(|T| > |t|) = 0.3853
  Prob(T > t) = 0.1927 

 Nb. Obs         Mean   Std. Error  Std. Dev.      Lower CI  Upper CI
      74 4.054054e-07 4.641962e-07   0.000004 -5.197363e-07  0.000001