Datasets

The module datasets allows the user to load the datasets used in this tutorial. Note that the datasets are only used to illustrate the syntax and APIs of Samplics. Many of the datasets in this tutorial are subsets of actual samples but DO NOT represent these samples. The datasets were subseted from existing samples to reduce the size of the files.

Tip

a dataset can be loaded using the function load_xxx() where xxx indicates the dataset name.

For example, load_psu_frame() loads the PSU frame dataset.

These functions return a dictionary with the following members: name, description, nrows, ncols, design, source, and, data. The current list of datasets is the following:

Let’s assume we want to load the PSU frame, we could write the following code.

import samplics

# Import the appropriate class.
from samplics.datasets import load_psu_frame

# Load the dataset and its metadata into 
# the variable (dictionary) psu_frame_dict
psu_frame_dict = load_psu_frame()

# Store the datasets in the variable psu_frame (optional)
psu_frame = psu_frame_dict["data"]
Important

The datasets should not be used for any statistical analysis.
No number shown in this tutorial shall be used for any statistical analysis.
All the examples are exclusively for illustrating the syntax and APIs of Samplics.

References

Battese, G E, R M Harter, and W A Fuller. 1988. “An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data.” J. Amer. Statist. Assoc. 83: 28–36.
Gonzalez Jr, J F, Krauss N, and Scott C. 1992. “Estimation in the 1988 National Maternal and Infant Health Survey.” In Proceedings of the Section on Statistics Education, edited by American Statistical Association, 343–48. https://doi.org/ 10.25080/Majora-92bf1922-00a .
McDowell, A, A Engel, J T Massey, and K Maurer. 1981. “Lan and Operation of the Second National Health and Nutrition Examination Survey, 1976–1980.” Vital and Health Statistics 1 (15): 1–144.