CEO’s Perspective to Statistical AnalysisSampling Techniques

This is the second in a series of white papers published by KINDUZ Consulting on Statistical Analysis. This paper is an introduction to population and samples, why we use samples, and the various sampling strategies that can be applied.

[wpdm_package id=’12734′]

Context for the Reader

You are the Global Chief Executive of KINDUZ Corp., a global organization with revenue of USD 10 billion. The vision of KINDUZ is “Creating Universal Prosperity”. In line with its vision, KINDUZ has established itself in multiple industries including Pharmaceuticals, Oil, and Gas, IT/ITES, Automotive manufacturing and Cement industries.

You and your entire team believe that Creating Universal Prosperity is achieved through alignment of organizational goals to individual goals of stakeholders (Employees, Stockholders, Customers, and Societies).

Approach of the Whitepaper

This whitepaper is one in a series of whitepapers published by KINDUZ to enable deeper understanding and application of tools and techniques by CEOs enabling them to deliver sustainable outcomes.

The learning in the whitepaper is structured around cases, where the concepts and its applications are explained through the case description, analysis, and solutions.

Executive Summary

CEOs take important decisions about their organizations based on the questions they ask, the data provided in response to these questions and the analysis of this data. Typically, the data that is used for analysis is a small portion of the entire data. This entire data set is called as population and the small portion selected is called as sample.

It is important to understand why we sometimes use samples instead of the population, and having decided to use samples, what sampling technique will give us the most representative picture.

To understand God’s thoughts,
one must study statistics… the measure of his purpose.”

– Florence Nightingale

This white paper first enables a CEO to understand the difference and similarity of population and sample. Then we discuss the following sampling techniques which are categorized as Probability sampling:

Simple random sampling
Stratified sampling
Systematic sampling
Cluster sampling (single stage and two stage)

In this paper, we have focused exclusively on probability sampling techniques. The primary reason for this is that non-probability sampling does not provide an equal probability for all units to be selected and therefore it is not apt for holistic decision making process for a CEO, because the data could be biased.

For example, KINDUZ Pharma had to recall an entire batch of medicines because of assay stability failure. The team at KINDUZ Pharma wants to study why the batch failed on assay stability. If the team collects random samples from batches (including batches that passed the assay stability and the batches that did not pass the assay stability) to study the impact of process variables on assay stability, then it is Probability sampling. If the team collects data samples from the batch that failed on assay stability, then it is non-probability
sampling. The samples collected from the failed batch is not representative of the entire population (all batches produced in a specified time period) and will not enable the KINDUZ Pharma team to holistically find the causes of assay stability failure.

This paper provides five cases that explain population, samples and probability sampling techniques. The situations in which these sampling techniques should be used are summarized in the table below:

Sampling Technique	Situation to use in
Simple Random	High risk of data manipulation Speed of sampling more important than precision
Stratified	Population needs to be divided into homogenous subpopulations to identify the specific problem area When the population shows periodicity
Systematic	Population does not show any patterns Low risk of data manipulation
Cluster	When dealing with a very large population where a large sample size is required (relative to population) and not all members of the population are equally accessible

Based on these cases, here is a list of Do’s and Don’ts for you to leverage:

Do's:

Check whether your baselines are based on population data or sample data
Always ask about the sampling technique used and the sample size when presented with data and/or it's analysis
Ensure the right sampling technique is used and the right sample size is chosen before undertaking any analysis and decisions
Collect samples at the lowest granularity to minimize the risk of sampling error

Don'ts:

Assume all sampling techniques to have the same precision
Accept simple random sampling as the default practice
Confuse stratified sampling with cluster sampling (this is a common mistake)

>>Read more