How accurate is your sampling method? How well do the results represent the population at large?
These are pertinent questions when you’re analyzing the quantitative data from your recent product testing research. You will never get a perfect sample, but you can make use of descriptive statistics techniques to get an idea of how close you’ve come.
Two key concepts in descriptive statistics are the confidence level and the confidence interval. People often confuse the two because they’re closely linked, and it’s hard to define one without defining the other. (Even career scientists get confused, so you have permission to cut yourself some slack if your grasp of these concepts is shaky.)
In this article, we’ll attempt to make the difference between confidence level vs. interval clear as day. But first, let’s first get a sense of what they’re used for.
You can’t test the whole world. If you could really sample everyone and everything, and you could calculate the mean or average value, then you’d know the value of the “true population parameter” (also called the “true mean”). But you can’t, so you must rely on approximations.
The difference between the mean value of your sample and the true population parameter is what’s known as the sampling error. With a good sampling method, you might be able to get quite close to the true mean, but there’s always going to be some degree of sampling error that you need to account for. If you don’t account for it, then the inferences you make might be bogus.
Thankfully, with some statistical analysis, you can get to the heart of this mystery. By selecting a confidence level that’s appropriate to your study and calculating the associated confidence interval, you’ll get the necessary insight into how representative your samples are.
Each time you take a sampling of the general population, including one that is highly representative, the sample parameter (or mean) won’t match the true parameter exactly. If it does, that’s just a very rare coincidence. Variation among separate sample parameters is the norm.
You can, however, use your sample data to establish a certain interval within which the true mean is likely to fall a certain percentage of the time if you were to continue taking (and analyzing) many, many samples. This interval, which stretches both above and below the sample’s mean value in equal measure, is known as the confidence interval, and the percentage of times that the true mean should fall within this interval is the confidence level.
You get to choose the confidence level based on what’s appropriate for your study (more on this later).
The confidence interval depends on the sample mean and the calculated margin of error (which you will both add to and subtract from the sample mean, creating a range that includes this value). Depending on the level of precision required (i.e., the confidence level), the width of this interval will vary.
Here we have:
Confidence interval (CI) = sample mean ± margin of error
What’s the margin of error, though? This value depends on three things: the confidence level (which you choose), the population standard deviation, and the sample size.
The equation for the margin of error is as follows. The symbol z* refers to the critical value associated with your confidence level (you can check a table of confidence levels and critical values to get the right number). The Greek letter σ represents the population standard deviation, and n is the sample size.
Margin of error = z* · σ / √n
You are multiplying the confidence level’s critical value by the population standard deviation and then dividing it by the square root of the sample size. You then need to add and subtract the resulting value from the sample mean.
We now have:
Confidence interval (CI) = sample mean ± z* · σ / √n
It looks complicated, but you’re really just plugging in values and doing some basic arithmetic!
Unless you’re comfortable with statistics, it might not be obvious from the equation above which statistical elements will stretch or shrink the interval. Here’s a breakdown:
How do you choose a confidence level for your data?
Let’s start off by saying there’s one confidence interval you really shouldn’t choose, and that’s 100%. You would then have zero information as to where the true population parameter likely lies. The only way to have 100% confidence is to include all the possible values.
What we have here is a balance between precision and confidence. The more “confident” you are, the less precise your estimate will be. If you want to be 100% confident, then your confidence interval has to stretch to include every single foreseeable outlier. It’s like saying you’re 100% confident that the buried treasure is somewhere on Earth, which is no help at all. You could be 99% confident that it’s in Oregon, but there’s that pesky 1% that says it might be anywhere else — perhaps even at the bottom of the ocean.
Researchers often choose a confidence level of 95% because this offers a good precision-confidence balance. The chance of being “wrong” (or, more technically, the percentage of the time that the true population parameter will be expected to fall outside the confidence interval) would then be 5%, which statisticians usually find acceptable. (Hence the T-shirts emblazoned with phrases like “Statistics: The only field where you can be 95% sure and still be wrong.”)
In the grab bag of confusing descriptive statistics terms, there’s another one that often gets tossed around in relation to confidence interval vs. confidence level: the significance level. This relates to a very important concept in statistics, which is statistical significance.
Let’s say that you’re comparing the average ratings by 200 Highlighters for two different brands of cat litter as part of a sensory test. With Litter Brand A getting an average rating of 4.5/5 and Litter Brand B getting an average rating of 3.7/5, what’s the likelihood that the rating difference was just a fluke, and a re-test would yield two ratings that were exactly the same?
This is the essence of statistical significance. There’s always a chance that the sample you selected just happened to have a really unbalanced distribution of tested values relative to the general population the sample is supposed to represent. However, greater degrees of skew get increasingly unlikely, so if you see a strong difference between the average litter brand ratings, then that difference is probably seen in the general population also.
With the Highlight product intelligence platform, your Scorecard shows the statistical significance of your results.
The significance level, written as α, is the probability of rejecting what’s called the “null hypothesis” when it’s actually true. In other words, it represents the risk of concluding that there’s a statistically significant difference when there isn’t one. (The null hypothesis is the one that says there’s no difference.)
We won’t get into a detailed discussion of confidence level vs. significance level here, but we’ll just say that comparing confidence intervals can help determine statistical significance.
Let’s revisit the original conundrum: sampling the general population in a way that represents it as closely as possible. This is something Highlight has a lot of experience with. Highlight is constantly curating and cultivating an engaged, nationwide community of product testers who are motivated to give honest, thorough feedback.
Highlight users can build their ideal audience within the platform to reflect the demographics and psychographics of their target audience. If the user is a massive brand that’s in every grocery store in America, they might want their testing audience to reflect the general population nationwide. If they’re a maker of specialty products, they might want to test with a specialized, “low incident rate” audience, such as women who are 38 weeks pregnant and open to taking post-partum supplements (read the case study!).
Because of considerations like confidence interval vs. level, it’s important to work with a consumer product testing company that enables you to 1) test a large-enough sample size, and 2) test with an audience that authentically represents your consumer.
Ultimately, the goal is to make good decisions based on authentic testing results, and authentic testing results can only come from well-constructed, representative samples!