6 Most Common Data Mining Mistakes
Six common data mining mistakes that lead to wrong conclusions. Avoid analysis pitfalls in your research.
Introduction
Many companies conduct surveys in an effort to collect mountains of data that they will then go through separately in order to uncover new information. Known as “data mining”, the basic idea is that there are questions that the data answers that you may not have considered when you first created the dataset, and by going through that data, it’s possible to uncover those answers with the right analysis. While data mining is a very valuable way to uncover potentially interesting information, it’s also prone to mistakes. The following are several very common data mining mistakes that you’ll need to avoid in order to improve the quality of your analysis.
Data Mining Mistakes
- Small Samples One of the main problems with data mining is that when you narrow down data in any way, you may be creating a sample size that is too small to draw any accurate conclusions. For example, if you’re looking at demographics data and narrow down the results to women between the ages of 35 and 37 with a 20-year-old child living below the poverty level, any data you have on those subjects are likely of a sample so small that the conclusions you draw could be based on random noise. It’s important to remember that any time you narrow down data, your sample shrinks with it, and you never want to draw inaccurate conclusions based on those small samples.
- Originally Problematic Data Data mining is also only effective if the data itself is reliable. An example: in baseball, defensive data is notoriously unreliable. It’s very difficult to figure out if a player got a good jump at the ball, ran the right routes, and how many “runs” a player saved or earned. When a player made an amazing catch, did they save a single? A double? A triple? Or did they only make that amazing catch because they were so slow they couldn’t catch up to it in time, and a normal fielder would have caught it? Defensive data is exceedingly prone to errors, and so too are statistics to measure defense. Often data mining runs into similar problems.
- Overreacting to Results Another common problem is overreacting to the results of your data mining efforts. Uncovering something within the data is only the first step. You still need to make sure that what you’re uncovering is accurate, or can be generalized. Often that means follow up studies, other forms of evaluation, and more. The results of your analysis can be very exciting, but they are not the end of the road ahead.
- Correlation and Causation It’s also important to make sure that your data mining efforts are finding the right information. For example, correlation and causation. You may find in the data that those with low incomes are less likely to shop at your company, but you cannot necessarily understand that data until you discover it further. Is it something about your company? Is it because they’re low income? There is simply no way to know until you’ve created further studies.
- Being Closed Minded The purpose of data mining is to allow the data to speak to you. Often that means shutting off the instinct in your brain that wants to answer the question for you, and instead opening yourself up to letting the data speak to you. You need to be willing to accept whatever you find within the data, not try to fight it or assume it is an anomaly.
- Asking Obvious Questions When you’re going through the data, you should also be willing to ask unusual questions. Don’t simply query for the data that answers an easy result. Try to find out if there is random information in the data that may be meaningful, and don’t be afraid to use analysis techniques that make no judgments about the data at all.
The Data Mining Process
Data mining can be an invaluable tool, but it’s also very difficult. It requires training, an excellent dataset, and the experience to understand what’s useful information and what is not. Don’t be afraid to try data mining large datasets, but also don’t fall victim to many of the most common data mining mistakes.
Key Takeaways
- Introduction
- Data Mining Mistakes
- The Data Mining Process
Related Articles
3 Potential Problems of a Long Survey
Three problems with long surveys: dropout rates, data quality issues, and respondent fatigue that harm your research results.
Common PitfallsBeyond Product Quality Investment
Why investing beyond product quality matters for customer satisfaction. Balance R&D spending with service improvements.
Common PitfallsVideo Content in Surveys
Pros and cons of using video content in surveys. When multimedia enhances engagement vs when it hurts response rates.
Ready to Get Started?
Create your first survey today with our easy-to-use platform.