Using Statistics for Data Driven Decisions
The need to make a decision is clear indication of uncertainty. When there is an uncertainty there is a risk and we all want to decrease the risk. Data analysis is one of the tools that we use to decrease the uncertainty.
With digital era, there is more data than we can all handle, and everybody has an access to some kind of data. Unfortunately, not everybody has enough know-how to create and use data so there are lots of wrong insights/information is being created intentionally or unintentionally. We need to make sure that we are not trying to validate preconceived ideas. Preconceived ideas create bias and that bias can hurt your analysis. We, analytics professionals, especially when we are young, we might be trapped into this situation by our senior managers but we need to be careful to do what we are supposed to do. We all have one top goal, which is to get to the TRUTH as much as possible. We all want to find SECRET PATTERNS that will help us to understand our data set better. Don’t let others to drive you away to the cheating path.
Data mining tries to find meaningful patterns and statistics tries to confirm these found patterns. If you think about data and information, you will realize that it is changing all the time. Every second there is a new click, new customer, new sales and so on. When we draw some conclusion from our data analysis with the data we have, are we sure that we can draw same conclusion with the new data that is coming in? That’s the dilemma…
One thing we always have to remember: We are always working with SAMPLE data even if we are analyzing all traffic, all customers, all population. Because each time we draw conclusion from data, it belongs to that particular instant. We really don’t know if the same conclusion will happen next minute, or next day or next year. With data analysis we are all trying the predict the future. So sometimes it also makes sense to work with smaller groups too if we want to make predictions.
If you have no statistical background or education, you can start by simple statistics. I would start with The null hypothesis. It simply suggests that observed difference is simply due to chance. The purpose of this technique is to make sure that we don’t simply accept any difference automatically with the face value. We want to make sure this fact can’t be disproved.
According to (https://explorable.com/null-hypothesis)
The null hypothesis (H 0) is a hypothesis which the researcher tries to disprove, reject or nullify. The ‘null‘ often refers to the common view of something, while the alternative hypothesis is what the researcher really thinks is the cause of a phenomenon.
In order to quantify the null hypothesis we use P Values.
According to (http://www.statsdirect.com/help/basics/p_values.htm)
The P value, or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H 0) of a study question is true – the definition of ‘extreme’ depends on how the hypothesis is being tested.
If p value is big, it means the observed difference can happen more and this is expected and the null hypothesis true-means there is actually no real difference
If the p value is small, the observed difference might be real since this is not expected. For instance if p is 5%, this means observed difference might be trusted within 95% confidence level. This 95% is called q value.
While Google Analytics helped everybody to access data for your online assets, it also created wrong assumptions about analytics and talent. Some people who had an access to Google Analytics started to fill Linkedin job descriptions as data analysts or analytics gurus. Since many senior management are new to the data concept, they kept hiring and expecting things from these people who only can read things from Google Analytics.
When data analysis is not supported by statistics you might have serious problems or you might not have any significant results. If you are a new analyst or if you are a manager who is making decisions based on data , you should test all the significant insights by statistical tests.
More to come…