We continue with the tools for analyzing and prioritizing problems, and today we are going to learn about one of the most used tools: The scatter diagram or scatter plot. How to Make a Scatter Plot in Excel? It is perhaps one of the graphs that you learn first in statistical training, so you already have an idea of its importance.
We are going to understand what a scatter diagram is, how it is made and of course, an application example to ensure learning.
What is a Scatter Plot in Excel
Table of Contents
Before answering this question, it is necessary to answer what is dispersion. The definition of dispersion has multiple answers, as wikipedia shows us: Dispersion . We are left with the mathematical definition:
Dispersion is defined as the degree of distance of a set of values from its mean value.
From this definition, the measures of dispersion that we learned in college statistics class are derived: Range, variance, deviation, covariance, correlation coefficient, etc.
Now, the scatter plot, also known as the scatter plot or graph correlation graph, consists of the graphical representation of two variables for a set of data. In other words, we analyze the relationship between two variables, knowing how much they affect each other or how independent they are from each other.
In this sense, both variables are represented as a point in the Cartesian plane and according to the relationship that exists between them, we define their type of correlation.
Correlation Types in a Scatter Plot
Based on the behavior of the study variables, we can find 3 types of correlation: Positive, negative and null.
It occurs when one variable increases or decreases and the other also increases, respectively. There is a proportional relationship. For example, for a car salesman, if he sells more cars (variable 1), he will earn more money (variable 2).
It occurs when one variable behaves in the opposite way or the other, that is, if one variable increases, the other decreases. There is a proportional inverse relationship. For example, for the construction of a building, the more workers are constructing a building (variable 1), the less time it will take to get it ready (variable 2)
If you don’t find a behavior between the variables, there is a null correlation.
These are, then, the most visible types of correlation. Although if we look at it from a perspective that evaluates how strong or weak the correlation is, we find another classification.
The correlation coefficient in a scatter plot
The correlation coefficient describes the relationship between two variables, in other words, knowing this number we know if the correlation is positive or negative and how strong or weak it is. The letter r is used to express it, let’s see how:
- r = 1
The correlation is perfect positive. If one variable grows, the other also grows at a constant rate. It is a direct relationship, so if we draw an adjustment line it will go through each and every one of the points.
- 0 <r <1
It is when r is between 0 and 1 without becoming 0 and 1. It is a positive correlation. The degree of closeness of 1 defines how direct and proportional the relationship between the two variables is, therefore the closer it is to 0, the weaker its negative correlation will be.
- r = 0
The correlation is null, that is, there is no linear relationship between both variables. What if you try looking for another type of relationship.
- -1 <r <0
It is when r is between -1 and 0 without becoming –1 and 0. It is a negative correlation. The degree of closeness to -1 defines how inverse and proportional the relationship is between both variables, therefore the closer it is to 0, the weaker its negative correlation will be.
- r = -1
The correlation is perfect negative. If one variable grows, the other will decrease in constant proportion. It is a direct and inverse relationship, therefore an adjustment line will touch all the plotted points.
A clearer example of all the aforementioned is shown by wikipedia in an image: Types of correlation coefficient
How to make a scatter plot step by step
- Step 1 : Determine what the situation is. If we do not understand what is happening, we will not be able to establish the variables to study.
- Step 2 : Determine the variables to study. If you have already determined the variables to study, it is because you believe that there may be a relationship between them that allows you to characterize the situation.
- Step 3 : Collect the data of the variables: If you already have it, perfect. If not, we define a period of time to get the data of the variables previously defined. Remember that the data of the two variables must be there in the same period of time.
- Step 4 : Locate the values on the respective axis. In general, the independent variable is one that does not influence by the other and is located on the x-axis . The dependent variable that is affected by the other variable is located on the y-axis. Thus, we proceed to locate the values in the Cartesian plane according to their variable (x, y)
- Step 5 : Determine the correlation coefficient: The correlation coefficient should reflect in the form that the scatter plot takes. It is the quotient of the covariance and the multiplication of the standard deviation of the two variables. With excel we can calculate it very simply.
- Step 6 : We analyze: Based on the coefficient and the graph, we define the relationship of the two variables and make the relevant decisions.
Scatter plot example
Let’s see from a business problem, a resolved example of a scatter diagram for the quality area.
Imagine that a lithographic company is opening a new production area for poster printing, and at this moment it is doing all the trials and tests to determine the amount of ink of each color that the machines should have.
As an initial test, they have decided to establish the ratio of printing errors according to the degree of filling of the ink containers of the machine.
Well, defined the situation, we start from step 2 :
The variables to study for this example of a quality scatter plot are:
- Ink quantity in liters
- Number of printing errors
For step 3 , we begin to collect the variables. In our case, the quality control department does 50 runs or tests for 5 continuous days.
The results, below:
For step 4 we place the axes according to the variables we have. Since the number of errors influence by the amount of ink, we place it as the y-axis. Therefore, the x-axis is the amount of ink. Now yes, we do the scatter plot.
Step 5 : We determine the correlation coefficient. In excel we calculate it with the formula COEF.DE.CORREL. For our worked example we get 0.94, is this reflected in the graph? Of course, yes, notice that the dots are very close to each other, which indicates that the values are strongly correlated, that is, the relationship between an increase in the liters of ink, directly impacts the number of errors in the poster printing. In fact it becomes clear if we look at the table, there are no big jumps between data if we look at the number of errors.
Step 6 : We analyze. Obviously there is a strong positive relationship between the amount of ink that the machine tube has with and the number of errors generated in the printing of posters. A next step for a problem of this type would be to find a way to take advantage of the remaining capacity of the machine, for example to use more and smaller tubes.
Hope you might have understood How to Make a Scatter Plot in Excel with easy steps. If you still facing problems in making scatter Plot in Excel then let us know in comment.