Curve Fitting In Excel For Mac
Section1.5Using Excel to find best-fit curves
Excel’s Method for Fitting Exponential Trendline, 1 of 2 “The exponential model creates a trendline using the equation y = c. ebx. Excel uses a log transformation of the original y data to determine fitted values, so the values of the dependent variable in your data set must be positive.
¶Overview
In the sections 1.1 and 1.2 we looked at useful mathematical models and formulas that we anticipate seeing repeatedly in the business environment. If we are given equations that model the processes we are interested in, then this approach works. What happens though if we are not given equations? Many important functions in business are quite often defined by data. Examples include past sales, material costs, and consumer demand.
If we are given a data set, we can find a best fitting curve. A straightforward approach is to assume that the data represents the output of a nice formula. In real life applications we will often see that so-called noise can complicate the situation. (For example, if I am looking at sales at a fast food restaurant, our model will have noise from traffic jams and bad weather outside.) For the purpose of this course we will assume that the data will be reasonably nice, although some noise may be evident. The problem of producing a best fitting curve to data can be broken into two pieces:
We need to decide what kind of curve, or what model we want to use.
We want to be able to set the parameters (the constants) in the model to give the best fit.
Coming up with a theoretical reason why we want to use a particular model in a given case forms the content of a large number of your business courses, both courses you have already taken and courses you are yet to take. The models that come up repeatedly in the theoretical courses are given names and used without redoing the theoretical foundation for the model. (This is why we introduced the normal distribution and the logistic growth function, neither of which looks like a simple equation.) In this course, we will be happy with simple heuristic arguments on which model to choose.
The second half of the problem is deciding how to choose the parameters to give the curve that does the best job of fitting the data. A moment of reflection shows deciding on the correct definition of “best fitting” is a nontrivial task beyond the scope of this course. For the time being we will accept the standard definition
We will come back to that definition later in the course, when we know more calculus, but for now we simply note that it is the standard definition, and is used by Excel. Instead, we will focus on using Excel to produce a best fitting curve of the appropriate model. Excel has a preprogrammed feature that will find the best fitting equation for a data set for a select number of functions:
Linear model
Exponential model
Polynomial model
Logarithmic model
Power model
We will show how to find an equation for a data set, assuming we know what model would be the best one to represent the data.
Best fitting linear curves
For a first example, we are running a widget factory and have the following data on employee performance:
(A parenthetical note: In economics, widget is a placeholder name for a generic manufactured device. It is only in recent times that it has also become a small computer GUI unit.)
We would like a formula for widgets produced as a function of hours worked. Since we can see two entries each, for 36, 43, and 44 hours worked, there cannot be a function that hits all our data exactly. While we expect a linear function, we are not surprised if there is random noise, as a worker may take a break, or be particularly focused on a given day. We start by creating a scatterplot for my data.
We right click (control-click on a mac) on one of the data points and we get a contextual menu. We select Add Trendline.
When adding a trend line, we need to select from a number of options. The first option concerns the mathematical model we want to choose. Given that we suspect the number of widget produced will be roughly proportional to the hours worked, we want to use a linear model, so we make that choice. Under options, we want to display the equation on the chart.
We have added a linear trend line to the graph and can also see the equation for the line. We could use that equation to plan how many hours we want our workers on the job based on the number of widgets we expect to sell.
Having found a best fitting line, I want to copy the equation back into my spreadsheet and to be able to compare the values in my data with the projections from my equation. You should notice that the equation Excel produces in the chart is written in standard mathematical notation, while the corresponding equation in cell B3 is in Excel notation. (In Excel notation we need a symbol for multiplication rather than simply putting a number and variable together. In Excel notation, we also use a cell reference, B1, rather than a variable, x.)
Checking and improving our equations
When finding the best fitting curve to data we have gathered, we need to pay attention to the model we have chosen and to the range to which we want to apply it. In our example, the linear fit looks pretty good. However we should be careful about using it on too wide a domain. According to our model, a worker who works no hours produces 12.52 widgets a week, which is obviously silly. In the other direction it predicts that a worker who worked 168 (= 7 x 24) hours a week would produce almost 970 widgets, instead of predicting a collapse from exhaustion.
The other issue is the choice of a model. We chose a linear model. An argument could easily be made for a proportional model. (A worker who works no hours produces no widgets.) We can switch to the proportional model by setting the y-intercept to 0 in options for the trend line. Then the equation is
instead of our original equation of
We should also be careful about trying to get a better fit by using an inappropriate model. In our case, we can get a better fit by allowing the curve to be a 6th degree polynomial. However the resulting equation does not make sense. It predicts that a worker will produce about quarter million widgets with a 1-hour work week, and -1500 widgets with a 55-hour work week.
Fitting the Consumer Price Index (CPI) to a best fitting curve; an extended example
For our second example, we will look at the consumer price index and try and fit it to a model. This example will illustrate several issues we need to keep in mind when building models. We obtained data for the consumer price index from http://inflationdata.com/inflation/Consumer_Price_Index/HistoricalCPI.aspx.
The data from 1960 to 2011 is in the worksheet Section1-5-Examples.xlsx.
Since we expect prices to rise as a percentage of the current prices, we expect the CPI to be modeled by an exponential curve. We start by selecting the data, producing a scatterplot, and adding a best fitting curve using an exponential model. We will always select the option to show the equation on the chart.
This first attempt gives an exponential formula, but it is unsatisfactory for a number of reasons.
That constant only shows one significant digit, which is not enough to make meaningful predictions.
The font size is too small to easily read off the resulting equations.
The constant coefficient is ridiculously small because it gives the projected value of the index in the year 0. Another way of thinking about this is that the values we are evaluating this exponential function at run in the thousands!.
The graph does not look like a very good fit. The plot of the numbers actually looks as though it represents three different graphs.
We will work through the problems one at a time.
The first problem is that the equation Excel has given us does not have enough significant digits to make useful predictions. We want to right click on the equation, select “Format Trendline Label”. We are given a dialog box that lets us make formatting options. Since the lead coefficient is so small, we want the numbers formatted in Scientific notation. We choose 4 digits beyond the decimal point in that notation.
This gives us a better equation. It should be noted that our pictures in this book use the font option in the formatting to use a larger sized font.
The next issue to deal with is adjusting the year. Looking at the raw data, the CPI was 100 sometime in 1983. Thus we simply add an extra column to our spreadsheet where the adjusted year is the current year minus 1983. In our graph, we also adjust the labels so a reader can still understand our chart.
Now we want to look at the more serious question, the one that says the model does not fit very well. Looking at our data, the inflation rate seems to fall into roughly 3 blocks, the years before 1973, the years from 1973-1983, and the years after 1983. We would want to go back to our economics classes and find an argument that says this division of years is reasonable. Using the same menu that lets us add a trend line, we can edit the source data. We want to restrict to the years after 1983. In our case, that means restricting to rows 1 to 30.
This breaks the data into two pieces. The first piece is the period from 1983 till 2011. As we see, the exponential model fits quite well in that case.
The second piece is the period from 1973 till 1982. Once again, the exponential model fits quite well over that period. Notice that the exponent is quite different in the two periods.
The obvious question that arises is to figure out what happened in 1983 that caused the economic model to shift. That question is beyond the scope of this course.
ExercisesExercises: Using Excel to find best fit curves
¶Excel has a limited set of models that can be used for trend lines to automatically fit curves to data. In later sections we will look at how to we can use calculus to find best fitting curves for other models. Until we develop those techniques, we can make a guess at parameters that will make curves fit.
Projects:
Section1.5Using Excel to find best-fit curves
¶Overview
In the sections 1.1 and 1.2 we looked at useful mathematical models and formulas that we anticipate seeing repeatedly in the business environment. If we are given equations that model the processes we are interested in, then this approach works. What happens though if we are not given equations? Many important functions in business are quite often defined by data. Examples include past sales, material costs, and consumer demand.
If we are given a data set, we can find a best fitting curve. A straightforward approach is to assume that the data represents the output of a nice formula. In real life applications we will often see that so-called noise can complicate the situation. (For example, if I am looking at sales at a fast food restaurant, our model will have noise from traffic jams and bad weather outside.) For the purpose of this course we will assume that the data will be reasonably nice, although some noise may be evident. The problem of producing a best fitting curve to data can be broken into two pieces:
We need to decide what kind of curve, or what model we want to use.
We want to be able to set the parameters (the constants) in the model to give the best fit. Canon powershot a480 digital camera.
Coming up with a theoretical reason why we want to use a particular model in a given case forms the content of a large number of your business courses, both courses you have already taken and courses you are yet to take. The models that come up repeatedly in the theoretical courses are given names and used without redoing the theoretical foundation for the model. (This is why we introduced the normal distribution and the logistic growth function, neither of which looks like a simple equation.) In this course, we will be happy with simple heuristic arguments on which model to choose.
The second half of the problem is deciding how to choose the parameters to give the curve that does the best job of fitting the data. A moment of reflection shows deciding on the correct definition of “best fitting” is a nontrivial task beyond the scope of this course. For the time being we will accept the standard definition
We will come back to that definition later in the course, when we know more calculus, but for now we simply note that it is the standard definition, and is used by Excel. Instead, we will focus on using Excel to produce a best fitting curve of the appropriate model. Excel has a preprogrammed feature that will find the best fitting equation for a data set for a select number of functions:
Linear model
Exponential model
Polynomial model
Logarithmic model
Power model
We will show how to find an equation for a data set, assuming we know what model would be the best one to represent the data.
Best fitting linear curves
For a first example, we are running a widget factory and have the following data on employee performance:
(A parenthetical note: In economics, widget is a placeholder name for a generic manufactured device. It is only in recent times that it has also become a small computer GUI unit.)
We would like a formula for widgets produced as a function of hours worked. Since we can see two entries each, for 36, 43, and 44 hours worked, there cannot be a function that hits all our data exactly. While we expect a linear function, we are not surprised if there is random noise, as a worker may take a break, or be particularly focused on a given day. We start by creating a scatterplot for my data.
We right click (control-click on a mac) on one of the data points and we get a contextual menu. We select Add Trendline.
When adding a trend line, we need to select from a number of options. The first option concerns the mathematical model we want to choose. Given that we suspect the number of widget produced will be roughly proportional to the hours worked, we want to use a linear model, so we make that choice. Under options, we want to display the equation on the chart.
We have added a linear trend line to the graph and can also see the equation for the line. We could use that equation to plan how many hours we want our workers on the job based on the number of widgets we expect to sell.
Having found a best fitting line, I want to copy the equation back into my spreadsheet and to be able to compare the values in my data with the projections from my equation. You should notice that the equation Excel produces in the chart is written in standard mathematical notation, while the corresponding equation in cell B3 is in Excel notation. (In Excel notation we need a symbol for multiplication rather than simply putting a number and variable together. In Excel notation, we also use a cell reference, B1, rather than a variable, x.)
Checking and improving our equations
When finding the best fitting curve to data we have gathered, we need to pay attention to the model we have chosen and to the range to which we want to apply it. In our example, the linear fit looks pretty good. However we should be careful about using it on too wide a domain. According to our model, a worker who works no hours produces 12.52 widgets a week, which is obviously silly. In the other direction it predicts that a worker who worked 168 (= 7 x 24) hours a week would produce almost 970 widgets, instead of predicting a collapse from exhaustion.
The other issue is the choice of a model. We chose a linear model. An argument could easily be made for a proportional model. (A worker who works no hours produces no widgets.) We can switch to the proportional model by setting the y-intercept to 0 in options for the trend line. Then the equation is
instead of our original equation of
We should also be careful about trying to get a better fit by using an inappropriate model. In our case, we can get a better fit by allowing the curve to be a 6th degree polynomial. However the resulting equation does not make sense. It predicts that a worker will produce about quarter million widgets with a 1-hour work week, and -1500 widgets with a 55-hour work week.
Fitting the Consumer Price Index (CPI) to a best fitting curve; an extended example
For our second example, we will look at the consumer price index and try and fit it to a model. This example will illustrate several issues we need to keep in mind when building models. We obtained data for the consumer price index from http://inflationdata.com/inflation/Consumer_Price_Index/HistoricalCPI.aspx.
The data from 1960 to 2011 is in the worksheet Section1-5-Examples.xlsx.
Since we expect prices to rise as a percentage of the current prices, we expect the CPI to be modeled by an exponential curve. We start by selecting the data, producing a scatterplot, and adding a best fitting curve using an exponential model. We will always select the option to show the equation on the chart.
This first attempt gives an exponential formula, but it is unsatisfactory for a number of reasons.
That constant only shows one significant digit, which is not enough to make meaningful predictions.
The font size is too small to easily read off the resulting equations.
The constant coefficient is ridiculously small because it gives the projected value of the index in the year 0. Another way of thinking about this is that the values we are evaluating this exponential function at run in the thousands!.
The graph does not look like a very good fit. The plot of the numbers actually looks as though it represents three different graphs.
We will work through the problems one at a time.
The first problem is that the equation Excel has given us does not have enough significant digits to make useful predictions. We want to right click on the equation, select “Format Trendline Label”. We are given a dialog box that lets us make formatting options. Since the lead coefficient is so small, we want the numbers formatted in Scientific notation. We choose 4 digits beyond the decimal point in that notation.
This gives us a better equation. It should be noted that our pictures in this book use the font option in the formatting to use a larger sized font.
The next issue to deal with is adjusting the year. Looking at the raw data, the CPI was 100 sometime in 1983. Thus we simply add an extra column to our spreadsheet where the adjusted year is the current year minus 1983. In our graph, we also adjust the labels so a reader can still understand our chart.
Now we want to look at the more serious question, the one that says the model does not fit very well. Looking at our data, the inflation rate seems to fall into roughly 3 blocks, the years before 1973, the years from 1973-1983, and the years after 1983. We would want to go back to our economics classes and find an argument that says this division of years is reasonable. Using the same menu that lets us add a trend line, we can edit the source data. We want to restrict to the years after 1983. In our case, that means restricting to rows 1 to 30.
This breaks the data into two pieces. The first piece is the period from 1983 till 2011. As we see, the exponential model fits quite well in that case.
The second piece is the period from 1973 till 1982. Once again, the exponential model fits quite well over that period. Notice that the exponent is quite different in the two periods.
The obvious question that arises is to figure out what happened in 1983 that caused the economic model to shift. That question is beyond the scope of this course.
ExercisesExercises: Using Excel to find best fit curves
¶Excel has a limited set of models that can be used for trend lines to automatically fit curves to data. In later sections we will look at how to we can use calculus to find best fitting curves for other models. Until we develop those techniques, we can make a guess at parameters that will make curves fit.
Projects: