# What Are You Doing? Log-Linear Regression

For this month’s article, I offer an overview of a log-linear regression that I conducted for a client recently. While I find the mathematics and statistics that I used for this project fascinating in and of themselves, I do not want to get bogged down in the actual equations or any distracting minutiae. What I think is more valuable is exploring how the analysis was used as a tool to help drive a management decision about corporate strategy.

__The Situation__

We were recently approached by a client looking for assistance in overhauling their corporate strategy. Specifically, they needed assistance with setting appropriate list prices for their products and then managing the discounts and rebates offered to customers. As is common, the number and magnitude of their discounts and rebates had proliferated over time. Now the client needed assistance with determining which discounts and rebates should be continued, which should be modified, and which should be scrapped. Most importantly, the client needed to know how these potential changes could impact the client’s customer relationships and the client’s bottom line.

__The Analysis__

Like most pricing projects, I began this project with a deep dive into the client’s transactional data. After cleaning the data and looking at a few pivot tables, I started creating visualizations to discover insights. Luckily, one of the simplest visualizations can also be one of the most insightful: the humble scatterplot.

A scatterplot generally displays two variables at a time, with one variable on the x-axis and the other variable on the y-axis. A scatterplot allows you to quickly spot relationships between variables. In pricing, the y-axis is often used for the unit price, and another marketing variable is placed on the x-axis. My challenge was to find the marketing variable that showed the strongest relationship with price. More precisely, I was looked for a marketing variable with a strong relationship to the net price, i.e., the actual price that a customer pays after accounting for on-invoice discounts and off-invoice rebates.

I usually begin my scatterplots by looking for a relationship between net price and the amount of product ordered. A negative correlation is expected: as the amount of product ordered increases, the net price is expected to decrease. In this example, I opted to compare transactional net price to transactional revenue. Figure 1 shows the resulting visualization.

__Figure 1__

As you can see, there does appear to be a negative correlation, but the correlation is weak. So, I kept looking. After a few additional dead ends, I decided to check transactional net price against total annual customer revenue. My results are shown in Figure 2.

__Figure 2__

I found that not only was there a correlation, but it was the strongest one I had found yet. It appeared as though the sales team was factoring in the total annual business that each customer did when negotiating their pricing. What’s more is that this phenomenon was not confined to a single product. Enthused by my initial results for a few specific products, I generalized the approach and applied it to all products sold by our client by expressing price in terms of percentage deviation from the average. As shown in Figure 3, the relationship held across the board.

__Figure 3__

The trendlines displayed in Figures 2 and 3 are the result of performing a linear regression to find the line of best fit. The line shows how the independent variable affects the dependent variable. The astute reader may have noticed that for Figures 2 and 3, I switched the x-axis to a log scale. These visualizations use a log scale with a base of 10. That means that every mark on the axis is 10 times larger than the mark to the immediate left.

Why use a log scale? In Figure 1, you can see that the data points are clustered to the left along the y-axis. When reviewing the initial data for Figure 2, the data was likewise clustered to the left, and I was not able to find a strong correlation. Since no relationship was readily apparent at the linear scale, I tried a log scale. In many cases where the variable of interest has a range across several orders of magnitude, log scaling allows an analyst to visualize the data without everything getting crammed along the axis.

Using a log scale does slightly complicate performing a linear regression. By definition, a log scale is nonlinear. However, by taking the natural log of the customer annual revenue and using the result in a regression analysis, I was able to successfully complete a log-linear regression. While I do not wish to overburden the reader with all the details, the key takeaway is that transforming variables in this manner allows you to explore nonlinear relationships between variables.

__The Outcome__

We had determined that the largest single contributor to price variation was the annual revenue of the customer. Armed with this newfound insight, we worked with the client to create a commercial policy that formalized the behavior of the sales force into rules that would be applied to all customers. Naturally, we incorporated a rebate based on customer annual revenue into the policy.

Additionally, we had to work through some red herrings during the project. For instance, our client was certain that order size was affecting the net price negotiated by the sales team. However, we did not find a strong relationship in the data. The client also expected the price to be affected by the customer classification. Yet the analysis revealed that the customer classification had little impact on the net price.

__Final Thoughts__

Ultimately, this project was about more than doing fancy math and then showing our work to a client. Our final result was so powerful because we were able to give our client a more accurate view of their reality. Our client entered the process with an assumption about the relationship between price and order size. Not only were we able to demonstrate to the client that the assumed relationship was weak, but we were able to show the client a stronger relationship that did exist in the data.

Furthermore, we were able to turn this powerful insight into action because we used this information to drive decisions on the creation of a new commercial policy. Our role was not to critique our client’s historical interaction with customers. Rather, we analyzed our client’s data, found a strong correlation between variables that was previously unrecognized and unexplored, and then used that insight to guide a discussion regarding whether this pre-existing behavior should be formally incorporated into the new commercial policy.