# Economic Price Optimization through A/B Tests: Limited Applicability, Part 4 in our Series

Which is a better price: $15.99 or $16.49? From the customer’s perspective, the lower of the two is normally preferred, but that doesn’t mean it is the best price. Customer relationships need to be mutually beneficial. From the firm’s perspective, the higher of the two may be preferred. To balance these two perspectives, we have the profit equation of the firm and, in a limited number of situations, the ability to conduct an A/B experimental test and optimize prices.

In this missive, we describe an application of this approach similar to that taught to DePaul sophomore marketing students. This is followed by a discussion of its limitations and a positioning of this approach within the larger context of pricing.

### A/B Price Testing

In a simple design of an A/B experimental price test, the exact same product, promotional material, and distribution outlet is presented to two sample sets of customers drawn from a representative population of prospective customers. The price varies in a controlled manner between specific high and low points. One sample only sees the high price. The other sample only sees the low price. After the samples have been shown the offers, the firm measures how often each sample does and does not purchase.

If the sample shown the high price purchases more often than that shown the low price, executives can safely conclude that the higher price is more profitable. Otherwise, two key analyses need to be conducted: (1) The Chi-Squared Test to determine if difference in purchase frequencies between the two samples is real or just a result of the inherent randomness of life. (2) A profit analysis to determine if the volume gains outweigh the margin losses at the lower price to identify the optimal price.

The first analysis, the Chi-Squared test, is a statistical test of significance. While keeping it at a high level for the general businessperson audience, let me clarify briefly.

The Chi-Squared test identifies the probability that the difference in sales volumes between the two prices is the result in random variations between the samples, also known as random sample error. If the probability (the p-value) that the differences is due to random sample error is greater than 5%, a commonly used significance level, then executives generally conclude that the differences are meaningless. Else, they generally conclude the difference is statistically significant.

People don’t have need to program a calculator nor be an expert in statistics to conduct a Chi-Squared test. They can do it easily in something as simple as Microsoft Excel (Chi-Squared in Excel Online Tutorial), or they can turn to any of a number of statistical software packages.

The second analysis, the profit analysis, enables executives to identify which price delivers the greater profit.

Let us call the high price PH and the low price PL. Similarly, let us call the frequency of purchases by the sample shown the high price %QH and the frequency of purchases by the sample shown the low price %QL. Finally, let V be the variable costs associated with the product. Since the product, promotion, and distribution is the same for both samples, V doesn’t change.

Mathematically, if %QL (PL-V) > %QH (PH-V), then the lower price is more profitable. Otherwise, the higher price is superior.

Importantly, the profit analysis is contingent upon the Chi-Squared test. If the p-value of the Chi-Squared is greater than 5%, then any variation between the %QL and %QH is meaningless and the firm should choose the higher price. Only when the statistical analysis has ruled out random sample error as the culprit behind observed purchase frequency differences can the profit analysis be conducted with any level of decision reliability.

### Simple, But Limited

This approach is relatively simple to execute, analyze, and formulate recommendations, but it is also highly limited in its applicability. This approach often requires well over a thousand experimental runs before any reliable pricing decision can be made. Moreover, each of these experimental runs must be run in a relatively short time period to enable executives to exclude exogenous factors interfering with the experimental control.

For example, let us consider a relatively typical scenario. Executives are considering a 10% price reduction on an item. Based on experience, these executives know that only 5% of the people who see their offer purchase. The other 95% don’t, perhaps because they were just curious about the market, checking availability, collecting a budgetary estimate, or, in a minority of situations, conducting comparison shopping. And even the comparison shoppers aren’t necessarily all price sensitive, some were comparing differences in the whole customer experience (utility).

They may expect lower prices to be associated with higher sales volumes, but even assuming a slightly high market elasticity of 2 (for a refresher on elasticity of demand, see part 2 of this series here), the 10% price reduction can only be expected to drive a 20% sales volume increase. Doing that math, we see that is equivalent to suspecting that a 10% price cut increases the purchase frequency from 5% to 6%. Detecting a 1% (10% x 2 x 5%) difference in purchase rates isn’t easy.

If each sample had only 100 customers in it, that would mean a difference of 6 purchases versus 5. That evidence alone would not suffice in convincing most rational business leaders that the lower price is more profitable.

To satisfy the Chi-Squared at the 5% significance level, the experiment would have to run about 1,000 times, and even then the difference would only be between 30 purchases at the low price versus 25 at the high price.

Worse, if the executives were considering a smaller difference, such as $15.99 vs. $16.49, they may need 10,000 experimental runs to come to a reliable decision.

While running thousands of experiments in a day may be reasonable for fast-moving consumer goods, it presents a challenge for most products sold in business markets. In fact, it presents an unbreachable challenge for wide host of long-tail (infrequently purchased) products. Purchase decisions about fiberglass roving, apartment units at a specific address, and obscure textbooks just aren’t made that frequently.

### Context

Economic price optimization through A/B tests is probably the simplest applied approach to using economic price optimization reliably. The approaches explored in *Price Optimization with Globally Linear Demand — Both Useful and Useless* and *Economic Price Optimization with Locally Measured Elasticity of Demand — Unreliable* are overly simplistic and suffer from unrealistic assumptions about the market’s response function. The approach explored in *Economic Price Optimization Part 3 – Mental Models Matter* more accurately describes the market’s response function but generally relies on market research data, not historical data.

But still, this approach can’t be used very often. So what should an executive do?

First, begin by deciding whether making pricing decisions based on historical data is the right approach. Market research approaches are known to be superior for pricing many products and services, especially in cases where the company is making the product or delivering the service itself, when the offering is differentiated, or when the product is new. Moreover, through market research executives can uncover actions which improve their pricing power.

Second, if managers really believe pricing based on historical data is appropriate for their market, then more sophisticated econometric approaches can be used. Numerous consulting firms and software vendors have developed algorithms and data-collection solutions to serve this need, and many executives have reported positive results.

In other words, yes, executives can determine which price is better: $15.99 or $16.49, using economic price optimization through an A/B test—but only under certain circumstances, and only after conducting a controlled experiment, and only after performing the appropriate statistical tests. Even then, it may not be clean-cut decision. Very limited in its application indeed.