In 1933, William R. Thompson published an article on a Bayesian model-based algorithm that would ultimately become known as Thompson sampling. This heuristic was largely ignored by the academic community until recently, when it became the subject of intense study, thanks in part to internet companies that successfully implemented it for online ad display.
Thompson sampling chooses actions for addressing the exploration-exploitation in the multiarmed bandit problem to maximize performance and continually learn, acquiring new information to improve future performance.
In a new study, “Online Network Revenue Management Using Thompson Sampling,” MIT Professor David Simchi-Levi and his team have now demonstrated that Thompson sampling can be used for a revenue management problem, where demand function is unknown.
Incorporating inventory constraints
A main challenge to adopting Thompson sampling for revenue management is that the original method does not incorporate inventory constraints. However, the authors show that Thompson sampling can be naturally combined with a classical linear program formulation to include inventory constraints.
The result is a dynamic pricing algorithm that incorporates domain knowledge and has strong theoretical performance guarantees as well as promising numerical performance results.
Interestingly, the authors demonstrate that Thompson sampling achieves poor performance when it does not take into account domain knowledge.
Simchi-Levi says, “It is exciting to demonstrate that Thompson sampling can be adapted to combine a classical linear program formulation, to include inventory constraints, and to see that this method can be applied to general revenue management problems in the business-to-consumer and business-to-business environments.”
Industry application improves revenue
The proposed dynamic pricing algorithm is highly flexible and is applicable in a range of industries, from airlines and internet advertising all the way to online retailing.
The new study, which has just been accepted by the journal Operations Research, is part of a larger research project by Simchi-Levi that combines machine learning and stochastic optimization to improve revenue, margins, and market share.
Algorithms developed in this research stream have been implemented at companies such as Groupon, a daily market maker, Rue La La, a U.S. online flash sales retailer, B2W Digital, a large online retailer in Latin America, and at a large brewing company, where Simchi-Levi and his team optimized the company’s promotion and pricing in various retail channels.