The New York Times integrates data science not only into its digital business, but also its print operations. Sending an optimal number of newspapers to each of our sales locations is a long-standing problem that we are newly addressing with a modeling and experimentation platform deployed on Google Cloud services.
Our models combine custom time series modeling and analytical solutions, while also incorporating qualitative business concerns. In particular, we probabilistically account for censored data (as demand in unknown when the paper sells out) and perform a constrained optimization to maximize profit while minimizing any decrease in circulation.
The algorithms are tested using paired treatment and control stores in which we can directly compare profits and sales. This "single copy" modeling must be executed regularly in a robust manner, as it drives our weekly sales in many stores throughout the country; these concerns have informed our recent redesign of the system as part of our company's move to Google Cloud Platform. This is one of the group's longest-running projects, and I will share some surprising lessons we've learned along the way.
Anne is a senior data scientist at The New York Times, where she builds models and data products to interpret and act on the company's data with a variety of groups including marketing, print circulation, and the newsroom. Before becoming a data scientist she was an astrophysics and cosmology postdoc in Barcelona and Munich, and received her Ph.D. in physics from Yale. She enjoys building systems to turn intractable amounts of data into usable insights.