How to deal with the seasonality of a market?

Published in

Lyft Engineering

8 min readNov 14, 2018

At Lyft we want to ensure that in the next few weeks there will be enough drivers so that all the passengers will be able to get to their destination in time, but also that there will be enough passengers so that drivers will be able to work when they want to, i.e that the market will be balanced, to provide the best transportation to everyone. Lyft has built many tools and bonuses to incentivize drivers and passengers to use Lyft more often or at specific times. But can we predict a few weeks in advance when we will need to launch this machinery, and if it will be enough to close the gap between drivers and passengers?

How can we predict the daily demand and supply a few weeks in advance ? Before starting to predict a raw time series, we need to understand how people ride and drive, and what affects their patterns of behavior; what we call seasonality. Only then will we predict the underlying evolution of the trend, the overall growth of driver hours and passengers ride requests. The seasonality patterns are a result of various phenomena: whether it is the work / home / week-end balance, or recurrent holidays or events, or even season effect (referring to summer / winter), each needs to be dealt with differently.

Work / Home / Week-end balance : the weekly seasonality

We don’t move around a city the same way during the weekend and during the week. Depending on which city you live in, the effect of weekly seasonality varies, but typically there will be more rides on the weekend than during the week. Evaluating precisely the effects however, can be tricky.

Daily rides in a city, and its underlying trend.

The seasonality is defined as a multiplicative coefficient relative to the underlying trend (the long term evolution).

rides = trend * (1 + seasonality)

Equivalently, we can work with the logs of the times series, which allows us to work with additive seasonality instead of multiplicative seasonality.

log(rides) = log(trend) + additive seasonality

The easy first go at evaluation of the weekly seasonality is to guess the trend (by a rolling average over the past 7 days for example), and to average the weekly effects. However this approach may fail or be inaccurate:

First the time series is not affected only by weekly seasonality: drops or peaks can be due to holidays, or an unexpected weather event like a big snow storm
Second, the weekly seasonality can vary over the years. The people using Lyft three years ago are different from the current users, and may use it for a different reason, for example starting to use it regularly as a daily commute means.

Instead, we use a very common and powerful model, the Kalman Filter.

Kalman Filters

Kalman Filters are a powerful tool used to evaluate the hidden state of a system, when we only have access to measurements of the system containing inaccuracies or errors. It bases its estimation on the past prior state, and the current measurements. For example, it can be used to estimate the position of a car based on its GPS signal. The position of the car at time t is a combination of its prior estimates of position and speed at t-1, and of the current GPS measurements of position (which can be inaccurate or contain random errors). For more details on Kalman Filters, check out this class.

States and observations in a Kalman system

With the notation from the figure above, Kalman Filters will be defined as follow:

With those definitions, Kalman Filters can be applied to a car movement as we have just described, but also to the weekly seasonality of a time series. For each day of the week, we suppose the observed value in day can be decomposed between the level of the given week (the trend as described in the above graph), and the specific seasonality of the day in this week.

After defining the problem, its state transition and observation model, the Kalman Filters can be tuned. This means fitting the unknown parameters of the model. The Kalman Filter is tuned iteratively by going through the time series, therefore it needs one more thing before being tuned: a first guess of the initial state, for example, the the observation during the first week.

After tuning the Kalman Filters, it can be used to evaluate the state of our system every week, giving us the additive weekly seasonality. By removing the seasonality, we obtain the weekly-deseasonalized time series:

Daily Rides in a city (light purple) and deseasonalized time series (purple)

As we can observe, the peaks during the week-end are very well taken into account. But some peaks and downs are still unexplained: those are due to either holidays, local events, or random weather conditions. Although a snowstorm can not be forecasted weeks in advance, we should have a pretty good idea of how Christmas is going to look like, based on what happened in the past years.

Holidays and Events effects

Now that we got the weekly effects out of the way, we can focus on the next holidays and events in each city, Halloween is coming up pretty fast, and before that students are starting classes again in big college cities, which is going to drive the numbers up. Once again here we look for the additive seasonality coefficients describing how much each recurring phenomena is affecting our time series:

log(rides) = log(trend) + additive seasonality

Halloween and classes starting actually have a very different effect on people’s behavior in terms of transportation. On one hand, Halloween’s effect, similarly to a marathon or other holidays, is very narrow in time: it is concentrated on the Friday and Saturday nights when people use Lyft to go celebrate and come back home safely. On the other hand, students going back to school have a long term effect over several months, and opposite to the Summer when the population of a college city drops.

Modeling holidays and yearly effects is based on the very basic idea that they will impact the time series in the same way year after year. Using the definition above, we will fit seasonality components based on the definition above, using the residual Z(t) = log(rides) — log(trend) . Punctual events are modeled by a few data points, whereas yearly seasonality (winter vs summer, classes starting, etc) are modeled by one time series described by a Fourier decomposition.

The linear model is fitted by minimizing the least square errors, with a quadratic penalty.

Where H is the matrix containing the time indicators and Fourier decomposition. You will recognize an optimization problem very similar to a ridge regression. However, a few tricks were added. Indeed, each time series is very different from another, and we need more flexibility than a regular ridge to truly adapt to the time series we are modeling:

H can be modified (Hreg) to account for the decreasing impact of seasonality with time, due to the growth of the market
W is a diagonal matrix; when the diagonal value is set to 0, it basically cancels some term in the regression. It is very useful when we want to ignore a period in time, for example during a hurricane
Lambda, finally, is a penalty term that can be adapted to each holiday, for example putting a bigger penalty on the yearly effects than on the holidays.

After setting all the parameters, and fitting the seasonality coefficients, we obtain the holidays and yearly seasonality effects.

A big part of understanding seasonality, is knowing what to expect. The local teams in each city know what events affect Lyft’s users, their input is key to ensuring that seasonality catches all the recurrent changes in the markets.

After incorporating both weekly and yearly seasonality, the residual time series represent the underlying evolution of our market.

Forecasting, to be continued …

Now that we have removed the effects of seasonality, we can focus on forecasting the evolution of the trend. First of all, the trend is computed by smoothing the deseasonalized time series. To do so we will use a Kalman Filter again, modeling the trend and its growth.

Daily rides (purple), deseasonalized daily rides (red), and its trend (black).

Forecasting the trend can then be done using directly the estimates from the Kalman Filters. But the trend of the market are affected by many other factors: the number of new drivers / passengers, marketing campaigns, change in pricing, etc. A Machine learning model is not particularly well equipped to deal with time series, and other techniques are used to compute seasonality. However, it can do a great job at incorporating external indicators into a model, that’s why we built a machine learning model to predict the evolution of the trend.

Stay tuned on the Lyft blog for part 2. of the article, with more details on the trend computation and trend forecasting.

Conclusion

Predicting holidays and seasonal impacts is key to understanding the evolution of the markets, as much as predicting its trend. It allows Lyft to anticipate unbalanced market conditions, and make plans to avoid those situations, using all the different tools our growth teams have built.