Implementation of Marketing Mix Modeling

This project is based on the pymc implementation of the MMM presented in the paper Jin, Yuxue, et al. “Bayesian methods for media mix modeling with carryover and shape effects.” (2017). Here we work on simulated data to set the parameters ourselves and allow us to conduct a parameter recovery exercise.The data generation process is as an adaptation of the blog posts “Media Effect Estimation with PyMC: Adstock, Saturation & Diminishing Returns” and Mastering Marketing Mix Modelling In Python. It also uses as references this other sources:

🔎 https://github.com/sibylhe/mmm_stan/tree/main
🔎 https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_example.html

1. Introduction

Marketing mix models or Media mix models are used to understand how media spending affects sales, so as to optimize future budget allocation.

These models are usually based on weekly or monthly aggregated national or geo level data. The data may include sales, price, product distribution, media spend in different channels, and external factors such as macroeconomic forces, weather, seasonality, and market competition.

ROAS (return on ad spend) and mROAS (marginal ROAS) are the key metrics to look at. High ROAS indicates the channel is efficient, high mROAS means increasing spend in the channel will yield a high return based on current spending level.

The ultimate goal is to create the best funnel to have the best ROI (return on investment).

2.Business Case Study

2.1 Problem Definition

Let's first define the business problem we are trying to solve. We want to optimize the marketing budget allocation of our client with the following characteristics:

Sales data: weekly sales of the client.
Media spend data: weekly spend on 5 different media channels
Domain knowledge:
- We know that there has a been an positive sales trend which we believe comes from a strong economic growth.
- We also know that there is a yearly seasonality effect.

There is a causal relationship between marketing and sales, but what is the nature of that relationship? We have to take into account that there is :

a carry-over effect (adstock). Meaning, the effect of spend on sales is not instantaneous but accumulates over time.
a saturation effect. Meaning, the effect of spend on sales is not linear but saturates at some point

2.2 Data Generation

📄 Find all the Generation process in the Notebook 1-Data_generation.ipynb
📄 Find a simplified Data_generation function in the Script data_generator_function.py

As described in the section above, we want a dataset with:

Sales variables: Sales ( the target variable)
Media Variables:
- ooh (Out of home spend)
- tv (Television spend)
- print (Print media spend)
- facebook ( Facebook ads spend)
- search (Google search ads spend)
- facebook_I (facebook impressions)
- search_clicks_P (Google search ads performence,number of clicks)
Control Variables: competitor_sales_B (competitor sales baseline)

To construct our dataset we considered 4 years of weekly data.

From what we know from the domain knowledge, we have described the demand with an increasing trend for organic growth, with a seasonality (oscillation) in the demand each year.

We also created a proxy for demand as in reality, the true demand is never observable, but we can find proxies.

After that, we created synthetic data for each marketing channel.The different channel spends are correlated with demand, and also are designed by different marketing strategy (for example high budget, bursty campaigns for TV, or relatively consistent with moderate noise for out-of-home).

Next, we pass the raw signal through the two transformations: first the geometric adstock (carryover effect) and then the logistic saturation. For the adstock, we set a maximum lag effect of 8 weeks, and we chose our alpha parameter accordingly to the media:

Channel	Type	alpha	Justification
`tv`	Offline, mass media	0.5 – 0.8	Strong long-term effect (brand awareness, memorability)
`ooh`	Offline, visual	0.4 – 0.7	Moderately lasting impact, repeated exposure in public spaces
`print`	Offline, print media	0.2 – 0.5	Lower memorability, short-lived effect, rarely drives direct action
`facebook`	Digital, paid social	0.1 – 0.4	Short-term performance focus, quick decay of impact
`search`	Digital, intent-based	0.0 – 0.2	No carryover effect: impact is immediate (direct response channel)

Same for the saturation:

Channel	Type	λ	Justification
`tv`	Offline, mass media	0.5 – 1.5	Strong saturation: TV reach saturates quickly (broad audience)
`ooh`	Offline, visual	1.0 – 2.0	Moderate saturation, especially in high-exposure urban areas
`print`	Offline, print media	1.5 – 3.0	Low saturation (narrow audience); hard to reach saturation point
`facebook`	Digital, paid social	0.5 – 1.5	Can saturate fast with high budget, algorithmically optimized
`search`	Digital, intent-based	2.0 – 4.0	Very low saturation: conversion effectiveness remains linear longer (pull channel)

And we can visualize the effect signal for each channel after each transformation:

We then add the Facebook impressions and google search clicks, and then the control variable that is the competitor sales baseline ( that also follows trend and seasonality)

Finally, we create our target value, the sales, that we assume it is a linear combination of the effect signal, the trend and the seasonal components, plus the two events and an intercept. We also add some Gaussian noise.

2.2 Exploratory Data Analysis and Feature Engineering

📄 Find more details on the EDA and FE in 2-EDA and FE.ipynb

Now that we have our data let's do the exploratory data analysis. (This one will be faster than the usuals because we already know well our dataset given that we constructed it).

We reduced our dataset to only the columns that would be available from real raw data delivered by the company(sales,spend per channel,sales of competitor,impressions and clicks), and start by visualizing them.

We also compare the monthly stats on ad spends:

	tv_s	ooh_s	print_s	facebook_s	search_s
count	209	209	209	209	209
mean	$6,342	$1,657	$142	$6,886	$3,328
std	$11,416	$436	$476	$2,470	$809
min	$0	$810	$0	$2,052	$1,589
25%	$0	$1,326	$0	$4,927	$2,673
50%	$0	$1,611	$0	$6,642	$3,322
75%	$12,002	$1,959	$0	$8,313	$3,834
max	$49,311	$2,779	$2,839	$15,080	$5,387

NB : Media contribution du modele vs de nos données generees à comparer

We can see the current allocations on the different channels but now the question is if this is the most profitable split and spendings.(Always remembering that the ad spends are not the only source explaining the sales).

And now we do a quick feature engineering step befor modeling the process.For this step, we add more layers to describe temporality and we include a trend feature (that will help us see the seasonality as 4 Fourier modes).

	date	sales	ooh_s	facebook_s	search_s	trend	year	month	dayofyear
0	2021-01-04	162277.109282	963.639807	4296.052070	2182.363211	0	2021	1	4
1	2021-01-11	170493.217138	1015.604279	4324.637643	2151.854375	1	2021	1	11
2	2021-01-18	144523.455074	1102.396049	4926.757799	2168.730601	2	2021	1	18
3	2021-01-25	239399.578756	1371.427530	7538.774028	3306.440497	3	2021	1	25
4	2021-02-01	195422.511307	1015.958803	5212.689979	2378.653779	4	2021	2	32

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
1-Data_generation.ipynb		1-Data_generation.ipynb
2- EDA and FE.ipynb		2- EDA and FE.ipynb
3-Model.ipynb		3-Model.ipynb
LICENSE		LICENSE
README.md		README.md
data.csv		data.csv
data_generator_function.py		data_generator_function.py
installation.txt		installation.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Implementation of Marketing Mix Modeling

1. Introduction

2.Business Case Study

2.1 Problem Definition

2.2 Data Generation

2.2 Exploratory Data Analysis and Feature Engineering

About

Uh oh!

Releases

Packages

Languages

License

Oriane-O/Marketing-Mix-Modeling

Folders and files

Latest commit

History

Repository files navigation

Implementation of Marketing Mix Modeling

1. Introduction

2.Business Case Study

2.1 Problem Definition

2.2 Data Generation

2.2 Exploratory Data Analysis and Feature Engineering

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages