Skip to main content
Ctrl+K
Wakana Morlan Marketing Data Analyst - Home
  • Konnichiwa! (“Hello!” in Japanese) đź‘‹

Projects

  • Travel Google Search Trend Prediction
  • Online Event Optimization 🎟️
  • Paid Social A/B Testing
  • Social Media Content Analysis
  • Credit Worthiness Analysis
  • WeRateDogs Twitter Analysis

Tutorials

  • Portfolio with Jupyter book
  • Repository
  • Open issue
  • .ipynb

Paid Social A/B Testing

Contents

  • Summary
  • Background
  • Situation
    • Current performance
  • Technology
  • Designing experiment
    • Fomulate hypotheses
    • Variable selection
    • Methodology
    • Sample size
  • A/B testing python codes for a sample dataset
    • Calculate the sample size
  • Prepare the data
  • Sampling
    • Calculate conversion rate
  • Test the hypothesis using two-sample Z-test/T-test
    • Z-test (sample size >= 30)
    • T-test (sample size < 30)
  • Results
  • References

Paid Social A/B Testing#

Summary#

I have conducted A/B Testing to validate using simplified lead form improve online ad conversion rate. The improvement in the conversion rate had to be big enough to change the company’s data capture policy and apply the change to over 30 variations of the landing page.

The null hypothesis was there is no difference between the landing page with the simplified lead form and the existing landing page. And the alternative hypothesis was there is a significant difference between two landing pages. The alpha level was set to 0.05, meaning the probability of the test is correctly reject null hypothesis is 95% but I would live with 5% of false positive.

I used Python’s statsmodels library to calculate the sample size, performed two-way Z test (since the sample size is larger than 30). The test resulted 0.001 p-value. Therefore, I rejected null hypothesis and we went ahead and adopted new simplified lead form.

Background#

  • Landing pages are essential elements on online ad campaigns (e.g., Google Search, paid social).

  • The purpose of landing pages is to communicate the benefits of your products quickly (You got about 5 seconds!) and to capture the contact information about the prospective customers (leads).

  • Online ads are considered “successful” only when the ad campaigns generate leads not just clicking the ads.

  • There are many ways to improve the landing page performances but here are a few examples:

    • Optimize the form (The less field to fill out, the higher changes prospective customers complete the form. But it’s a tradeoff of giving up important customer information.)

    • Optimize copy to convince the benefits and/or match the ad copy

    • Personalize the content on the landing page

    • Improve the page load speed

Situation#

Note: Numbers and some scenarios were altered to protect confidential information.

  • Our marketing team has over 30 variations of landing page because we run over 1,000 online ads for different products and different audience.

  • We recognized the lead form is a bit lengthy due to the company’s data capture policy.

  • However, the benefits of getting more qualified leads might surpass the company’s data capture policy even though the sales team potentially have to fill out missing information manually, which is a cost too.

  • Also, making changes to all different landing page costs developers’ resources.

  • So, we decided to perform A/B testing to evaluate the performance of a landing page with a simplified form against existing landing page with a long form.

Current performance#

  • The current conversion rate from the landing page 10%

  • 5% of leads become customers (5% lead conversion rate)

  • The average customer lifetime value is 30,000 USD

Let’s say, we get 1000 clicks from the online ads, we acquire 100 leads, and eventually 5 leads become customers, which generate 150,000 USD profits.

We’re aiming to improve the landing page conversion rate by 10 points. The potential profit will be 300,000 USD, which is 150,000 USD more than what we are currently generating. After the discussion with stakeholders, additional 150,000 USD profit is reasonable to make the data capture policy. Also, we’ll make the changes to all other landing page if the test is successful because the total potential profits generated from over 30 landing pages can easily pay the development cost.

Technology#

To perform A/B testing (Two sample Z/T test), the following functions from statsmodels were used.

  • proportion_effectsize: Caluclate effect size for a test omparing two propertions. Or, effect size is calculated by taking the difference between two groups and dividing it by the standard devisation of one of the groups.

  • statsmodels.stats.power.NormalIndPower.solve_power: Solve for any one parameter of the power of a two sample z-test. (effect size, sample size, alpha, power, and ratio) For this case example, the sample size was calculated using this function.

  • statsmodels.stats.power.tt_ind_solve_power: Solve for any one parameter of the power of a two sample t-test Caluclate the sample size for two sample t-test.

  • proportions_ztest, proportion_confint from statsmodels.stats.proportion module: calculate the p-value and confidence intervals for the z test.

  • ttest_ind from statsmodels.stats.weightstats module: calculate the p-values for the t-test.

Designing experiment#

Fomulate hypotheses#

  • Null hypothesis: There is no difference in lead conversion rate between the existing landing page (a long lead form) and the new landing page (a short lead form).

  • Alternative hypothesis: There is a statistically significant difference between the new and old landing pages.

Variable selection#

  • Dependent variable: user’s conversion rate

  • Independent variable: existing or new landing page (control and treatment)

Methodology#

  • Use Facebook A/B test tool otherwise Facebook will optimize the campaign and we will not get the same ad performance for each ad.

    • Create a new campaign on Facebook and create two identical ads.

    • One of ads has the current landing page as the ad landing page and another ad has the new landing page as the ad landing page. Other variables are same. (Copy, image, target audience, ad placement, etc.)

Sample size#

The number of sample size affect the test’s statistical power (i.e. The probability that test can reject null hypothesis correctly)

  • Alpha value (critical value): The probability that we are willing to reject the null hpyothesis when it is actually the null hypothesis is correct. i.e. We’re willing to live with the 5% of change that we conclude there is a difference even though it isn’t. This is a TYPE 1 ERRO or False Positive

    • 1 - Confidence Level

    • Usually set to 0.05 or 5% (Convidende level of 95% and alpha of 5%)

  • Beta : The probability that we failed to reject the null hypothesis when the alternative hypothesis is actually true. This is a TYPE II Error or False negative.

    • Typically set to 0.2

  • Power: The probability that test will correctly support the null hypothesis. (1 - β)

  • Effect size: How big of a difference we expect there to be between conversion rates. Effect size is calculated by taking the difference between the two groups (e.g., the mean of treatment group minus the mean of the control group) and dividing it by the standard deviation of one of the groups – This will need to be agreed by various stakeholders. This this case, 5%.

Note

  • < 0.1 = Trivial effect

  • 0.1 - 0.3 = small effect

  • 0.3 - 0.5 moderate effect

  • > 0.5 large effect

A/B testing python codes for a sample dataset#

import pandas as pd
import numpy as np
import scipy.stats as stats
import statsmodels.stats.api as sms
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from math import ceil

%matplotlib inline

Calculate the sample size#

#Calculate effect size based on expected rates
effect_size = sms.proportion_effectsize(0.20, 0.10)
effect_size
0.2837941092083278
#calculate samplesize n
required_n = sms.NormalIndPower().solve_power(effect_size, power = 0.8, alpha = 0.05, ratio = 1) #solve for any one parameter of the power of a two sample z-test
required_n = ceil(required_n)
print(f" We will need {required_n} observations for each group.")
 We will need 195 observations for each group.

Prepare the data#

I’m using the dammy data. This data set is avaialbe on Kaggle: https://www.kaggle.com/zhangluyuan/ab-testing?select=ab_data.csv

filename = "ab_data.csv"
df = pd.read_csv(filename)
df.head()
user_id timestamp group landing_page converted
0 851104 2017-01-21 22:11:48.556739 control old_page 0
1 804228 2017-01-12 08:01:45.159739 control old_page 0
2 661590 2017-01-11 16:55:06.154213 treatment new_page 0
3 853541 2017-01-08 18:28:03.143765 treatment new_page 0
4 864975 2017-01-21 01:52:26.210827 control old_page 1
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294478 entries, 0 to 294477
Data columns (total 5 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   user_id       294478 non-null  int64 
 1   timestamp     294478 non-null  object
 2   group         294478 non-null  object
 3   landing_page  294478 non-null  object
 4   converted     294478 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 11.2+ MB
pd.crosstab(df["group"], df["landing_page"])
landing_page new_page old_page
group
control 1928 145274
treatment 145311 1965
#Remove the duplicate users

session_counts = df['user_id'].value_counts(ascending=False)
multi_users = session_counts[session_counts > 1].count()

print(f"Duplicates: {multi_users}")

multi_users = session_counts[session_counts > 1].count()
users_to_drop = session_counts[session_counts >1].index
df[~df["user_id"].isin(users_to_drop)]
Duplicates: 3894
user_id timestamp group landing_page converted
0 851104 2017-01-21 22:11:48.556739 control old_page 0
1 804228 2017-01-12 08:01:45.159739 control old_page 0
2 661590 2017-01-11 16:55:06.154213 treatment new_page 0
3 853541 2017-01-08 18:28:03.143765 treatment new_page 0
4 864975 2017-01-21 01:52:26.210827 control old_page 1
... ... ... ... ... ...
294473 751197 2017-01-03 22:28:38.630509 control old_page 0
294474 945152 2017-01-12 00:51:57.078372 control old_page 0
294475 734608 2017-01-22 11:45:03.439544 control old_page 0
294476 697314 2017-01-15 01:20:28.957438 control old_page 0
294477 715931 2017-01-16 12:40:24.467417 treatment new_page 0

286690 rows Ă— 5 columns

Sampling#

control_sample = df[df["group"]=="control"].sample(n=required_n, random_state=22)
control_sample.head()
user_id timestamp group landing_page converted
254532 644179 2017-01-16 04:15:36.663685 control old_page 0
196706 729672 2017-01-20 19:04:10.409185 control old_page 0
272726 866186 2017-01-09 02:56:47.675707 control old_page 0
118044 884303 2017-01-18 04:49:04.225284 control old_page 0
205159 882576 2017-01-15 13:36:49.854723 control old_page 0
treatment_sample = df[df["group"]=="treatment"].sample(n=required_n, random_state=22)
treatment_sample.head()
user_id timestamp group landing_page converted
155301 634761 2017-01-13 16:48:49.838750 treatment new_page 0
175664 867837 2017-01-06 06:08:36.284810 treatment new_page 0
286642 750518 2017-01-13 10:27:54.415101 treatment new_page 0
230892 657043 2017-01-18 09:49:00.374240 treatment new_page 0
96436 678762 2017-01-14 07:50:35.538789 treatment new_page 0
ab_test = pd.concat([control_sample, treatment_sample], axis=0)

#Reset index
ab_test.reset_index(drop=True, inplace = True)

ab_test.head(10)
user_id timestamp group landing_page converted
0 644179 2017-01-16 04:15:36.663685 control old_page 0
1 729672 2017-01-20 19:04:10.409185 control old_page 0
2 866186 2017-01-09 02:56:47.675707 control old_page 0
3 884303 2017-01-18 04:49:04.225284 control old_page 0
4 882576 2017-01-15 13:36:49.854723 control old_page 0
5 922404 2017-01-04 16:28:51.385675 control old_page 0
6 797435 2017-01-14 07:44:19.849593 control old_page 0
7 806644 2017-01-10 17:43:00.799286 control old_page 0
8 654993 2017-01-03 03:58:54.528626 control old_page 0
9 929263 2017-01-13 15:33:37.297036 control old_page 0
ab_test["group"].value_counts()
control      195
treatment    195
Name: group, dtype: int64

Calculate conversion rate#

conversion_rates = ab_test.groupby("group")["converted"]

#standard_deviation of the proposition
std_p = lambda x: np.std(x, ddof=0)

#standard_error of the population
se_p = lambda x: stats.sem(x, ddof=0)
conversion_rates = conversion_rates.agg([np.mean, std_p, se_p])
conversion_rates.columns = ["Conversion_rate", "std_deviation", "standard error"]
conversion_rates.style.format('{:.3f}')
  Conversion_rate std_deviation standard error
group      
control 0.123 0.329 0.024
treatment 0.087 0.282 0.020
plt.figure(figsize = (8,6))
sns.barplot(x=ab_test["group"], y=ab_test["converted"], ci=False)

plt.ylim(0,0.15)
plt.title("conversion rate by group", pad = 20)
plt.xlabel("Group", labelpad=15)
plt.ylabel("Converted (propotion)", labelpad = 15);
/var/folders/23/pk6hmsss0pbbzbxmbw44c90c0000gn/T/ipykernel_8577/1455529305.py:2: FutureWarning: 

The `ci` parameter is deprecated. Use `errorbar=('ci', False)` for the same effect.

  sns.barplot(x=ab_test["group"], y=ab_test["converted"], ci=False)
_images/a0c876ef273a6f092230cede57bd816bf3e36c3da0e5c23da6c296fe467e9f21.png

Test the hypothesis using two-sample Z-test/T-test#

  • Use Z-statistic when Sample size is >= 30 or we know the variance of population (which is very are)

  • Use T-statistic otherwise

control_results = ab_test[ab_test["group"]== "control"]["converted"]
treatment_results = ab_test[ab_test["group"]=="treatment"]["converted"]

Z-test (sample size >= 30)#

#z-test
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

n_con = control_results.count()
n_treat = treatment_results.count()

success = [control_results.sum(), treatment_results.sum()]
nobs = [n_con, n_treat]

z_stat, p_val = proportions_ztest(success, nobs=nobs)
(lower_con, lower_treat), (upper_con, upper_treat) = proportion_confint(success, nobs = nobs, alpha = 0.05)

print(f"z statistic: {z_stat:.2f}")
print(f"p-value: {p_val:.2f}")
print(f"ci 95% for contril group: [{lower_con:.3f}, {upper_con:.3f}]")
print(f"ci 95% for treatment group: [{lower_treat:.3f}, {upper_treat:.3f}]")
z statistic: 1.16
p-value: 0.25
ci 95% for contril group: [0.077, 0.169]
ci 95% for treatment group: [0.048, 0.127]

T-test (sample size < 30)#

from statsmodels.stats.weightstats import ttest_ind
t_stat, p_value, degree_of_freedom = ttest_ind(controll_results, treatment_results)
print(f"T-statistic: {t_stat:0.2f}")
print(f"p-value: {p_value:0.2f}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[18], line 2
      1 from statsmodels.stats.weightstats import ttest_ind
----> 2 t_stat, p_value, degree_of_freedom = ttest_ind(controll_results, treatment_results)
      3 print(f"T-statistic: {t_stat:0.2f}")
      4 print(f"p-value: {p_value:0.2f}")

NameError: name 'controll_results' is not defined

Results#

if p_val < 0.05:
    print("Reject the null hypothesis.")
    
else:
    print("Failed to reject the null hypothesis.")
Failed to reject the null hypothesis.

Since the p-value is larger than the alpha level, we failed to reject the null hypothesis for this sample data set. Therefore, we will not move forward to implementing the changes to the landing page.

References#

  • Dallanoce, F. (2021, August 22). ANOVA, T-test and other statistical tests with python. Medium. Retrieved March 21, 2022, from https://towardsdatascience.com/anova-t-test-and-other-statistical-tests-with-python-e7a36a2fdc0c

  • DiFrancesco, V. (2021, April 10). Understanding alpha, beta, and statistical power. Medium. Retrieved March 21, 2022, from https://towardsdatascience.com/understanding-alpha-beta-and-statistical-power-525b84453687

  • Ding, E. (2022, February 7). 7 A/B testing questions and answers in data science interviews. Medium. Retrieved March 21, 2022, from https://towardsdatascience.com/7-a-b-testing-questions-and-answers-in-data-science-interviews-eee6428a8b63#5072

  • Fillinich, R. (2021, August 23). AB testing with python. Medium. Retrieved March 21, 2022, from https://towardsdatascience.com/ab-testing-with-python-e5964dd66143

  • Hari, K. (2019, November 5). A/B testing clearly explained. Medium. Retrieved March 21, 2022, from https://medium.com/analytics-vidhya/a-b-testing-clearly-explained-56488430156

  • Hypothesis testing: Difference between Z-test and T-test. Analytics Vidhya. (2020, December 23). Retrieved March 21, 2022, from https://www.analyticsvidhya.com/blog/2020/06/statistics-analytics-hypothesis-testing-z-test-t-test/

previous

Online Event Optimization 🎟️

next

Social Media Content Analysis

Contents
  • Summary
  • Background
  • Situation
    • Current performance
  • Technology
  • Designing experiment
    • Fomulate hypotheses
    • Variable selection
    • Methodology
    • Sample size
  • A/B testing python codes for a sample dataset
    • Calculate the sample size
  • Prepare the data
  • Sampling
    • Calculate conversion rate
  • Test the hypothesis using two-sample Z-test/T-test
    • Z-test (sample size >= 30)
    • T-test (sample size < 30)
  • Results
  • References

By Wakana Morlan

© Copyright 2022-2024.