This is a very basic question, so please bear with me.
I've been learning about AB Testing, which is largely used in internet
marketing to examine the effectiveness of certain aspects of ads, websites,
etc. Here's a couple links to people who want to know more about AB Testing:
http://visualwebsiteoptimizer.com/split-testing-blog/what-you-really-need-to-know-about-mathematics-of-ab-split-testing/
http://20bits.com/articles/statistical-analysis-and-ab-testing/
http://elem.com/~btilly/effective-ab-testing/
Let's say that I have a website that registers users for a forum. I want to
know if Headline 1 or Headline 2 is more effective at getting visitors on
the web site to register for the forum. So I have the following data.
dat = data.frame(Headline=c("Headline 1", "Headline 2"),
Visitors=c("1000", "1300"),
Clicks=c("500", "600"),
Conversions=c("100", "150"))
And here are the click through rates and conversion rates for each of the
headlines.
ctr1 = (500/1000)*100 # for headline 1
ctr2 = (600/1300)*100 # for headline 2
ctr1; ctr2
conv1 = (100/1000)*100 # for headline 1
conv2 = (150/1300)*100 # for headline 2
conv1; conv2
According to the sites above, I'm really interested in determining the
confidence intervals for the conversion rates for each headline. While 95%
confidence would be ideal, I'm really open to anything 80% and up, so I need
to calculate confidence intervals where I am 80%/85%/90%/95% confident that
the conversion rate for a headline is within a certain range.
I'm really not sure how to go about this. Are there specific tests and/or
functions in R that with provide me with the appropriate information?
(confint, chisquare, gtest, etc.?)
Thanks for your patience and help.
EDIT:
So I tried the following, and I'm not sure if I'm doing it properly or
making the right conclusions. Furthermore, there has to be a more efficient
way to perform this task in R.
For a given conversion rate (p) and number of trials (n):
p1 = 0.1
n1 = 1000
se1 = sqrt( p1 * (1-p1) / n1 )
se1
se1 * 1.96
(p1 + 1.96*se1) * 100
(p1 - 1.96*se1) * 100
p2 = 0.11
n2 = 1300
se2 = sqrt( p2 * (1-p2) / n2 )
se2
se2 * 1.96
(p2 + 1.96*se2) * 100
(p2 - 1.96*se2) * 100
(8.1, 11.8) # headline 1
(9.2, 12.7) # headline 2
# these confidence intervals for the two headlines overlap.
# therefore, the variation (headline 2) isn't more effective
# than the control headline
Thanks again.
I'm running R 2.13 on Windows 7
[[alternative HTML version deleted]]