A/B Testing & Eye Tracking

A/B Testing & Eye Tracking — UX Research

design2021· UX researcher

A/B Testing & Eye Tracking

Role: UX Researcher · Course: CSCI 1300, Brown University

Team: Miranda Mo, Jennifer He, Milanca Wang

Context

This project was made for CSCI 1300: User Interfaces and User Experiences, a class at Brown University taught by Jeff Huang. Our goal was to research how affordances affect user behavior by analyzing data through A/B testing and eye tracking.

For this project, we created two new and improved versions of a taxi booking site. For Version A, we used a vertical layout. We added grid boxes around each taxi option and a drop shadow to emphasize the different sections. We also used different font sizes for the name of the taxi and the description to add distinction between the different pieces of information. For Version B, we used a horizontal layout that placed all of the taxi options next to each other contained within the whole screen. We also added grid boxes around each taxi to make the content more readable. For both versions, we used the same color palette so that color was not a factor — instead, we focused on how the two different layouts would affect behavioral trends such as click through rate, time to click, dwell time, and return rate.

Part 1: A/B Testing

First, we made a series of hypotheses on how the two versions of our sites would perform based on four metrics (click through rate, time to click, dwell time, and return rate). Then, we tested our hypotheses by having users navigate through one of our two sites and analyzed the data to conclude whether our results are statistically significant, using either a Chi-Squared test or T-test.

Hypotheses

Click Through Rate

Null: The click-through rate of Version A will be equal to that of Version B.

Alternative: The click-through rate of Version A will be greater than that of Version B because the layout of Version A is cleaner and entices the user to select a service offered.

Time to Click

Null: The time to click will be the same in Version A and Version B.

Alternative: The time to click a button on Version A will be shorter than that of Version B because the button on Version A is larger, and there is less content on the initial screen for Version A compared to Version B.

Dwell Time

Null: The dwell time will be the same in Version A and Version B.

Alternative: The dwell time for Version A will be shorter than that of Version B because users of Version B see all taxi companies on the page at once and are more likely to choose to click on a company that they will likely stick with.

Return Rate

Null: The return rate of Version A will be equal to that of Version B.

Alternative: The return rate for Version A will be higher than the return rate for Version B because users of Version B see more content on the screen and are more likely to read the information for all of the taxi companies before clicking on the reserve button. On the other hand, users of Version A see less content on the screen and are more likely to click on the reserve button before fully reading information for the other taxi companies, thus causing them to return back to the web page.

Data Analysis

Click Through Rate

We began by creating a pivot table in Excel with count of page load time (how many times page was loaded) and max of click time (0 if they didn't click on any links), tallied those with max click time of 0, and subtracted them from the total number of entries to find the percentage of people who clicked. We chose to use the Chi-squared test for click through rate because this is categorical data (yes or no for each user).

Observed

	Click	No click	Total
A	29	5	34
B	28	10	38
Total	57	15	72

Expected

	Click	No click
A	26.9	7.1
B	30.1	7.9

Interface A: 85.3% · Interface B: 73.7%

χ² = 1.4897, df = 1, p-value reference = 3.84

We did not reject the null hypothesis because our calculated value (1.4898) was less than the reference value (3.84) at p = 0.05 and df = 1. The difference between click-through rates was not statistically significant.

Time to Click

We calculated the average time to click by deleting people with no clicks, subtracting the page load time of a click entry with the click time, then averaging users' results for Version A and B. We chose to use a t-test because we are comparing the average time to click for each interface.

Interface A: 8,846.86 ms · Interface B: 9,119.82 ms

p-value = 0.1490, df = 55, reference t = 1.673

95% CI: −271.96 ± 3,052.70 (contains 0)

We did not reject the null hypothesis because our calculated p-value (0.1490) was greater than 0.05. The difference between average time to click was not statistically significant.

Dwell Time

We deleted users who did not click and return to the page, then subtracted the first click time from the next page load time to calculate dwell time. We used a t-test to compare average dwell times.

Interface A: 24.39 seconds · Interface B: 76.38 seconds

t = 0.7360, df = 36, reference t = 1.688

Since |t| (0.7360) was less than 1.688, the data was not statistically significant. We did not reject the null hypothesis that dwell time would be the same in Version A and Version B.

Return Rate

We found the total number of people who clicked for each version, then determined who returned by checking whether the maximum page load time exceeded the maximum click time. We used a Chi-squared test because this is categorical data.

	Clicked & Returned	Clicked & Did Not Return	Total Clicked
Interface A	18	11	29
Interface B	20	8	28
Total	38	19	57

Interface A: 63.3% · Interface B: 72.4%

χ² = 0.5616, df = 1, p-value reference = 3.84

Since 0.5616 is not greater than 3.84, the difference between versions A and B is not statistically significant. We did not reject the null hypothesis.

Part 2: Eye Tracking

For the eye tracking portion of this project, we used an eye tracker on two participants, one for each version of the site. Then, we used JavaScript and Python to create a heatmap and an animated replay of the users' eye movements based on data collected from each eye tracking session.

Hypothesis

We instructed the user to book the cheapest taxi. Our hypothesis is that users of Version A will focus on the descriptions and spend more time searching for the taxi that they want to click due to having to scroll through options. Meanwhile, users of Version B are able to view all of the information at once since it is all contained on one screen, so they will not look at the information as closely and will click on a button in a shorter amount of time. Since our instruction for the eye-tracking experiment differs from that of the A/B testing data collection (to simply choose a taxi), our hypotheses differ slightly from Part 1.

Eye Tracking Replays

Version A eye tracking replay — Version A — eye tracking

Version B eye tracking replay — Version B — eye tracking

Heatmaps

Interpretation

The eye tracking data shows that the user for Version A focused on the description section for each section as they scrolled through the site, while the user for Version B does not focus on the description as much and looks at the images before clicking a button. Additionally, the number of data points shows that the user for Version A spent a longer time on the site than the user for Version B. These findings support our hypothesis.

Part 3: Comparison

Based on the data analysis and eye-tracking results, we propose that Memphis Taxis Co. use Interface A due to its higher click-through rate and lower click time. The click-through rate of Interface A (85.3%) is higher than that of Interface B (73.7%), which may be caused by the cleaner design simplifying information retrieval and the decision-making process. The average time to click for Interface A (8,847.86 ms) is also lower than that of Interface B (9,119.82 ms), suggesting that Interface A's vertical layout helps people consider options individually and make quicker decisions, unlike Interface B where users read and compare each option before deciding. The eye-tracking heatmap shows that Interface A's user focused on several descriptions before making her final decision and had a longer click time than Interface B's eye-tracking user — contrary to the A/B Test results — due to a different command given to the user. Therefore, despite the results that advocate for Interface A, we should conduct more tests to achieve statistically significant results and use the same commands for A/B data collection and eye-tracking.

When comparing A/B testing and eye tracking, we noticed that eye tracking has no measurement for behavioral trends such as return rate and dwell time, since the session ends once the user clicks a button. The advantage of A/B testing is that it captures time-based metrics like dwell time and return rate. The advantage of eye tracking is that it reveals where on the screen users are looking, informing specific design choices — and the visual heatmap enables direct comparisons between versions.

One metric that could be used unethically is click through rate, if images or text are intentionally misleading in order to get users to click. Another is conversion rate — the number of users who completed a desired goal divided by total visitors — which can be manipulated to push users into unwanted purchases without taking into consideration what is most desirable for the user.

Conclusion

This project effectively combines design with data collection. We used the eye tracker to create a heatmap which offered different data visualization from that of the A/B Testing results. Both provided insightful results into how users interact with the two different sites. To further build on this project, we would keep our hypotheses for A/B Testing and eye-tracking more consistent and conduct tests on a larger group of users.

Projects

Launchpad

About Me

GitHub

Mail

Spotify

Trash