Bellabeat
Coursera: Google Data Analytics Capstone Project
Introduction
Bellabeat, cofounded by Urška Sršen and Sando Mur, is small company that focuses on manufacturing high-tech & health-focused products for women. Some of these products include the “Leaf”, a wellness tracker, “Time”, a wellness watch, and “Spring”, a water bottle that can track your daily water intake. The company has come as for to developing the “Bellabeat app” that allows their users to keep track of their health-related data such as exercising activities they complete, sleep cycles, stress, and menstrual cycles. Bellabeat also offers month-to-month memberships that allows their users to fully personalize and track their nutrition, exercise, sleep, health, beauty, and mindfulness.
Current Project
For this current project, we have been asked by Bellabeat’s cofounders Urška Sršen & Sando Mur, and Bellabeat’s marketing analytics team to gain better insights on how current Bellabeat product consumers use their products and then provide insightful recommendations for continued growth and future marketing strategies.
We will be analyzing smart device usage from non-Bellabeat smart devices to later compare them to that of Bellabeat’s to answer the following questions:
What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat marketing strategy?
Preparing the Data
Participants & Measures
The data was provided by Fitbit Fitness Tracker Data. The data is public domain and was made available through Mobius. All responses collected came from unique Fitbit users. All users consented to be a part of the data collection process. The personal tracker data collected was measured for the following: daily activities, daily steps, heart rate, and sleep. The data sets that I used for this project were "Daily Activity" and "Sleep".
We can see below the structure of each data set. Daily_Activity has 15 columns and 940 rows and based on "Id" alone we see that there are 33 distinct ID numbers. As for the Sleep data set, there are 5 columns and 413 rows. We also see that there are 24 distinct ID numbers. Simply scanning over these outputs we see that we are working with numerical data.
Cleaning the Data
By scanning and cleaning the datasets we can verify the integrity of the data. Once cleaned and verified we are able to run all analyses.
In this session:
Transformed the variables that were measured in minutes into hours by using the mutate function. The new variables were added to their corresponding data sets/data frames.
Next, I calculated the means of all the variables of interest which were Total Hours Asleep & Total Hours in Bed from the Sleep data, and Sedentary Hours, Total Steps, & Calories from the Daily Activity data. The means were calculated by ID. I did this to reduce the data down by each unique ID, rather than treating each ID number as a different one. Also, since this data was compiled across a 1-month time span, it was best to show how each participant performed over that time span rather than per day.
Lastly, I created two final versions of the data sets I would be working with containing the new calculated variables: DA_MEANS_FINAL & SLEEP_MEANS_FINAL. Each data set contained means of each variable of interest and the data frames were now 33 x 3 and 24 x 2 respectively.
Analyzing the Data in R Studio
Daily Activity
For the Daily Activity data set I first measured the relationship between the number of Total Steps walked and the number of Calories burned with a correlation. I then measured the correlation between the number of Total Steps and Sedentary Minutes.
Sleep
For the Sleep data set I measured the correlation between Total Hours Asleep to the Total Hours In Bed.
Joining Data Sets:
Daily Activity + Sleep
To understand how sleeping habits impact daily activity we joined the data sets by user ID to control for duplicate participants. The combined data set had 24 unique ID numbers, and the data frame was 24 x 5. The correlation for this data set was Total Hours Asleep and Total Steps.
Summarizing Findings/Visiualizations
Daily Activity Part 1:
Consistent with existing data and previous research studies we see that there is a positive relationship between the number of daily steps and calories burned (1). Also, looking at the scatter plot we see that the majority of the participants walked between 4,000 - 8,000 steps a day, burning between 1,500 - 3,000 calories a day. This is consistent with the reported 4,000 steps a day (1). As for the number of calories burned, it remains consistent with existing data where the average American woman burns between 1,600 - 2,400 calories a day, and the average American man burns about 2,000 - 3,000 calories a day (2).
Statistically, the strength of the relationship (r = 0.436) can be categorized as weak (5).
Daily Activity Part 2:
As expected, there is a negative relationship between the amount spent sitting down and the number of daily steps. Those who walk more spend less sedentary time versus those who do not. We do however see that even those who walked the average of about 10,000, as shown in the graph, their sedentary time ranged from an average of 10 to 15 hours a day. A research study published in 2019, studied sedentary behavior in the US population and made note that from the years 2001-2016 the time increased to about 8.2 hours per day (3). When compared to our data, sedentary time is a lot higher than the national average.
Statistically, the strength of the relationship (r = -0.385) can be categorized as weak (5).
Sleep:
In this graph, we see a positive relationship between the time spent in bed versus time spent sleeping. We also see that the average time spent asleep is between 6 to 9 hours a day. As recommended by the CDC, the average U.S. adult should spend about 7 hours or more sleeping (4). We also see that the average time spent in bed ranged between 5 to 10 hours asleep. This means that these participants were mostly in their bed for the purposes of sleeping, instead of probably watching tv among other leisure-like activities. These are signs of good sleeping habits.
Statistically, the strength of the relationship
(r = 0.94) is very strong (5).
Joined Data Sets:
This is the most interesting graph we see. This one shows the relationship between the number of steps taken daily versus the amount of time asleep. As we can see the relationship is not linear. We can categorize the results we see into three groups:
We have some people who are walking above the average, but sleeping below recommended hours.
We have the "happy" medium of those sleeping the recommended hours and walking the recommended daily steps.
Then we have those who are walking the average daily steps, but are sleeping more than the hours recommended.
Statistically, the strength of the relationship (r = -0.218) can be categorized as weak (5).
Next Steps
Answering the Questions
What are some trends in smart device usage?
As we have seen before, there are those who use their smart devices to try and remain as healthy as possible holistically. This is including sleep, the amount of time being sedentary, and the number of steps taken per day. Then there are those who are not in the "happy medium" by either being above the recommended average or below.
How could these trends apply to Bellabeat customers?
These trends can most definitely apply to Bellabeat customers because just like FitBit, they both sell health & fitness tracking devices. Fitbit is just more popular and definitely one of the leading brands in this niche area. Their target consumers are definitely very similar, if not the same.
How could these trends help influence Bellabeat marketing strategy?
Some of my recommendations would be to target consumers who want to be in that "happy medium", to promote overall holistic wellness. This is including promoting their trackers to walk above the average, but not overdo it. Also encourage appropriate bedtimes to get the best rest possible, having in mind the users' wake-up time. Track daily water consumption and encourage the appropriate amounts throughout. I would also have their trackers encourage people to get up and stretch after hours of sitting. The whole point is to promote holistic wellness, for people to maintain themselves in a healthy range.I would even go as far as to have a "happiness" tracker on their app to track the relationship to how they feel to how they take care of their health and physical throughout the day, tying in health, fitness, and mental health in a way.
Limitations
Having a larger sample size could provide better insights
Having demographics like age and gender/sex can provide more detail of the data
The participants volunteered and were not randomly selected
All variables were self-reported/automatically collected through their trackers which could have led to overreporting or underreporting their experiences
References
Center for Disease Control and Prevention. (2020, March 24). Higher daily step count linked with lower all-cause mortality. Centers for Disease Control and Prevention. Retrieved October 28, 2021, from https://www.cdc.gov/media/releases/2020/p0324-daily-step-count.html.
Fetters, K. A., Bedosky, L., Manning, J., Groth, L., Rapaport, L., Lawler, M., Millard, E., Gorin, A., Carstensen, M., & Upham, B. (2020, June 10). How to achieve 1 pound of weight loss. Everyday Health. Retrieved October 28, 2021, from https://www.everydayhealth.com/weight/how-to-achieve-one-pound-of-weight-loss.aspx.
Yang L, Cao C, Kantor ED, et al. Trends in Sedentary Behavior Among the US Population, 2001-2016. JAMA. 2019;321(16):1587–1597. doi:10.1001/jama.2019.3636.
Centers for Disease Control and Prevention. (2017, May 2). Sleep and Sleep Disorders. Centers for Disease Control and Prevention. Retrieved October 28, 2021, from https://www.cdc.gov/sleep/data_statistics.html.
LaMorte, W. W. (2021, April 21). PH717 module 9 - correlation and regression: Evaluating Association Between Two Continuous Variables. The Correlation Coefficient (r). Retrieved from https://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717-Module9-Correlation-Regression/PH717-Module9-Correlation-Regression4.html.