Bellabeat Project

Summary

This case study was completed by Osbaldo Albornoz in February 2023 as part of the Google Data Analytics Professional Certificate capstone unit. R has been used to complete this case study and then hosted online through Github.

This is a case study, where we perform thinking as real-world tasks of a junior data analyst. In this case we are working for a fictional company, Bellabeat company, where we meet different characters and team. members.

Scenario

You are a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the market. Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. You have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights you discover will then help guide marketing strategy for the company. You will present your analysis to the Bellabeat executive team along with your high-level recommendations for Bellabeat’s marketing strategy. 

About the Company

Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women. 

By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates. 

Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, she would like high-level recommendations for how these trends can inform Bellabeat marketing strategy. 

Case Details

Stakeholders

  • Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer 

  • Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team 

Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy. You joined this team six months ago and have been busy learning about Bellabeat’’s mission and business goals — as well as how you, as a junior data analyst, can help Bellabeat achieve them.

Products

  • Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products. 

  • Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress. 

  • Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness. 

  • Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.

  • Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals. 

Business task: Analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices.

M E T H O D O L O G Y S T E P S

ASK

  1. What are some trends in smart device usage? 

  2. How could these trends apply to Bellabeat customers? 

  3. How could these trends help influence Bellabeat marketing strategy? 

PREPARE

Data source

A Kaggle data set: FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains consented personal fitness tracker data from thirty Fitbit users which includes minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.

Data is not available on Bellabeat customers, nor on usage of Bellabeat products, hence proxy data will be used for this analysis. Public data on Fitbit users will be the primary data for this analysis. This dataset was obtained from Kaggle and has very high credibility and legitimate licensing.

PROCESS

Installing and loading packages

Loading the datasets (raw data)

# Loading Raw data

activity <- read.csv("/Users/osbaldoealbornoz/Documents/Bellabeat_Project/dailyActivity_merged.csv")

calories <- read.csv("/Users/osbaldoealbornoz/Documents/Bellabeat_Project/dailyCalories_merged.csv")

intensities <- read.csv("/Users/osbaldoealbornoz/Documents/Bellabeat_Project/dailyIntensities_merged.csv")

sleep <- read.csv("/Users/osbaldoealbornoz/Documents/Bellabeat_Project/sleepDay_merged.csv")

weight <- read.csv("/Users/osbaldoealbornoz/Documents/Bellabeat_Project/weightLogInfo_merged.csv")

steps <- read.csv("/Users/osbaldoealbornoz/Documents/Bellabeat_Project/dailySteps_merged.csv")

Checking and Cleaning the datasets

Lets take a look at the datasets, first we use the head and str function starting with the activity dataset.

activity dataset

head(activity)
##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366    4/12/2016      13162          8.50            8.50
## 2 1503960366    4/13/2016      10735          6.97            6.97
## 3 1503960366    4/14/2016      10460          6.74            6.74
## 4 1503960366    4/15/2016       9762          6.28            6.28
## 5 1503960366    4/16/2016      12669          8.16            8.16
## 6 1503960366    4/17/2016       9705          6.48            6.48
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.44                     0.40
## 4                        0               2.14                     1.26
## 5                        0               2.71                     0.41
## 6                        0               3.19                     0.78
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                3.91                       0                30
## 4                2.83                       0                29
## 5                5.04                       0                36
## 6                2.51                       0                38
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  11                  181             1218     1776
## 4                  34                  209              726     1745
## 5                  10                  221              773     1863
## 6                  20                  164              539     1728
str(activity)
## 'data.frame':    940 obs. of  15 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ TotalSteps              : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
##  $ TotalDistance           : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ Calories                : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...


calories dataset

head(calories)
##           Id ActivityDay Calories
## 1 1503960366   4/12/2016     1985
## 2 1503960366   4/13/2016     1797
## 3 1503960366   4/14/2016     1776
## 4 1503960366   4/15/2016     1745
## 5 1503960366   4/16/2016     1863
## 6 1503960366   4/17/2016     1728
str(calories)
## 'data.frame':    940 obs. of  3 variables:
##  $ Id         : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay: chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ Calories   : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...

Note: It seems that the information from the calories dataset is also contained in the activities dataset, since the fields of the calories dataset are present in activities, only with the difference of the date field, which has different names. We can check if this is true by using the all function.

all(calories %in% activity)
## [1] TRUE

This result TRUE indicates that the dataset calories in contained in the activity dataset.

Lets check the others datasets as well !!

all(intensities %in% activity)
## [1] TRUE
all(sleep %in% activity)
## [1] FALSE
all(weight %in% activity)
## [1] FALSE
all(steps %in% activity)
## [1] TRUE

So, the calories, intensities and steps datasets are contained in the activity dataset. this means that we will use the following datasets:

activity, sleep, weight.

Column names consistency

We will use the clean_names function just to make sure that all datasets have only characters, number and underscores in the column names.

Checking the sleep and weight datasets

str(sleep)
## 'data.frame':    413 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : chr  "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...
str(weight)
## 'data.frame':    67 obs. of  8 variables:
##  $ Id            : num  1.50e+09 1.50e+09 1.93e+09 2.87e+09 2.87e+09 ...
##  $ Date          : chr  "5/2/2016 11:59:59 PM" "5/3/2016 11:59:59 PM" "4/13/2016 1:08:52 AM" "4/21/2016 11:59:59 PM" ...
##  $ WeightKg      : num  52.6 52.6 133.5 56.7 57.3 ...
##  $ WeightPounds  : num  116 116 294 125 126 ...
##  $ Fat           : int  22 NA NA NA NA 25 NA NA NA NA ...
##  $ BMI           : num  22.6 22.6 47.5 21.5 21.7 ...
##  $ IsManualReport: chr  "True" "True" "False" "True" ...
##  $ LogId         : num  1.46e+12 1.46e+12 1.46e+12 1.46e+12 1.46e+12 ...

Date Format

The Dates columns in the activity, sleep and weight datasets are chr type, lets fix and rename them.

Verifying the changes

str(activity)
## 'data.frame':    940 obs. of  15 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ Date                    : Date, format: "2016-04-12" "2016-04-13" ...
##  $ TotalSteps              : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
##  $ TotalDistance           : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ Calories                : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
str(sleep)
## 'data.frame':    413 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ Date              : POSIXct, format: "2016-04-12" "2016-04-13" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...
str(weight)
## 'data.frame':    67 obs. of  8 variables:
##  $ Id            : num  1.50e+09 1.50e+09 1.93e+09 2.87e+09 2.87e+09 ...
##  $ Date          : POSIXct, format: "2016-05-02 23:59:59" "2016-05-03 23:59:59" ...
##  $ WeightKg      : num  52.6 52.6 133.5 56.7 57.3 ...
##  $ WeightPounds  : num  116 116 294 125 126 ...
##  $ Fat           : int  22 NA NA NA NA 25 NA NA NA NA ...
##  $ BMI           : num  22.6 22.6 47.5 21.5 21.7 ...
##  $ IsManualReport: chr  "True" "True" "False" "True" ...
##  $ LogId         : num  1.46e+12 1.46e+12 1.46e+12 1.46e+12 1.46e+12 ...

Checking for Uniformity and valid sample of the data

This will help us to find if there are enough subjects for the analysis in each dataset. We will use the n_distinct function

n_distinct(activity$Id)
## [1] 33
n_distinct(sleep$Id)
## [1] 24
n_distinct(weight$Id)
## [1] 8

This output is telling us that in the weight dataset there are not enough subjects for the analysis.

Checking for duplicates

sum(duplicated(activity))
## [1] 0
sum(duplicated(sleep))
## [1] 3

There a 3 duplicates in the sleep dataset, lets remove them

sleep <- unique(sleep)
sum(duplicated(sleep))
## [1] 0

Checking missing values

activity dataframe

sum(is.na(activity))
## [1] 0

sleep dataframe

sum(is.na(sleep))
## [1] 0

There are no missing values in the dataframes.

Merging the datasets

Now that we have checked and cleaned the datasets (activity and sleep), we can merge them in a single dataset.

# Merging the activity and sleep datasets
all_activity <-merge(activity, sleep, by=c("Id", "Date"), all.x = TRUE)
head(all_activity)
##           Id       Date TotalSteps TotalDistance TrackerDistance
## 1 1503960366 2016-04-12      13162          8.50            8.50
## 2 1503960366 2016-04-13      10735          6.97            6.97
## 3 1503960366 2016-04-14      10460          6.74            6.74
## 4 1503960366 2016-04-15       9762          6.28            6.28
## 5 1503960366 2016-04-16      12669          8.16            8.16
## 6 1503960366 2016-04-17       9705          6.48            6.48
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.44                     0.40
## 4                        0               2.14                     1.26
## 5                        0               2.71                     0.41
## 6                        0               3.19                     0.78
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                3.91                       0                30
## 4                2.83                       0                29
## 5                5.04                       0                36
## 6                2.51                       0                38
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  11                  181             1218     1776
## 4                  34                  209              726     1745
## 5                  10                  221              773     1863
## 6                  20                  164              539     1728
##   TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1                 1                327            346
## 2                 2                384            407
## 3                NA                 NA             NA
## 4                 1                412            442
## 5                 2                340            367
## 6                 1                700            712

In this case the resulting dataset will have missing values because we are retaining the unmatched values from the activity dataframe. This is ok for this analysis.

Saving the Dataset

Once the data set is clean and free of errors, it is ready for analysis and it is good practice to keep it in a safe place.

# Saving the Dataset in our system
fwrite(
  all_activity, 
  "/Users/osbaldoealbornoz/Documents/Bellabeat_Project/all_activity.csv", 
  col.names = TRUE,
  row.names = FALSE
  )

ANALISYS

Lets Summarize the data !!!

summary(all_activity)
##        Id                 Date              TotalSteps    TotalDistance   
##  Min.   :1.504e+09   Min.   :2016-04-12   Min.   :    0   Min.   : 0.000  
##  1st Qu.:2.320e+09   1st Qu.:2016-04-19   1st Qu.: 3790   1st Qu.: 2.620  
##  Median :4.445e+09   Median :2016-04-26   Median : 7406   Median : 5.245  
##  Mean   :4.855e+09   Mean   :2016-04-26   Mean   : 7638   Mean   : 5.490  
##  3rd Qu.:6.962e+09   3rd Qu.:2016-05-04   3rd Qu.:10727   3rd Qu.: 7.713  
##  Max.   :8.878e+09   Max.   :2016-05-12   Max.   :36019   Max.   :28.030  
##                                                                           
##  TrackerDistance  LoggedActivitiesDistance VeryActiveDistance
##  Min.   : 0.000   Min.   :0.0000           Min.   : 0.000    
##  1st Qu.: 2.620   1st Qu.:0.0000           1st Qu.: 0.000    
##  Median : 5.245   Median :0.0000           Median : 0.210    
##  Mean   : 5.475   Mean   :0.1082           Mean   : 1.503    
##  3rd Qu.: 7.710   3rd Qu.:0.0000           3rd Qu.: 2.053    
##  Max.   :28.030   Max.   :4.9421           Max.   :21.920    
##                                                              
##  ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
##  Min.   :0.0000           Min.   : 0.000      Min.   :0.000000       
##  1st Qu.:0.0000           1st Qu.: 1.945      1st Qu.:0.000000       
##  Median :0.2400           Median : 3.365      Median :0.000000       
##  Mean   :0.5675           Mean   : 3.341      Mean   :0.001606       
##  3rd Qu.:0.8000           3rd Qu.: 4.782      3rd Qu.:0.000000       
##  Max.   :6.4800           Max.   :10.710      Max.   :0.110000       
##                                                                      
##  VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
##  Min.   :  0.00    Min.   :  0.00      Min.   :  0.0        Min.   :   0.0  
##  1st Qu.:  0.00    1st Qu.:  0.00      1st Qu.:127.0        1st Qu.: 729.8  
##  Median :  4.00    Median :  6.00      Median :199.0        Median :1057.5  
##  Mean   : 21.16    Mean   : 13.56      Mean   :192.8        Mean   : 991.2  
##  3rd Qu.: 32.00    3rd Qu.: 19.00      3rd Qu.:264.0        3rd Qu.:1229.5  
##  Max.   :210.00    Max.   :143.00      Max.   :518.0        Max.   :1440.0  
##                                                                             
##     Calories    TotalSleepRecords TotalMinutesAsleep TotalTimeInBed 
##  Min.   :   0   Min.   :1.00      Min.   : 58.0      Min.   : 61.0  
##  1st Qu.:1828   1st Qu.:1.00      1st Qu.:361.0      1st Qu.:403.8  
##  Median :2134   Median :1.00      Median :432.5      Median :463.0  
##  Mean   :2304   Mean   :1.12      Mean   :419.2      Mean   :458.5  
##  3rd Qu.:2793   3rd Qu.:1.00      3rd Qu.:490.0      3rd Qu.:526.0  
##  Max.   :4900   Max.   :3.00      Max.   :796.0      Max.   :961.0  
##                 NA's   :530       NA's   :530        NA's   :530

Some observations and conclusions that can be made from the summary statistics:

  • On the average, participants sleep 1 time for 419.5 minutes or 7 hours.

  • On average, individuals take 7,638 steps per day and cover a distance of 5.49 km.

  • The median value of very active distance is 0.21 km, which suggests that most people in the dataset are not very active.

  • The mean value of total sleep records is 1.119, which indicates that some people may not have recorded their sleep at all.

  • The maximum values for total steps and distance are quite high, which suggests that some individuals in the dataset are very active.

  • The average number of sedentary minutes per day is 991.2, which is quite high and indicates that many people in the dataset may have a sedentary lifestyle.

  • The maximum calories burned per day is 4,900, which is a very high value and suggests that some individuals in the dataset may be professional athletes or have physically demanding jobs.

SHARE

After cleaning, summarizing and analyzing the data, it’s time to give it some life with some graphs and charts that will help us visualize, understand, and identify trends, patterns, and correlations in the data.

Device use over time

Lets see how frequently the smart devices are used:

# illustrating the average amount of time each user uses the device over time
ggplot(data = device_use_frequency, mapping = aes(x = Date, y = use_time)) + geom_smooth(color = "blue", fill = "lightblue") + labs(title = "Device use frequency over time")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

This visualization is clearly telling us that the use of the smart wearable by a customer declines over time and will lead to very little or no use of the wearable and a drop in the market of Bellabeat.

User types

Lets create some categories for the users according to how active they are …

head(user_categories)
## # A tibble: 6 × 26
## # Groups:   Id [1]
##           Id Date       TotalS…¹ Total…² Track…³ Logge…⁴ VeryA…⁵ Moder…⁶ Light…⁷
##        <dbl> <date>        <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 1503960366 2016-04-12    13162    8.5     8.5        0    1.88   0.550    6.06
## 2 1503960366 2016-04-13    10735    6.97    6.97       0    1.57   0.690    4.71
## 3 1503960366 2016-04-14    10460    6.74    6.74       0    2.44   0.400    3.91
## 4 1503960366 2016-04-15     9762    6.28    6.28       0    2.14   1.26     2.83
## 5 1503960366 2016-04-16    12669    8.16    8.16       0    2.71   0.410    5.04
## 6 1503960366 2016-04-17     9705    6.48    6.48       0    3.19   0.780    2.51
## # … with 17 more variables: SedentaryActiveDistance <dbl>,
## #   VeryActiveMinutes <int>, FairlyActiveMinutes <int>,
## #   LightlyActiveMinutes <int>, SedentaryMinutes <int>, Calories <int>,
## #   TotalSleepRecords <int>, TotalMinutesAsleep <int>, TotalTimeInBed <int>,
## #   mean_daily_steps <dbl>, mean_very_active_minutes <dbl>,
## #   mean_fairly_active_minutes <dbl>, mean_lightly_active_minutes <dbl>,
## #   mean_sedentary_minutes <dbl>, total_daily_minutes <dbl>, …

Types of users by activity category

ggplot(data = user_categories) + geom_bar(mapping=aes(x= user_category, fill = user_category))+
  scale_fill_manual(values = c("#66BB6A", "orange", "purple","#2196F3")) +
  labs(title = "Categories of users")

Without any doubt we can say that users who are very active use their smart devices more.

But, it’s important to see how these types of users behave over time.

Let’s check what happens in the VeryActiveMinutes category

ggplot(data = user_categories) + geom_line(mapping = aes(x= Date, y = VeryActiveMinutes, group = Id, color = user_category)) + 
  labs(title = "Very Active Minutes Across Days for each user") +
  scale_color_manual(values = c("#66BB6A", "orange", "purple","#2196F3"))

The intensity of physical activity is significantly reduced over the months.

Lets see if this trend is also true during a days

ggplot(data = user_categories, mapping = aes(x = Date, y = VeryActiveMinutes)) + 
  geom_smooth(color = "#2196F3", fill = "lightblue") + 
  labs(title = "Trend of very active minutes across days")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

The trend continues throughout the days

Now lets see these trends for the lightly and moderate active categories…

For moderate activity

ggplot(data = user_categories, mapping = aes(x = Date, y = LightlyActiveMinutes)) + 
  geom_smooth(color = "orange", fill = "#F9D2B4") + 
  labs(title = "Trend of light active minutes across days")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

For fairly activity

ggplot(data = user_categories, mapping = aes(x = Date, y = FairlyActiveMinutes)) + 
  geom_smooth(color = "#66BB6A", fill = "lightgreen") + 
  labs(title = "Trend of fearly active minutes across days")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

These trends indicate that while the more active a user is the more.

Lets see some specific correlations

correlations <- cor(all_activity[,c("TotalSteps", "TotalDistance", "VeryActiveMinutes", "Calories")])
corrplot(correlations, type = "upper", order = "hclust", tl.cex = 0.8)

ggplot(all_activity, aes(x = TotalSteps, y = Calories)) + 
  geom_point(color = "#2196F3") + 
  ggtitle("Relationship between calories and total steps")

Now lets see the sedentary minutes

ggplot(all_activity, aes(x=SedentaryMinutes)) +
  geom_density(color = "#2196F3") +
  xlab("Sedentary Minutes") +
  ylab("Density") +
  ggtitle("Distribution of Sedentary Minutes")

This indicates that users tend to increase sedentary lifestyle

Lets see the correlations between the pair of variables…

corr_matrix <- cor(all_activity[,3:15])
melted_corr_matrix <- melt(corr_matrix)
ggplot(melted_corr_matrix, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

ACT

Final Observations and Conclusions.

Based on the visualizations, we can see some trends in smart device usage, such as:

  • Users tend to be more active during weekdays than weekends.

  • TotalSteps and TotalDistance are positively correlated, meaning that users who take more steps also tend to travel farther distances.

  • SedentaryMinutes and VeryActiveMinutes are negatively correlated, meaning that users who are more active tend to be less sedentary.

  • There is a wide range of variability in the amount of sleep users get each night.

  • While there are some indications that some individuals are very active, the high number of sedentary and sleep minutes, data suggest that many people in the dataset may have a sedentary lifestyle.

Recomendations

  • Use social media platforms to create a community of users who can share their progress, tips and experiences with the device. This can create a sense of accountability and motivation for users.

  • Offer discounts and promotions to customers who reach certain activity goals, or who consistently use the device over a certain period of time. This can incentivize customers to continue using the device and achieve their fitness goals.

  • Partner with fitness influencers or athletes to showcase the device and its benefits to a wider audience. This can help establish credibility and trust with potential customers.

  • Use targeted online advertising to reach potential customers who have shown an interest in fitness or health-related products. This can help increase awareness of the device and drive sales.

  • Provide customers with personalized insights and recommendations based on their activity data, such as suggesting specific workouts or tips to improve sleep quality. This can help create a more engaging and personalized experience for users.

by Osbaldo Albornoz