This data analysis project in Python is intended to conduct a market analysis to assist a fitness studio based in Philippines with:
- Assessing global demand for workout for the timeframe between mid-March 2018 and mid-March 2023.
- Exploring top 3 fitness keywords searched that generated most interest for workout during various time periods.
- Gauging the split of interest in these three keywords by Philippines and its far/near neighbouring countries, including in Middle East.
- Identifying most popular workout types in Philippines and Singapore.
The main goal of these activities is to enable the firm to create unique new digital products and services for its existing customers and potential users informed by a data-driven product management strategy.
There were four fundamental datasets leveraged for this analysis:
- workout.csv
- three_keywords.csv
- workout_global.csv
- geo_three_keywords.csv
- Anaconda Navigator:
- To access the Jupyter Notebook for Python
- Found in the Datasets folder, all datasets were first inspected and then loaded into various Pandas DataFrames in the appropriate sections of the code.
- During inspection, various duplicated, corrupt and missing aspects of the data were identified.
- Before loading any file into DataFrames, the code was written to pre-emptively handle all problematic parts in the datasets.
Depending upon Google Trends and also YouTube keyword searches in niche instances, EDA aimed at answering some key questions, such as:
- How does the trend of global demand for workout appear for the timeframe between mid-March 2018 and mid-March 2023?
- Which of these three keywords - gym workout, home workout and home gym, generated most interest in 2020 and during 2022-2023 separately?
- What are the top 20 countries with the highest interest in workout?
- What is the breakdown of the three workout keywords for certain Middle Eastern and South Asian countries (aka. MESA), including Philippines and Singapore?
- Based on YouTube keyword searches, which workout type (e.g., yoga, weight lifting) generated highest interest in Philippines during the timeframe at hand?
While inspecting datasets that include multiple categorical variables, I realised that if I were to transpose those cat variables from columns into rows, it would help tremendously with my next steps for the exploratory data analysis work.
Hence, I decided to resort to a function, called stack(), to achieve this task:
df = file.set_index('Week').stack().reset_index()
Prior to that, I had needed to set the index to our 'Week' column so that I would be able to also group each of these three keywords by the datetime parameter without forfeiting it.
The critical analysis results are summarised as follows:
- Interest in workouts tends to be rather seasonal, spiking at the beginning of each year, presumably due to people's part of New Year resolutions.
- However, there is an anomaly observed in year 2020, with another but more powerful spike in global interest in workouts, shortly following the COVID announcement - in April 2020.
- During the peak COVID 2020 period, the home workouts had stood out as the most popular workout trend, whereas there was subsequently a global shift in trend towards the gym workouts from 2022 onwards.
- The United States in relative terms showed the highest interest in workouts while Philippines globally ranked the 7th spot where the firm is based. In addition, almost half of the top 20 countries turned out to be based in Middle East or South Asia (MESA).
-
It was fairly interesting to note that, among the other prominent MESA countries in rankings, Philippines came out on top for the population's interest in home workout. Nevertheless, the same country ranked the last spot when comes to interest in home gym.
-
Linked to Finding #4 to a certain extent, Zumba is the most preferred method of workout in Philippines over the last 5 years, which may also indicate the reason as to why the population living in Philippines works out rather at home than visiting a gym, as shown below:
Based on the analysis, I recommend the following actions:
- Invest human and financial capital to devise a data-driven product management strategy.
- Launch marketing communications campaigns, particularly built on the theme Zumba, in order to alter potential users' behaviour towards visiting the firm's fitness studios.
- Consider Zumba and Yoga as the pilot content while developing products and services designed to more intelligently interact with potential users.
- Continue to undertake data researches of similar type every 6 months to stay abreast of new developments in the industry.
All this data analysis work hinge on the datasets made available by DataCamp. By consequence, the accuracy of the findings and proposed suggestions are inherently bound to the integrity of such data.