This project involves a comprehensive analysis of the IMDB movie dataset, aiming to understand the various factors that influence a movie's success and its IMDB rating.
The dataset is related to IMDB movies, and the central problem explored is: What factors contribute to a movie's success on IMDB? Success, in this case, is defined by a high IMDB rating. Understanding these factors is crucial for movie producers, directors, and investors who seek to make informed decisions when planning and investing in future projects.
Data preprocessing is the first step in the analysis. This phase includes:
- Handling missing values
- Removing duplicates
- Converting data types where necessary
- Potentially performing feature engineering to enhance the dataset’s usability for analysis
In this step, the dataset is explored to uncover relationships between various factors influencing IMDB ratings. The analysis includes examining correlations between ratings and variables such as:
- Genre
- Director
- Budget
- Year of release
- Actors
The goal is to identify which features are most impactful in determining a movie's rating.
The Five 'Whys' technique is applied to dig deeper into the data. For example, if movies with higher budgets tend to receive higher ratings, the "Why?" technique is used iteratively to uncover the underlying factors contributing to this trend.
The project culminates in a detailed report that narrates the findings from the data analysis. The report presents:
- The initial problem
- The analysis process
- Insights gained from the data
The findings are accompanied by visualizations to make the results more accessible and easier to understand. The ultimate goal is to provide actionable insights that can help stakeholders in the movie industry make more informed, data-driven decisions.