This project uses SQL to calculate and compare monthly churn rates for two user segments (30 and 87) between December 2016 and March 2017. The goal is to identify which segment demonstrates higher retention and provide data-driven recommendations for business expansion.
- How many months has the company been operating? How can this be used to find the churn rate?
- What segments of users exist?
- What is the overall churn trend since the company started?
- Compare the churn rates between user segments.
- Which segment should the company expand on?
- The company's data ranges from December 2016 to March 2017. Because users have a minimum 1-month subscription length, we only need to use January, February, and March to find the churn rate.
- There are two segments: segment 87 and segment 30.
- The overall churn rate of the company has increased since data collection started in December. With one exception being a slight decrease in segment 30 from January to February.
- Segment 30 had a much lower churn rate than segment 87. While churn rate for both groups increased in March, users in segment 30 were less likely to cancel in all three months.
- The company should expand on segment 30. Future analysis into why the churn rate for segment 80 is higher should be done. Has segment 87 been around longer? Is segment 30 newer? How much are we spending on these two segments? What kind of customers are in each segment? What is the account age of canceled users?
The data used in this project is stored in a SQL table called subscriptions (from Codecademy). It has the following columns:
- id - primary key id.
- subscription_start - start date of subscription.
- subscription_end - last date of subscription(Null if still active).
- segment - identifies which segment or group the subscription belongs to.
The company has a minimum subscription length of 31 days, so no user can start and end their subscription in the same month.
- Currently, in the data, there are only two segments. If we wanted to add more segments to this query so we can reuse it, this code won't scale well. We could add a segment to group by in the status aggregate, which would let us use this query for any number of segments.
- For large databases, this query will not be fast

