Problem 1: A local pizza restaurant wants to get a better sense of who its customers are and how much they buy. The file Pizza_Customers.xlsx shows data collected on 30 randomly selected customers. Variables include Age, Female (1 if female, 0 otherwise), Annual Income, Married (1 if married, 0 otherwise), Own (1 if own residence, 0 otherwise), College (1 if completed college degree, 0 otherwise), Size (household size) and Spending (annual store spending).

1. Perform hierarchical clustering to group the customers based on the numerical variables only (Age, Annual Income, Size, and Spending). Describe each cluster based on the cluster characteristics.
2. Perform hierarchical clustering to group the customers based on the categorical variables (Female, Married, Own, and College), and Spending. Describe each cluster based on the cluster characteristics.
4. Experiment with other combinations of variables with Spending and find one that you believe is more insightful. Justify your recommendation.

Problem 2: A telecommunications company wants to identify customers who are likely to unsubscribe to their telephone service. The file Telecom.xlsx shows the data collected from 100 customers: ID (customer ID), Age, Income (annual income), Usage (monthly usage, in minutes), Tenure (time as a subscriber, in months), and Unsubscribe (1 if unsubscribed, 0 if still subscribed).

1. Perform k-means clustering to group the 100 customers into four clusters based on Age, Income, Usage, and Tenure. Describe the characteristics of each cluster.
2. Compute the percent of customers that have unsubscribed to the telephone service from each cluster. Which cluster has the highest percent of customers who have unsubscribed to the telephone service?
3. Experiment with another cluster size. In your opinion, which of these two cluster sizes is more insightful. Explain your answer.
4. What would you recommend to the telecommunication company as a result of this analysis?

