Quiz Instructions
• Start with datafile: Presidential Elections.jmp
• Change the file names to: YourFirstName_YourLastName_presidential.jmp
• Review what we did in the lab.
• You must use JMP to answer the following multiple-choice questions.
• Note: When you are asked to submit a screen capture, you need to make sure that your name is part of the capture.

Question 1 2 pts
Based on a hierarchical clustering on the election results, how many optimal clusters of States did you get from your analysis?

4

6

5

8

Question 2 2 pts
Which cluster of States has the lowest proportion of Democratic voters in the Presidential elections?

Alaska, Nebraska, Idaho, Wyoming, Utah

Iowa, Wisconsin, Pennsylvania, Minnesota

Virginia, Texas, Indiana

Arizona, Nevada, New Hampshire, Colorado, Florida

Question 3 2 pts
If you are an advisor for a Republican presidential nominee, for which States would you not run many campaign ads against your opponent who is a Democratic presidential nominee?

Iowa, Wisconsin, Pennsylvania, Minnesota

Alaska, Nebraska, Idaho, Wyoming, Utah

Arizona, Nevada, New Hampshire, Colorado, Florida

Virginia, Texas, Indiana

Question 4 2 pts
Show a parallel plot graph to demonstrate how different States are clustered together based on hierarchical clustering analysis? Submit a screen shot.
Upload

Question 5 2 pts
Perform a K-means clustering with initial number of cluster seed to vary between 3 and 10. What is the optimal number of clusters did you get from this analysis?

8

4

5

6

Question 6 2 pts
Show a parallel plot graph to demonstrate how different States are clustered together based on the K-means clustering analysis? Submit a screenshot.
Upload

Question 7 2 pts
Based on the K-means clustering, show how States are grouped together using colored coded US Map. Submit a screenshot.
Upload

Question 8 2 pts
Upon evaluation of cluster means obtained from K-means clustering, which cluster has the lowest overall democratic percentage of voters over the 9 election cycles?

2

4

8

3

Question 9 2 pts
Upon evaluation of cluster means obtained from K-means clustering, which cluster has the highest overall democratic percentage of voters over the 9 election cycles?

8

4

3

2

Question 10 2 pts
Which cluster depicts swing States where the percentage for both democratic and republican nominees are similar in the last couple of election years?

Pivot Tables and Charts
Introduction
Excel allows you to create cross-tab (or cross-tabulation) analysis in what it calls a Pivot Table.
Pivot tables can be one, two or three-dimensional. You can use multiple statistical analysis and summary options tools. You can include data from multiple worksheets, and you can modify them dynamically.
Creating a Pivot Table
1. Open ExcelPivotsWorksheetCIS360.xlsx, found on Canvas.
2. Click on the Expenses tab.
3. Click on cell A2.
4. On the Insert tab click on the [PivotTable] button.
5. The Create Pivot Table box opens:

Change the file name to: YourFirstName_YourLastName_Assignment7.jmp

You must use JMP to answer the following multiple-choice questions.

Note: When you are asked to submit a screen capture, you need to make sure that your name is part of the capture.

Question 1 2.5 pts

Since we are interested in understanding ADMITTED students, change the Value Order in "Column Properties" of the "ADMIT" column so that 1 (admitted) appears first in graphics and analyses. Show the screenshot to provide the evidence of this step.

Question 2 2.5 pts

Execute a contingency analysis to find out the relationship between ‘Admit’ and “Rank’ of the students. Which of the following statements is correct based on your analysis?

Group of answer choices

76.9% of Rank-2 students didn’t get admitted

45.9% of Rank-1 students didn’t get admitted.

35.8% of Rank-4 students didn’t get admitted.

17.9% of Rank-3 students got admitted

Question 3 2.5 pts

Execute a simple logistic regression analysis to test if ‘Admit’ is significantly related to ‘GPA’. Based on your analysis, which of the following statements is correct? [save the script to the datafile]

Group of answer choices

’Admit’ is significantly related to the independent variable ‘GPA’ because the p-value is less than 0.05

‘Admit’ is not related with ‘survived’ because one of them is categorical variable

‘Admit’ is not related to the dependent variable ‘GPA’ because the p-value is more than 0.05

As ‘GPA’ increases, ‘Admit’ decreases.

Question 4 2.5 pts

Share an appropriate screenshot to support your answer in Question 3. Make sure you use different colors and different markers for students who got admitted and who didn’t.

Question 5 2.5 pts

Develop a simple logistic regression analysis to test if ‘Admit’ is significantly related to ‘GRE’. Based on your analysis, which of the following statements is correct?

Group of answer choices

GRE is significantly related to the dependent variable ‘Admitted’

GRE is not related to the dependent variable ‘Admit’

As GRE increases, the Admit decreases.

GRE is related to ‘Admit’ but is it not significant

Question 6 2.5 pts

Share a screenshot of the Logistic Fit Plot to support your answer in Q5. Make sure you use different colors and different markers for students who got admitted and who didn’t.

Question 7 2.5 pts

Use Analyze-> Fit Model to perform a 'stepwise' fit for “ADMIT” using all relevant predictors in the data. Use Minimum BIC (as stopping rule), Forward (as direction), and Whole Effects (as Rules) in the model. Run the logistic regression model using the selected variables in the previous step. Which of the following variables is/are significant in the model to predict when students get admitted? [save the script to the data file]

Group of answer choices

GPA

None of the variables are significant

GRE

Both GRE and GPA

Question 8 2.5 pts

Submit the parameter estimate table from the above model in Question 7.

Upload

Question 9 2.5 pts

Submit the final equation of the model (the resulting logit of the probability model, saved as Lin[1] in the data table). [save script to the data file]

Upload

Question 10 2.5 pts

Submit the JMP data file with saved script for all the analysis to answer questions from this assignment.

Change the file names to: YourFirstName_YourLastName_titanicpassengers.jmp

Review what we did in the lab.

You must use JMP to answer the following multiple-choice questions.

Note: When you are asked to submit a screen capture, you need to make sure that your name is part of the capture.

Question 1 2 pts

Based on the available data sample, what percentage of the 3rd class passengers were on board that survived the accident?

Group of answer choices

74.5%

25.5%

43%

62%

Question 2 2 pts

What percentage of male on board did not survive the accident?

Group of answer choices

72.3%

19.1%

27.3%

80.9%

Question 3 2 pts

Develop a simple logistic regression model to test if ‘Age’ is a significant predictor of the outcome ‘survived’. Based on your analysis, which of the following statements is correct?

Group of answer choices

As 'Age' increases, the survival rate also increases.

Age is not a significant predictor of the dependent variable ‘survived’

Age is a significant predictor of the dependent variable ‘survived’

As 'Age' decreases, the survival rate decreases.

Question 4 2 pts

Share a screenshot to support your answer in Q3. Make sure you use different colors and different markers for passengers who survived and who didn’t.

Upload

Question 5 2 pts

Develop a simple logistic regression model to test if ‘Fare’ is a significant predictor of the outcome ‘survived’. Based on your analysis, which of the following statements is correct?

Group of answer choices

Fare is not correlated with the dependent variable ‘survived’ because the p-value is more than 0.05

Fare is not correlated with ‘survived’ because one of them is a categorical variable

As Fare increases, the survival rate decreases

Fare is a significant predictor of the dependent variable ‘survived’ because the p-value is less than 0.05

Question 6 2 pts

Share a screenshot to support your answer in Q5. Make sure you use different colors and different markers for passengers who survived and who didn’t.

Upload

Question 7 2 pts

As shown in the lab, develop a multiple logistic regression model (use stepwise regression) to predict survival. Which of the following variables is most significant in predicting survival?

Group of answer choices

Age

Port

Sex

Passenger class

Question 8 2 pts

As shown in the lab, develop a multiple logistic regression model (use stepwise regression) to predict survival. Which of the following variables is not significant to predict survival?

Group of answer choices

Parents and children

Age

Port

Passenger class

Question 9 2 pts

Provide a screenshot of appropriate table to support your answer in Q7 & Q8

Upload

Question 10 2 pts

Provide the screenshot of the equation of the final model (the resulting logit of the probability model, saved as Lin[Yes] in the data table).

In classification analysis, we are determining the probability of an observation ________.

Group of answer choices

To be undefined

To be one

To be part of a certain class or not

To be zero

Question 2 1 pts

A loan officer wants to know if the next customer is likely to default or not on a loan. How can she assess the risk of extending the loan to that customer?

Group of answer choices

By utilizing a simple linear regression model developed by an in-house analyst

By asking his colleague if he knows the person

By asking the customer if he is planning to default the loan or not

By utilizing a multiple logistic regression model developed by an in-house analyst

Question 3 1 pts

In classification analysis, we typically split the data into two mutually exclusive sets, known as ________, to investigate the strength of the developed model.

Group of answer choices

Training and Binary

Training and validation/testing

Binary and numeric

Testing and validation

Question 4 1 pts

Odds ratio is defined as ________, where p is the probability of success.

Group of answer choices

p/1-p

1/p-1

p/p-1

1/1-p

Question 5 1 pts

The ________ is often used to describe the performance of a classification model applied to a set of test data for which the true outcomes are known.

Group of answer choices

Effect summary table

ANOVA table

Parameter estimates table

Confusion matrix

Question 6 1 pts

If you want to find out if body weight, calorie intake, fat intake and age have an influence on the probability of having a heart attack (yes or no), which of the following kind of analysis will help determine the answer?

Group of answer choices

Multiple logistic regression

Simple logistic regression

Simple linear regression

Multiple linear regression

Question 7 1 pts

In classification problems, the primary source for accuracy estimation of the model is ________.

Group of answer choices

Probability of success

Logit

Confusion matrix

Odds ratio

Question 8 1 pts

In logistic regression analysis, instead of Y as a dependent variable, we use a function of Y called ________.

Group of answer choices

Logit

Odds

Odds ratio

Log of Y

Question 9 1 pts

Logistic regression is a specialized type of regression analysis that is designed to predict ________ variables.

Group of answer choices

independent

numeric dependent

a binary numeric

a binary categorical

Question 10 1 pts

In logistic regression, the dependent variable y is defined as:

Change the file name to: YourFirstName_YourLastName_Assignment6.jmp

You must use JMP to answer the following multiple-choice questions.

Note: When you are asked to submit a screen capture, you need to make sure that your name is part of the capture.

Question 1 2.5 pts

You have been asked to test if 'city mileage' of a car can be predicted based on the 'Fuel Tank capacity'. Which of the following statements is correct?

Group of answer choices

Dependent variable ‘Fuel Tank Capacity’ is negatively correlated with independent variable ‘City Mileage of Car’

The test shows that the variables are significantly correlated. Dependent variable ‘City mileage car’ is predicted as City Mileage (MPG) = 45.587166 - 1.3934743*Fuel Tank Capacity

The test shows that both variables are correlated. The dependent variable ‘Fuel Tank Capacity’ is predicted as Fuel Tank Capacity = 45.587166 + 1.3934743*City Mileage of Car.

The test shows that both variables are not correlated. Dependent variable City mileage car is predicted as City Mileage (MPG) = 45.587166 + 1.3934743*Fuel Tank Capacity

Question 2 2.5 pts

What is the coefficient of determination between 'City Mileage' and 'Weight'? [Save the script to the data file]

Group of answer choices

47.05

0.71

-0.008

0.66

Question 3 2.5 pts

The team wants to find out if there are any other variables that are significantly correlated, with a correlation coefficient greater than +0.8 or less than -0.8. Execute an appropriate analysis to answer this question. Which of the following combinations of variables satisfy this condition? [Save the script to the data file]

Group of answer choices

Luggage Capacity and weight

Fuel Tank Capacity & Weight

Weight & Maximum Horsepower

Maximum Horsepower & Engine Size

Question 4 2.5 pts

Use the least square method to develop a linear model to predict the city mileage of a car using all other variables in the data file as independent variables, except ‘Model’ & ‘Manufacturer’. Remove the insignificant parameters from the model one by one by checking the log(worth) of each parameter and removing the least important parameter first from the model. Which of the following variables remain significant in the final model? [Save Script to the datafile]

Group of answer choices

Maximum Horsepower & Fuel Tank Capacity

Vehicle category & Weight

Luggage capacity & Weight

Rear seat room & Weight

Question 5 2.5 pts

What is the coefficient of determination of model equation obtained from Question 4?

Group of answer choices

0.43

0.64

0.78

0.80

Question 6 2.5 pts

The team also asked you to check for any multi-collinearity effects in your model obtained from Question 4. After testing for any multi-collinearity effects (using VIF), what did you find out?

Group of answer choices

‘Passenger capacity’ and ‘Length’ variables show multi-collinearity effects in the model

‘Passenger capacity’ and ‘Weight’ variables show multi-collinearity effects in the model

‘Fuel Tank Capacity’, ‘Width’ and ‘Weight’ variables show multi-collinearity effects in the model

‘Vehicle Category’ and ‘Engine Size’ variables show multi-collinearity effects in the model

Question 7 2.5 pts

If you discovered multi-collinearity effects in the model, remove the variables in question one at a time (starting from the highest VIF) from the model and then stop when you don’t need to remove any further variable(s) from the model based on accepted VIF and p-values. After this process, submit the screenshot of the ‘Effect Summary’ of the final model.

Upload

Question 8 2.5 pts

Based on the model in question 7, what are the strongest and weakest variables in predicting the city mileage of a car?

Group of answer choices

Passenger Capacity (strongest) and Weight (weakest)

Vehicle Category (strongest) and Passenger Capacity (weakest)

Weight (strongest) and ‘Length’ (weakest)

Fuel Tank Capacity (strongest) and Passenger Capacity (weakest)

Question 9 2.5 pts

What is the model equation for the final model in Question 7 to predict City mileage of a car? [Save script] Submit a screenshot.

Question 10 2.5 pts

Submit the JMP data file with saved script for all the analysis to answer questions from this assignment.

Change the file names to: YourFirstName_YourLastName_housingprices.jmp

Review what we did in the lab.

You must use JMP to answer the following multiple-choice questions.

Note: When you are asked to submit a screen capture, you need to make sure that your name is part of the capture.

Question 1 2 pts

In a simple linear regression model that predicts home price based on the number of bedrooms, the coefficient of determination is:

Group of answer choices

None of the given answers is correct

0.44

0.98

0.46

Question 2 2 pts

Is the linear model obtained in the previous question significant? Support your answer with an appropriate screenshot from JMP analysis.

Upload

Question 3 2 pts

When you develop a prediction model for home-price, based on the beds, baths, and square feet as independent variables, which, if any, of the following independent variables is significant in the model?

Group of answer choices

All of the other answers are correct

Square footage

Baths

Beds

Question 4 2 pts

In a final model (after removing all the non-significant predictors) to predict the home price based on other variables available in the data file, the most important predictor that contributes significantly is:

Group of answer choices

Square feet

Acres

Baths

Miles to base

Question 5 2 pts

What is the model equation based on multiple regression analysis to predict home price? (reference Q4). Provide a screenshot from JMP output to show the model question.

Upload

Question 6 2 pts

What is the coefficient of determination for the final model in previous question?

Group of answer choices

0.61

0.78

-0.34

0.80

Question 7 2 pts

In the final model (reference Q4), do you have any multicollinearity issue to be addressed?

Group of answer choices

Need additional information such as ‘residual’ to answer this question

Need more data to answer this question

No

Yes

Question 8 2 pts

Share a screenshot of the appropriate table to support your answer in Q7.

Upload

Question 9 2 pts

Based on the final regression equation, if all variables remain the same, for an additional bath in the house, the home price wil:

Group of answer choices

Increase by $197 K

Increase by $59.2 K

Decrease by $3.8 K

Increase by $5.00 K

Question 10 2 pts

Create a frequency distribution (with summary statistics) of the residual ( Observed value - Predicted value) to show how well the model predicts the actual home price. Share the screenshot of this plot.

Question 1 2.5 pts
What statistical test would perform to test your hypothesis: average time to deliver pizza, once the order is placed, is less than or equal to 25 minutes in the population.

No test is necessary

ANOVA

T-test

Z-test

Question 2 2.5 pts
The null hypothesis for the statistical test for Question 1 is:

Mean delivery time is less than or equal to 25 minutes

None of the other answers are correct

Mean delivery time is less than 0.05

Mean delivery time is more than 25 minutes

Question 3 2.5 pts
Based on the sample, is there sufficient evidence in the data to conclude that the population average time to deliver a pizza, once the order is placed, is greater than 25 minutes? Based on the statistical analysis (with margin of error = 5%) what is your conclusion? Pave the script to the data file]

We could not conclude anything

We accept the null hypothesis and hence conclude that the delivery time is more than 25 min

We reject the null hypothesis, and hence conclude that the delivery time is more than 25 min

We could not reject the null hypothesis, and hence conclude the mean delivery time is less than or equal to 25 min

Question 4 2.5 pts
Submit a screenshot of an appropriate analysis table to support your answer in Question 3.

Upload _________

Question 5 2.5 pts
Given the sample data, what kind of statistical test would you perform in order to find out if the days of the week have an effect on Use mean delivery time?

Z-test

T-test

ANOVA

Descriptive

Question 6 2.5 pts
Based on executing the test in Question 5, what would you conclude, given a margin of error of 5%?

We reject the null hypothesis; the mean delivery time is different for every day of the week

We reject the null hypothesis; the mean delivery time is different at least on one of the days of the week

We could not reject the null hypothesis; the mean delivery time is not different for different days of the week

We accept the null hypothe week; the mean delivery time is not different for different days of the week

Question 7 2.5 pts
Execute an appropriate follow-up test to determine on which days of the week the an delivery time is different. What is your conclusion? [Save the script to the data file]

Mean delivery times on Tuesday and Wednesday are different

Mean delivery times on Friday and Thursday are different

Mean delivery times on Saturday and Thursday are different

All of the answers are correct

Question 8 2.5 pts
Execute an appropriate statistical test, with a margin of error of 5%, to determine if hour of the day has an effect on the mean delivery time. What is your conclusion? [Save the script to the data file]

Mean delivery time is different on the 9th and 10th hour of the day

Nothing can be concluded as additional information is necessary

Mean delivery time does not vary by the hour of the day

Mean delivery time is different at every hour of the day

Question 9 2.5 pts
Based on your analysis, what action would you recommend to the owner of Pizza Perk to improve their operation?

The owner should focus on reducing the mean delivery time to 20 minutes.

The owner should collect additional delivery time data to make better recommendation on pizza delivery operation

The owner should consider hiring you full time

The owner should consider hiring more staff on Friday and Saturday to reduce wait timeon those days.

Question 10 2.5 pts
Submit the JMP data file with the saved scripts for all the analyses to answer questions from this assignment.
Upload _______