Pivot Tables and Charts
Introduction
Excel allows you to create cross-tab (or cross-tabulation) analysis in what it calls a Pivot Table.
Pivot tables can be one, two or three-dimensional. You can use multiple statistical analysis and summary options tools. You can include data from multiple worksheets, and you can modify them dynamically.
Creating a Pivot Table
1. Open ExcelPivotsWorksheetCIS360.xlsx, found on Canvas.
2. Click on the Expenses tab.
3. Click on cell A2.
4. On the Insert tab click on the [PivotTable] button.
5. The Create Pivot Table box opens:
Change the file name to: YourFirstName_YourLastName_Assignment7.jmp
You must use JMP to answer the following multiple-choice questions.
Note: When you are asked to submit a screen capture, you need to make sure that your name is part of the capture.
Question 1 2.5 pts
Since we are interested in understanding ADMITTED students, change the Value Order in "Column Properties" of the "ADMIT" column so that 1 (admitted) appears first in graphics and analyses. Show the screenshot to provide the evidence of this step.
Question 2 2.5 pts
Execute a contingency analysis to find out the relationship between ‘Admit’ and “Rank’ of the students. Which of the following statements is correct based on your analysis?
Group of answer choices
76.9% of Rank-2 students didn’t get admitted
45.9% of Rank-1 students didn’t get admitted.
35.8% of Rank-4 students didn’t get admitted.
17.9% of Rank-3 students got admitted
Question 3 2.5 pts
Execute a simple logistic regression analysis to test if ‘Admit’ is significantly related to ‘GPA’. Based on your analysis, which of the following statements is correct? [save the script to the datafile]
Group of answer choices
’Admit’ is significantly related to the independent variable ‘GPA’ because the p-value is less than 0.05
‘Admit’ is not related with ‘survived’ because one of them is categorical variable
‘Admit’ is not related to the dependent variable ‘GPA’ because the p-value is more than 0.05
As ‘GPA’ increases, ‘Admit’ decreases.
Question 4 2.5 pts
Share an appropriate screenshot to support your answer in Question 3. Make sure you use different colors and different markers for students who got admitted and who didn’t.
Question 5 2.5 pts
Develop a simple logistic regression analysis to test if ‘Admit’ is significantly related to ‘GRE’. Based on your analysis, which of the following statements is correct?
Group of answer choices
GRE is significantly related to the dependent variable ‘Admitted’
GRE is not related to the dependent variable ‘Admit’
As GRE increases, the Admit decreases.
GRE is related to ‘Admit’ but is it not significant
Question 6 2.5 pts
Share a screenshot of the Logistic Fit Plot to support your answer in Q5. Make sure you use different colors and different markers for students who got admitted and who didn’t.
Question 7 2.5 pts
Use Analyze-> Fit Model to perform a 'stepwise' fit for “ADMIT” using all relevant predictors in the data. Use Minimum BIC (as stopping rule), Forward (as direction), and Whole Effects (as Rules) in the model. Run the logistic regression model using the selected variables in the previous step. Which of the following variables is/are significant in the model to predict when students get admitted? [save the script to the data file]
Group of answer choices
GPA
None of the variables are significant
GRE
Both GRE and GPA
Question 8 2.5 pts
Submit the parameter estimate table from the above model in Question 7.
Upload
Question 9 2.5 pts
Submit the final equation of the model (the resulting logit of the probability model, saved as Lin[1] in the data table). [save script to the data file]
Upload
Question 10 2.5 pts
Submit the JMP data file with saved script for all the analysis to answer questions from this assignment.
Change the file names to: YourFirstName_YourLastName_titanicpassengers.jmp
Review what we did in the lab.
You must use JMP to answer the following multiple-choice questions.
Note: When you are asked to submit a screen capture, you need to make sure that your name is part of the capture.
Question 1 2 pts
Based on the available data sample, what percentage of the 3rd class passengers were on board that survived the accident?
Group of answer choices
74.5%
25.5%
43%
62%
Question 2 2 pts
What percentage of male on board did not survive the accident?
Group of answer choices
72.3%
19.1%
27.3%
80.9%
Question 3 2 pts
Develop a simple logistic regression model to test if ‘Age’ is a significant predictor of the outcome ‘survived’. Based on your analysis, which of the following statements is correct?
Group of answer choices
As 'Age' increases, the survival rate also increases.
Age is not a significant predictor of the dependent variable ‘survived’
Age is a significant predictor of the dependent variable ‘survived’
As 'Age' decreases, the survival rate decreases.
Question 4 2 pts
Share a screenshot to support your answer in Q3. Make sure you use different colors and different markers for passengers who survived and who didn’t.
Upload
Question 5 2 pts
Develop a simple logistic regression model to test if ‘Fare’ is a significant predictor of the outcome ‘survived’. Based on your analysis, which of the following statements is correct?
Group of answer choices
Fare is not correlated with the dependent variable ‘survived’ because the p-value is more than 0.05
Fare is not correlated with ‘survived’ because one of them is a categorical variable
As Fare increases, the survival rate decreases
Fare is a significant predictor of the dependent variable ‘survived’ because the p-value is less than 0.05
Question 6 2 pts
Share a screenshot to support your answer in Q5. Make sure you use different colors and different markers for passengers who survived and who didn’t.
Upload
Question 7 2 pts
As shown in the lab, develop a multiple logistic regression model (use stepwise regression) to predict survival. Which of the following variables is most significant in predicting survival?
Group of answer choices
Age
Port
Sex
Passenger class
Question 8 2 pts
As shown in the lab, develop a multiple logistic regression model (use stepwise regression) to predict survival. Which of the following variables is not significant to predict survival?
Group of answer choices
Parents and children
Age
Port
Passenger class
Question 9 2 pts
Provide a screenshot of appropriate table to support your answer in Q7 & Q8
Upload
Question 10 2 pts
Provide the screenshot of the equation of the final model (the resulting logit of the probability model, saved as Lin[Yes] in the data table).
In classification analysis, we are determining the probability of an observation ________.
Group of answer choices
To be undefined
To be one
To be part of a certain class or not
To be zero
Question 2 1 pts
A loan officer wants to know if the next customer is likely to default or not on a loan. How can she assess the risk of extending the loan to that customer?
Group of answer choices
By utilizing a simple linear regression model developed by an in-house analyst
By asking his colleague if he knows the person
By asking the customer if he is planning to default the loan or not
By utilizing a multiple logistic regression model developed by an in-house analyst
Question 3 1 pts
In classification analysis, we typically split the data into two mutually exclusive sets, known as ________, to investigate the strength of the developed model.
Group of answer choices
Training and Binary
Training and validation/testing
Binary and numeric
Testing and validation
Question 4 1 pts
Odds ratio is defined as ________, where p is the probability of success.
Group of answer choices
p/1-p
1/p-1
p/p-1
1/1-p
Question 5 1 pts
The ________ is often used to describe the performance of a classification model applied to a set of test data for which the true outcomes are known.
Group of answer choices
Effect summary table
ANOVA table
Parameter estimates table
Confusion matrix
Question 6 1 pts
If you want to find out if body weight, calorie intake, fat intake and age have an influence on the probability of having a heart attack (yes or no), which of the following kind of analysis will help determine the answer?
Group of answer choices
Multiple logistic regression
Simple logistic regression
Simple linear regression
Multiple linear regression
Question 7 1 pts
In classification problems, the primary source for accuracy estimation of the model is ________.
Group of answer choices
Probability of success
Logit
Confusion matrix
Odds ratio
Question 8 1 pts
In logistic regression analysis, instead of Y as a dependent variable, we use a function of Y called ________.
Group of answer choices
Logit
Odds
Odds ratio
Log of Y
Question 9 1 pts
Logistic regression is a specialized type of regression analysis that is designed to predict ________ variables.
Group of answer choices
independent
numeric dependent
a binary numeric
a binary categorical
Question 10 1 pts
In logistic regression, the dependent variable y is defined as:
Change the file name to: YourFirstName_YourLastName_Assignment6.jmp
You must use JMP to answer the following multiple-choice questions.
Note: When you are asked to submit a screen capture, you need to make sure that your name is part of the capture.
Question 1 2.5 pts
You have been asked to test if 'city mileage' of a car can be predicted based on the 'Fuel Tank capacity'. Which of the following statements is correct?
Group of answer choices
Dependent variable ‘Fuel Tank Capacity’ is negatively correlated with independent variable ‘City Mileage of Car’
The test shows that the variables are significantly correlated. Dependent variable ‘City mileage car’ is predicted as City Mileage (MPG) = 45.587166 - 1.3934743*Fuel Tank Capacity
The test shows that both variables are correlated. The dependent variable ‘Fuel Tank Capacity’ is predicted as Fuel Tank Capacity = 45.587166 + 1.3934743*City Mileage of Car.
The test shows that both variables are not correlated. Dependent variable City mileage car is predicted as City Mileage (MPG) = 45.587166 + 1.3934743*Fuel Tank Capacity
Question 2 2.5 pts
What is the coefficient of determination between 'City Mileage' and 'Weight'? [Save the script to the data file]
Group of answer choices
47.05
0.71
-0.008
0.66
Question 3 2.5 pts
The team wants to find out if there are any other variables that are significantly correlated, with a correlation coefficient greater than +0.8 or less than -0.8. Execute an appropriate analysis to answer this question. Which of the following combinations of variables satisfy this condition? [Save the script to the data file]
Group of answer choices
Luggage Capacity and weight
Fuel Tank Capacity & Weight
Weight & Maximum Horsepower
Maximum Horsepower & Engine Size
Question 4 2.5 pts
Use the least square method to develop a linear model to predict the city mileage of a car using all other variables in the data file as independent variables, except ‘Model’ & ‘Manufacturer’. Remove the insignificant parameters from the model one by one by checking the log(worth) of each parameter and removing the least important parameter first from the model. Which of the following variables remain significant in the final model? [Save Script to the datafile]
Group of answer choices
Maximum Horsepower & Fuel Tank Capacity
Vehicle category & Weight
Luggage capacity & Weight
Rear seat room & Weight
Question 5 2.5 pts
What is the coefficient of determination of model equation obtained from Question 4?
Group of answer choices
0.43
0.64
0.78
0.80
Question 6 2.5 pts
The team also asked you to check for any multi-collinearity effects in your model obtained from Question 4. After testing for any multi-collinearity effects (using VIF), what did you find out?
Group of answer choices
‘Passenger capacity’ and ‘Length’ variables show multi-collinearity effects in the model
‘Passenger capacity’ and ‘Weight’ variables show multi-collinearity effects in the model
‘Fuel Tank Capacity’, ‘Width’ and ‘Weight’ variables show multi-collinearity effects in the model
‘Vehicle Category’ and ‘Engine Size’ variables show multi-collinearity effects in the model
Question 7 2.5 pts
If you discovered multi-collinearity effects in the model, remove the variables in question one at a time (starting from the highest VIF) from the model and then stop when you don’t need to remove any further variable(s) from the model based on accepted VIF and p-values. After this process, submit the screenshot of the ‘Effect Summary’ of the final model.
Upload
Question 8 2.5 pts
Based on the model in question 7, what are the strongest and weakest variables in predicting the city mileage of a car?
Group of answer choices
Passenger Capacity (strongest) and Weight (weakest)
Vehicle Category (strongest) and Passenger Capacity (weakest)
Weight (strongest) and ‘Length’ (weakest)
Fuel Tank Capacity (strongest) and Passenger Capacity (weakest)
Question 9 2.5 pts
What is the model equation for the final model in Question 7 to predict City mileage of a car? [Save script] Submit a screenshot.
Question 10 2.5 pts
Submit the JMP data file with saved script for all the analysis to answer questions from this assignment.
Change the file names to: YourFirstName_YourLastName_housingprices.jmp
Review what we did in the lab.
You must use JMP to answer the following multiple-choice questions.
Note: When you are asked to submit a screen capture, you need to make sure that your name is part of the capture.
Question 1 2 pts
In a simple linear regression model that predicts home price based on the number of bedrooms, the coefficient of determination is:
Group of answer choices
None of the given answers is correct
0.44
0.98
0.46
Question 2 2 pts
Is the linear model obtained in the previous question significant? Support your answer with an appropriate screenshot from JMP analysis.
Upload
Question 3 2 pts
When you develop a prediction model for home-price, based on the beds, baths, and square feet as independent variables, which, if any, of the following independent variables is significant in the model?
Group of answer choices
All of the other answers are correct
Square footage
Baths
Beds
Question 4 2 pts
In a final model (after removing all the non-significant predictors) to predict the home price based on other variables available in the data file, the most important predictor that contributes significantly is:
Group of answer choices
Square feet
Acres
Baths
Miles to base
Question 5 2 pts
What is the model equation based on multiple regression analysis to predict home price? (reference Q4). Provide a screenshot from JMP output to show the model question.
Upload
Question 6 2 pts
What is the coefficient of determination for the final model in previous question?
Group of answer choices
0.61
0.78
-0.34
0.80
Question 7 2 pts
In the final model (reference Q4), do you have any multicollinearity issue to be addressed?
Group of answer choices
Need additional information such as ‘residual’ to answer this question
Need more data to answer this question
No
Yes
Question 8 2 pts
Share a screenshot of the appropriate table to support your answer in Q7.
Upload
Question 9 2 pts
Based on the final regression equation, if all variables remain the same, for an additional bath in the house, the home price wil:
Group of answer choices
Increase by $197 K
Increase by $59.2 K
Decrease by $3.8 K
Increase by $5.00 K
Question 10 2 pts
Create a frequency distribution (with summary statistics) of the residual ( Observed value - Predicted value) to show how well the model predicts the actual home price. Share the screenshot of this plot.
Question 1 2.5 pts
What statistical test would perform to test your hypothesis: average time to deliver pizza, once the order is placed, is less than or equal to 25 minutes in the population.
No test is necessary
ANOVA
T-test
Z-test
Question 2 2.5 pts
The null hypothesis for the statistical test for Question 1 is:
Mean delivery time is less than or equal to 25 minutes
None of the other answers are correct
Mean delivery time is less than 0.05
Mean delivery time is more than 25 minutes
Question 3 2.5 pts
Based on the sample, is there sufficient evidence in the data to conclude that the population average time to deliver a pizza, once the order is placed, is greater than 25 minutes? Based on the statistical analysis (with margin of error = 5%) what is your conclusion? Pave the script to the data file]
We could not conclude anything
We accept the null hypothesis and hence conclude that the delivery time is more than 25 min
We reject the null hypothesis, and hence conclude that the delivery time is more than 25 min
We could not reject the null hypothesis, and hence conclude the mean delivery time is less than or equal to 25 min
Question 4 2.5 pts
Submit a screenshot of an appropriate analysis table to support your answer in Question 3.
Upload _________
Question 5 2.5 pts
Given the sample data, what kind of statistical test would you perform in order to find out if the days of the week have an effect on Use mean delivery time?
Z-test
T-test
ANOVA
Descriptive
Question 6 2.5 pts
Based on executing the test in Question 5, what would you conclude, given a margin of error of 5%?
We reject the null hypothesis; the mean delivery time is different for every day of the week
We reject the null hypothesis; the mean delivery time is different at least on one of the days of the week
We could not reject the null hypothesis; the mean delivery time is not different for different days of the week
We accept the null hypothe week; the mean delivery time is not different for different days of the week
Question 7 2.5 pts
Execute an appropriate follow-up test to determine on which days of the week the an delivery time is different. What is your conclusion? [Save the script to the data file]
Mean delivery times on Tuesday and Wednesday are different
Mean delivery times on Friday and Thursday are different
Mean delivery times on Saturday and Thursday are different
All of the answers are correct
Question 8 2.5 pts
Execute an appropriate statistical test, with a margin of error of 5%, to determine if hour of the day has an effect on the mean delivery time. What is your conclusion? [Save the script to the data file]
Mean delivery time is different on the 9th and 10th hour of the day
Nothing can be concluded as additional information is necessary
Mean delivery time does not vary by the hour of the day
Mean delivery time is different at every hour of the day
Question 9 2.5 pts
Based on your analysis, what action would you recommend to the owner of Pizza Perk to improve their operation?
The owner should focus on reducing the mean delivery time to 20 minutes.
The owner should collect additional delivery time data to make better recommendation on pizza delivery operation
The owner should consider hiring you full time
The owner should consider hiring more staff on Friday and Saturday to reduce wait timeon those days.
Question 10 2.5 pts
Submit the JMP data file with the saved scripts for all the analyses to answer questions from this assignment.
Upload _______
What would be the null hypothesis for testing a linear regression model with profit as the dependent variable and sales as the independent variable?
There is a negative relationship between profit and sales.
There is a linear relationship between profit and sales that can be either positive or negative.
There is no linear relationship between profit and sales.
There is a positive relationship between profit and sales.
Question 2
1 / 1 pts
A market analyst is developing a regression model to predict monthly household expenditures on groceries as a function of family size, household income, and household neighborhood (urban, suburban, and rural). The "neighborhood" variable in this model is ________.
a linear variable
an independent variable
a continuous variable
a dependent variable
Question 3
1 / 1 pts
Which of the following statement is true based on the following regression equation?
IQ = 4.0 + Reading Label * 5.6
A unit point change in IQ will result in 9.6-point increase in reading label.
Reading label is not a good predictor of IQ.
A unit point change in IQ will result in 5.6-point increase in reading label.
A unit point change in reading label will increase IQ by 5.6 point.
Question 4
1 / 1 pts
The value of R-Squared always falls between ________ and ________, inclusive.
-infinity to + infinity
0 and 1
-1 and +1
0 and -1
Question 5
1 / 1 pts
The correlation coefficient between the age of a vehicle and the money spent to repair it is 0.9. Which of the following statement is true?
90% of the repair cost will be explained by the age of the vehicle
90% of the money spent on repair is explained by the age of the vehicle
81% of the variation in the money spent on repairs is explained by the age of the vehicle
81% of money spent on repairs is explained by the age of the vehicle
Question 6
1 / 1 pts
Which of the following is true about multi-collinearity?
The effect of an independent variable on the dependent variable becomes easy to isolate.
The regression coefficients become clearer and are easier to interpret.
It is measured using a measure called variance inflation factor (VIF).
The P-value reduces significantly, leading to rejection of the null hypothesis.
Question 7
1 / 1 pts
The unexplained variance in the regression analysis is also known as:
Total variance
Residual variance
Predicted variance
Regression variance
Question 8
1 / 1 pts
A correlation coefficient between “college entrance exam” grades and scholastic achievement was found to be -1.08. On the basis of this, you would tell the university that:
The entrance exam is a good predictor of success.
They should hire a new statistician.
The exam is a poor predictor of success.
Students who do best on this exam will make the worst students.
Question 9
1 / 1 pts
A manager wishes to predict the annual cost (y) of an automobile based on the number of miles (x) driven. The following model was developed: y = $1500 + 0.36x. If a car is driven 15000 miles in a year, the model predicts the annual cost of the car to be:
$7400
$3850
$2090
$6900
Question 10
1 / 1 pts
Which of the following assumptions is not true for multiple linear regression?
The independent variables are not correlated.
The relationship between dependent and independent variables is linear.