# (Solved): WPC300 Practice Test (Practical) march 2021...

**WPC300**

**Practice Test (Practical) **

**Due**Mar 7

**Instructions**

__PRACTICE Practical Exam Instructions__

**This exam has a total of two sections. You are required to use two sets of data to answer all the questions. Please see the individual instructions for each section. The total time allowed for this exam is 75 minutes. You must complete the exam in one seating. You will need JMP Pro and Excel software to analyze data and answer questions. **

**Note: You are expected to work individually to complete this exam before the due date. Getting help from outside resources other than what is made available to you via the canvas course site is considered a violation of the code of academic integrity for which you will be liable for the consequence. **

Attempt History

Submitted Mar

**Score for this attempt: 20 out of 20**

**Data background**

The sample includes various demographic and blood test responses for 442 diabetes patients (respondents). The response variable Y is a quantitative measure of disease progression one year after baseline measurements were taken. The ten variables measured at baseline time are age, gender (1 = male, 2 = female), body mass index (BMI), average blood pressure (BP), and six blood serum measurements (Total Cholesterol, LDL, HDL, TCH, LTG, & Glucose). The response Y Binary is constructed from the response Y and defined as high if Y is above 200 or low otherwise.

**Section A:**

**Instructions: **

- Use the following data file for this section: SampleDiabetes.xlsx
- Remember the honor code.
- Use Excel to prepare your responses to the questions in this section
- Note that sometimes numbers have been rounded.

Create a new column using a vlookup() function to categorize the age variable into age categories as follows:

Age |
Category |

70+ |
1 |

60-69 |
2 |

50-59 |
3 |

40-49 |
4 |

30-39 |
5 |

19-29 |
6 |

**Question 1**

**1 / 1 pts**

Using a pivot table, determine which of the following statements is incorrect.

- Category 4 has 97 respondents
**Correct!** - Category 3 has 54 respondents
- Category 5 has 73 respondents
- Category 2 has 90 respondents

**Question 2**

**1 / 1 pts**

Using a pivot table, determine which of the following statements is incorrect about the average age of respondents in each age category.

- Category 3 average age is 54.0 years
- Category 2 average age is 63.8 years
- Category 4 average age is 44.9 years
**!** - Category 1 average age is 71.2 years

**Question 3**

**1 / 1 pts**

Create a pivot table pie chart for people of age 40 or older using the same age categories as before, determine which of the following statements is correct.

- Category 3 has 28% of the respondents
- Category 2 has 20% of the respondents
**t!** - Category 2 has 28% of the respondents
- Category 4 has 22% of the respondents

**Section B**

**Instructions**

- Use the following JMP data file for this section [Diabetes.JMP]
- Remember the honor code.
- Use JMP Pro to prepare your responses to the questions in this section
- Note that sometimes numbers have been rounded.

**Question 4**

**1 / 1 pts**

Which of the following statements is not correct based on the sample data provided?

- The mean for LDL is 115
- The upper limit of the 95% confidence interval for BP is 95.9
- The median for Total Cholesterol is 186
- The standard deviation for HDL is 0.6152

**Question 5**

**1 / 1 pts**

Looking at the distribution of BMI, you observe that the data centrality is measured as:

- n = 442
- Standard Error = 0.21
- Standard deviation = 4.41
**orrect!** - Mean = 26.4

**Question 6**

**1/ 1 pts**

Looking at the distribution of Glucose, you observe that the distribution spread is measured as:**Answered**

- Mean is 91.3
- 95% confidence interval is 90.2 to 92.3
- Standard error is 11.5
- Interquartile range is 15

**Question 7**

**1 / 1 pts**

It is generally believed that the average population age is 50. You claim that the population average age is less than 50. Perform a statistical test on the sample to see if the average age for the sample is consistent with your hypothesis (use a margin of error of 5%). What is the p-value from the test?

- 0.05
**Correct!** - 0.0089
- 0.9911
- 0.0179

**Question 8**

**1/ 1 pts**

It is generally believed that the average population age is 50. You claim that the population average age is more than 50. Perform a statistical test on the sample to see if the average age for the sample is consistent with your hypothesis (use a margin of error of 5%). What can you conclude?

- We fail to reject the null hypothesis
- We accept the null hypothesis
- We do not have enough information to make a judgement on the null hypothesis
**Answered** - We reject the null hypothesis

**Question 9**

**1 / 1 pts**

Perform a pairwise correlation analysis of the variables Y, age, BMI, BP, Total Cholesterol, LDL, HDL, TCH, LTG, & Glucose in the sample suggests that:

- The population has a significant negative correlation between TCH and Total Cholesterol
- The population has no correlation between Total Cholesterol and LDL
**t!** - The population has a significant negative correlation between HDL and TCH
- The population has a significant negative correlation between Y and BMI

**Question 10**

**1 / 1 pts**

If we are interested in determining a possible cause and effect relationship where BMI and Age are causing disease progression (Y), _____ is the independent variable and ____ is the dependent variable?

- Age, BMI respectively
**Answered** - BMI, Y respectively
- BMI, Age respectively
- Y, BMI respectively

**Question 11**

**1 / 1 pts**

Perform a simple linear regression to predict Y using respondents’ BMI. What is the correct equation for the regression line?

- Y = 10.2*BMI
- BMI = -118 + 10.2*Y
**Correct!** - Y = -118 + 10.2*BMI
- BMI = 21 + 0.034*Y

**Question 12**

**1 / 1 pts**

Perform a multiple regression analysis (with a margin of error of 5%) that examines all of the variables in the sample (excluding Y binary) as potential predictors of Y. Which of the following conclusions can be made based on the analysis without removing any of the predictor variables?

- LDL is a significant predictor in the model, LTG is not.
- TCH is a significant predictor in the model, Glucose is not.
**Correct!** - BMI is a significant predictor in the model, HDL is not.
- Age is a significant predictor in the model, Total Cholesterol is not.

**Question 13**

**1 / 1 pts**

After performing model building by applying backward deletion to the model described in Q12, which of the following conclusions is valid based on the final model?

- Glucose is not a significant predictor, but Gender is
- Total Cholesterol is not a significant predictor, but BP is
**You Answered** - HDL is not a significant predictor, but LTG is
- Age is not a significant predictor, but LDL is

**Question 14**

**1/ 1 pts**

Based on the final model developed in Q13, which is the strongest predictor in the model?

- Intercept
- BMI
**ou Answered** - Total Cholesterol
- Gender

**Question 15**

**1 / 1 pts**

Based on the final model developed in Q13, which is the weakest predictor in the model?**ct!**

- Gender
- Total Cholesterol
- Intercept
- BMI

**Question 16**

**1 / 1 pts**

How much of the variation in the dependent variable can be explained by the final regression model developed in Q13?

- 50.8%
**You Answered** - 51.5%
- <.0001
- We cannot determine this quantity

**Question 17**

**1 / 1 pts**

Is there a multicollinearity concern for the final model developed in Q13?

- There is a multicollinearity problem in the final model and we should delete the LTG variable
**Correct!** - There is a multicollinearity problem in the final model and we should delete the Total Cholesterol variable
- There is no multicollinearity problem in the final model
- There is a multicollinearity problem in the final model and we should delete the Gender variable

**Question 18**

**1 / 1 pts**

The Y Binary variable was developed to categorize respondents into high and low development of Diabetes over the year since their baseline measurements were taken. What proportion of high development respondents are female?** Answered**

- 75.3%
- 69.6%
- 24.7%
- 30.4%

**Question 19**

**1 / 1 pts**

In an initial logistic regression analysis attempting to establish if all of the variables (excluding Y) in the sample can predict (with a margin of error of 5%) the level (high/low) of the disease, it can be concluded that:**u Answered**

- Some of the predictors are not significant and can be deleted from the model
- The overall model is significant in predicting the level of development of the disease
- The model accuracy can be determined by the confusion matrix
- All the other answer choices are correct

**Question 20**

**1 / 1 pts**

In the final logistic regression model to predict/classify Y binary, which of the following statements is true:

- 77 respondents were correctly classified by the model as high disease development
- 291 respondent were correctly classified by the model as low disease development
- 44 respondents were incorrectly classified by the model as low disease development
**Correct!** - All of the other answer choices are correct