WPC 300: Final Exam
Summer 2021 update
Question 1
2.5 pts
Which of the following techniques is a combination of data, mathematical models, and various business rules?
- Prescriptive analytics
- Predictive analytics
- Explanatory analytics
- Descriptive analytics
Question 2
2.5
Which of the following is not an important component of data analytics process'
- Communication
- Interpretation
- Team building
- Discovery
Question 3
is a hypothesis that people value a product more once their property right to it is established
- Framing effect
- Overconfidence
- Endowment effect
- Clustering illusion
Question 4
2.5 pts
Which of the following analytics technique would 'Costco Corporation' use to find out their likely revenue for next five years?
- Descriptive analytics
- Predictive analytics
- Prescriptive analytics
- Explanatory analytics
Question 5
Which of the following is true in Heuristics?
- We value quantitative information and models
- We learn by analyzing
- We seek optimal solution
- We rely on common sense
Question 6
Gambler's fallacy is
- A clustering illusion bias
- A zero risk bias
- Framing effect bias
- An endowment effect bias
Question 7
An over reliant of the first piece of information is a bias from
- Zero risk effect
- Bandwagon effect
- Clustering illusion
- Anchoring effect
Question 9
Which of the following analytic technique is useful to discover and understand the causal relationship of an outcome?
- Prescriptive analytics
- Explanatory analytics
- Predictive analytics
- Descriptive analytics
Question 10
Which of the following is NOT considered a drawback for the analytical decision-making
- Lack of flexibility
- Delayed action
- Frustrations in teams
- Comparison of all alternatives
Question 11
What are the four types of data analytical methods?
- Descriptive, analytical, predictive and prescriptive
- Descriptive, explanatory, predictive and prescriptive
- Descriptive, logical, predictive and prescriptive .
- Critical, analytical, predictive and explanatory
Question 12
Which of the following is an example of primary data?
- Internet data
- Simulated data
- Firm's proprietary database
- Interview data
Question 13
2.5 pts
You conducted a survey with 200 randomly selected students from freshman class at ASU to find out the average height of ASU students. What is the 'population' in this example?
- The 100 selected students
- All freshman at University of Arizona
- 1000 freshman students from W.P. Carey school of business
- All students at ASU.
Question 14 Which of the following statements is true?
- A/B testing is only done for direct mail campaign.
- A/B testing is often done in brick and mortar store.
- A/B testing is only done for website.
- A/B testing is only done in digital environment.
Question 15
kurtosis for a perfectly normal distribution is
Question 16
When two variables are highly positively correlated, the correlation coefficient could be
- More than 1
- Close to 0
- Close to -1
- Close to 1
Question 17
In a controlled experiment, the subjects in the control group
- Are given a placebo
- Are given a placebo and treatment
- Are tested for confounding variables
- Are given the treatment
Question 18
Which is true of A/B testing?
- It compares two samples of customers to test their behavior
- It compares two versions of a website to see which one performs better
- It compares two different versions of non-disclosure agreement to see which one is better
- It compares two random events to find the best
Question 19
How do blind experiments increase the validity of research results?
- They allow experimenters to manipulate expectation of participants.
- They allow the experimenters to control the results of an experiment.
- They decrease the chance of experimenter and participant biases affecting experimental results
- They allow for a subjective interpretation of experimental results
Question 20
___________ is an extraneous variable in an observational study that correlates with both dependent and independent variables.
- Control
- Confounder
- Treatment
- Sample
Question 21
An experiment is said to be double-blinded if ____________
- A placebo is given to some of the subjects
- Researchers don't know who is being given the treatment.
- The research is not aware of confounding variables.
- Subjects and those working with the subjects are not aware of who given which treatment.
Question 22
The central tendency of a data sample is measured by ____________
- inferential statistics that identify the best single value for representing a set of data
- inferential statistics that identify the spread of the scores in a data set
- descriptive statistics that identify the best single value for representing a set of data
- descriptive statistics that identify the spread of the score in a data set
Question 23
Mean value for ________ data is computed by summing all values in the data set and (1,nding the sum by the number of values in the data set.
- Nominal
- Categorical
- Any
- Continuous
Question 24
What is a dependent variable in an experiment?
- A factor that responds to change made to treatment
- A factor that researchers can hold constant
- The factor that researchers typically manipulate during the experiment
- A condition that may negatively affect the outcome of the experiment
Question 25
One of the assumptions in One-Way ANOVA is _________
- Equal variance of each population
- Unequal variances of samples
- Population means are different
- Observations are quite dependent
Question 26
A paired sample t-test evaluates if the mean of the difference between two variables is significantly different from ________
- The variance
- Each other
- Zero
- One
Question 27
The mean and standard deviation of a population is 500 and 50 respectively. The sample sae is 2S. What is the mean value of the sample mean distribution?
Question 28
One way ANOVA analysis is useful when
- You are testing the validity of the sample
- You are comparing two groups from one sample
- You are comparing more than two sample means
- You are comparing one sample mean
Question 29
The figure below is based on a random sample collected to study alcohol contents in a certain drug. What is the standard deviation of the sample?
Question 30 The margin of error in your inference comes from
- Standard deviation
- Sample size
- Sampling error
- Sample mean
Question 31
2.5 pts
Sample of size 25 is selected from a population with a mean 40 and a standard deviation 5 The standard error of the sample means distribution is:
Question 32 All things being equal, the lower the p-value
- The greater is the chance of rejecting the null hypothesis
- The smaller is the sampling error
- The small is the value of population mean
- The smaller is the chance of rejecting the null hypothesis
Question 33
2.5 pts
You find a statistically significant ANOVA. In order to determine which groups are ditterent,you must conduct a
- correlation analysis
- Tukey's test
- regression analysis
- Student's t-test
Question 35
What is the purpose of an inferential statistical test?
- To see if your results are accurate
- To randomize the sample
- To make sure you have not made a mistake in your data collection
- To check the probability of your results applying to the entire population
Question 36
The null hypothesis in the analysis of variance (ANOVA) asks whether means of
- any groups are the same
- all groups are the same
- specific groups are the same
- selected groups are the same
Question 37
Which of the following is the first stage of agglomerative hierarchical clustering"
- By separating cluster into two finer groups
- By separating two pairs of clusters with minimal Euclidean distance between them
- By joining two clusters that are closest to each other
- By joining two clusters farthest away from each other
Question 38
2..5 pts
Which method of analysis does not classify variables as dependent and independent vanab1es?
- Analysis of variance
- Linear Regression
- Logistic regression
- Cluster analysis
Question 39
After which process in ETL, the data would be ready for in-depth analysis?
- Data separation
- Data extraction
- Data loading
- Data transformation
Question 40
Clustering is part of data mining.
- Supervised
- Predictive
- Unsupervised
- Explanatory
Question 41
2.5
The clustering method uses information on all pairs of distances, not merely the minimum or maximum distances.
- Average linkage
- Single linkage
- Medium linkage
- Complete linkage
Question 42
Which of the following is not true of cluster analysis?
- Objects in each cluster tend to be similar to each other and dissimilar to objects in the other clusters.
- Cluster analysis is a technique for analyzing data when the dependent variable is categorical and the independent variables are categorical in nature.
- Custer analysis is also called segmentation analysis.
- Groups or clusters are suggested by the data, not defined a priori.
Question 43
2.5 pts
Which analysis would you perform to segment your customers for a target marketing campaign'?
- Linear Regression
- Logistic Regression
- ANOVA
- Clustering
Question 44
2.5 pts
In the data transformation process, the ETL tool transforms data in accordance viral _ established by the organization.
- Standard protocol
- Business rules and standards
- Business plan
- Business model
Question 45
2.5 pts
Which of the following is a definition of distance between two clusters in a single linkage clustering?
- The average of distance between all pairs of objects, where each pair is made up of one obiect tram each group
- The distance between the least distant pair of objects, one from each group
- The sum of square of the distance between clusters
- The distance between the most distant pair of objects, one from each group
Question 46
2.5 pts
In the data extraction process, ETL tool gathers data primarily from which c` source?
- Operational systems
- Online Vendor
- Hard disk
- Competition
Question 47 Which of the following is a false statement?
- Reducing SSE (sum of squared error) within cluster increases cohesion.
- In the cluster analysis, the objects within clusters should exhibit an high amount of similarity.
- The k-means algorithm is a method for doing partitional clustering.
- To predict sales from transactional data one should perform clustering analysis.
Question 48
2.5 pts
is a clustering procedure characterized by the development of a dendrogram.
- Hierarchical clustering
- Divisive clustering
- k-Means clustering
- Classification technique
Question 49 In classification problems, the primary source for accuracy estimation is
- R-squared
- Slope
- Confusion matrix
- Correlation coefficient
Question 50
To make sure that the multi-collinearity is not an issue in your regression model, the measured variance inflation factor should be
- Equal to 20
- Equal to 0
- More than 20
- Less than 5
Question 51
For a hypothesis testing with correlation, the null hypothesis is:
- Correlation coefficient is -1
- Correlation coefficient is 1
- Alternative hypothesis is not true
- Correlation coefficient is 0
Question 52
Which of the following is true about multicollinearity?
- The effect of a dependent variable on another becomes difficult to isolate.
- It is best measured using the statistical variance inflation factor (VIF)
- P-value reduces significantly leading to rejection of the null hypothesis.
- Regression coefficients become clearer and are easier to interpret.
Question 53
In regression analysis, one uses data _______
-
- From an independent variable to predict he dependent variable
- From an extreme value to predict outlier
- From any variables to predict any other variable
- From an dependent variable to predict an independent variable
Question 54
Correlation coefficients between dependent and independent variables cannot be
Question 56
The lowest value of coefficient of determination is 0
Question 57
Highest value of correlation coefficient is 1
2.5
Question 58
Classification analysis can be done using.
- Multiple linear regression
- Logistic regression
- Non-linear regression
- Linear regression
Question 60
For the best line fit diagram (shown below), which of the following statement is not true?
Question 61 When is a data table' a better way to show insights than a chart?
- With large sample data (n=1000)
- With large sample data (n=1000) and 10 different data variables.
- With small sample data (n=10) and 1000 data variables.
- With small sample data (n=10) with a couple of data variables
Question 62
2.5 pts
When you are expecting a correlation between sales and profit as shown in the graph below. what kind of visualization is this?
Question 63
2.5 pls
Which of the following statements describes one of the basic principles for creating a good chart. defined by Edward Tufte?
- The chart should display grid for easy reading
- The chart should tell a story
- The chart should apply additional visual effects so it will stand out,
- The chart should have a lot of ink
Question 66
Visualization of spatial data are most illustrative when shown using
- Bar graph
- Maps
- Bubble graphs
- Line graphs
Question 68
Which are useful principles for data visualization?
- The use of a wide range of colors is critical to emphasize distinctions
- It is important to include every possible information in a chart
- Including as many grids as possible is vital for fully specifying the data to be represented
- The chart should yield insights beyond text
Question 69
2.5 pts
Which of the following charts should not be used to display the total sales by the salesperson when it is evaluated from a data-ink perspective?
- A 2-D bar chart
- A 3-D bar chart
- A line chart
- A 2-D horizontal bar chart
Question 70
Which of the following statements is a reason not to use a table?
- Tables cannot easily show trends
- Large amount of information can be included in a very small space
- The table has more precise numbers
- Tables display more information in less space than a chart
Question 71
A set of data that describes about data in relational database is called
- Semi-structured data
- Structured data
- Metadata
- Unstructured data
Question 72
2.5 pts
When you access information from two different tables connected by an identifier key, the SQL keyword you should use is
- COUNT
- ORDER BY
- GROUP BY
- INNER JOIN
Question 73
The following are among the 4V's of big data except
- Vitality
- Velocity
- Volume
- Veracity
Question 74
In a database table for 'Product', the information about a single product resides in a single
Question 75
Results can be sorted in a database using SQL statement.
- SELECT
- WHERE
- ORDER BY
- FROM
Question 76
Which SQL statement is used to extract data from a relational database?
Question 77
Which of the following is not an on-demand computing service obtained over the network?
- Software as a service
- Consulting service
- Infrastructure as a service
- Platform as a service
Question 78
NoSQL is primarily designed for
- Improve data integrity
- Big data
- Structured data
- Data that cannot be stored in flat files[u1]
Question 79
What does the acronym "SaaS" stand for?
- Software as a Service
- Storage as a Service
- Software as application service
- None of the other answers is true
Question 80
2.5 pts
What type of values you should use when creating a primary key column of a database table?
- Values that contain meaningful information
- Same value for each record
- Unique values for every record
- Values that are null