WPC 300: Final Exam
Summer 2021 update
Question 1
2.5 pts
Which of the following techniques is a combination of data, mathematical models, and various business rules?
 Prescriptive analytics
 Predictive analytics
 Explanatory analytics
 Descriptive analytics
Question 2
2.5
Which of the following is not an important component of data analytics process'
 Communication
 Interpretation
 Team building
 Discovery
Question 3
is a hypothesis that people value a product more once their property right to it is established
 Framing effect
 Overconfidence
 Endowment effect
 Clustering illusion
Question 4
2.5 pts
Which of the following analytics technique would 'Costco Corporation' use to find out their likely revenue for next five years?
 Descriptive analytics
 Predictive analytics
 Prescriptive analytics
 Explanatory analytics
Question 5
Which of the following is true in Heuristics?
 We value quantitative information and models
 We learn by analyzing
 We seek optimal solution
 We rely on common sense
Question 6
Gambler's fallacy is
 A clustering illusion bias
 A zero risk bias
 Framing effect bias
 An endowment effect bias
Question 7
An over reliant of the first piece of information is a bias from
 Zero risk effect
 Bandwagon effect
 Clustering illusion
 Anchoring effect
Question 9
Which of the following analytic technique is useful to discover and understand the causal relationship of an outcome?
 Prescriptive analytics
 Explanatory analytics
 Predictive analytics
 Descriptive analytics
Question 10
Which of the following is NOT considered a drawback for the analytical decisionmaking
 Lack of flexibility
 Delayed action
 Frustrations in teams
 Comparison of all alternatives
Question 11
What are the four types of data analytical methods?
 Descriptive, analytical, predictive and prescriptive
 Descriptive, explanatory, predictive and prescriptive
 Descriptive, logical, predictive and prescriptive .
 Critical, analytical, predictive and explanatory
Question 12
Which of the following is an example of primary data?
 Internet data
 Simulated data
 Firm's proprietary database
 Interview data
Question 13
2.5 pts
You conducted a survey with 200 randomly selected students from freshman class at ASU to find out the average height of ASU students. What is the 'population' in this example?
 The 100 selected students
 All freshman at University of Arizona
 1000 freshman students from W.P. Carey school of business
 All students at ASU.
Question 14 Which of the following statements is true?
 A/B testing is only done for direct mail campaign.
 A/B testing is often done in brick and mortar store.
 A/B testing is only done for website.
 A/B testing is only done in digital environment.
Question 15
kurtosis for a perfectly normal distribution is
Question 16
When two variables are highly positively correlated, the correlation coefficient could be
 More than 1
 Close to 0
 Close to 1
 Close to 1
Question 17
In a controlled experiment, the subjects in the control group
 Are given a placebo
 Are given a placebo and treatment
 Are tested for confounding variables
 Are given the treatment
Question 18
Which is true of A/B testing?
 It compares two samples of customers to test their behavior
 It compares two versions of a website to see which one performs better
 It compares two different versions of nondisclosure agreement to see which one is better
 It compares two random events to find the best
Question 19
How do blind experiments increase the validity of research results?
 They allow experimenters to manipulate expectation of participants.
 They allow the experimenters to control the results of an experiment.
 They decrease the chance of experimenter and participant biases affecting experimental results
 They allow for a subjective interpretation of experimental results
Question 20
___________ is an extraneous variable in an observational study that correlates with both dependent and independent variables.
 Control
 Confounder
 Treatment
 Sample
Question 21
An experiment is said to be doubleblinded if ____________
 A placebo is given to some of the subjects
 Researchers don't know who is being given the treatment.
 The research is not aware of confounding variables.
 Subjects and those working with the subjects are not aware of who given which treatment.
Question 22
The central tendency of a data sample is measured by ____________
 inferential statistics that identify the best single value for representing a set of data
 inferential statistics that identify the spread of the scores in a data set
 descriptive statistics that identify the best single value for representing a set of data
 descriptive statistics that identify the spread of the score in a data set
Question 23
Mean value for ________ data is computed by summing all values in the data set and (1,nding the sum by the number of values in the data set.
 Nominal
 Categorical
 Any
 Continuous
Question 24
What is a dependent variable in an experiment?
 A factor that responds to change made to treatment
 A factor that researchers can hold constant
 The factor that researchers typically manipulate during the experiment
 A condition that may negatively affect the outcome of the experiment
Question 25
One of the assumptions in OneWay ANOVA is _________
 Equal variance of each population
 Unequal variances of samples
 Population means are different
 Observations are quite dependent
Question 26
A paired sample ttest evaluates if the mean of the difference between two variables is significantly different from ________
 The variance
 Each other
 Zero
 One
Question 27
The mean and standard deviation of a population is 500 and 50 respectively. The sample sae is 2S. What is the mean value of the sample mean distribution?
Question 28
One way ANOVA analysis is useful when
 You are testing the validity of the sample
 You are comparing two groups from one sample
 You are comparing more than two sample means
 You are comparing one sample mean
Question 29
The figure below is based on a random sample collected to study alcohol contents in a certain drug. What is the standard deviation of the sample?
Question 30 The margin of error in your inference comes from
 Standard deviation
 Sample size
 Sampling error
 Sample mean
Question 31
2.5 pts
Sample of size 25 is selected from a population with a mean 40 and a standard deviation 5 The standard error of the sample means distribution is:
Question 32 All things being equal, the lower the pvalue
 The greater is the chance of rejecting the null hypothesis
 The smaller is the sampling error
 The small is the value of population mean
 The smaller is the chance of rejecting the null hypothesis
Question 33
2.5 pts
You find a statistically significant ANOVA. In order to determine which groups are ditterent,you must conduct a
 correlation analysis
 Tukey's test
 regression analysis
 Student's ttest
Question 35
What is the purpose of an inferential statistical test?
 To see if your results are accurate
 To randomize the sample
 To make sure you have not made a mistake in your data collection
 To check the probability of your results applying to the entire population
Question 36
The null hypothesis in the analysis of variance (ANOVA) asks whether means of
 any groups are the same
 all groups are the same
 specific groups are the same
 selected groups are the same
Question 37
Which of the following is the first stage of agglomerative hierarchical clustering"
 By separating cluster into two finer groups
 By separating two pairs of clusters with minimal Euclidean distance between them
 By joining two clusters that are closest to each other
 By joining two clusters farthest away from each other
Question 38
2..5 pts
Which method of analysis does not classify variables as dependent and independent vanab1es?
 Analysis of variance
 Linear Regression
 Logistic regression
 Cluster analysis
Question 39
After which process in ETL, the data would be ready for indepth analysis?
 Data separation
 Data extraction
 Data loading
 Data transformation
Question 40
Clustering is part of data mining.
 Supervised
 Predictive
 Unsupervised
 Explanatory
Question 41
2.5
The clustering method uses information on all pairs of distances, not merely the minimum or maximum distances.
 Average linkage
 Single linkage
 Medium linkage
 Complete linkage
Question 42
Which of the following is not true of cluster analysis?
 Objects in each cluster tend to be similar to each other and dissimilar to objects in the other clusters.
 Cluster analysis is a technique for analyzing data when the dependent variable is categorical and the independent variables are categorical in nature.
 Custer analysis is also called segmentation analysis.
 Groups or clusters are suggested by the data, not defined a priori.
Question 43
2.5 pts
Which analysis would you perform to segment your customers for a target marketing campaign'?
 Linear Regression
 Logistic Regression
 ANOVA
 Clustering
Question 44
2.5 pts
In the data transformation process, the ETL tool transforms data in accordance viral _ established by the organization.
 Standard protocol
 Business rules and standards
 Business plan
 Business model
Question 45
2.5 pts
Which of the following is a definition of distance between two clusters in a single linkage clustering?
 The average of distance between all pairs of objects, where each pair is made up of one obiect tram each group
 The distance between the least distant pair of objects, one from each group
 The sum of square of the distance between clusters
 The distance between the most distant pair of objects, one from each group
Question 46
2.5 pts
In the data extraction process, ETL tool gathers data primarily from which c` source?
 Operational systems
 Online Vendor
 Hard disk
 Competition
Question 47 Which of the following is a false statement?
 Reducing SSE (sum of squared error) within cluster increases cohesion.
 In the cluster analysis, the objects within clusters should exhibit an high amount of similarity.
 The kmeans algorithm is a method for doing partitional clustering.
 To predict sales from transactional data one should perform clustering analysis.
Question 48
2.5 pts
is a clustering procedure characterized by the development of a dendrogram.
 Hierarchical clustering
 Divisive clustering
 kMeans clustering
 Classification technique
Question 49 In classification problems, the primary source for accuracy estimation is
 Rsquared
 Slope
 Confusion matrix
 Correlation coefficient
Question 50
To make sure that the multicollinearity is not an issue in your regression model, the measured variance inflation factor should be
 Equal to 20
 Equal to 0
 More than 20
 Less than 5
Question 51
For a hypothesis testing with correlation, the null hypothesis is:
 Correlation coefficient is 1
 Correlation coefficient is 1
 Alternative hypothesis is not true
 Correlation coefficient is 0
Question 52
Which of the following is true about multicollinearity?
 The effect of a dependent variable on another becomes difficult to isolate.
 It is best measured using the statistical variance inflation factor (VIF)
 Pvalue reduces significantly leading to rejection of the null hypothesis.
 Regression coefficients become clearer and are easier to interpret.
Question 53
In regression analysis, one uses data _______

 From an independent variable to predict he dependent variable
 From an extreme value to predict outlier
 From any variables to predict any other variable
 From an dependent variable to predict an independent variable
Question 54
Correlation coefficients between dependent and independent variables cannot be
Question 56
The lowest value of coefficient of determination is 0
Question 57
Highest value of correlation coefficient is 1
2.5
Question 58
Classification analysis can be done using.
 Multiple linear regression
 Logistic regression
 Nonlinear regression
 Linear regression
Question 60
For the best line fit diagram (shown below), which of the following statement is not true?
Question 61 When is a data table' a better way to show insights than a chart?
 With large sample data (n=1000)
 With large sample data (n=1000) and 10 different data variables.
 With small sample data (n=10) and 1000 data variables.
 With small sample data (n=10) with a couple of data variables
Question 62
2.5 pts
When you are expecting a correlation between sales and profit as shown in the graph below. what kind of visualization is this?
Question 63
2.5 pls
Which of the following statements describes one of the basic principles for creating a good chart. defined by Edward Tufte?
 The chart should display grid for easy reading
 The chart should tell a story
 The chart should apply additional visual effects so it will stand out,
 The chart should have a lot of ink
Question 66
Visualization of spatial data are most illustrative when shown using
 Bar graph
 Maps
 Bubble graphs
 Line graphs
Question 68
Which are useful principles for data visualization?
 The use of a wide range of colors is critical to emphasize distinctions
 It is important to include every possible information in a chart
 Including as many grids as possible is vital for fully specifying the data to be represented
 The chart should yield insights beyond text
Question 69
2.5 pts
Which of the following charts should not be used to display the total sales by the salesperson when it is evaluated from a dataink perspective?
 A 2D bar chart
 A 3D bar chart
 A line chart
 A 2D horizontal bar chart
Question 70
Which of the following statements is a reason not to use a table?
 Tables cannot easily show trends
 Large amount of information can be included in a very small space
 The table has more precise numbers
 Tables display more information in less space than a chart
Question 71
A set of data that describes about data in relational database is called
 Semistructured data
 Structured data
 Metadata
 Unstructured data
Question 72
2.5 pts
When you access information from two different tables connected by an identifier key, the SQL keyword you should use is
 COUNT
 ORDER BY
 GROUP BY
 INNER JOIN
Question 73
The following are among the 4V's of big data except
 Vitality
 Velocity
 Volume
 Veracity
Question 74
In a database table for 'Product', the information about a single product resides in a single
Question 75
Results can be sorted in a database using SQL statement.
 SELECT
 WHERE
 ORDER BY
 FROM
Question 76
Which SQL statement is used to extract data from a relational database?
Question 77
Which of the following is not an ondemand computing service obtained over the network?
 Software as a service
 Consulting service
 Infrastructure as a service
 Platform as a service
Question 78
NoSQL is primarily designed for
 Improve data integrity
 Big data
 Structured data
 Data that cannot be stored in flat files[u1]
Question 79
What does the acronym "SaaS" stand for?
 Software as a Service
 Storage as a Service
 Software as application service
 None of the other answers is true
Question 80
2.5 pts
What type of values you should use when creating a primary key column of a database table?
 Values that contain meaningful information
 Same value for each record
 Unique values for every record
 Values that are null