1

### (Solved): CIS370 – Business Analytics - Case 3: Data Mining (50 poin...

Case 3: Data Mining (50 points)

Problem 1: A local pizza restaurant wants to get a better sense of who its customers are and how much they buy. The file Pizza_Customers.xlsx shows data collected on 30 randomly selected customers. Variables include Age, Female (1 if female, 0 otherwise), Annual Income, Married (1 if married, 0 otherwise), Own (1 if own residence, 0 otherwise), College (1 if completed college degree, 0 otherwise), Size (household size) and Spending (annual store spending).

1. Perform hierarchical clustering to group the customers based on the numerical variables only (Age, Annual Income, Size, and Spending). Describe each cluster based on the cluster characteristics.
2. Perform hierarchical clustering to group the customers based on the categorical variables (Female, Married, Own, and College), and Spending. Describe each cluster based on the cluster characteristics.
4. Experiment with other combinations of variables with Spending and find one that you believe is more insightful. Justify your recommendation.

Problem 2: A telecommunications company wants to identify customers who are likely to unsubscribe to their telephone service. The file Telecom.xlsx shows the data collected from 100 customers: ID (customer ID), Age, Income (annual income), Usage (monthly usage, in minutes), Tenure (time as a subscriber, in months), and Unsubscribe (1 if unsubscribed, 0 if still subscribed).

1. Perform k-means clustering to group the 100 customers into four clusters based on Age, Income, Usage, and Tenure. Describe the characteristics of each cluster.
2. Compute the percent of customers that have unsubscribed to the telephone service from each cluster. Which cluster has the highest percent of customers who have unsubscribed to the telephone service?
3. Experiment with another cluster size. In your opinion, which of these two cluster sizes is more insightful. Explain your answer.
4. What would you recommend to the telecommunication company as a result of this analysis?

1

### (Solved): AVS 5205: Test 2 Take - Home Part (50 points) Fall, 2023 Gi...

Test 2 Take-Home Part (50 points)

Given: October, 2023
Due: October, 2023

Part B (Take-Home): Total 50 points
1. This part of the examination contains 3 items worth a total of 50 points. You got to use Times New Roman, size 12 font, and double space throughout the paper.
3. Include copies of your statistical output at the end of the exam. Please do NOT incorporate the output as part of your solutions. Failure to follow this requirement could result in a loss of 10 points. To facilitate my grading, please label the output appropriately and include a note in your solutions that directs me to the output (e.g., “See summary statistics on page 5”).
4. All pages of your solutions, and corresponding output are to be submitted on Canvas, Wednesday, October 25th, 2023 by 5 pm EDT. No late work will be accepted.
5. You may use Gallo et al. (2023), class slides, and other resources that I provided you for reference.
6. Important Note: Although this part of the exam is being given as a take-home activity, you are expected to do your own work and follow the same exam protocols applied to an in-class examination. Please sign below to confirm that this was indeed the case.

I confirm that I completed this exam on my own and did not receive any assistance from anyone other than Dr. Sharma. I also confirm that the output attached to this exam as required in Item 3 above was the result of the analyses I performed personally and is not a copy of someone else’s output. I understand that noncompliance to these requirements could result in a grade of 0 for this part of the exam.
____________________________________________________ __________________
Signature

Use the following research description and corresponding excel data file to answer Items A-C.

A researcher wanted to investigate the relationship between different variables associated with airline transport pilots (ATPs). Thererefore, she collected the below listed variables from a sample of commercial airline pilots in the U.S. All the pilots are FAA approved Part 121 Airline Transport Pilots (ATPs).

• Age. A continuous variable and measured in years.
• Biological Sex. A self-reported dichotomous nominal variable coded 1 = Female and 0 = Male.
• Marital status. A self-reported dichotomous nominal variable coded 1 = Married and 0 = Not Married.
• Psychological distress. A continuous variable that was measured using Goldberg and Williams’ (1988) General Health Questionnaire (GHQ). The GHQ is a 12-item instrument, scored on a 4-point Likert-type response scale that is designed to measure a person’s general psychological health. Possible responses are 0 = Not at all, 1 = No more than usual, 2 = Rather more than usual, and 3 = Much more than usual. Thus, scores could range from 0 to 36, with higher scores indicating higher levels of psychological distress such as depression and anxiety.
• Self-Efficacy. A continuous variable that was measured using Chen, Gully, and Eden’s (2001) New General Self-Efficacy Scale (NGSES). The NGSES is an 8-item instrument scored on a 5-point traditional Likert response scale ranging from 1 = Strongly Disagree to 5 = Strongly Agree, with higher scores reflecting a higher level of self-efficacy.
• Total flight hours. A self-reported continuous variable.
• Hazardous events. A continuous variable that was measured using Hunter’s (2002a) 10-item Hazardous Events Scale (HES). The HES asked participants to reflect on the number of times they were involved in various aviation-related hazardous events. Possible responses for each item are 0, 1, 2, 3, and 4 or more, with higher scores indicating a higher level of actions by pilots that could lead or contribute to an unplanned or undesired event such as an accident.
• Attitudes toward aviation safety. A continuous variable that was measured using Hunter’s (2002b) Aviation Safety Attitudes Scale (ASAS). The ASAS is a 27-item instrument scored on a traditional 5- point Likert response scale ranging from 1 = Strongly Disagree to 5 = Strongly Agree, with higher scores reflecting a more positive attitude toward aviation safety.
• Risk perception. A continuous variable that was measured using Hunter’s (2006) Risk Perception-Other (RP-O) instrument. The RP-O is a 17-item instrument scored on a 100-point response scale (1 to 100). Each item presents a scenario and participants are to asked to assess the amount of risk they perceive in the given scenario. Higher scores indicate a higher degree of risk perception, which suggests the participant is more risk aversive than risk prone.

The file “Part B_Test_2” contains a set of hypothetical data relative to this imaginary study.
A. Descriptive Statistics (2 x 7 = 10 points)
1. Prepare a frequency table for all the categorical variables.
2. Prepare a descriptive statistics summary table for all the continuous variables.

Use the below description to answer items B and C (2 x 20 points = 40 points)
The researcher initially wanted to explore four primary objectives using the data that she collected:
(a) The first objective of the researcher was to compare the psychological distress of the ATPs to the military pilots. Based on her previous research experience, she found that the average psychological distress of military pilots in the U.S. was 14 when measured using GHQ (scored on a scale of 0 to 36). Although she believed that ATPs should feel less depressed than the military pilots, the related literature had mixed results and therefore she was uncertain whether ATPs’ psychological distress would be higher or lower than the military pilots.
(b) The second objective of the researcher was to investigate the relationship between ATPs’ self-efficacy and attitude towards aviation safety. Based on her literature review, the researcher claims that the two variables should have a positive correlation.
(c) The third objective of the researcher was to derive a prediction equation that could be used to predict ATPs’ risk perception, based on their total flight hours, independent of any other factor.
(d) The fourth objective was to compare the mean self-efficacy of male ATPs to the mean self-efficacy of female ATPs, to see which group has a higher self-efficacy.
However, due to time, budget, and other resource constraints, she decided to investigate only two of the four primary objectives. Help the researcher in conducting the analyses listed in items B and C
B. Conduct a Pearson Correlation analysis for one of the four objectives you think is appropriate.
C. Conduct a single sample t test for one of the four objectives you think is appropriate.

For the items B and C perform all the tasks listed below (20 points breakdown)
Pre-Data Analysis (4 points)
1. Specify the research question. (1 point)
2. Specify the research hypothesis. (1 point)
3. Determine the appropriate research methodology/design and explain why it is appropriate. (1 point)
4. Conduct an a priori power analysis to determine the minimum sample size needed. Compare this result to the size of the given data set and explain what impact the size of the given sample will have on the results relative to the minimum size needed. (1 point)

Data Analysis (8 points)
1. Conduct a hypothesis test by applying all four steps. For regression you also need to report and interpret the terms of the regression equation (i.e., B and B0). (4 steps x 2 points = 8 points)

Post-Data Analysis (8 points)
1. Determine and interpret the effect size and determine and interpret effect size from both explained variance and prediction perspectives, wherever it is required. (3 points)
2. Determine and interpret the 95% confidence interval, including its precision and AIPE. Also Determine and interpret the standard error of estimate, RMSE, wherever it is required. (2 points)
3. Determine and interpret the power of the study from a post-hoc perspective. (1 point)
4. Present at least two plausible explanations for the results (2 points)

1

### (Solved): TVS 5205: Aviation Statistics ...

TVS 5205: Aviation Statistics

An aviation student researcher examined the relationship between pilots’ simple visual reaction times and experience, which she defined as number of flight hours including simulator time.Visual reaction time was measured by having participants press a joystick immediately at tire appearance of a suprathreshold spot target and war measured in milliseconds. (Note: 1000 milliseconds - I second.) The reaction times are considered “simple" because participants did not nave to make any decisions regarding if or when to press the joystick. The file “Graded Assignment 5" contains a copy of the data the student collected from a random sample of airline transport pilots (ATPs) who agreed to participate in the study and self-reported their total flight hours. The researcher surmised that these two variables should have a strong, negative correlation: As pilot experience increases, their visual reaction times should decrease, which indicates faster reaction times.

A. Pre Dat A analysis (5 points)

1. Specify the research question and corresponding operational definitions.
2. Specify the research hypothesis.
3. Determine the appropriate research methodology/design and explain why it is appropriate.
4. Conduct an a priori power analysis to determine the minimum sample size needed. Compare this result to the size of the given data set and explain what impact the size of the given sample will have on the results relative to the minimum size needed.

1

, Conduct a hypothesis test of the mean by applying the four steps presented in Section 6,6 of the textbook

# C. Port-Data Analysts (JO points)

1. Determine and interpret the effect size, N, relative to explained variance and predictive perspective.
2. Determine and interpret the 95% confidence interval, including its precision and AIPE.
3. Determine and interpret the power of the study from a post-hoc perspective.
4. Present at least two plausible explanations for the result.

1

### (Solved): CIS 300 QUIZ | Week 2 | Attempt 1 | Score 96 out of 100 | S...

CIS 300 QUIZ | Week 2

Score for this quiz: 96 out of 100
Submitted Sep 10
This attempt took 26 minutes.

Question 1
4 / 4 pts
Which of the following is not a web design recommended practice?

• design your site to be easy to navigate
• limit the use of animated items
• use dark text on dark backgrounds

Question 2
4 / 4 pts
Select the true statement below.

• There is no need for a lot of navigation because web site visitors can just use the back button.
• Very small text is good to use for web site appealing to the baby-boomer and older market.
• Placing white space around graphics and headings helps them to stand out.
• Children like sites with dark colors.

Question 3
4 / 4 pts
When applying the design principle of ______ related items are grouped together.

• linear
• alignment
• proximity
• repetition

Question 4
4 / 4 pts
A ______ is a sketch or blueprint of a web page that shows the structure (but not the detailed design) of basic page elements such as the logo, navigation, content, and footer.

• wireframe
• drawing
• hierarchy
• site map

Question 5
4 / 4 pts
Select the group whose mission is to create guidelines and standards for Web Accessibility.

• Web Accessibility Initiative (WAI)
• International Webmasters Association (IWA)
• ICANN
• Internet Society

Question 6
4 / 4 pts
Select a good design recommendation for text hyperlinks.

• use a key phrase as a hyperlink
• none of the answers are correct
• create the entire sentence as a hyperlink

Question 7
4 / 4 pts
Select the false statement below.

• Flat web design is a minimalistic design style.
• Flat web design always displays well on all devices.
• Flat web design avoids the use of 3D effects.
• Flat web design often features vertical scrolling.

Question 8
4 / 4 pts
Select the best description of “white space”.

• empty screen area around blocks of text and images
• both “empty screen area around blocks of text and images” and “using the background color of white for a page”
• none of the answers are correct
• using the background color of white for a page

Question 9
4 / 4 pts
Applying the design principle of ______ serves to add visual interest and draw attention.

• alignment
• proximity
• contrast
• linear

Question 10
4 / 4 pts
A ______ color scheme consists of three colors that are equidistant on the color wheel.

• analogous
• complementary

Question 11
4 / 4 pts
The four principles of WCAG 2.0 are as follows:

• perceivable, operable, understandable, robust
• linear, hierarchical, random, sequential
• contrast, repetition, alignment, proximity

Question 12
4 / 4 pts
Select the items below that can help to appeal to the intended or target audience of a site.

• the amount of color used on the site
• the overall look and feel for the site
• the font size and styles used on the site
• all of the answers are correct

Question 13
4 / 4 pts
Which of the following is a CSS selector that will configure the paragraph elements within the footer element.

• footer p
• footer p
• p footer
• #footer p

Question 14
4 / 4 pts
Which of the following is an HTML attribute that configures inline styles?

• id
• style
• type

Question 15
4 / 4 pts
Embedded styles override or take precedence over external styles.

• True
• False

Question 16
4 / 4 pts
Use the ______ element to code embedded styles on a web page.

• style
• css
• embed

Question 17
4 / 4 pts
To apply a style to one or more of elements on a web page, configure a CSS ______.

• class
• attribute
• group
• id

Question 18
4 / 4 pts
Which of the following is the CSS property used to set the background color?

• bgcolor
• color
• none of the answers are correct
• background-color

Question 19
4 / 4 pts
Which of the following is correct CSS syntax?

• p : color #000000;
• p { color: #000000; }
• p { color;#000000; }
• p { color=#000000; }

Question 20
4 / 4 pts
Use the ______ element to create logical areas on a web page that are embedded within paragraphs or other block formatting elements.

• neither div nor span
• div
• both div and span
• span

Question 21
4 / 4 pts
When CSS is coded in the body of the web page as an attribute of an HTML tag it is called ______.

• External
• Inline
• Embedded
• Imported

Question 22
4 / 4 pts
Select the code below that uses CSS to configure a background color of #eaeaea for a web page.

• body {bgcolor:#eaeaea; }
• body {background-color:#eaeaea; }
• none of the answers are correct
• document {background-page:#eaeaea; }

Question 23
4 / 4 pts
Which of the following is a CSS selector that will configure the anchor elements within the nav element.

• nav a
• #nav a
• .nav a
• nav anchor

Question 24
4 / 4 pts
External styles override or take precedence over inline styles.

• True
• False

Question 25
4 / 4 pts
To apply a style to exactly one element on a web page, configure a CSS ______.

• group
• none of the answers are correct
• id
• class

1

### (Solved): CIS 300 Quiz | Week 1 | Attempt 1 | 24 minutes | 96 out ...

Quiz | Week 1

Attempt 1

24 minutes

Score for this quiz 96 out of 100

Question 1
4 / 4 pts
The first widely used graphical web browser was developed at:

• W3C
• ARPA
• CERN
• NCSA

Question 2
4 / 4 pts
_________ combines the formatting strengths of HTML 4.0 and the data structure and extensibility strengths of XML.

•  SGML
• DHTML
• HTML 5.0
• XHTML

Question 3
4 / 4 pts
The Domain Name System is used to purchase domain names.

• True
• False

Question 4
4 / 4 pts
Select the true statement from the list below.

• A country code domain name can only be owned by someone who resides in that country.
• When a domain name ends in .com it indicates that it is a computer company.
• None of these statements are true
• Only non-profit organizations can purchase a .org domain name

Question 5
4 / 4 pts
The ____ protocol is a set of rules that controls how data is sent between computers on the Internet.

• TCP
• FTP
• HTTP
• IP

Question 6
4 / 4 pts
Select the item below that lists the top level domain name for the URL http://www.yahoo.com.

• yahoo
•  com
• www
• http

Question 7
4 / 4 pts
New Top Level Domains (TLDs) are coordinated by

• no one, because anyone can add a TLD to the Domain Name System
• W3C
• TCP
• ICANN

Question 8
4 / 4 pts
A ________ consists of two or more computer connected for the purpose of communicating and sharing resources.

• MIME
• client
• server
• network

Question 9
4 / 4 pts
The purpose of the ___________ protocol is to ensure the integrity of the communication.

• FTP
• TCP
• IP
• HTTP

Question 10
4 / 4 pts
A language using a text-based syntax intended to extend the power of HTML by separating data from presentation is called _______.

• SGML
• DHTML
• XHTML
• XML

Question 11
4 / 4 pts
What type of HTML list will automatically place a list marker, or bullet point, indicator in front of each item?

• description list
• bullet list
• unordered list
• ordered list

Question 12
4 / 4 pts
The purpose of the ________ element is to contain information that would typically be some type of fine print or a disclaimer of some kind.

• legal
• small
• em
• dd

Question 13
4 / 4 pts
What element contains each item in an ordered or unordered list?

• li
• dd
• dt
• item

Question 14
4 / 4 pts
The purpose of the ________ element is used to configure the main navigation area on a web page.

• command
• main
• nav

Question 15
4 / 4 pts
Use the ______ element to create a generic area or section on a web page that is physically separated from others.

• h1
• strong
• div
• small

Question 16
4 / 4 pts
The text contained between title tags is:

• Never seen by your web page visitor.
• Not used by search engines
• Displayed in the title bar of the browser window
• Not displayed by browsers

Question 17

4 / 4 pts
What type of HTML list will automatically place a number in front of the items?

• numbered list
• ordered list
• description list
• unordered list

Question 18
4 / 4 pts
How would you configure a hyperlink from the index.html file to another file named services.html which is located in the same folder?

Question 19
4 / 4 pts
Choose the special character that is used to indicate a blank space.
*ignore the extra spacing spacing. It was added because Canvas doesn't render it correctly.

• & nbsp ;
• & copy ;
• & blank ;
• & space ;

Question 20
4 / 4 pts
Choose the preferred element to use when configuring important text that is intended to be displayed a bold font style.

• strong
• em
• i
• b

Question 21
4 / 4 pts
Choose the preferred element to use when displaying text in a bold font style when there is no special importance to the words in the text.

• b
• strong
• i
• em

Question 22
4 / 4 pts
The __________ attribute of the anchor element can cause the new web page to open in its own browser window.

• window
• name
• id
• target

Question 23
4 / 4 pts
Choose the elements that are used in a description list.

• all of these answers are correct
• dd
• dl
• dt

Question 24
4 / 4 pts
Select the true statement from the choices below.

• A web page must pass syntax validation testing before it is used.
• Validation testing guarantees that your web page will looks good.
• A web page will not display in a browser unless it passes syntax validation testing.
•  Invalid code may cause browsers to render the pages slower than otherwise.

Question 25
4 / 4 pts
Select the element used to hyperlink web pages to each other from the list below:

• target
• anchor

1

### (Solved): CIS 355 Final Exam | Attempt 1 | Scored 120 out of 120 | May...

CIS 355 Final Exam | Attempt 1 | Scored 120 out of 120
Submitted May 6, 2022
This attempt took 69 minutes.

Question 1
3 / 3 pts
When a fact can be summed in certain situations but not in others, it is referred to as:

• Type 3
• None of the above

Question 2
3 / 3 pts
True or False: Once an organization builds data warehouses and data marts, they can no longer receive business reports from their operational source systems.

• True
• False

Question 3
3 / 3 pts
True or False: With a "rolling append" ETL pattern, you will start with some fixed period of history - for example, 3 years of sales data - and then expand the time frame of your historical data from that point.

• True
• False

Question 4
3 / 3 pts
True or false: you will find data warehousing used in governmental settings as well as in business.

• True
• False

Question 5
3 / 3 pts
True or False: In a star schema, you can always determine the order of a hierarchy (e.g., a PRODUCT hierarchy that also includes BRAND and CATEGORY) by looking at the order of how the database columns are listed. For example, if the PRODUCT columns are listed first, then the BRAND columns, and then the CATEGORY columns last, that means that products are part of brands, which are then part of categories.

• True
• False

Question 6
3 / 3 pts
Which slowly changing dimension (SCD) model does not retain history?

• Type 1
• Type 2
• Type 3
• Neither Type 1 nor Type 2 retains history
• Neither Type 1 nor Type 3 retains history
• None of the above is correct

Question 7
3 / 3 pts
True or False: Type 3 slowly changing dimensions (SCDs) are used less frequently and commonly than Type 1 and Type 2 SCDs in real-world data warehousing.

• True
• False

Question 8
3 / 3 pts
True or False: A data element that is numeric is always a fact; it can never be a dimension.

• True
• False

Question 9
3 / 3 pts
Which of the following is not one of the original "Bill Inmon 4 rules for data warehousing":

• Subject-oriented
• Volatile
• Time-variant
• All of the above are "Inmon rules"

Question 10
3 / 3 pts
True or False: For the same set of data that we want to dimensionalize, a snowflake schema has more foreign keys in a fact table than in a star schema.

• True
• False

Question 11
3 / 3 pts
Which of the following statements is/are correct?

• A. In SQL, a primary key/surrogate key must have an INT data type
• B. In SQL, if your dimension table is including natural keys, a natural key must have an INT data type
• C. In SQL, a fact must have an INT data type
• A and B are correct
• A and C are correct
• B and C are correct
• A, B, and C are correct

Question 12
3 / 3 pts
An accumulating snapshot fact table is used to track and analyze the progress of a business process through formally defined stages.

• True
• False

Question 13
3 / 3 pts
True or False: In a snowflake schema, you can have at most one "flat" dimension that isn't part of a hierarchy.

• True
• False

Question 14
3 / 3 pts
The most "architecturally flexible" slowly changing dimension (SCD) model for maintaining history of changes in a data warehouse is:

• Type 1
• Type 2
• Type 3
•  All of the above maintain history and do so with equivalent "architectural flexibility"

Question 15
3 / 3 pts
"Branching" in a snowflake schema can only occur at the lowest-level dimension table in a hierarchy.

• True
• False

Question 16
3 / 3 pts
True or False: If a source system contains a natural key for data that you will be putting into a fact table - for example, SHIPMENT_ID for data about a shipment of products, how much revenue will come in from the shipment, etc. - you should use that natural key such as SHIPMENT_ID for the primary key of the fact table, rather than a combination key made up of all the surrogate keys/foreign keys. So your SQL would look like:
CREATE TABLE SHIPMENT_FACT (
SHIPMENT_ID CHAR(15) NOT NULL,
PRODUCT_KEY INT NOT NULL,
CUSTOMER_KEY INT NOT NULL,
PRIMARY KEY (SHIPMENT_ID),
FOREIGN KEY PRODUCT_KEY REFERENCES PRODUCT_DIM (PRODUCT_KEY),
...

• True
• False

Question 17
3 / 3 pts
True or False: In a snowflake schema, you will typically have less duplicated dimension data overall (especially for the higher levels of your hierarchies) than in a star schema

• True
• False

Question 18
3 / 3 pts
Which of the following scenarios is/are most applicable to using a Type 3 slowly changing dimension?

• A. Employees receiving salary increases
• B. Sales territory reorganizations
• C. Employee home addresses changing
• Both A and B
• Both A and C
• Both B and C
• A, B, and C are all applicable for Type 3 SCDs
• Neither A, B, nor C is applicable for Type 3 SCDs

Question 19
3 / 3 pts
True or False: A factless fact table is the most appropriate dimensional modeling structure for recording online users viewing web pages along with the amount of time a user spent on each web page during each view.

• True
• False

Question 20
3 / 3 pts
A "cube" is typically used for VERY large volumes of data in a data warehouse.

• True
• False

Question 21
3 / 3 pts
Which of the following scenarios is/are most applicable to using a Type 1 slowly changing dimension?

• A. Employees receiving salary increases
• B. Sales territory reorganizations
• C. Employee home addresses changing
• Both A and B
• Both A and C
• Both B and C
• A, B, and C are all applicable for Type 1 SCDs
• Neither A, B, nor C is applicable for Type 1 changes

Question 22
3 / 3 pts
Which of the following statements is/are true about a "non-terminal dimension table" in a snowflake schema?

• A. You have a compound primary key made up of more than one surrogate keys
• B. You have only one foreign key constraint
• C. You have one or more foreign key constraints
• D. You have no foreign key constraints
• A and B are correct
• A and C are correct
• A and D are correct
• Neither A, B, C, nor D is correct

Question 23
3 / 3 pts
True or False: If you decide to use a "tracking fact" in a factless fact table, that tracking fact must always have a value of zero.

• True
• False

Question 24
3 / 3 pts
Which of the following words are signals to look for a dimension?

• "COUNT" and "SUM"
• "SELECT" and "BY"
• "FOR" and "BY"
• None of the above

Question 25
3 / 3 pts
True or False: A "factless fact table" is another name for a dimension table.

• True
• False

Question 26
3 / 3 pts
For your ETL design, which type of dimensional data do you process first?

• Dimensional data
• Fact table data
• Either dimensional data or fact table data, based on whichever data volume is lesser
• Either dimensional data or fact data, depending on which data is more important

Question 27
3 / 3 pts
With a persistent staging layer:

• Once data has been transformed and loaded into the performance layer, that data is no longer needed and is deleted
• After data has been transformed and loaded into the performance layer, the data is still retained in the staging layer
• The data in the staging layer is stored in the form of dimension tables and fact tables
• None of the above is true

Question 28
3 / 3 pts
A capability introduced into Relational Database Management Systems (RDBMSs) to improve performance for data warehousing was:

• A. Star joins
• B. Snowflake joins
• C. Bitmapped indices
• A and B above
• A and C above
• B and C above
• A, B, and C above
• None of the above

Question 29
3 / 3 pts
Which of the following statements is/are true about a "coverage/eligibility/state of being" faceless fact table?

• A. You typically need 2 relationships back to a date/time dimension
• B. You will always have one and only one fact/measurement in the table
• C. You do not need to have FOREIGN KEY/REFERENCES constraint clauses, unlike other fact tables
• All of the above - A, B, and C - are correct

Question 30
3 / 3 pts
With a non-persistent staging layer:

• Once data has been transformed and loaded into the performance layer, that data is no longer needed and is deleted
• After data has been transformed and loaded into the performance layer, the data is still retained in the staging layer
• The data in the staging layer is stored in the form of dimension tables and fact tables
• None of the above is true

Question 31
3 / 3 pts
True or False: For the same set of data that we want to dimensionalize, a snowflake schema typically has the same number of dimension tables as a star schema.

• True
• False

Question 32
3 / 3 pts
Which of the following scenarios is/are most applicable to using a Type 2 slowly changing dimension?

• A. Employees receiving salary increases
• B. Sales territory reorganizations
• C. Employee home addresses changing
• Both A and B
• Both A and C
• Both B and C
• A, B, and C are all applicable for Type 2 SCDs
• Neither A, B, nor C is applicable for Type 2 SCDs

Question 33
3 / 3 pts
True or False: In a CREATE TABLE statement for a fact table, presuming that you have a combination primary key, you will have one FOREIGN KEY constraint clause for each column/field used as part of your combination primary key.

• True
• False

Question 34
3 / 3 pts
Which of the following statements is not true about a star schema?

• We can work with dimension tables without needing to use fact tables
• We can work with fact tables without needing to use dimension tables
• In a star schema, you only have one “level” of dimension tables connected to a fact table
• All 3 of the above statements are actually true

Question 35
3 / 3 pts
In a snowflake schema, each "non-terminal" dimension table will contain:

• At least one FOREIGN KEY clause, but possibly more than one of the hierarchy branches
• A composite/combination PRIMARY KEY
• Always at least two FOREIGN KEY clauses

Question 36
3 / 3 pts
True or False: Best practices call for storing natural keys in fact tables, but not in dimension tables.

• True
• False

Question 37
3 / 3 pts
True or False: In a star schema, if you have a many-to-many relationship between dimensional data - for example, a doctor can perform multiple surgeries, and any given surgery can be performed by multiple doctors - then your model needs to use foreign keys in your dimension tables show that many-to-many relationship between the two dimension tables (e.g., DOCTOR_DIM and SURGERIES_DIM) that you create.

• True
• False

Question 38
3 / 3 pts
When a fact cannot be meaningfully summed in any situation, it is referred to as:

• Hybrid
• None of the above

Question 39
3 / 3 pts
True or false: when doing ETL design, you will always make a Type 1 change to exactly one row in a dimension table.

• True
• False

Question 40
3 / 3 pts
True or False: when phrasing a dimensional question, every applicable dimension must be "signaled" by either the word "BY" or the word "FOR."

• True
• False

1

### (Solved): CIS 355 Midterm Exam | Points 100 | Attempt 1 - Score for t...

CIS 355 Midterm Exam

Points 100 |  Questions 25

Attempt 1
Score for this quiz: 100 out of 100

Time Limit 75 Minutes

Question 1
4 / 4 pts
Which of the following is not a common incremental ETL pattern?

• Virtual append
• Complete replacement
• All of the above are common incremental ETL patterns

Question 2
4 / 4 pts
True or False: “Data Virtualization” is a related but different approach to building a “Data Warehouse” – the two terms do not mean exactly the same.

• True
• False

Question 3
4 / 4 pts
Which of the following is not one of the four original “Bill Inmon Rules” for data warehousing:

• Integrated
• Time-variant
• Dimensional data
• All of the answer selections are among the four original “Bill Inmon Rules”

Question 4
4 / 4 pts
True or False: all enterprise data warehouses are built using relational databases, while all data marts are built using cubes.

• True
• False

Question 5
4 / 4 pts
In early-generation data warehouses, architects and planners would typically need to make design compromises because of technology limitations. Which of the following was a common design compromise found in early data warehouses:

• Building centralized rather than component-based data warehouses
• Aggregation rather than individual transaction data
• Incorporating front-end data marts into the overall architecture
• All of the answer selections were early-generation DW design compromises

Question 6
4 / 4 pts
Today’s enterprise-scale data warehouses commonly receive data from _____ (number) of source applications:

• No more than ten
• Thousands
• Hundreds of thousands
• None of the answer selections are correct

Question 7
4 / 4 pts
True or False: Today’s data warehouses can easily support many terabytes of data.

• True
• False

Question 8
4 / 4 pts
Which of the following is one of the common “flavors” or types of business intelligence (BI)?

• Tell me what happened, and why
• Tell me what is likely to happen (predictive analytics)
• Tell me something interesting and important
• All of the answer selections are common “flavors” or types of BI

Question 9
4 / 4 pts
The unlabeled “disk symbol” on the right side of the diagram could represent which one or more of the following:

• A. An independent data mart
• B. A dependent data mart
• C. A centralized data warehouse
• Answers (A) and (C) both are correct
• Answers (B) and (C) both are correct
• Answers (A), (B), and (C) are all correct

Question 10
4 / 4 pts
True or False: A multidimensional database “cube” is commonly used rather than a relational database when an organization has exceptionally large volumes of data to include in the data warehouse.

• True
• False

Question 11
4 / 4 pts
In the data warehousing “wholesaler-retailer” paradigm, which of the following components can be thought of as a “supplier” of data?

• Source application
• Data warehouse
• Data mart

Question 12
4 / 4 pts
True or False: An independent data mart is architecturally identical to a data warehouse.

• True
• False

Question 13
0 / 4 pts
In an environment that makes use of front-end data marts, the enterprise data warehouse (EDW) component in such an environment provides which type of business intelligence/reporting/analytics?

• Organization-specific operational reporting
• Enterprise-wide strategic reporting and BI
• Organization-specific predictive analytics
• None of the answer selections are correct

Question 14
4 / 4 pts
True or False: When you have an enterprise data warehouse that feeds data to dependent data marts, every dependent data mart must have its data organized dimensionally to support “classic” business intelligence and online analytical processing (OLAP). You would never feed data from an EDW into a data mart that is structured to support data mining.

• True
• False

Question 15
4 / 4 pts
We’ve discussed how one particular relational database capability created difficulties for early-generation relational databases, and how significant work needed to occur in RDBMS products to give RDBMSs satisfactory performance for data warehousing. What relational database capability is this?

• CREATE TABLE
• WHERE clauses
• JOIN operations
• None of the answer selections

Question 16
4 / 4 pts
True or False: We typically do not delete all information about a customer in a data warehouse or data mart even if that customer has been deleted from a source application.

• True
• False

Question 17
4 / 4 pts
Two common ETL transformation models are:

• Data hubs and source merging
• Non-persistent staging layers and de-duplication
• Address matching/standardization and data virtualization
• Dropping columns and value based row/record filtering

Question 18
4 / 4 pts
When we refer to a relational database management system (RDBMS) as sort of a “Swiss Army Knife” for data management, we mean:

• An RDBMS can only support analytical processing
• An RDBMS can only support transactional processing
• An RDBMS can be built using either physical pointers among database tables or logical relationships among database tables
• None of the answer selections are correct

Question 19

4 / 4 pts
True or False: “ETL” is an important aspect of data warehousing, and is an acronym for several combinations of words; one of those combinations is “Extraction, Transaction, and Loading.”

• True
• False

Question 20
4 / 4 pts
True or False: “Front end data marts” and “Dependent data marts” are the same thing.

• True
• False

Question 21
4 / 4 pts
True or False: a “Foreign Key” in a relational database table is the formal name for the unique identifier of any given row in that table.

• True
• False

Question 22
4 / 4 pts
True or False: a “star join” is a technical means for a relational database management system to efficiently join data from potentially a large number of database tables , typically when a database is being used for data warehousing purposes.

• True
• False

Question 23
4 / 4 pts
True or False: In a Corporate Information Factory (CIF) approach to component-oriented data warehousing, users access data from either the EDW component or the data marts, whichever suits their needs best.

• True
• False

Question 24
4 / 4 pts
You will find which of the following in the Federated EDW architectural approach:

• A. An architected, component-oriented data warehousing environment
• B. Independent data marts
• C. An integrated, centralized EDW
• A and B above
• A and C above
• B and C above
• A, B, and C above
• None of the answer selections are correct

Question 25
4 / 4 pts
True or False: a “cube” used for data warehousing could have more than 3 dimensions, despite the usage of the “cube” term.

• True
• False

1

### (Solved): CIS 355 Assignment 4 ...

Using the same subject area as you did for Assignment #3 (unless directed by your instructor to choose another subject area), you will create a star schema SQL Server model, with tables pasted as in Assignment 3, containing both types of factless fact tables, as discussed in the course lecture:

Case 1: business activity ("something happened”), but nothing to measure

Case 2: “conditions, coverage, or eligibility” – a “state of being” – but nothing to measure

If you had any points deducted for your dimension tables in Assignment #3, you will need to "fix" any issues for Assignment #4.

You will remove your transaction-grained fact tables from your model and replace them with factless fact tables, as described above.

You also may need to add a DATE/TIME-related dimension table if you didn't have one in Assignment #3; if you do, you can EITHER delete one of your Assignment 3 dimension tables and replace with DATE/TIME, or just add the new DATE/TIME one to the existing 3; your choice based on what makes sense for your model.

25 points total: 9 points for dimension tables, 8 points for each of your faceless fact tables

• Dimension tables: 3 points for each one (including DATE/TIME dimension; if you have 4 total, all 4 will factor into this portion of the grading), "all or nothing" correctness:
• surrogate key as primary key with correct data type and NOT NULL clause
• correct PRIMARY KEY clause
• no FOREIGN KEY clause or other syntax violations
• Fact tables: 8 points for each, based on:
• SQL syntax: 4 points, "all or nothing" including PRIMARY and FOREIGN KEY clauses/constraints; correct primary key designation; no facts/measurements, only keys; other syntax
• Correct fact table usage: 4 points
• For both fact tables: your dimensions are correctly and fully related to the business subject area that you selected - (example: if you were to do "SEASON TICKETS" then you must have one or more appropriate dimensions related to SEASON TICKETS  - in other words, you need to have more than just syntactically correct PK and FK columns, your model needs to "make sense")
• "Business event with nothing to measure" must be correct and related to health care/hospital
• "Coverage/eligiblity/'state of being'" must be correct and related to ASU ATHLETICS - make sure that your relation(s) to your date/time dimensions are syntactically and semantically correct

1

### (Solved): Knowledge Check 1 | Managerial Economics (JWI515006JWONL-12...

Knowledge Check 1

Managerial Economics

(JWI515006JWONL-1224-001)

Attempt Score    4 out of 4 points

QUESTION 1
GDP is expressed in inflation adjusted dollars.

• True
• False

0.4 points

QUESTION 2
Which of the following statements in NOT true?
A.    Raising the Interest rate decreases the rate of Inflation.
B.    Lagging Indicators can predict the economy's future direction.
C.    Perfect Competition involves many sellers offering similar goods.
D.    Cost-push is one of the causes of Inflation.
0.4 points

QUESTION 3
GDP calculation does not reflect depreciation of machinery and other capital assets. Therefore, it is considered to be a ______ measure.
A.    Net
B.    Nominal
C.    Gross
D.    Annual
E.    Marginal
0.4 points

QUESTION 4
The Federal Reserve regulates banks and enacts U.S. monetary policy.

• True
• False

0.4 points

QUESTION 5
Decreasing the Interest Rate ...
A.    Slows down the Rate of Inflation.
B.    Makes it harder for consumers to borrow money.
C.    Increases the flow of cash into the marketplace.
D.    Makes it harder for businesses to borrow money.
0.4 points

QUESTION 6
Which of the following is NOT a significant Indicator in Macroeconomics?
A.    The Unemployment Rate.
B.    The Gross Domestic Product.
C.    Consumption of Goods and Services.
D.    The Rate of Inflation.
0.4 points

QUESTION 7
The Consumer Price Index (CPI) is reported once every quarter.

• True
• False

0.4 points

QUESTION 8
A basket of goods bought by a typical consumer is used to ...
A.    Calculate the Interest Rate.
B.    Measure the Gross Domestic Product.
C.    Calculate the Consumer Price Index.
D.    Illustrate the law of Supply and Demand.
0.4 points

QUESTION 9
Prior to policy changes due to COVID-19, to be counted in the Unemployment Rate, a person must ...
A.    Have given up trying to find employment.
B.    Be less than the retirement age of 65 years old.
C.    Have lost a job through involuntary layoffs.
D.    Have been actively seeking a job for the past 4 weeks.
0.4 points

QUESTION 10
The Federal Reserve (FED) sets the Interest Rate for the U.S.A. In this function, the Fed operates as a ...
A.    Monopoly.
B.    Coincident Indicator.
C.    Regulating entity.
D.    Market Structure.

1

### (Solved): Business data analytics|jmp|tableau|excel|Create Pricing For...

Task: Create Pricing Formula for renting Airbnb in Chicago.

create a new column in the data where you do a formula that would calculate all the fees together and create a "cost per day" or something that could be used to determine the spectrum of cost for each rental from least to most expensive.

Hypothesis:  All areas have the same average cost?

USE DATA: Fees: (cleaning, minimum nights required, security deposit, cancellation fee)

• Use JMP Pro Excel, and Tableau
• Must perform some data cleaning (outlier analysis)
• Test correlations between data variables
• Perform appropriate statistical test (t-test, ANOVA, Regression, Cluster analysis)

Questions

Q1: Is the hypothesis true explain why in a few sentences.

Q2: Use tableau and show the areas with rentals from least to most expensive. Explain in your own words why they are expensive and why they are cheap.

Q3: If hypothesis is not true, explain why its not true in a few sentences.

Showing Page 1 of 154 Pages