## Data Analysis

Assessment Item 2 Research Report

BSB123 Data Analysis

Assessment Item 2 Research Report (2017 S1)

Data

The file: Birthweights.xlsx contains data on the following variables for a sample of 1000 births recorded in a large local hospital in 2015:

Variable Description

Birthweight Birthweight in grams

Gestation Length of pregnancy in days

Smoke Whether the mother is a smoker or not

Pre-pregnancy weight Mother’s pre-pregnancy weight in kilograms

Height Mothers height in centimetres

Status Mother’s indigenous status

Age Mother’s age in years

Background

Management at the hospital is interested in being able to better manage room allocations and bookings in their maternity ward. They are keen to identify mothers at risk of having low birth weight babies who may require additional hospital resources during their stay in the hospital.

The hospital has collected data for a number of previous births at the hospital. The data contains information on the variables outlined in the table above. As a consultant, they have approached you and asked if you could analyse this dataset.

Tasks

Part 1 - Analysis (80%)

Past records (2004) show that the average birthweight was 3500 grams. Test at 5% if the average birthweight in 2015 has increased with the improvement in general nutrition.

(Include all six steps for hypothesis testing.)`(2 marks)`

Perform a two-sample t-test for each of the following tasks. (Include all six steps for hypothesis testing in each.)

(a) Determine if there is evidence that on average the weight of a baby of a mother who smokes is less than that of a mother who does not. ( = 5%)`(2 marks)`

(b) Determine if being indigenous is a disadvantage in terms of birthweight. ( = 5%)

(2 marks)

The hospital management is particularly interested in whether you can develop a regression model to help them to predict the birthweight of a baby based on the variables in the data supplied. The model could then be used to predict birthweight to identify babies at risk in future.- By using the forward stepwise method, develop a multiple regression model to predict the birthweight.

Step 1: Gestation only

Step 2: Gestation and Smoke

Step 3: Gestation, Smoke and Pre-pregnancy Weight

Step 4: Gestation, Smoke, Pre-pregnancy Weight and Height

Step 5: Gestation, Smoke, Pre-pregnancy Weight, Height and Status

Step 6: Gestation, Smoke, Pre-pregnancy Weight, Height, Status and Age

(a) Interpret the regression coefficients of all six (6) independent variables in the model obtained in Step 6, and comment on the statistical significance of each.

(3 marks)

(b) Use Excel to obtain the correlation matrix for the following variables: Gestation, Pre-pregnancy Weight, Height, Age and Birthweight. Do you think multi-collinearity is a problem in the regression model? Are the correlation coefficients consistent with the regression coefficients obtained in the model in Step 6? Discuss briefly.

(3 marks)

(c) Focusing on Steps 3 and 4, discuss fully how the introduction of Height in Step 4 affects the regression coefficient of Pre-pregnancy Weight.

(3 marks)

(d) Based on the results in (a) to (c), explain which independent variables should be included or excluded to formulate the final model. State the final model.

(2 marks)

(e) Comment on the overall adequacy of the final model.

(2 marks)

(f) Consider an indigenous mother who is a smoker, 20 years of age, and 160cm tall with a pre-pregnancy weight of 58kg and gestational age of 267 days. What is the expected weight of the child, using the final model you have developed in (d)?

(2 marks) - Compute the difference in the average birthweight of babies of indigenous and non-indigenous mothers (called the birthweight difference, for simplicity). Discuss fully if there is any discrepancy between the regression coefficient of Status obtained in the regression model and the birthweight difference.

(3 marks)

Part 2 – Report (20%)

You are required to submit a concise report (word limit: 400) presenting any important features or relationships in the data. The content of your report should be based on, but not restricted to, insights gleaned from your analyses conducted in Part 1.

(6 marks)

Notes:

Part 1 - Analysis

• For presentation and ease of marking, it is advisable to include relevant Excel output in your answer to each question in this part instead of placing them in appendices.

• There is no word limit in Part 1.

Part 2 - Report

• The report is primarily based on the data provided. If, however, you wish to include, and refer to, additional information, you can use any referencing system as long as it is used consistently.

• You can include relevant charts and Excel objects in your report.

• Use 1 & ½ spacing and font size of 11.

• The word limit of 400 (with a tolerance of 10%) is exclusive of words in tables, appendices and reference list (if any).

Submission

• You should submit your response to both parts as a single pdf document saved in the format:

BSB123 Report_StudentName.pdf

• After uploading your research report, it is your responsibility to go back to the Assignment Upload page to check that your report was properly uploaded.

• Due: 11:59 pm 28 May 2017 (Sunday) via Blackboard