# Fit the regression model discussed above

## Estimate the linear effect of dose

Background: This second part of the assignment requires some more theoretical work based on fitting a linear regression model to investigate the effect of three dosage levels on an outcome. Suppose a clinical investigator is interested in examining the relationship between the effect of increasing doses of vitamin D supplement given to individuals who are Vitamin-D deficient. She performs a randomised trial in which she allocates (at random) volunteers to three groups, 1000 IU (International Units), 2000 IU and 3000 IU of supplement, per day for a perion of three months, after whcih the serum levels of a key metabolite of Vitamin D called 25 (OH) D are measured in each participant. (N.B. This is a hypothetical scenrio based on a real question that is current in epidemiology at the moment.)

Question 1
One possible analysis of the data described is to estimate the linear effect of dose, i.e to assume a linear relationship of expected outcome (labelled Y, as usual) to dose level, which for simplicity we will represent as X = 1,2,3 representing doses 1000IU, 2000IU, and 3000 UI respectively. To estimate the average rate of change in Y with dose we would fit the simple linear regression model with the standard assumptions for the error term:

Yi = Bo + B1x1 + ei

To objective is to show (algebraically) that if the sample size allocation between group1 1, group 2 and group 3 is 1:1:4 (i.e. n1-n, n2=n, n3 = 4n), then the leat squares estimate of B1 is

B1 = (4Y_bar3 - 3Y_bar1 - Y-bar2)/7

Question 2
The dataset provided contains some simulated data that might have arisen from the study just described, with 15 participants in groups 1, and 2, and 60 participants in dose group 3. Fit the regression model discussed above and demonstrate that the result obtained from B1 in question 1 is true in this sample.

Dataset for use in this assignment..
dosevd_reg_KA.xlsx