# Statistics Midterm Project

Build a regression model to explain the price of the house
Save the file as: MKT9740_LastName when you submit. I would prefer if you submit the file as PDF.
Answer the questions as clearly as possible with the main results and findings explained as your response. You can append the SAS code and other analysis as an Appendix.
Do not email me SAS files or SAS Codes separately. Copy and paste the output nicely and refer to the output as you see fit as an Appendix.
Use “Equation” option in Word to write models, if necessary.

Question 1

Download the data set on 401K contribution of individuals (401subs.xls). The description of the different variables is given below:

Variable Description
e401k This equal 1 if the individual is eligble for 401(k) contribution.
inc The income level of the individuals in thousands of \$ (\$1000).
married This equal 1 if the individual is married.
male This equal 1 if the individual is male
age Age of the individual.
fsize The family size of the individual
netfina The net financial assets of the individual in thousands of \$ (\$1000).
p401k This equal 1 if the individual participates in the 401(k) program.
pira This equal 1 if the individual participates in an IRA program.
incsq Square of the income.
agesq Square of the age.

1. Build a regression model to explain the net financial assets of an individual. The explanatory/independent variables you can use are age, income, family size, whether the individual is married, male, participates in 401k etc. Summarize your findings providing detailed explanations. What do you find that is interesting.
2. Can you include both p401k and pira as explanatory variables at the same time in a model to explain the net financial assets of an individual? Why or why not? Explain clearly using analysis as needed.
3. Based on your model, what is the expected net financial assets of an individual who is 45, has a family size of 4, is married, is male and has an income of \$50 (in thousands)? You can use the mean of the other additional variables if you are using them in your model. You may make any other reasonable assumptions.

Question 2
Download the data set on house prices (Hprice.xls). The description of the different variables is given below:

Variable Description
price House price, in thousands of \$(\$1000s)
assess Assessed value, in thousands of \$(\$1000s)
bdrms Number of bedrooms in the house
lotsize Size of lot in square feet
sqrft Size of house in square feet
colonial This equals 1 if home is colonial style
lprice Log(price)
lassess Log(assess
llotsize Log(lotsize)
lsqrft Log(sqrft)

1. Build a regression model to explain the price of the house. The explanatory variables that need to be used are the number of bedrooms, lot size, home size and whether the house is a colonial. Summarize your findings. Does this model make sense?
2. Perform the various regression diagnostics to make sure that the model is appropriate. Report your findings.
3. Now redo/re-estimate the model by using log(price), log (assess), log(lotsize) and log (sqrft) in place of price, assess, lot size and sqrft. Also perform the various regression diagnostics. Summarize your findings.
4. Do the assessed values accurately reflect the actual value of the price?

Question 3
Download the data set-apples (apples.xlxs). This data set contains individuals’ purchase of regular and eco-labelled (e.g., organic) apples along with other relevant variables. The description of the different variables is given below:

Variable Description
id Identifier of the individual
educ Education level in terms of the number of years of schooling
date date: month/day/year
state home state
regprc price of regular apples \$/lb
ecoprc price of eco-labeled apples \$/lb
inseason =1 if the individual was surveyed about apples in November
hhsize household size of the individual
male =1 if individual is male
faminc family income, thousands of \$
age Age of the individual
reglbs quantity of regular apples purchased by the individual in pounds
ecolbs quantity of regular apples purchased by the individual in pounds
numlt5 Number of people in the individual’s household younger than 5 years old
num5_17 Number of people in the individual’s household with ages from 5-17
num18_64 Number of people in the individual’s household with ages from 5-17
numgt64 Number of people in the individual’s household with ages from 5-17

1. From the above, choose the variables that you think are relevant for modeling the sales of regular and eco-labelled apples (i.e., reglbs, ecolbs). Present the summary statistics of the relevant variables (and also the sales of regular and eco-labelled apples).
2. Estimate the regression model to explain the sales of regular apples. That is, run the regression model with sales of regular apples as the response/dependent variable and the variables you have chosen (in #1 above) as the dependent/explanatory variables.
3. Calculate the (own price) elasticity of regular apples. Do this for the Linear model, Semi-Log model and the Log-Log model.
4. Calculate the (own price) elasticity of eco-labelled apples. Do this for the Linear model, Semi-Log model and the Log-Log model.
5. Calculate the cross-price elasticity of regular apples and the cross-price elasticity of eco-labelled apples. You can do this for just one model.