# Statistics Midterm Project

**Build a regression model to explain the price of the house**

Save the file as: MKT9740_LastName when you submit. I would prefer if you submit the file as PDF.

Answer the questions as clearly as possible with the main results and findings explained as your response. You can append the SAS code and other analysis as an Appendix.

Do not email me SAS files or SAS Codes separately. Copy and paste the output nicely and refer to the output as you see fit as an Appendix.

Use “Equation” option in Word to write models, if necessary.

**Question 1**

Download the data set on 401K contribution of individuals (401subs.xls). The description of the different variables is given below:

Variable Description

e401k This equal 1 if the individual is eligble for 401(k) contribution.

inc The income level of the individuals in thousands of $ ($1000).

married This equal 1 if the individual is married.

male This equal 1 if the individual is male

age Age of the individual.

fsize The family size of the individual

netfina The net financial assets of the individual in thousands of $ ($1000).

p401k This equal 1 if the individual participates in the 401(k) program.

pira This equal 1 if the individual participates in an IRA program.

incsq Square of the income.

agesq Square of the age.

- Build a regression model to explain the net financial assets of an individual. The explanatory/independent variables you can use are age, income, family size, whether the individual is married, male, participates in 401k etc. Summarize your findings providing detailed explanations. What do you find that is interesting.
- Can you include both p401k and pira as explanatory variables at the same time in a model to explain the net financial assets of an individual? Why or why not? Explain clearly using analysis as needed.
- Based on your model, what is the expected net financial assets of an individual who is 45, has a family size of 4, is married, is male and has an income of $50 (in thousands)? You can use the mean of the other additional variables if you are using them in your model. You may make any other reasonable assumptions.

Question 2

Download the data set on house prices (Hprice.xls). The description of the different variables is given below:

**Variable** **Description**

price House price, in thousands of $($1000s)

assess Assessed value, in thousands of $($1000s)

bdrms Number of bedrooms in the house

lotsize Size of lot in square feet

sqrft Size of house in square feet

colonial This equals 1 if home is colonial style

lprice Log(price)

lassess Log(assess

llotsize Log(lotsize)

lsqrft Log(sqrft)

- Build a regression model to explain the price of the house. The explanatory variables that need to be used are the number of bedrooms, lot size, home size and whether the house is a colonial. Summarize your findings. Does this model make sense?
- Perform the various regression diagnostics to make sure that the model is appropriate. Report your findings.
- Now redo/re-estimate the model by using log(price), log (assess), log(lotsize) and log (sqrft) in place of price, assess, lot size and sqrft. Also perform the various regression diagnostics. Summarize your findings.
- Do the assessed values accurately reflect the actual value of the price?

**Question 3**

Download the data set-apples (apples.xlxs). This data set contains individuals’ purchase of regular and eco-labelled (e.g., organic) apples along with other relevant variables. The description of the different variables is given below:

Variable Description

id Identifier of the individual

educ Education level in terms of the number of years of schooling

date date: month/day/year

state home state

regprc price of regular apples $/lb

ecoprc price of eco-labeled apples $/lb

inseason =1 if the individual was surveyed about apples in November

hhsize household size of the individual

male =1 if individual is male

faminc family income, thousands of $

age Age of the individual

reglbs quantity of regular apples purchased by the individual in pounds

ecolbs quantity of regular apples purchased by the individual in pounds

numlt5 Number of people in the individual’s household younger than 5 years old

num5_17 Number of people in the individual’s household with ages from 5-17

num18_64 Number of people in the individual’s household with ages from 5-17

numgt64 Number of people in the individual’s household with ages from 5-17

- From the above, choose the variables that you think are relevant for modeling the sales of regular and eco-labelled apples (i.e., reglbs, ecolbs). Present the summary statistics of the relevant variables (and also the sales of regular and eco-labelled apples).
- Estimate the regression model to explain the sales of regular apples. That is, run the regression model with sales of regular apples as the response/dependent variable and the variables you have chosen (in #1 above) as the dependent/explanatory variables.
- Calculate the (own price) elasticity of regular apples. Do this for the Linear model, Semi-Log model and the Log-Log model.
- Calculate the (own price) elasticity of eco-labelled apples. Do this for the Linear model, Semi-Log model and the Log-Log model.
- Calculate the cross-price elasticity of regular apples and the cross-price elasticity of eco-labelled apples. You can do this for just one model.