Hawkes Learning Chapter 8 Certification Chapter 8 Review Questions in Statistics

Qn1. direct mail company wishes to estimate the proportion of people on a large mailing list that will purchase a product. Suppose the true proportion is 0.07. If 402 are sampled, what is the probability that the sample proportion will be less than 0.04 ? Round your answer to four decimal places.

Qn2. Suppose a large shipment of laser printers contained 17% defectives. If a sample of size 224 is selected, what is the probability that the sample proportion will be greater than 16%? Round your answer to four decimal places.

Qn3. A carpet expert believes that 7% of Persian carpets are counterfeits. If the expert is accurate, what is the probability that the proportion of counterfeits in a sample of 574 Persian carpets would be less than 6%? Round your answer to four decimal places.

Qn4. The mean life of a television set is 138 months with a variance of 324. If a sample of 83 televisions is randomly selected, what is the probability that the sample mean would differ from the true mean by less than 5.4 months? Round your answer to four decimal places.

Qn5. A courier service company wishes to estimate the proportion of people in various states that will use its services. Suppose the true proportion is 0.07. If 330 are sampled, what is the probability that the sample proportion will differ from the population proportion by greater than 0.03? Round your answer to four decimal places.

Qn6 . If 330 are sampled, what is the probability that the sample proportion will differ from the population proportion by greater than 0.03? Round your answer to four decimal places.

Qn6. Suppose babies born in a large hospital have a mean weight of 3225 grams, and a standard deviation of 535 grams. If 106 babies are sampled at random from the hospital, what is the probability that the mean weight of the sample babies would differ from the population mean by more than 53 grams? Round your answer to four decimal places.

Qn7. Suppose 55% of the population has a college degree. If a random sample of size 496 is selected, what is the probability that the proportion of persons with a college degree will differ from the population proportion by less than 4%? Round your answer to four decimal places.

Qn8. The mean points obtained in an aptitude examination is 183 points with a standard deviation of 13 points. What is the probability that the mean of the sample would be greater than 186.2 points if 73 exams are sampled? Round your answer to four decimal places.

Qn9. Suppose cattle in a large herd have a mean weight of 1158lbs and a standard deviation of 92lbs. What is the probability that the mean weight of the sample of cows would be less than 1149lbs if 55 cows are sampled at random from the herd? Round your answer to four decimal places.

Qn10. Thompson and Thompson is a steel bolts manufacturing company. Their current steel bolts have a mean diameter of 135 millimeters, and a variance of 64. If a random sample of 32 steel bolts is selected, what is the probability that the sample mean would be less than 133 millimeters? Round your answer to four decimal places.

Analysis of a Fitted Multiple Regression Model

If your “best model” selected in the previous chapter contains more than one x-variable, run that regression model in SAS. If it only has one x-variable, use the best model that had more than one x-variable. Answer the following questions.

  1. Analysis of Output
    (a) The t-tests
    (i) What is being tested?
    (ii) What are the results of the tests? (Careful, these are partial coefficients.)

(b) The F-test

(i) State the null and alternate hypotheses for your model. 
(ii) State the conclusion of the test and the grounds for the decision.

(c) The -equation

    (i) State the equation for your fitted model;
 (ii) Explain how is related to the E(YX).
  1. R-square
    (a) Report the value of R-square;
    (b) Show how it was computed from the software output;
    (c) Explain the meaning of your R-square using the approach found in Lecture Notes 5.
    (d) Give the naïve interpretation of your R-square, as discussed on LN5.
  2. Adjusted R-square
    (a) Report the value of the adjusted R-square;
    (b) Show how it was computed from the output;
    (b) What is the meaning of your adjusted R-square value?
  3. The Partial Regression Coefficients
    (a) Run the simple regression of y on one of your x-variables from your best model to show that this produces a sample slope coefficient that differs from the partial sample slope coefficient for the same x-variable when your other x-variables are in the model.
    (b) Describe the steps by which you can use a series of regressions to compute the above partial regression coefficient for the selected x-variable, above. This involves regressing y on those other x-variables, saving the residuals, and so on, until finally you regress one set of residuals on a second set of residuals.
    (c) Demonstrate that what you described in (b) actually works!
    (d) Output from SAS the partial R-square and the partial correlations coefficient for the same x-variable on which you analyzed the partial coefficient in your best model.
    (e) Using the same approach as in (c), show that the ordinary R-square for the regression of those residuals yields the partial R-square. Likewise for the partial correlation coefficient.
  4. Model Diagnostics
    (a) Explain what is shown by Cook’s distance diagnostic plot for your model.

Chapter 7
Run a ridge regression analysis of your data. Try in your own words to explain what that technique does and interpret the output. Supply your own Headers.

Build a regression model to explain the price of the house
Save the file as: MKT9740_LastName when you submit. I would prefer if you submit the file as PDF.
Answer the questions as clearly as possible with the main results and findings explained as your response. You can append the SAS code and other analysis as an Appendix.
Do not email me SAS files or SAS Codes separately. Copy and paste the output nicely and refer to the output as you see fit as an Appendix.
Use “Equation” option in Word to write models, if necessary.

Question 1

Download the data set on 401K contribution of individuals (401subs.xls). The description of the different variables is given below:

Variable Description
e401k This equal 1 if the individual is eligble for 401(k) contribution.
inc The income level of the individuals in thousands of $ ($1000).
married This equal 1 if the individual is married.
male This equal 1 if the individual is male
age Age of the individual.
fsize The family size of the individual
netfina The net financial assets of the individual in thousands of $ ($1000).
p401k This equal 1 if the individual participates in the 401(k) program.
pira This equal 1 if the individual participates in an IRA program.
incsq Square of the income.
agesq Square of the age.

  1. Build a regression model to explain the net financial assets of an individual. The explanatory/independent variables you can use are age, income, family size, whether the individual is married, male, participates in 401k etc. Summarize your findings providing detailed explanations. What do you find that is interesting.
  2. Can you include both p401k and pira as explanatory variables at the same time in a model to explain the net financial assets of an individual? Why or why not? Explain clearly using analysis as needed.
  3. Based on your model, what is the expected net financial assets of an individual who is 45, has a family size of 4, is married, is male and has an income of $50 (in thousands)? You can use the mean of the other additional variables if you are using them in your model. You may make any other reasonable assumptions.

Question 2
Download the data set on house prices (Hprice.xls). The description of the different variables is given below:

Variable Description
price House price, in thousands of $($1000s)
assess Assessed value, in thousands of $($1000s)
bdrms Number of bedrooms in the house
lotsize Size of lot in square feet
sqrft Size of house in square feet
colonial This equals 1 if home is colonial style
lprice Log(price)
lassess Log(assess
llotsize Log(lotsize)
lsqrft Log(sqrft)

  1. Build a regression model to explain the price of the house. The explanatory variables that need to be used are the number of bedrooms, lot size, home size and whether the house is a colonial. Summarize your findings. Does this model make sense?
  2. Perform the various regression diagnostics to make sure that the model is appropriate. Report your findings.
  3. Now redo/re-estimate the model by using log(price), log (assess), log(lotsize) and log (sqrft) in place of price, assess, lot size and sqrft. Also perform the various regression diagnostics. Summarize your findings.
  4. Do the assessed values accurately reflect the actual value of the price?

Question 3
Download the data set-apples (apples.xlxs). This data set contains individuals’ purchase of regular and eco-labelled (e.g., organic) apples along with other relevant variables. The description of the different variables is given below:

Variable Description
id Identifier of the individual
educ Education level in terms of the number of years of schooling
date date: month/day/year
state home state
regprc price of regular apples $/lb
ecoprc price of eco-labeled apples $/lb
inseason =1 if the individual was surveyed about apples in November
hhsize household size of the individual
male =1 if individual is male
faminc family income, thousands of $
age Age of the individual
reglbs quantity of regular apples purchased by the individual in pounds
ecolbs quantity of regular apples purchased by the individual in pounds
numlt5 Number of people in the individual’s household younger than 5 years old
num5_17 Number of people in the individual’s household with ages from 5-17
num18_64 Number of people in the individual’s household with ages from 5-17
numgt64 Number of people in the individual’s household with ages from 5-17

  1. From the above, choose the variables that you think are relevant for modeling the sales of regular and eco-labelled apples (i.e., reglbs, ecolbs). Present the summary statistics of the relevant variables (and also the sales of regular and eco-labelled apples).
  2. Estimate the regression model to explain the sales of regular apples. That is, run the regression model with sales of regular apples as the response/dependent variable and the variables you have chosen (in #1 above) as the dependent/explanatory variables.
  3. Calculate the (own price) elasticity of regular apples. Do this for the Linear model, Semi-Log model and the Log-Log model.
  4. Calculate the (own price) elasticity of eco-labelled apples. Do this for the Linear model, Semi-Log model and the Log-Log model.
  5. Calculate the cross-price elasticity of regular apples and the cross-price elasticity of eco-labelled apples. You can do this for just one model.
            Homework Assignment #3
            First assignment on Semester Project
             

As the first step in your Regression Project, find a data set that is of interest to you. The data set should contain at least 50 rows of data and have a y-variable and an x-variable for now, and at least 4 x-variables as regressors later on (to make the "model selection" sections interesting). Some possible sources of data sets are given in the posted guide.
However, if you do not readily find a data set, do not waste all weekend trying to find the "perfect" data set. Rather, just grab some baseball data (but not mine) or some quarterly Bureau of Labor Statistics data and use that for this assignment. Then, if you decide to use something different for your project, you will find it is fairly easy to redo this assignment for inclusion in your project, because you will have already have done the assignment once.

For the SAS and R questions, you are free to insert your data (and change the variable names) of the SAS and R templates given in the lecture notes.

Imbed all graphs and tables in the document. Do NOT put them on separate pages, as the reader will soon give up looking for them.

This homework, and all the following homework, will be drafts of chapters of your semester project. Therefore, with that goal in mind, please structure as follows:
(1) show the given chapter headings, such as
Chapter 1
(b) show the given underlined section headers, such as

  1. Scatterplots,
    (c) do NOT show the questions asked: just answer them! That means, that the answer has to include the question. Example: “Why are you going home?” A good answer: “I am going home to feed the dog.” A weak answer: “To feed the dog.”

Show Title: A Multiple Regression Analysis of

Show your Name: _

Chapter 1

 1. Topic

The subject of this study is to investigate the relationship between sale price of houses and its lost size. The purpose is to inform my future house purchasing decision, to effectively predict the probable cost of a house given its characteristics. The goal is to effectively purchase a house having determined the range within which its true price is likely to fall.

  1. Data Source
    The dataset is obtained from Rdata sets directory available at Github.io.

https://vincentarelbundock.github.io/Rdatasets/datasets.html
https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/Ecdat/Housing.csv

  1. Variables

The data set has 546 observations (n = 546) and 13 variables. The variables in this dataset are:
Price (sale price of a house)

Lotsize (the lot size of a property in square feet)

Bedrooms (number of bedrooms)

Bathrms (number of full bathrooms)

Stories (number of stories excluding basement)

Driveway (does the house has a driveway ?)

Recroom (does the house has a recreational room ?)

Fullbase (does the house has a full finished basement ?)

Gashw (does the house uses gas for hot water heating ?)

Airco (does the house has central air conditioning ?)

Garagepl (number of garage places)

Prefarea (is the house located in the preferred neighbourhood of the city ?)

  1. Data View

sas data view.png

The first 15 observations are displayed above.

Chapter 2 A Simple Regression Model

  
  Predicting house price using its lot size. x=lot size, y=price.
SAS output
  1. Scatterplots

sas scatterplots.png

Scatterplot of price vs. lot size.

  1. Analysis of Scatterplot
  2. The Linear Regression Model
    State your regression model and briefly explain

The regression model is:
(a) the meaning of your YX term in the model;
(b) how the terms on the right-side are related to E(YX);
(c) how the terms on the right-side are related to V(YX).

  1. SAS Output for the Fitted Model

the proc reg SAS model.png
Run Proc Reg in SAS to fit your model. Show the table output, cleaned-up, and the SAS regression plot with the confidence and prediction bands. Otherwise, only show what you are going to use.

  1. Analysis of Output
    (a) The t-tests
    (i) What is being tested?
    (ii) What are the results of the test?
    (iii) Use the Story of Many Possible Sample to explain how the test is done.

(b) The -equation

    (i) State the equation for your fitted model;
 (ii) Explain howis related to E(YX) using the story of many possible samples.

(c) In the regression plot, explain what is being shown by the 95% confidence band and the 95% prediction band. Include a vertical x-cut to provide a focus for your explanation.