## Hawkes Learning Chapter 8 Certification Chapter 8 Review Questions in Statistics

Qn1. direct mail company wishes to estimate the proportion of people on a large mailing list that will purchase a product. Suppose the true proportion is 0.07. If 402 are sampled, what is the probability that the sample proportion will be less than 0.04 ? Round your answer to four decimal places.

Qn2. Suppose a large shipment of laser printers contained 17% defectives. If a sample of size 224 is selected, what is the probability that the sample proportion will be greater than 16%? Round your answer to four decimal places.

Qn3. A carpet expert believes that 7% of Persian carpets are counterfeits. If the expert is accurate, what is the probability that the proportion of counterfeits in a sample of 574 Persian carpets would be less than 6%? Round your answer to four decimal places.

Qn4. The mean life of a television set is 138 months with a variance of 324. If a sample of 83 televisions is randomly selected, what is the probability that the sample mean would differ from the true mean by less than 5.4 months? Round your answer to four decimal places.

Qn5. A courier service company wishes to estimate the proportion of people in various states that will use its services. Suppose the true proportion is 0.07. If 330 are sampled, what is the probability that the sample proportion will differ from the population proportion by greater than 0.03? Round your answer to four decimal places.

Qn6 . If 330 are sampled, what is the probability that the sample proportion will differ from the population proportion by greater than 0.03? Round your answer to four decimal places.

Qn6. Suppose babies born in a large hospital have a mean weight of 3225 grams, and a standard deviation of 535 grams. If 106 babies are sampled at random from the hospital, what is the probability that the mean weight of the sample babies would differ from the population mean by more than 53 grams? Round your answer to four decimal places.

Qn7. Suppose 55% of the population has a college degree. If a random sample of size 496 is selected, what is the probability that the proportion of persons with a college degree will differ from the population proportion by less than 4%? Round your answer to four decimal places.

Qn8. The mean points obtained in an aptitude examination is 183 points with a standard deviation of 13 points. What is the probability that the mean of the sample would be greater than 186.2 points if 73 exams are sampled? Round your answer to four decimal places.

Qn9. Suppose cattle in a large herd have a mean weight of 1158lbs and a standard deviation of 92lbs. What is the probability that the mean weight of the sample of cows would be less than 1149lbs if 55 cows are sampled at random from the herd? Round your answer to four decimal places.

Qn10. Thompson and Thompson is a steel bolts manufacturing company. Their current steel bolts have a mean diameter of 135 millimeters, and a variance of 64. If a random sample of 32 steel bolts is selected, what is the probability that the sample mean would be less than 133 millimeters? Round your answer to four decimal places.

## STA 9700: Homework 6

As with the previous assignment, this is a first draft of a section of your semester project.

Lecture Notes 6 The Matrix Approach to Regression

1. Simple Linear Regression in Matrix Terms

In this exercise, you will use the matrix approach to do a simple linear regression of y on x for just one of your x-variables. Use a modest subset of rows, say, n=8, which is the sample size I used in the Excel example.

(a) Show the X matrix using your x-variable and the necessary column of 1's.
.
(b) Show the y-vector, composed of your observed y-values.

(c) In Excel, use matrix operations to compute the b-vector, Hat matrix, and the y-hat vector. Show your Excel work, using the posted Excel file as a guide.

(d) Using R or Proc Reg, run the regression of y on x and show the output. Check that the b-vector agrees with the values found using the matrix approach in Excel. If the values do not agree, remain calm. Just state, "There was a problem," and go on. You can fix it, later.

(e) On page 7 of STA 9700 Lecture Notes 6, it is shown that for simple regression we can compute the entries of the hat matrix directly from the data. Compute the value of h2,3 by the method shown, and check your result matches the value in the Excel hat matrix in the second row, third column. If the result does not match, don't drive yourself crazy: double-check your work once, and then just say, "There was a problem" and go on.

1. ## Multiple Linear Regression in Matrix Terms

In this exercise, add a second regressor variable to the data set used above.

(a) Show the new X matrix.

(b) Use R or SAS Proc IML to fit the model, following the example given in the Lecture Notes (which is also contained in the posted SAS example file). Show the output of that program.

(c) Using Proc Reg in SAS, check the values in the Proc IML b-vector. If the output does not match, just say so. We will get it corrected, later!

``````(d) Show the Proc IML program you used.
``````

Our Statistics online help experts did this assignment initially for another student, and the following attached solution was provided. If you need help the same or a similar project then do not hesitate to contact us.

Excel data file

## STA 9700: Homework 2

``````                      Read STA 9700 Lecture Notes 2;  Read again, write questions in margins.
(There is some related material in Kutner, pg. 2-27.)
STA 9708 LN 5 (Expectation and variance of random variables)
``````

Questions based on STA 9700 Lecture Notes 2
2.1 Looking at Fig. 2.1 in Lecture Notes 2, we see that there is a general rise in the NetWt of the bags as the Count increases. While the phrase "general rise" is not clearly defined, it is certainly better than the following commonplace description, "Bags with more M&M's are heavier." That statement is far too simplistic!

(a) The data for Fig. 2.1 is shown on pages 15-17 of Lecture Notes 2. Using the data, give several examples of pairs of bags for which the statement "Bags with more M&M's are heavier" is false.

(b) Having shown that not all bags with more M&M's are heavier than all bags with fewer M&M's, consider this next vague description, "The average bag containing 18 M&M's weighs more than the average bag containing 17 M&M's." What is vague about that statement? Hint: which bag is the average bag? What is the definition of the average bag? (That is as hard as defining or locating the average American, which should be easy because we hear about that dude everyday on the news.)

(c) Critique this statement: “Since on page 12 the sample slope is 1.276 when regressing net weight on count for the 192 bags, then the sample average for bags with Count=18 must be higher than for Count=17.” And, find a counterexample in the data set, itself!

(d) What statement are we struggling to make here about the relationship between the sub-populations of Net Weights and their Count?

2.2 Putting together the BigMM SAS program and the following Proc Reg routine, we can create a SAS program that computes the sample slope, the sample intercept, and the root mean square error for each of the 8 groups of bags of M&M's (there are 24 bags per group), outputs those statistic to a SAS file, and prints the file.

``````               proc reg outest=LTatum;
model NetWt=Count;
By Group;
run;
proc print data=LTatum; run;
``````

The Proc Reg option "outest=LTatum" instructs SAS to save the regression statistics (or "estimates") into a SAS file named "Ltatum." The output is shown below due to difficulties with SAS, but I would be delighted if you are able to produce it yourself! The sample slopes are in the Count column.
Net

``````Obs    Group    _MODEL_    _TYPE_    _DEPVAR_     _RMSE_    Intercept     Count      Wt

1        2     MODEL1     PARMS      NetWt      1.52202     25.2154     1.28176     -1
2        3     MODEL1     PARMS      NetWt      0.94023     27.2769     1.16531     -1
3        5     MODEL1     PARMS      NetWt      0.96081     17.6571     1.65238     -1
4        6     MODEL1     PARMS      NetWt      1.01435     19.1121     1.59741     -1
5        7     MODEL1     PARMS      NetWt      1.53226     26.1459     1.22875     -1
6        8     MODEL1     PARMS      NetWt      1.09972     28.7744     1.11778     -1
7        9     MODEL1     PARMS      NetWt      0.99709     22.1760     1.42708     -1
8       10     MODEL1     PARMS      NetWt      1.10568     26.5912     1.18456     -1
``````

(a) You now have 8 different sample slopes, or 8 different values for . These can be viewed as 8 values drawn from what population? (Hint: You need The Story of Many Possible Samples.)
(b) Imagine that for our production run of 10,000 bags of Peanut M&M's that we regressed the 10,000 net weights on their respective 10,000 counts. What would we call the resulting intercept and slope? Show the answer in words and Greek letters.
(c) Using The Story of Many Possible Samples, explain what it would mean to say that is an unbiased estimator.

2.3 Refer to the SAS output on page 12, for the regression using all 192 bags.
(a) Compute the value for count=18.
(b) What is estimated by b1?
(c) How is the value related to ?

## Expected Value and Variance Review Questions

2.4 For a roll of a fair die with 4 sides, numbered 1 to 4, find the expected value and the variance.
2.5 Find the probability distribtuion for the average of two rolls of a fair die with four sides. Then, compute expected value and variance of the average from the distribution.
2.6 How were the answers to question 2.5 related to those of question 2.4?

2.7 Generic Calculus Questions; warming up to least squares: Find the derivative with respect to x of the following functions:

``````(a) y = x2
(b) y = (4x + 3)2
(c) y = (-3x2 + x)
``````

2.8 The R function

``            lm(y~x) ``

will regress y on x, and the function

``          summary(lm(y~x)) ``

produces output similar to the SAS regression output. For BigMM, see if you can get output with similar values as those given by SAS on page 16. Locate the estimate of the variance of epsilon.

## Doing Independent-Samples T-tests Quiz

Doing Independent-Samples T-tests

1. Download the file timeonsite.csv from the course materials. This file describes a website A/B test in which visitors’ time-on-site was measured in two website variations. How many subjects were in this website A/B test?
2. How many subjects were exposed to each variation of the website A and B?
3. To the nearest tenth (one digit), what was the median time on sire for site “B”?
4. To the nearest hundredth (two digits), what was the standard deviation of time on site for site “A”?
5. Conduct an independent-samples t-test on Time by Site. Assume equal variances. To the nearest hundredth (two digits), What is the t-statistic?
6. How many degrees of freedom resulted from the t-test?
7. To the nearest ten-thousandth (four digits), what was the p-value for this test?
8. What is the most proper way to report this t-test result?
9. The results show that, on average, website “A” resulted in almost 13 seconds greater time-on-sire than website “B”, and that this difference was statistically significant. (Note: Assume Time is measured in seconds).
10. By Setting the var. equal parameter to FALSE, a Welch t-test can be run, which does not assume equal variances.