Home Coursework Help WhatsApp Us About us

Posts under category statistics homework help

Constructing a Confidence Interval for the Difference between Two Population Proportions
In order to determine if a new instructional technology improves students' scores, a professor wants to know if a larger percentage of students using the instructional technology passed the class than the percentage of students who did not use the new technology. Records show that 45 out of 50 randomly selected students who were in classes that used the instructional technology passed the class and 38 out of 51 randomly selected students who were in classes that did not use the instructional technology passed the class. Construct a 95%

confidence interval for the true difference between the proportion of students using the technology who passed and the proportion of students not using the technology who passed.

Solution

We are going to show how to construct the confidence interval first without a TI-83/84 Plus calculator and then with one.
Step 1: Find the point estimate.

First, we'll let Population 1 be those students who used the new technology and Population 2 be those students who did not. Next, we need to calculate the sample proportions. The sample proportion for Sample 1 (using instructional technology) is calculated as follows.

pˆ1=x1n1=4550=0.9

The sample proportion for Sample 2 (without the instructional technology) is found as follows.

pˆ2=x2n2=3851≈0.745098

Now that we have the sample proportions, we can calculate the point estimate.

pˆ1−pˆ2=0.9−0.745098=0.154902

Step 2: Find the margin of error.

Notice that the samples are indeed independent of one another. Because they are two separate groups of students, they are not connected in any way. We can assume that the other necessary conditions are met to allow us to use the standard normal distribution to calculate the margin of error. The level of confidence is c=0.95
, so the critical value is zα2/=z0.052/=z0.025=1.96

. Substituting the values into the formula gives us the following.

E=zα2/pˆ1(1−pˆ1)n1+pˆ2(1−pˆ2)n2⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯√=1.960.9(1−0.9)50+0.745098(1−0.745098)51⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯√≈0.145675

Step 3: Subtract the margin of error from and add the margin of error to the point estimate.

Subtracting the margin of error from the point estimate and then adding the margin of error to the point estimate gives us the following endpoints of the confidence interval.

Lower endpoint: (pˆ1−pˆ2)−E=0.154902−0.145675≈0.009Upper endpoint: (pˆ1−pˆ2)+E=0.154902+0.145675≈0.301

Thus, the 95%
confidence interval for the difference between the two population proportions ranges from 0.009 to 0.301

. The confidence interval can be written mathematically using either inequality symbols or interval notation, as shown below.

0.009<p1−p2<0.301

or

(0.009,0.301)

Therefore, we are 95%
confident that the percentage of students who passed the class is between 0.9% and 30.1% higher for the population of students who used the new instructional technology (Population 1) than for the population of students who did not use the technology (Population 2). Thus, with 95%

confidence, the professor can conclude that the new instructional technology improves students' scores.

To calculate the confidence interval for the difference between two proportions on the calculator, we don't need to find the individual sample proportions; we just need to enter the number of successes and the sample size for each sample, as well as the level of confidence. Press STAT , scroll to TESTS, and then choose option B:2-PropZInt. x1 is the number of successes from the first sample and n1 is the first sample's size. Similarly, x2 is the number of successes from the second sample and n2 is the second sample's size. As usual, C-Level is the confidence level, which must be entered as a decimal. The data should be entered as shown in the first screenshot below. After you select Calculate and press ENTER , the results will be displayed on the screen as shown in the second screenshot below.
2-PropZInt data entry screen with x_1 equal to 45, n_1 equal to 50, x_2 equal to 38, n_2 equal to 51, and C-Level equal to .95. 2-PropZInt results screen shows ( .00923 , .30057 ), p hat_1 equal to .9 , p hat_2 equal to .7450980392, n_1 equal to 50, and n_2 equal to 51.

Notice that the calculator gives the same interval but with more decimal places. The interpretation of the confidence interval is still the same. The proportion of students passing the class was higher for the population of students who used the new instructional technology than for the population of students who did not use the technology.

In this section we will turn our attention to comparing two population proportions. Once again, there are times when we aren't necessarily focused on the exact proportion, but rather how proportions from two populations compare, that is, if they are equal, or if one is larger than the other.

When we were comparing population means, we constructed a confidence interval for the difference between the two population means. Similarly, when comparing two population proportions, we use a confidence interval for the difference between the population proportions. The best point estimate for the difference is pˆ1−pˆ2

. In this section we will restrict our discussion to comparing two population proportions when the following conditions are met. Notice that the conditions are similar to those discussed for estimating a single population proportion.

All possible samples of a given size have an equal probability of being chosen; that is, simple random samples are used.

The samples are independent.

The conditions for a binomial distribution are met for both samples.

The sample sizes are large enough to ensure that n1pˆ1≥5

, n1(1−pˆ1)≥5, n2pˆ2≥5, and n2(1−pˆ2)≥5

.

When these conditions are met, we can apply the Central Limit Theorem to the sampling distribution of the differences between the sample proportions for two independent samples. This means that we will use the standard normal distribution to calculate the margin of error of a confidence interval for the difference between two population proportions. You can assume that the necessary criteria are met for all examples and exercises in this lesson.
Memory Booster

Population Proportion

p=xN=# of successespopulation size

Sample Proportion

pˆ=xn=# of successessample size

Properties of a Binomial Distribution

    The experiment consists of a fixed number, n, of identical trials.

    Each trial is independent of the others.

    For each trial, there are only two possible outcomes. For counting purposes, one outcome is labeled a success, and the other a failure.

    For every trial, the probability of getting a success is called p. The probability of getting a failure is then 1−p

.

The binomial random variable, X, counts the number of successes in n trials.

If there are n pairs of data values and the population distribution of the paired differences is approximately normal, then the sampling distribution for the sample statistic d⎯⎯ follows a t-distribution with n, n−1 degrees of freedom. Hence, the formula for the margin of error is as follows. This is the same formula that is used when estimating a single population mean when σ is unknown. This is because we use the paired differences as a single set of sample data rather than using the data from the two samples separately when working with paired data.
Margin of Error of a Confidence Interval for the Mean of the Paired Differences for Two Populations ( σ Unknown, Dependent Samples)

When both population standard deviations are unknown, the samples taken are dependent, simple random samples of paired data, and either the number of pairs of data values in the sample data is greater than or equal to 30

or the population distribution of the paired differences is approximately normal, the margin of error of a confidence interval for the mean of the paired differences for two populations is given by

E=(tα2/)(sdn⎯⎯√)

where tα2/
is the critical value for the level of confidence, c=1−α such that the area under the t-distribution with n−1 degrees of freedom to the right of tα2/ is equal to α2

.

sd

is the sample standard deviation of the paired differences for the sample data, an

n is the number of paired differences in the sample data.

To use paired data to construct a confidence interval, the following conditions must be met.

All possible samples of a given size have an equal probability of being chosen; that is, simple random samples are used.

The samples are dependent.

Both population standard deviations, σ1

and σ2

are unknown.

Either the number of pairs of data values in the sample data is greater than or equal to 30
(n≥30)

or the population distribution of the paired differences is approximately normal.

In this lesson, you may assume that these conditions are met for all examples and exercises involving paired data.

The value that we want to estimate is the mean of the paired differences for the two populations of dependent data, μd
. Recall that the first step in constructing a confidence interval is to find the point estimate, and the best point estimate for a population mean is a sample mean. Therefore, the mean of the paired differences for the sample data, d⎯⎯

is the point estimate used here.

Formula: Mean of Paired Differences

When two dependent samples consist of paired data, the mean of the paired differences for the sample data is given by

d⎯⎯=∑din

where di

is the paired difference for the ith pair of data values and

n is the number of paired differences in the sample data.

Faculty of Science, Technology, Engineering and Mathematics M248 Analysing data

Please read the Student guidance for preparing and submitting TMAs on the M248 website before beginning work on a TMA. You can submit a TMA either by post or electronically using the University’s online TMA/EMA
service.

You are advised to look at the general advice on answering TMAs provided on the M248 website. Each TMA is marked out of 50. The marks allocated to each part of each question are indicated in brackets in the margin. Your overall score for each TMA will be the sum of your marks for these questions.

Note that the Minitab files that you require for TMA 05 are not part of the M248 data files and must be downloaded from the ‘Assessment’ area of the M248 website.

Question 1, which covers topics in Unit 9, and Question 2, which covers topics in Unit 10, form M248 TMA 05. Question 1 is marked out of 32; Question 2 is marked out of 18.

Minitab Question one
You should be able to answer this question after working through Unit 9.
(a) A study was undertaken to examine the tensile strength of a new type of polyester fibre. The Minitab worksheet polyester-fibre.mtw gives the breaking strengths (in grams/denier, denier being a unit of fineness) of a random sample of n = 30 observations, given in the variable Strength.

The existing type of polyester fibre which the new type is designed to replace has a mean breaking strength of 0.26 grams/denier. Interest centres on using the data in polyester-fibre.mtw to test whether the mean breaking strength of the new type of polyester fibre differs from the mean breaking strength of the existing type of polyester fibre.

(i) Write down appropriate null and alternative hypotheses for a test of whether the mean breaking strength of the new type of polyester fibre differs from the mean breaking strength of the existing type of polyester fibre. Define any notation that you use. [3]

(ii) It is proposed to use a z-test to test the hypotheses specified in part (a)(i). Justify this choice of test in terms of the sample size, n. [1]

(iii) Write down the formula for the test statistic used in the z-test of part (a)(ii). Define any further notation that you use. [2]

(iv) Write down the null distribution of the test statistic in part (a)(iii). What is the reason for the use of the word ‘null’ in the phrase ‘null distribution’? [2]

(v) Using Minitab, obtain the standard deviation of the values in Strength, then perform the z-test that you have been considering throughout part (a) of this question. Provide a copy of the **Minitab
output** produced by performing this test. (This output should comprise four lines which start with the words Test, The, Variable and Strength, respectively.) [3]

(vi) Interpret the result of the test that you have just performed, as given by its p-value. [3]
(vii) Would you have rejected H0 or not rejected H0 if you had tested the hypotheses of interest in this question at the 5% significance level? Would you have rejected H0 or not rejected H0 if you had tested these hypotheses at the 1% significance level? Justify each of your answers separately. [4]

(b) The proportion, p0, of foraging bumblebees not exposed to pesticides who bring very little pollen back to their nest is 0.4. A recent study of foraging bumblebees investigated the effect of exposure to a widely used neonicotoid pesticide called imidacloprid on pollen foraging rates. (Neonicotoid pesticides are commonly used in agriculture due to their low toxicity in mammals.) Let p denote the proportion of foraging bumblebees exposed to imidacloprid who bring very little pollen back to their nest.

A sample of 60 bumblebees were exposed to a low (field realistic) dose of imidacloprid: 39 of these bumblebees brought back very little pollen to their nest. Use these data to perform the test of the hypotheses H0 : p = 0:4; H1 : p > 0:4; by working through the following subparts of this part of the question.

(i) Calculate the observed value of the test statistic for this test. [2]
(ii) Using the approximate normal null distribution of this test statistic, identify the rejection region of a test of the stated hypotheses using a 1% significance level. [2]
(iii) Report and interpret the outcome of this hypothesis test. [2]

(c) The isotopic abundance ratio of natural silver (Ag) is the ratio of the stable isotopes Ag107 to Ag109. Its mean is 1.076 and measurements on a random sample of observed isotopic abundance ratios suggested that they are plausibly normally distributed with a sample standard deviation of 0.0026. Interest in this part of the question concerns the planning of a further experiment to detect whether this ratio is different in observations from a certain source of silver nitrate. The new study will use a two-sided test at the 5% significance level, assuming normality. It is desired to make sufficient observations of the isotopic abundance ratio on the silver nitrate so that the power of the test to distinguish a difference between the null hypothesis of a true underlying mean of 1.076 and a value that is 0.0015 larger or 0.0015 smaller is 90%. For the purpose of performing the necessary sample size calculation, it will be assumed that the population standard deviation of the isotopic abundance ratio measurements is equal to the sample standard deviation given above.

(i) Calculate, by hand, the size of the sample required to achieve the desired power of the test. Show our working. [6]

(ii) Ignoring rounding up to an integer, and assuming that no other aspect of the problem changes, by what multiple is the required sample size changed if, instead of seeking to distinguish between the underlying mean and values that are 0.0015 larger or smaller than it, it was decided to seek to distinguish between the underlying mean and values that are 2/3 as much (that is, 0.001) larger or smaller than it? [2]

Minitab statistics question 2
Question 2 { 18 marks
You should be able to answer this question after working through Unit 10.
(a) Halofenate has been shown to be effective in the treatment of conditions associated with abnormally high levels of lipids in the blood; triglyceride is a lipid of particular importance. A group of 22 patients were treated with halofenate medication. The changes between the patients’ triglyceride levels after treatment with halofenate and before treatment with halofenate were measured. These changes are in the Minitab worksheet triglyceride.mtw, in the column Halofenate. (Note that a negative change corresponds to the desirable outcome of a reduction in triglyceride levels.)

The column Placebo contains the changes between triglyceride levels after treatment with an inactive placebo and before treatment with the placebo, for an independently drawn control group of 21 patients. The main question of interest is whether halofenate makes a more favourable change to triglyceride levels, in comparison to a placebo.

Graphical investigation of the data shows that normality cannot be assumed for the distribution of either Halofenate or Placebo. It is therefore decided to compare the effects of halofenate and a placebo on
triglyceride reduction using the Mann{Whitney test.

(i) Write down appropriate null and alternative hypotheses for a test of whether the difference between the location of the changes between triglyceride levels after and before treatment with halofenate and the location of the changes between triglyceride levels after and before treatment with a placebo is negative. Define any notation that you use. [3]

(ii) Use Minitab to carry out the Mann-Whitney test of the hypotheses discussed above. Provide a copy of one line of the Minitab output which includes the p-value associated with the test. [2]

(iii) Interpret the result of the test that you have just performed, as given by its p-value. [2]
(b) In Table 5 of Unit 3, data were given on the month of death (January = 1, February = 2, . . . , December = 12) for 82 descendants of Queen Victoria; they all died of natural causes. The data are repeated here in Table 1.
2023-04-04T10:00:33.png

The question of whether or not these royal deaths could be claimed to be from a discrete uniform distribution on the range 1; 2; : : : ; 12 was considered informally in Example 20 of Unit 3 and, at some length, in Chapter 8 of Computer Book A. From these investigations, it looked as though the discrete uniform distribution may be a plausible model for these data, but no firm conclusion was reached.

In this part of the question, you are going to perform a chi-squared goodness-of-fit test of the discrete uniform distribution to these data.

(i) Obtain the expected frequencies of the values 1; 2; : : : ; 12 assuming a discrete uniform distribution. Why is it not necessary to pool categories before performing a chi-squared goodness-of-fit test in this case? [3]

(ii) Carry out the remainder of the chi-squared goodness-of-fit test: report the individual elements of the chi-squared test statistic, the value of the test statistic itself, the number of degrees of freedom of the chi-squared null distribution, and whatever this tells you about the p-value associated with the test. Interpret the outcome of the test.

m248 TM 05 sample statistics solution.docx

Let us know if you like us to help you with any of your coursework,