## Faculty of Science, Technology, Engineering and Mathematics M248 Analysing data

Please read the Student guidance for preparing and submitting TMAs on the M248 website before beginning work on a TMA. You can submit a TMA either by post or electronically using the University’s online TMA/EMA
service.

You are advised to look at the general advice on answering TMAs provided on the M248 website. Each TMA is marked out of 50. The marks allocated to each part of each question are indicated in brackets in the margin. Your overall score for each TMA will be the sum of your marks for these questions.

Note that the Minitab files that you require for TMA 05 are not part of the M248 data files and must be downloaded from the ‘Assessment’ area of the M248 website.

Question 1, which covers topics in Unit 9, and Question 2, which covers topics in Unit 10, form M248 TMA 05. Question 1 is marked out of 32; Question 2 is marked out of 18.

Minitab Question one
You should be able to answer this question after working through Unit 9.
(a) A study was undertaken to examine the tensile strength of a new type of polyester fibre. The Minitab worksheet polyester-fibre.mtw gives the breaking strengths (in grams/denier, denier being a unit of fineness) of a random sample of n = 30 observations, given in the variable Strength.

The existing type of polyester fibre which the new type is designed to replace has a mean breaking strength of 0.26 grams/denier. Interest centres on using the data in polyester-fibre.mtw to test whether the mean breaking strength of the new type of polyester fibre differs from the mean breaking strength of the existing type of polyester fibre.

(i) Write down appropriate null and alternative hypotheses for a test of whether the mean breaking strength of the new type of polyester fibre differs from the mean breaking strength of the existing type of polyester fibre. Define any notation that you use. [3]

(ii) It is proposed to use a z-test to test the hypotheses specified in part (a)(i). Justify this choice of test in terms of the sample size, n. [1]

(iii) Write down the formula for the test statistic used in the z-test of part (a)(ii). Define any further notation that you use. [2]

(iv) Write down the null distribution of the test statistic in part (a)(iii). What is the reason for the use of the word ‘null’ in the phrase ‘null distribution’? [2]

(v) Using Minitab, obtain the standard deviation of the values in Strength, then perform the z-test that you have been considering throughout part (a) of this question. Provide a copy of the **Minitab
output** produced by performing this test. (This output should comprise four lines which start with the words Test, The, Variable and Strength, respectively.) [3]

(vi) Interpret the result of the test that you have just performed, as given by its p-value. [3]
(vii) Would you have rejected H0 or not rejected H0 if you had tested the hypotheses of interest in this question at the 5% significance level? Would you have rejected H0 or not rejected H0 if you had tested these hypotheses at the 1% significance level? Justify each of your answers separately. [4]

(b) The proportion, p0, of foraging bumblebees not exposed to pesticides who bring very little pollen back to their nest is 0.4. A recent study of foraging bumblebees investigated the effect of exposure to a widely used neonicotoid pesticide called imidacloprid on pollen foraging rates. (Neonicotoid pesticides are commonly used in agriculture due to their low toxicity in mammals.) Let p denote the proportion of foraging bumblebees exposed to imidacloprid who bring very little pollen back to their nest.

A sample of 60 bumblebees were exposed to a low (field realistic) dose of imidacloprid: 39 of these bumblebees brought back very little pollen to their nest. Use these data to perform the test of the hypotheses H0 : p = 0:4; H1 : p > 0:4; by working through the following subparts of this part of the question.

(i) Calculate the observed value of the test statistic for this test. [2]
(ii) Using the approximate normal null distribution of this test statistic, identify the rejection region of a test of the stated hypotheses using a 1% significance level. [2]
(iii) Report and interpret the outcome of this hypothesis test. [2]

(c) The isotopic abundance ratio of natural silver (Ag) is the ratio of the stable isotopes Ag107 to Ag109. Its mean is 1.076 and measurements on a random sample of observed isotopic abundance ratios suggested that they are plausibly normally distributed with a sample standard deviation of 0.0026. Interest in this part of the question concerns the planning of a further experiment to detect whether this ratio is different in observations from a certain source of silver nitrate. The new study will use a two-sided test at the 5% significance level, assuming normality. It is desired to make sufficient observations of the isotopic abundance ratio on the silver nitrate so that the power of the test to distinguish a difference between the null hypothesis of a true underlying mean of 1.076 and a value that is 0.0015 larger or 0.0015 smaller is 90%. For the purpose of performing the necessary sample size calculation, it will be assumed that the population standard deviation of the isotopic abundance ratio measurements is equal to the sample standard deviation given above.

(i) Calculate, by hand, the size of the sample required to achieve the desired power of the test. Show our working. [6]

(ii) Ignoring rounding up to an integer, and assuming that no other aspect of the problem changes, by what multiple is the required sample size changed if, instead of seeking to distinguish between the underlying mean and values that are 0.0015 larger or smaller than it, it was decided to seek to distinguish between the underlying mean and values that are 2/3 as much (that is, 0.001) larger or smaller than it? [2]

Minitab statistics question 2
Question 2 { 18 marks
You should be able to answer this question after working through Unit 10.
(a) Halofenate has been shown to be effective in the treatment of conditions associated with abnormally high levels of lipids in the blood; triglyceride is a lipid of particular importance. A group of 22 patients were treated with halofenate medication. The changes between the patients’ triglyceride levels after treatment with halofenate and before treatment with halofenate were measured. These changes are in the Minitab worksheet triglyceride.mtw, in the column Halofenate. (Note that a negative change corresponds to the desirable outcome of a reduction in triglyceride levels.)

The column Placebo contains the changes between triglyceride levels after treatment with an inactive placebo and before treatment with the placebo, for an independently drawn control group of 21 patients. The main question of interest is whether halofenate makes a more favourable change to triglyceride levels, in comparison to a placebo.

Graphical investigation of the data shows that normality cannot be assumed for the distribution of either Halofenate or Placebo. It is therefore decided to compare the effects of halofenate and a placebo on
triglyceride reduction using the Mann{Whitney test.

(i) Write down appropriate null and alternative hypotheses for a test of whether the difference between the location of the changes between triglyceride levels after and before treatment with halofenate and the location of the changes between triglyceride levels after and before treatment with a placebo is negative. Define any notation that you use. [3]

(ii) Use Minitab to carry out the Mann-Whitney test of the hypotheses discussed above. Provide a copy of one line of the Minitab output which includes the p-value associated with the test. [2]

(iii) Interpret the result of the test that you have just performed, as given by its p-value. [2]
(b) In Table 5 of Unit 3, data were given on the month of death (January = 1, February = 2, . . . , December = 12) for 82 descendants of Queen Victoria; they all died of natural causes. The data are repeated here in Table 1.

The question of whether or not these royal deaths could be claimed to be from a discrete uniform distribution on the range 1; 2; : : : ; 12 was considered informally in Example 20 of Unit 3 and, at some length, in Chapter 8 of Computer Book A. From these investigations, it looked as though the discrete uniform distribution may be a plausible model for these data, but no firm conclusion was reached.

In this part of the question, you are going to perform a chi-squared goodness-of-fit test of the discrete uniform distribution to these data.

(i) Obtain the expected frequencies of the values 1; 2; : : : ; 12 assuming a discrete uniform distribution. Why is it not necessary to pool categories before performing a chi-squared goodness-of-fit test in this case? [3]

(ii) Carry out the remainder of the chi-squared goodness-of-fit test: report the individual elements of the chi-squared test statistic, the value of the test statistic itself, the number of degrees of freedom of the chi-squared null distribution, and whatever this tells you about the p-value associated with the test. Interpret the outcome of the test.

m248 TM 05 sample statistics solution.docx

## Faculty of Science, Technology, Engineering and MathematicsM248 Analysing data

M248 - TMA 04

*Please read the Student guidance for preparing and submitting TMAs on the M248 website before beginning work on a TMA. You can submit a TMA either by post or electronically using the University’s online TMA/EMA
service.

Question 1, which covers topics in Unit 7, and Question 2, which covers topics in Unit 8, form M248 TMA 04. Question 1 is marked out of 23; Question 2 is marked out of 27.

Question 1 { 23 marks
You should be able to answer this question after working through Unit 7.
(a) Let X and Y be independent random variables both with the same
mean µ 6= 0. Define a new random variable W = aX + bY where a and
b are constants.
(i) Obtain an expression for E(W). [2]
(ii) What constraint is there on the values of a and b so that W is an
unbiased estimator of µ? Hence write all unbiased versions of W as
a formula involving a, X and Y only (and not b). [3]
(b) An otherwise fair six-sided die has been tampered with in an attempt to
cheat at a dice game. The effect is that the 1 and 6 faces have different
probability of occurring than the 2, 3, 4 and 5 faces.
Let θ be the probability of obtaining a 1 on this biased die. Then, the
outcomes of rolling the biased die have the following probability mass
function.
Table 1 The p.m.f. of outcomes of rolls of a biased die:

(i) By consideration of the p.m.f. in Table 1, explain why it is necessary for θ to be such that 0 < θ < 1=2. [2]
(ii) The value of θ is unknown. Data from which to estimate the value of θ were obtained by rolling the biased die 1000 times. The result of this experiment is shown in Table 2.

Table 2 Outcomes of 1000 independent rolls of a biased die

Show that the likelihood of θ based on these data is
L(θ) = C θ395 (1 − 2θ)605 where C is a positive constant, not dependent on θ. [5]
(iii) Show that L0(θ) = C θ394(1 − 2θ)604 (395 − 2000 θ): [4]

(iv) What is the value of the maximum likelihood estimate, θb, of θ based on these data? Justify your answer. What does the value of θb suggest about the value of θ for this biased die compared with the
value of θ associated with a fair, unbiased, die? [4]

c) Studies of the size and range of wild animal populations often involve tagging observed individual animals and recording how many times each is caught in a trap (from which it is then released back into the wild). The dataset presented in Table 3 consists of the numbers of times each of n = 334 wood mice were caught in a particular trap (over a two-year time period). The data are also provided in the Minitab file wood-mice.mtw.

Table 3 Numbers of trappings of wood mice

The geometric distribution with parameter p is a good model for these data.
(i) What is the maximum likelihood estimator of p for a geometric model? [1]
(ii) What is the maximum likelihood estimate of p for the data in Table 3? You are recommended to use Minitab to help you to answer this part of the question. [2]

Question 2 in Statistics
You should be able to answer this question after working through Unit 8.
(a) In this part of the question, you should calculate the required confidence interval by hand, using tables, and show your working. (You may use Minitab to check your answers, if you wish.)

Modern aircraft cockpit windscreens are complex items, comprising several layers of material and a heating system. Such windscreens are replaced upon damage to any of their components. A dataset was collected on the times to replacement of n = 84 windscreens of a particular modern airliner. The sample mean windscreen replacement time was 23 515 hours of flight. The sample standard deviation of windscreen replacement times was 5168 hours of flight.

(i) Obtain an approximate 90% confidence interval for the mean replacement time of this type of aircraft windscreen. What
property of the dataset justifies using this type of confidence interval, and why? [6]
(ii) Interpret the particular confidence interval that you found in part (a)(i) in terms of repeated experiments. [3]

(b) In this part of the question, you should calculate the required confidence interval by hand, using tables, and show your working. (You may use Minitab to check your answers, if you wish.)
In a large study of patients who were being treated for hypertension (high blood pressure), 148 out of 5493 patients receiving the conventional treatment for hypertension later suffered a stroke. Also,
192 out of 5492 patients receiving an alternative drug to treat their hypertension later suffered a stroke

(i) Obtain an approximate 95% confidence interval for the difference in proportions between the number of conventionally treated hypertension patients who later suffered a stroke and the number of hypertension patients treated with the alternative drug who later suffered a stroke. (You are advised to work with proportions rounded to four decimal places throughout; also, you may assume that the numbers involved are large enough that your approximation is a good one.) [5]

(ii) Some clinicians had suggested that the proportions of hypertension patients who suffered a stroke would not depend on which treatment they were being given. Are the data consistent with that

(c) In various places in this module, data on the silver content of coins minted in the reign of the twelfth-century Byzantine king Manuel I Comnenus have been considered. The full dataset is in the Minitab file coins.mtw. The dataset includes, amongst others, the values of the silver content of nine coins from the first coinage (variable Coin1) and seven from the fourth coinage (variable Coin4) which was produced a number of years later. (For the purposes of this question, you can ignore the variables Coin2 and Coin3.) In particular, in Activity 8 and Exercise 2 of Computer Book B, it was argued that the silver contents in both the first and the fourth coinages can be assumed to be normally distributed. The question of interest is whether there were differences in the silver content of coins minted early and late in Manuel’s reign. You are about to investigate this question using a two-sample t-interval.

(i) Using Minitab, find either the sample standard deviations of the two variables Coin1 and Coin4, or their sample variances. Hence, check for equality of variances using the rule of thumb given in
Subsection 4.4 of Unit 8. [3]

(ii) Whatever the outcome of part (c)(i), use Minitab to obtain a 90% two-sample t-interval for the difference E(X1) − E(X4) where X1 denotes the mean silver content in coins of the first coinage and X4
denotes the mean silver content in coins of the fourth coinage.

State that interval and comment briefly on what it tells us about the silver content of coins in the earlier and later coinages. [3]

(iii) Name the distribution used in constructing the confidence interval in part (c)(ii), state the value of its parameter and show why the parameter takes the value that it does. [2]
(iv) What would have been the outcome if you had obtained a 90% two-sample t-interval for E(X4) − E(X1) instead of for

E(X1) − E(X4)? Justify your conclusion in terms of the derivative of the parameter transformation involved. [3]

If you need someone to help you with this statistics assignment, then MyMathLab homework help is the right platform to address all your statistics needs.

## The Open University Statistics - M248 TMA 03

Please read the Student guidance for preparing and submitting TMAs on the M248 website before beginning work on a TMA. You can submit a TMA either by post or electronically using the University’s online TMA/EMA
service.

You are advised to look at the general advice on answering TMAs provided on the M248 website.
Each TMA is marked out of 50. The marks allocated to each part of each question are indicated in brackets in the margin. Your overall score for each TMA will be the sum of your marks for these questions.

Note that the Minitab files that you require for TMA 03 are not part of the M248 data files and must be downloaded from the ‘Assessment’ area of the M248 website.

Question 1, which covers topics in Unit 5, and Question 2, which covers topics in Unit 6, form M248 TMA 03. Question 1 is marked out of 28; Question 2 is marked out of 22.

You should be able to answer this question after working through Unit 5.
(a) In this part of the question, you should calculate the required probabilities without using Minitab, and show your working. (You may use Minitab to check your answers, if you wish.)

In England, the most serious emergency calls requesting an ambulance are classified as ‘Red 1’. According to data from NHS England, in March 2017, the London Ambulance Service (LAS) received a total of 1597 Red 1 emergency calls. Based on this number and adjusting for variations during the day, suppose that Red 1 calls arriving at LAS in daylight hours may be modelled as a Poisson process with rate 3 per hour.

(i) (1) Write down the distribution of the number of Red 1 calls arriving at the LAS in a 30-minute period during daylight hours, including the values of any parameters. [2]

(2) Calculate and report the probability that three Red 1 calls arrive at the LAS in 30 minutes during daylight hours. [2]

(ii) (1) Write down the distribution of the waiting time (in hours) between the arrival of two successive Red 1 calls at the LAS during daylight hours, including the values of any parameters. [2]

(2) Calculate and report the probability that the gap between the arrival of two successive Red 1 calls at the LAS during daylight hours will exceed 20 minutes. [4]

(b) This part of the question concerns data on the lengths of the 51 time intervals (in days) between successive earthquakes in California starting from a major earthquake on 9 January 1857, up to an earthquake on 24 August 2014. (To qualify for inclusion in this dataset, earthquakes had to be single mainshocks with magnitude of at least 4.9.) These time intervals are in the variable Interval in the worksheet california-earthquakes.mtw. In this part of the question, you will explore whether or not a Poisson process is a suitable model for these data.

(i) The intervals between successive events in a Poisson process are exponentially distributed. Using Minitab, find the mean and standard deviation of the intervals between earthquakes in California. Are these values consistent with the data being observations from an exponential distribution? Give a reason for your answer. [3]

(ii) Using Minitab, obtain a histogram with the following properties:
• the ticks on the horizontal axis are at the cutpoints
• the bins have width 500 days
• the first bin starts at 0 days and the last bin finishes at 7500 days. Include a copy of your histogram in your answer. Is the shape of the histogram consistent with the data being observations from an

(iii) The data are listed in the order in which they arose. Using Minitab, produce an appropriate graph to investigate whether, for the period of observation, the data are consistent with the rate at which earthquakes occur in California remaining constant. Include a copy of your graph in your answer. On the basis of your graph, explain whether or not you think that the rate at which earthquakes occur in California remained constant over the course of the period studied. If you think that the rate did not remain constant, then say how you think it changed. [6]

(c) A certain form of ‘triangular’ distribution has c.d.f.
F(x) = 1 − (1 − x)2; 0 < x < 1;
which is plotted in Figure 1 below. (It is called a triangular distribution because its p.d.f. is a line which, together with the axes, forms a triangle.)

(i) Calculate the value of the upper quartile for this distribution. [3]
(ii) On a copy of, or very rough sketch based on, Figure 1, show the values of α and its corresponding quantile qα for the upper quartile that you calculated in part (c)(i). [2]

Question 2
You should be able to answer this question after working through Unit 6.
(a) In this part of the question, you should calculate the required probabilities using tables, and not Minitab, and show your working.

A model for normal human body temperature, X, when measured orally in ◦F, is that it is normally distributed, X ∼ N(98:2; 0:5184).
(i) According to the model, what proportion of people have a normal body temperature of 99 ◦F or more? [3]
(ii) Find the normal body temperature such that, according to the model, only 10% of people have a lower normal body temperature. [2]
(iii) Let W denote normal human body temperature, when measured orally in ◦C. Given that W = 5 9(X − 32) and that
X ∼ N(98:2; 0:5184), what is the distribution of W ? [3]
(iv) According to the model you just derived for W , what proportion of people have a normal body temperature of between 36 ◦C and 36.8 ◦C? [4]

(b) The Minitab file body-temperature.mtw contains values of the normal body temperature, measured orally, of n = 130 people. The model for normal human body temperature used in part (a) of this
question was obtained partly by consideration of these data. The data can be used to check whether or not the assumption of normality of normal human body temperature is appropriate. Suggest a suitable graph to investigate specifically whether or not a normal distribution might be a good model for the normal body
temperature of people, measured orally. Using Minitab, produce this graph. Include a copy of your graph in your answer. On the basis of this graph, do you think that a normal distribution is a plausible model for

(c) Suppose that the mean weight of a particular type of ripe tomato is 155 g and the variance of the weight of this type of ripe tomato is 576 g2. A random sample of n = 36 such ripe tomatoes is obtained.
(i) What is the approximate distribution of the sample mean weight of the random sample of 36 ripe tomatoes? [2]

(ii) Use Minitab to find the probability that the sample mean weight of the sample of 36 ripe tomatoes lies between 150 g and 157.5 g. To show that you used Minitab, write down the results of any intermediate calculations you make in Minitab to the same number of decimal places as given by Minitab. [3]

Our statistics help experts have prepared the following sample solutions for you to compare.

M248 TM 03 open university statistics solutions for question 2.docx

## The Open University Statistics TMA02

Question 1 - 27 marks
You should be able to answer this question after working through Unit 3.
(a) In 1986, the US Space Shuttle Challenger tragically exploded in flight. This accident was caused by the catastrophic failure of rubber ‘O-ring’ seals that linked segments of its rocket boosters together. There were six O-ring seals in Challenger (and all other Space Shuttles at the time). Table 1 shows the numbers of O-ring seal failures that had occurred on each of 23 previous Space Shuttle flights.

Table 1 Number of O-ring seal failures

(i) Let p be the probability that an O-ring seal fails on a flight. What distribution is appropriate to describe the failure or non-failure of a particular O-ring seal on a particular flight? (Ensure that you
define the corresponding random variable appropriately.)

(ii) A reasonable estimate of p is 3=46 ’ 0:065. Explain where this number comes from.

(iii) It is suggested that an appropriate model for the number of O-ringseals that fail on a particular flight might be a binomial distribution B(6; p). What assumptions are made by using thismodel? In your opinion, is a binomial model appropriate? Briefly justify your answer.

iv) Use Minitab to obtain a table containing both the p.m.f. and c.d.f. of the B(6; p) distribution with p = 0:065. (Do not change the number of decimal places of the values obtained from those provided by Minitab.)

(v) Use the information in Table 1 and the solution to part (a)(iv) to complete the following table, giving your values rounded to three decimal places.

Comment briefly on how close the observed proportions of flights on which 0; 1; 2; : : : ; 6 O-ring seals failed are to those predicted by the binomial model. What does this suggest about the appropriateness, or otherwise, of the binomial model?

(b) Records show that 6% of blood samples tested for a certain condition test positive. Assuming that whether or not a blood sample tests positive is independent of whether or not any other blood sample tests
positive, calculate by hand the following probabilities correct to four decimal places. In each case, state clearly the probability model that you use (including the values of any parameters) and show your working.

(i) The probability that, out of 20 samples tested, at least three will test positive. [6]
(ii) The probability that the first blood sample that tests positive tomorrow will be the ninth sample tested. [3]

(c) The number of flaws in a fibre optic cable follows a Poisson distribution with parameter λ = 1:25. Calculate by hand the probability that there are two or fewer flaws in such a fibre optic cable, giving your answer correct to three decimal places. Show your working.

Question Two
You should be able to answer this question after working through Unit 4.
(a) In Question 2(b) of TMA 01, the probability mass function of a discrete random variable X representing the number of bicycles available at a docking station each morning was introduced. This p.m.f. is repeated
here, in Table 2.

(i) What is the mean number of bicycles available at the docking station each morning? [2]
(ii) What is the variance of the number of bicycles available at the docking station each morning?

(b) The Atacama Desert in Chile is known as the driest place on Earth. Suppose that in one part of the Atacama Desert, whether it rains at all in a given year has probability 0.2 and that whether or not it rains in one year is independent of whether or not it rains in any other year.

Answer the following questions, in each case stating clearly the probability model that you use (including the values of any parameters).

(i) Suppose that a random variable X is defined to take the value 1 when there is rainfall in a particular year and 0 when there is not.
What is the mean of the random variable X? [2]
(ii) What is the expected number of years with some rainfall in a period of 100 years? [2]
(iii) What is the expected value of the number of years up to and including the first year in which there is some rainfall? [2

(c) A scout on a camping trip is requested to find some dry sticks of length
at most one metre to use as firewood. A model for the distribution of
the lengths, X, in metres, of sticks that she brings back to the camp has
probability density function
f(x) = 3
2
px; 0 < x < 1:
(i) According to the model, what is the mean length of the sticks that the scout brings back? [3]
(ii) According to the model, what is the standard deviation of the lengths of the sticks that the scout brings back? [5]
(iii) What are the units of the mean and the standard deviation that you have just calculated? [1]

(d) The number of customers buying a cooked breakfast at a high-street cafe on a weekday morning is a random variable X with mean 24 and variance 36. As a loss leader { a product sold at a loss to attract
customers to also partake of its other offerings { the cafe charges \$3:50 per breakfast and has fixed breakfast-specific daily costs (ingredients, labour) of \$105. Let Y be the cafe’s daily loss on breakfasts, in pounds, where Y = 105 − 3:5X: What are the mean and standard deviation of this loss? [3]

Solution for the first question has been attached here for your reference, TMO2 Open university statistics.docx
Let us no if you need further help with your statistics.

## Complete the following paragraph by selecting words

You should be able to answer this question after working through Unit 2.
(a) Complete the following paragraph by selecting words or phrases from the list that follows it to fill in the underlined gaps.

In a long sequence of repetitions of a study or experiment, random samples tend to settle down towards probability distributions in the sense that, for discrete data, bar charts settle down towards probability functions and, for continuous data, histograms settle down towards probability functions. As the sample size increases, the amount of difference between successive graphical displays obtained from the data .

Available words and phrases: continuous cumulative decreases density discrete
frequency increases mass model models relative frequency remains constant unimodal unit-area [3]

(b) Kevin lives in a city which operates a bicycle hire scheme using a large number of bicycle ‘docking stations’ spread around the city. He walks past a small docking station, for up to six bicycles, each morning. Kevin has come up with the following probability mass function (p.m.f.) for the distribution of the random variable X which denotes the number of bicycles available at the docking station each morning.
It is given in
Table 1.
Table 1 The p.m.f. of X

x 0 1 2 3 4 5 6
p(x) 0.3 0.2 0.2 0.1 0.1 0.05 0.05
(i) What is the range of X? [1]
(ii) Explain why the p.m.f. suggested by Kevin is a valid p.m.f. [2]
(iii) What is the probability that, on any particular morning, there is one bicycle at the docking station? [1]
(iv) Write down a table containing values of F(x), the cumulative distribution function (c.d.f.) of X, for x = 0; 1; 2; 3; 4; 5; 6. [2]
(v) Write the probabilities P(X < 3) and P(X ≥ 5) in terms of the c.d.f. F(x). Use the c.d.f. to calculate the values of these two probabilities.

(c) In 1955, C.W. Topp and F.C. Leone introduced a number of
distributions in the context of the statistical modelling of the reliability of electronic components in engineering. One of these distributions has probability density function (p.d.f.) given by f(x) = 4x(1 − x)(2 − x) on the range 0 < x < 1.
(i) Verify, by integration, that Integrate( 4x(1 − x)(2 − x)) dx = x2(2 − x)2 + c; where c is an arbitrary constant

(ii) Explain why the p.d.f. suggested by Topp and Leone is a valid
p.d.f. [4]
(iii) What is the c.d.f. associated with this p.d.f.? [2] (iv) Suppose that X is a random variable following this p.d.f., and that we are interested in evaluating P(1/3 < X < 2/3). Write this probability in terms of the c.d.f., and hence show that P (1/3 < X < 2/3)= 39 81
(which is approximately 0.481)