## Submit this project to the proctor at the time you take Test 3.

When the problem involves hypothesis testing, use the following structure for written reports.

# Hypothesis testing steps

• Step 1: State the hypotheses.
• Step 3: Give the value of the test statistic and the p-value.
• Step 4: Use the p-value to draw a conclusion. State the conclusion in statistical
terms: Reject Ho in favor of Ha, or retain Ho (fail to reject Ho).
• Step 5: State the conclusion in layman terms and in context of the application. Use the
p-value to state the strength of the evidence.
When a significance level is not given, then use the following guidelines and language associated
with p-value. Note that the lower the p-values, the stronger the evidence against Ho and in
favor of Ha. We go from insufficient evidence, to some evidence, to fairly strong evidence, to
strong evidence, to very strong evidence.
• p-value > .10
retain Ho – there is insufficient evidence to reject Ho in favor of Ha
• .05 < p-value ≤ .10
gray area -- decision to reject Ho or retain Ho is up to the investigators – there is some
evidence against Ho and in support of Ha
• .01 < p-value ≤ .05
reject Ho in favor of Ha – there is fairly strong evidence against Ho and in favor of Ha
• .001 < p-value ≤ .01
reject Ho in favor of Ha – there is strong evidence against Ho and in favor of Ha
• p-value ≤ .001
reject Ho in favor of Ha – there is very strong evidence against Ho and in favor of Ha
Use your TI-83/TI-84 calculator for all of these problems. You will not need any tables.
Use the Sample Test 3 Questions—Answer Key (posted in Canvas) as an example of what my
expectations are.

1. A sociologist suspects that, for married couples with young children, the husbands watch more TV
than the wives. Twenty married couples are randomly selected and their weekly viewing times, in
hours, are recorded in the table below. Assume the population of differences between husband’s
and wife’s TV time is mound-shaped and symmetrical.
a) Do the sample results provide sufficient evidence to support the sociologist’s claim? Perform a
hypothesis test to find out.
b) If there is sufficient evidence to support the sociologist’s claim, estimate how much more TV the
husbands watch, on average, with a 95% confidence interval. Interpret.

1. The data below show the sugar content (as a percentage of weight) of several national brands of
children’s and adults’ cereals. Assume the distributions of sugar content in both children’s cereals
and adults’ cereals are mound-shaped and symmetrical.
a) Does the sample data provide sufficient evidence to conclude that the sugar content in
children’s cereals is higher than that in adults’ cereals, on average? Perform a hypothesis test to
find out.
b) If you conclude that children’s cereals have more sugar than adults’ cereals, estimate how much
more with a 95% confidence interval for the difference in mean sugar content. Interpret.
Children’s cereals: 40.3, 55, 45.7, 43.3, 50.3, 45.9, 53.5, 43, 44.2, 44, 47.4, 44, 33.6, 55.1, 48.8,
50.4, 37.8, 60.3, 46.6
Adults’ cereals: 20, 30.2, 2.2, 7.5, 4.4, 22.2, 16.6, 14.5, 21.4, 3.3, 6.6, 7.8, 10.6, 16.2, 14.5, 4.1,
15.8, 4.1, 2.4, 3.5, 8.5, 10, 1, 4.4, 1.3, 8.1, 4.7, 18.4
2. A randomly selected sample of entering college freshmen has participated in a special program to
enhance their academic abilities, and their GPAs at the end of one year have been recorded. A
group of 20 students from the same class who did not participate in the program has been selected
as a control group, and they have been matched with the experimental group by gender, age, highschool class rank, ACT scores, and declared major. The results (GPAs) are presented below. Assume
the population of differences between the project student GPA and the control group student GPA
is mound-shaped and symmetrical.
a) Can the program claim that it was successful? Carry out a hypothesis test to find out.
b) If you conclude that the program was successful, make a judgment regarding the size of the
effect of program participation on student GPAs by constructing a 95% confidence interval.

1. Michelle Sayther is a fashion design artist who designs the display windows in front of a large
clothing store in New York City. Electronic counters at the entrances total the number of people
entering the store each business day. Before Michelle was hired by the store, the mean number of
people entering the store each day was 3218. Management would like to investigate whether this
number has changed since Michelle has started working. A random sample of 42 business days after
Michelle began work gave an average of 𝑋𝑋� = 3392 people entering the store each day. The sample
standard deviation was s = 287 people. Assume the population of daily number of people entering
the store is mound-shaped and symmetrical.
a) Perform a hypothesis test to decide if the average number of people entering the store each day
since Michelle was hired is different from what it was before Michelle was hired.
b) If you find that the average number of people entering the store each day since Michelle was
hired is different from what it was before Michelle was hired, estimate the average number of
people entering the store each day since Michelle was hired with a 95% confidence interval and
interpret. (Has the number of people entering the store each day increased or decreased since
Michelle was hired, and by how much has it increased or decreased?)
2. An experiment was conducted to evaluate the effectiveness of a treatment for tapeworm in the
stomachs of sheep. A random sample of 24 worm-infected lambs of approximately the same age
and health was randomly divided into two groups. Twelve of the lambs were injected with the drug
and the remaining twelve were left untreated. After a 6-month period, the lambs were slaughtered
and the following worm counts were recorded. Assume the distribution of worm counts of drugtreated sheep is mound-shaped and symmetrical. Assume the distribution of worm counts of
untreated sheep is also mound-shaped and symmetrical.
c) Does the sample data provide sufficient evidence to conclude that the treatment is effective in
reducing the occurrence of tapeworm in sheep? Perform a test of significance to find out.
d) If you conclude that the treatment is effective, estimate the average reduction in tapeworm
count with a 95% confidence interval. Interpret.

1. In each of the problems above, #1- #5, an assumption of normality is made about the distribution of
the population(s) from which the sample data is obtained. For each of #1 - #5, provide the page
number in the e-book where the assumption is described by the author. You will be citing page
numbers from Sections 9.2, 10.1, and 10.2.
2. A study of the health behavior of school-aged children asked a sample of 15-year-olds in several
different countries if they had been drunk at least twice. The results are shown in the table, by
gender. (Health and Health Behavior Among Young People. Copenhagen. World Health
Organization, 2000)
a) Perform a hypothesis test to determine if there is a gender effect. That is, is there a difference
in the average percent of 15-year-old males who have been drunk at least twice and the average
percent of 15-year-old females who have been drunk at least twice? Assume the distributions
for both males and females are mound-shaped and symmetrical.
b) If there is sufficient evidence that there is a difference between average percent of 15-year-old
males who have been drunk at least twice and the average percent of 15-year-old females who
have been drunk at least twice, estimate the difference with a 95% confidence interval and

Quantitative project reasoning solutions. The following solutions have been provided to you by MyMathLab statistics experts The solutions were provided under the MyMathLab answers statistics help services.

## MyMathLab Binomial Probability Questions

Qn22.
Write the binomial probability in words. Then, use a continuity correction to convert the binomial probability to a normal distribution probability.
P(x = 81)

## Write the probability in words

.
Which of the following is the normal probability statement that corresponds to the binomial probability statement?
a. P(x < 81.5)
b. P(X > 81.5)
c. P(80.5 < x < 81.5)
d. P(x>80.5)
e. P(x < 80.5)
Qn23.
A student answers all 48 questions on a multiple-choice test by guessing. Each question has four possible answers, only one of which correct. Find the probability that the student gets exactly 15 correct answers. Use the normal distribution to approximate the binomial distribution.

`````` *Answer., 0.0823, 0.8577, 0.0606, 0.7967*

``````

## Planned comparisons between domestic and international bookings

Qn11. Download the file bookflights.csv from the course materials. This file describes a survey in which website visitors books a flight on either Expedia, Orbitz, or Priceline. Whether they booed a domestic or international flight was recorded. The survey response was 1-7 rating for Ease on a Likert-type scale, with “7” being easiest. The research question is which site felt easiest to use overall, and specifically for domestic vs. international bookings. How many subjects took part in this study?

Qn12. Create an interaction plot with Website on the X-axis and International as the traces. How many times, if ay, do the two traces cross? Hint: if you already recoded Ease as an ordinal response, you must use as.numeric when passing it to interaction.plot.

Qn13. Use ordinal logistic regression to examine Ease by Website and International. To the nearest ten-thousandth (four digits), what is the p-value of the website main effect? Hint: Use the MASS library and its polr function with Hess = TRUE to create the ornidal logistic model. Then use the car library and its Anova function with type = 3. Prior to either, set sum-to-zero contrasts for both website and international.

Qn14. Conduct three planned comparisons between domestic and international bookings for each website. Adjust for multiple comparisons using Holm’s sequential Bonferroni procedure. What is the highest p-value from such tests? Hint: use the multcomp and lsmeans libraries and the lsmeans, pairs, and as.glht functions. (The lsm formulation from within glht will not work in this case.) Because we only have three planned pairwise comparisons, use “none” for the multiple comparisons adjustment to avoid correcting for all possible pairwise comparisons. Instead, just find the three planned and as-yet uncorrected p-values and pass them manually to p.adjust with method=”holm”. Since the formulation for simultaneous comparisons is a bit different, we place the code for those aspects of this questions here:

``````Summary(glht(m,lsm(pairwise ~ website * International)), test = adjusted (type = “non”))
``````

Qn15. Which of the following conclusions are supported by the analyses we performed on bookflights.csv?

``````There was a significant main effect of website on Ease
There was a significant main effect of International on Ease
There was significant website*international interaction
Expedia was perceived as significantly easier for booking international flights than domestic
Orbitz was perceived as significantly easier for booking domestic flights than international flights
Priceline was perceived as significantly easier for booking domestic flights than interanion flights.

``````

## Doing Factorial ANOVAs

Qn20. Download the file socialvalue.csv from the course materials. This file describes a study of people viewing a positive or negative film clip before going onto social media and then judging the value of the first 100 posts they see there. The number of valued posts was recorded. Examine the data and indicate what kind of experiment design this was.

``````- A 2x2 between-subjects design with factors for clip (positive, negative) and social (Facebook, Twitter).
-A 2x2 within-subjects design with factors for clip(positive, negative) and social (facebook, Twitter).
-A 2x2 mixed factorial design with a between-subjects factor for clip (positive, negative) and a within-subjects factor for social (Facebook, Twitter).
- None of the above
``````

Qn21. How many subjects took part in this experiment?
Qn22. To the nearest hundredth (two digits), on average how many posts out of 100 were valued for the most combination of clip and social?

Qn23. Create an interaction plot with social on the X-axis and clip as the traces. Do the lines cross?

``````Yes
No``````

Qn24. Create an interaction plot with clip on the X-axis and social as the traces. Do the lines cross?

``````Yes
No
``````

Qn25. Conduct a factorial ANOVA to test for any order effects that the presentation order of the clip factor and/or the social factor may have had. To the nearest ten-thousandth (four digits), what is the p-value for the ClipOrder main effect? Hint: Use the ez library and its ezANOVA function. Pass both ClipOrder and Socialorder as the within parameter using a vector created with the “c” function.

Qn26. Conduct a factorial ANOVA on valued by clip and social. To the nearest hundredth (two digits), what is the largest F statistic produced by such a test? Hint: use the ez library and its function. Pass both clip and social as the within parameter using a vector created with the “c” function.

Qn27. Conduct two planned pairwise comparison using paired-samples t-tests. The first question is whether on Facebook, the number of valued posts was different after people saw a positive fil clip versus a negative film clip. The second question is whether on Twitter, the number of valued posts was different after people saw a positive film clip versus a negative film clip. Assuming equal variances and using Holm’s sequential Bonferroni procedure to correct for multiple comparisons, what to within a ten-thousandth (four digits) is the lowest p-value from these tests? Hint: use the reshape2 library and its dcast function to make a wide-format table with columns for subject and the combination of social* clip, and then do a paired-samples t-test between columns with the same social level.

Qn28. Which of the following conclusions are supported by the planned pairwise comparisons just conducted? (Mark all that apply)

``````On Facebook, people valued significantly more posts after seeing a positive film clip than a negative film clip
On Facebook, people valued significantly more posts after seeing a negative film clip than a positive film clip.
On Twitter, people valued significantly more posts after seeing a positive film clip than a negative film clip,
On Twitter, people valued significantly more posts after seeing a negative film clip than a positive film clip.
``````

Qn29. Continue using the file socialvalue.csv from the course materials. Conduct a nonparametric Aligned Rank Transform procedure on Valued by Clip and Social. To the nearest hundredth (two digits). What is the largest F statistic produced by this procedure?

``````Hint: use the ARTOOL library and its art function with the formula.
Valued ~ Clip * Social + (1|Subject)``````

The above formular expression indicates that subject is to be treated as a random effect.

Qn30. Pairwise comparisons among levels of clip and among levels of social could be conducted using the following code, but these are unnecessary after our main effects tests because each of these factors only has two levels.

``````*library(lsmeans)
lsmeans(artlm(m,”clip”), pairwise ~ Clip)
lsmenas(artlm(m, “social”), pairwise ~ social)*``````

True
False

Qn31. Conduct interaction contrasts (i.e difference-of-differences) to discover whether the difference in the number of valued posts after viewing a negative clip vs. a positive clip on Facebook was itself different that that same difference on Twitter. To the nearest hundredth (two digits), what is the chi-square statistic from such a test? Hint: use the phia library and its testInteractions function with the artlm function.

Qn32. The difference in the number of valued posts after people saw negative film clip vs positive film clips in the Facebook condition is significantly different from that difference in the Twitter condition. An interaction plot makes it clear that the difference in valued posts was much greater in the Facebook condition than in the Twitter condition, with positive film clips resulting in more valued posts.

## Understanding Experiment Designs

1. What might account for random error in an experimental measure?

• natural variation among and within subjects
• A systematic flaw in the logging software
• A pattern of dropped data for every fifth subject
• Biased observations
2. Which of the following would be an ordinal response? (Mark all tha apply)

• Responses on a Likert-type scale
• Height in centimeters of each subject
• Favorite color of each subject
• How spicy each subject prefers their Thai food using 1-5 stars

• The number of heads resulting from one-hundred coin flips
3. In an experiment, factors are the independent variables manipulated by the experimenter, and level are the specific values a factor can take on.

• True
• False
4. A between-subjects factor is most precisely defined by which of the following characteristic?

• Each subject experiences more than one level of the factor
• Each subject experiences only one level of the factor,
• Each subject experiences all levels of the factor.
• Each subject experiences all but one level of the factor.
• None of the above
5. A within-subjects factor is most precisely defined by which of the following characteristic?

• Each subject experiences more than one level of the factor.
• Each subject experiences only one level of the factor,
• Each subject experiences all levels of the factor.
• Each subject experiences all but one level of the factor.
• None of the above
6. If a given factor has four levels and subjects experience two of the four levels, that factor is most precisely described as:

• A within-subjects factor
• A between-subjects factor
• A partial within-subjects factor
• A partial between-subjects factor
• None of the above
7. Balanced experimental designs are where every subject experiences every level of every factor

• True
• False
8. The most common use of an independent-samples t-test is to examine which of the following?

• One set of subjects that all does the same thing
• One set of subjects that does two different things
• Two sets of subjects that do the exact same thing
• Two sets of subjects that do different things
• None of the above
9. Which of the following is the most proper way to report a t-test result?

• t(14) = 2.76. p = .015
• t(14) = 2.76, p < .05
10. A t-test is a test suited to one factor with two levels

• True
• False