Posts tagged with r statistics

STA 9700: Homework 2

Reading Assignment

                      Read STA 9700 Lecture Notes 2;  Read again, write questions in margins.
                        (There is some related material in Kutner, pg. 2-27.)
                 STA 9708 LN 5 (Expectation and variance of random variables)
        

Questions based on STA 9700 Lecture Notes 2
2.1 Looking at Fig. 2.1 in Lecture Notes 2, we see that there is a general rise in the NetWt of the bags as the Count increases. While the phrase "general rise" is not clearly defined, it is certainly better than the following commonplace description, "Bags with more M&M's are heavier." That statement is far too simplistic!

(a) The data for Fig. 2.1 is shown on pages 15-17 of Lecture Notes 2. Using the data, give several examples of pairs of bags for which the statement "Bags with more M&M's are heavier" is false.

(b) Having shown that not all bags with more M&M's are heavier than all bags with fewer M&M's, consider this next vague description, "The average bag containing 18 M&M's weighs more than the average bag containing 17 M&M's." What is vague about that statement? Hint: which bag is the average bag? What is the definition of the average bag? (That is as hard as defining or locating the average American, which should be easy because we hear about that dude everyday on the news.)

(c) Critique this statement: “Since on page 12 the sample slope is 1.276 when regressing net weight on count for the 192 bags, then the sample average for bags with Count=18 must be higher than for Count=17.” And, find a counterexample in the data set, itself!

(d) What statement are we struggling to make here about the relationship between the sub-populations of Net Weights and their Count?

2.2 Putting together the BigMM SAS program and the following Proc Reg routine, we can create a SAS program that computes the sample slope, the sample intercept, and the root mean square error for each of the 8 groups of bags of M&M's (there are 24 bags per group), outputs those statistic to a SAS file, and prints the file.

               proc reg outest=LTatum;                 
               model NetWt=Count;
               By Group;
               run; 
               proc print data=LTatum; run;

The Proc Reg option "outest=LTatum" instructs SAS to save the regression statistics (or "estimates") into a SAS file named "Ltatum." The output is shown below due to difficulties with SAS, but I would be delighted if you are able to produce it yourself! The sample slopes are in the Count column.
Net

Obs    Group    _MODEL_    _TYPE_    _DEPVAR_     _RMSE_    Intercept     Count      Wt

 1        2     MODEL1     PARMS      NetWt      1.52202     25.2154     1.28176     -1
 2        3     MODEL1     PARMS      NetWt      0.94023     27.2769     1.16531     -1
 3        5     MODEL1     PARMS      NetWt      0.96081     17.6571     1.65238     -1
 4        6     MODEL1     PARMS      NetWt      1.01435     19.1121     1.59741     -1
 5        7     MODEL1     PARMS      NetWt      1.53226     26.1459     1.22875     -1
 6        8     MODEL1     PARMS      NetWt      1.09972     28.7744     1.11778     -1
 7        9     MODEL1     PARMS      NetWt      0.99709     22.1760     1.42708     -1
 8       10     MODEL1     PARMS      NetWt      1.10568     26.5912     1.18456     -1

(a) You now have 8 different sample slopes, or 8 different values for . These can be viewed as 8 values drawn from what population? (Hint: You need The Story of Many Possible Samples.)
(b) Imagine that for our production run of 10,000 bags of Peanut M&M's that we regressed the 10,000 net weights on their respective 10,000 counts. What would we call the resulting intercept and slope? Show the answer in words and Greek letters.
(c) Using The Story of Many Possible Samples, explain what it would mean to say that is an unbiased estimator.

2.3 Refer to the SAS output on page 12, for the regression using all 192 bags.
(a) Compute the value for count=18.
(b) What is estimated by b1?
(c) How is the value related to ?

Expected Value and Variance Review Questions

2.4 For a roll of a fair die with 4 sides, numbered 1 to 4, find the expected value and the variance.
2.5 Find the probability distribtuion for the average of two rolls of a fair die with four sides. Then, compute expected value and variance of the average from the distribution.
2.6 How were the answers to question 2.5 related to those of question 2.4?

2.7 Generic Calculus Questions; warming up to least squares: Find the derivative with respect to x of the following functions:

(a) y = x2 
       (b) y = (4x + 3)2 
(c) y = (-3x2 + x)

2.8 The R function

            lm(y~x) 

will regress y on x, and the function

          summary(lm(y~x)) 

produces output similar to the SAS regression output. For BigMM, see if you can get output with similar values as those given by SAS on page 16. Locate the estimate of the variance of epsilon.

Mixed Effects R statistic Quiz

Qn17. Download the file vocab.csv from the course materials. This file describes a study in which 50 recent posts by men and women on social media were analyzed for how many unique words they used, i.e the size of their operational vocabulary on social media. The research question is how men’s and women’s vocabulary may differ on each of three social media websites. How many subjects took part in this study?

Qn18. Create an interaction plot with social on the X-axis and Sex as the traces. How many times, if any, do these lines cross?

Qn19. Perform three Kolmogorov-Smirnov goodness-of-fit tests on Vocab for each level of social using exponential distributions. To the nearest ten-thousandth (four digits), what is the lowest p-value of these three tests? Hint: use the MASS library and its fitdistr function on Vocab separately for each level of social. Use “exponential” as the distribution type. Save the estimate as a fit. The se ks.test with “pexp” passing fit [1] as the rate and requesting an exact test. Ignore any warnings produced about ties.

**

Test of order effects using Generalized Linear Mixed Model

**
Qn20. Use a generalized linear mixed model (GLMM) to conduct a test of order effects on Vocab to ensure counterbalancing worked. To the nearest ten-thousandth (four digits), what is the p-value for the order main effect? Hint: use the lme4 library and its glmer function with family=Gamma(link=”log”) and subject as a random effect. Then use the car library and its Anova function with type = 3. Prior to either, set sum-to-zero contrasts for Sex and Order.

Qn21, Use a generalized linear mixed model (GLMM) to conduct a test of Vocab by Sex and Social. To the nearest then-thousandth (four digits), what is the p-value for the interaction effect? Hint: use the lme4 library and its glmer function with family-Gamma(link=”log”) and subject as a random effect. Then use the car library and its Anova function with type = 3. Prior to either, set sum-to-zero contrasts for sex and social.

Qn22. The only significant effect on Vocab was social. Therefore, perform post hoc pairwise comparisons among levels of social adjusted with Holm’s sequential Bonferroni procedure. To the nearest ten-thousandth (four digits), what is the p-value of the only non-significant pairwise comparison? Hint: Use the multicomp library and its mcp function called from within its glht function, Ignore any warnings produced.

Qn23. In module *, we conducted a generalized linear model (GLM) for ordinal logistic regression using the polr function form the MASS library. We also conducted a GLM for nominal logistic regression using the multinom function from the nnet library. It is, therefore, reasonable to ponder whether variants of such functions exist for generalized linear mixed models (GLMMs), i.e variants that can handle random effects and therefore repeated measures. Unfortunately, although certain approaches exist, ther are arcane and difficult to use, and the R community has not converged upon any widely adopted approaches to mutinomial models with random effects. Our lectures did not venture into such territory, but as a final topic pointing toward the future, here is a brief treatment of ordinal logistic regression with random effects. Let’s being by revisiting our file websearch3.csv from the course materials. Effort is a Likert-type response. How many ordered categories does effort have? Recode Effort as an ordinal response.

Qn24. Use a generalized linear mixed model (GLMM) for ordinal logistic regression to examine Effort by Engine, Specifically, we will use what is called a “cumulative link mixed model” (CLMM). We find the clmm function in the ordinal library. To produce significant tests we use a special version of the ANova function form RVAideMemoire library. There are two quirks. One is that we must make our data frame before passing it to clmm. The second is that the type of parameter seems to be ignore by Anova, resulting in a type II ANOVA. (with a Type II ANOVA, if an interaction is present, then main effects are ignored; not an issue for our one-way analysis of effort by Engine here.) To the nearest ten-thousandth (four digits), what is the p-value of the Engine maineffect? Hint: Here is the code to use:

#assuming df contains websearch3.csv
#Assuming Subject has been recoded as nominal
#Assumng effort has been recoded as ordinal
library(ordinal)
library(RVAideMemoire)
df2 <- as.data.frame(df) # quirk
Contrasts (df2)$Engine) <- “contr.sum”
m  = clmm(Effort ~ Engine + (1|Subject), data = df2)
Anova(m, type = 3) # type ignored

Qn25. In light of the significant main effect of Engine on Effort, post hoc pairwise comparisons are justified among the levels of Engine. However, there is no glht equivalent for clmm, so the best we can do is to treat Effort as a numeric value. Plot the Effort ratings by Engine and perform pairwise comparisons with the following code, To the neares ten-thousandth (four digits), what is the p-value of the one non-significant pairwise comparisons?

#assuming code continuing from Q24
plot(as.numeric(Effort)~Engine, data = df2)
library(mle4)
library(multcomp)
m = lmer(as.numeric(Effort)~Engine + (1|subject), data=df2)
summary(glht(m, mcp(Engine = “Tukey”)), test = adjusted(type=”holm”))

Looking for someone to help you with this Coursera Quiz ? Do not hesitate because MyMathLabhomeworkhelp.com statistics experts are the best when it comes to providing accurate and timely solutions to all your statistics problems. You can place an order under our Coursework help tab, and we'll get back to you ASAP

Questions from Generalized mixed-model Coursera Quiz

Qn11. Download the file teaser.csv from the course materials. This file describes a survey in which respondents recruited online saw five different teaser trailers for upcoming movies of different genres. Respondents simply indicated whether they liked each teaser or not. The research question is whether trailers from certain film genres were liked more than others. How many respondents took part in this survey?

Qn12. By viewing the data table, discern which counterbalancing scheme was used for the Teaser factor, if any:

Full counterbalancing
Latin Square
Balanced Latin square
Random
None random

Q13. Create a plot of Liked by Teaser. Which teaser trailer genre was like the most?

Action
Comedy
Horror
Romance
Thriller 

Qn14. Using a generalized linear mixed model (GLMM), conduct a test of order effects on Liked to ensure counterbalancing worked, To the nearest ten-thousandth (four digits), what is the p-value for the order main effect? Hint: use the lme4 library and its glmer function with family=binomial and subject as random effect. The use the car library and its Anova function with type= 3. Prior to either, set sum-to-zero contrasts for order.

Qn15. Using a generalized linear mixed model (GLMM), conduct a test of liked by Teaser. To the nearest ten-thousandth (four digits), what is the chi-square statistic for the Teaser main effect? Hint: Use the lme4 library and its glmer function with family=binomial and subject as a random effect. Then use the car library and its Anova function with type = 3. Prior to either, set sum-to-zero contrasts for Teaser.
Qn16. Conduct simultaneous post hoc pairwise comparisons among levels of Teaser. Be sure to use Holm’s sequential Bonferroni procedure. How many of the tests are statistically significant? Hint: use the multcomp library and its mcp function called from within its glht function.

Conducting Linear Mixed Model using Social.sav Data

Qn6. Because the omnibus linear mixed model (LMM) did not result in a significant main effect of Engine on Searches, post hoc pairwise comparisons were not justified. As a result, despite one such comparison having p < 0.05, strictly speaking this “finding” must be disregarded

True
False

Qn7. Recall our file socialvalue.cv. If you have not done so already, please download it form the course materials. This file describes a study of people viewing a positive or negative film clip before going onto social media and then judging the value of the first 100 posts they see there. The number of valued posts was recorded. You originally analyzed this data with a 2x2 within subjects ANOVA. Now you will use a linear mixed model (LMM). Let’s refresh our memory: How many subjects took part in this study?

Qn8. To the nearest whole number, how many more posts were valued of Facebook than Twitter after seeing a positive film clip?

Qn9. Conduct a linear mixed model (LMM) on valued by social and Clip. To the nearest ten-thousandth (four digits), what is the p-value of the interaction effect? Hint: use the lme4 library and its lmer function with subject as a random effect. Then use the car library and its Anova function with type = 3 and test.statistic = “F”. Prior to either, set sum-to-zero contrasts for both social and clip.

Planned Pairwise comparisons of the data
Q10. Conduct two planned pairwise comparisons of how the film clips may have influenced judgements about the vale of social media. The first question is whether on Facebook, the number of valued posts was different after people saw a positive film clip versus a negative film clip. The second question is whether on Twitter, the number of valued posts was different after people saw a positive film clip versus a negative film clip. Correcting for these two planned comparisons using Holm’s sequential Bonferroni procedure, to the nearest ten-thousandth (four digits), what is the lowers corrected p-value of the two tests? Hint: use the multcomp and lsmeans libraries and the lsm function within the glht function. Do not correct for multiple comparisons yet as only two planned comparisons will be regarded. After retrieving the two as-yet uncorrected p-values of interest manually pass them to p.adjust for correction.

if you'd like someone to help you with r statistics assignments, then you can send us your files with the assignments instructions, and we'll get back to you with the solutions on time.

Doing Mixed Effects Models
Qn1. Recall our file websearch3.csv. If you have not done so already, please download if from the course materials. This file describes a study of the number of searches people did with various search engines to successfully find 100 facts on the web. You originally analyzed this data with a one-way repeated measures ANOVA. Now you will use a linear mixed model (LMM). Let’s refresh our memory: How many subjects took part in this study?

Qn2. To the nearest hundredth (two digits), how many searches on average did subjects require with the Google search engine?

Qn3. Conduct a linear mixed model (LMM) on Searches by Engine. To the nearest ten-thousandth (four digits), what is the p-value of such a test? Hint: use the lme4 library and its lmer function with the subject as random effect. The use the car library and its Anova function with type = 3 and test.statistic = “f”. Prior to either, set sum-to-zero contrasts for engine.
Qn4. In light of your p-value result, are post hoc pairwise comparisons among levels of Engine justified, strictly speaking?

Yes
No

Qn5. Regardless of your answer to the previous question, conduct simultaneous pairwise comparisons among all levels of Engine. Correct your p-values with Holm’s sequential Bonferroni procedure. To the nearest ten-thousandth (four digits), what is the lowest corredted p-value resulting from such tests? Hint: use the multcomp library and its mcp function from within a call to its glht function.

This questions uses the r statistics programming software, if you are looking for someone to help with this question, then do not hesitate to contact MyMathLab answers .