Posts tagged with r statistics help

Doing Mixed Effects Models
Qn1. Recall our file websearch3.csv. If you have not done so already, please download if from the course materials. This file describes a study of the number of searches people did with various search engines to successfully find 100 facts on the web. You originally analyzed this data with a one-way repeated measures ANOVA. Now you will use a linear mixed model (LMM). Let’s refresh our memory: How many subjects took part in this study?

Qn2. To the nearest hundredth (two digits), how many searches on average did subjects require with the Google search engine?

Qn3. Conduct a linear mixed model (LMM) on Searches by Engine. To the nearest ten-thousandth (four digits), what is the p-value of such a test? Hint: use the lme4 library and its lmer function with the subject as random effect. The use the car library and its Anova function with type = 3 and test.statistic = “f”. Prior to either, set sum-to-zero contrasts for engine.
Qn4. In light of your p-value result, are post hoc pairwise comparisons among levels of Engine justified, strictly speaking?

Yes
No

Qn5. Regardless of your answer to the previous question, conduct simultaneous pairwise comparisons among all levels of Engine. Correct your p-values with Holm’s sequential Bonferroni procedure. To the nearest ten-thousandth (four digits), what is the lowest corredted p-value resulting from such tests? Hint: use the multcomp library and its mcp function from within a call to its glht function.

This questions uses the r statistics programming software, if you are looking for someone to help with this question, then do not hesitate to contact MyMathLab answers .

Understanding oneway repeated Measures Designs

Qn1. What primarily distinguishes a oneway repeated measures ANOVA from a one-way ANOVA?

- The presence of multiple factors
- The presence of a between-subjects factor.
- The presence of a within-subjects factors.
- None of the above

Qn2. All else being equal, which of the following is a reason to use a within-subjects factor instead of a between-subjects factor?

- The data is more reliable
- The data exhibits less variance
- The factors are easier to analyze
- The exposure to confounds is less
- Less time from each subject is required

Qn3. In a repeated measures experiment, why should we encode an Order factor and test whether it is statistically significant? (Mark all that apply)

- To examine whether the presentation order of conditions exerts a statistically significant effect on the response.
- To examine whether any counterbalancing strategies we may have used were effective 
- To examine whether confounds may have affected our results
- To examine whether our factors cause changes in our response
- To examine whether out experiment discovered any differences

Qn4. How many subjects would be needed to fully counterbalance a repeated measures factor with four levels?

 - 4,8,16,24,32

Qn5. For an even number of conditions, a balanced Latin Square contains more sequences than a Latin Square.

- True
- False

Qn6. For a within-subjects factor of five levels, a balanced Latin Square would distribute which of the following number of subjects evenly across all sequences?

5, 15, 20,25,35

Qn7. Which is the key property of a long-format data table?

- Each row contains only one data point per response for a given subject.
- Each row contains all of the data points per response for a given subject.
- Each row contains all of the dependent variables for a given subject.
- Multiple columns together encode all levels of a single factor.
 - Multiple columns together encode all measures for a given subject

Qn8. Which is not a reason why Likert-type responses often do not satisfy the assumptions of ANOVA for parametric analyses.

- Despite having numbers on a scale, the response is not actually numeric.
- Responses may violate normality
- The response distribution cannot be calculated
- The response is ordinal
- The response is bound to within, say, a 5- or 7-point scale.

Qn9. When is the Greenhouse-Geisser Correction necessary?

- When a within-subjects factor of 2+ levels violates sphericity
- When a within-subjects factor of 2+ levels exhibits sphericity
- When a within-subjects factor of 3+ levels violates sphericity
- When a within-subjects factor of 3+ levels exhibits sphericity
- None of the above

Qn10. If an omnibus Friedman test is non-significant, post hoc pairwise comparisons should be carried out with Wilcoxon signed-rank tests

-True
-False

Doing Oneway ANOVAS

Qn1. Download the file alphabets.csv from the course materials. This file describes a study in which people used a pen-based stroke alphabets to enter a set of textphases. How many different stroke alphabets are being compared?
Qn2. To the nearest hundredth (two digits), what was the average text entry speed in words per minute (WPM) of the EdgeWrite alphabet?

Qn3. Conduct Shapiro-Wilk normality tests on the WPM response for each Alphabet. Which of the following, if any, violate the normality test? (Mark all that apply.)

-Unistrokes
-Graffiti
-EdgeWrite
-None of the above

Qn4. Conduct a Shapiro-Wilk normality test on the residuals of a WPM by Alphabet model. To the nearest ten-thousandth (four digits), what is the p-value from such a test? Hint: Fit a model with aov and then run Shapiro.test on the model residuals.

Qn5. Conduct a Brown-Forsythe homoscedasticity test on WPM by Alphabet. To the nearest then-thousandth (four digits), what is the p-value from such a test? Hint: Use the car library and its level Test function with center=median

Qn6. Conduct a oneway ANOVA on WPM by Alphabet. To the neares hundredth (two digits), what is the F statistic from such a test?
Qn7. Perform simultaneous pairwise comparisons among levels of Alphabet sing the Tukey approach. Adjust for multiple comparisons using Holm’s sequential Bonferroni procedure. To the nearest ten-thousandth (four digits), what is the corrected p-value for the comparison of Unistrokes to graffiti? Hint: use the multcomp library and its mcp function called form within its glht function.
Qn8. According to the results of the simultaneous pairwise comparisons, which of the following levels of Alphabet are significantly different in terms of WPM? Mark all that apply.)

-Unistrokes vs. graffiti
- Unistrokes vs, EdgeWrite
- Graffiti vs. EdgeWrite
- None of the above

Qn9. Conduct a Kruskal-Wallis test on WPM by Alphabet. To the nearest ten-thousandth (four digits), what is the p-value from such a test? Hint: use the coin library and its Kruskal-test function with distribution = “asymptotic”

Qn10. Conduct nonparametric post hoc pairwise comparisons of WPM among all levels of Alphabet manually using separate Mann-Whitnet U tests. Adjust the p-values using Holm’s sequential Bonferroni procedure, To the nearest ten-thousandth (four digits), what is the corrected p-value for Unistrokes vs. graffiti? Hint: The coin library’s Wilcoc_test only takes a model formular specification. For this, you need wilcox.test with paired = FALSE ((and to avoid warnings= FALSE))

Sample assignment on R statistics help

Answer all questions. Marks are indicated beside each question. You should submit your solutions before the
You should submit both
• a .pdf file containing written answers (word processed, or hand-written and scanned), and
• an .R file containing R code.
For all answers include
• the code you have written to determine the answer, the relevant output from this code, and a justification of how you got your answer.
• Total marks: 60 1. Consider the one parameter family of probability density functions

fb     for − b ≤ x ≤ b

where b > 0.
(a) Write R code to plot this pdf for various values of b > 0. [2 marks]
(b) Determine the method of moments estimator for the parameter b. (No R code necessary) [4 marks]
(c) Determine the Likelihood function for the parameter b. By writing R code to plot a suitable graph, determine that the derivative of this likelihood function is never zero. [4 marks] (d) Hence find the Maximum Likelihood Estimator for the parameter b. (No R code necessary) [4 marks] (e) The data in the file Question 1 data.csv contains 100 independent draws from the probability distribution with pdf fb(x), where the parameter b is unknown. Load the data into R using the command
D <- read . csv (path_to_f i l e )$x
where path_to_file indicates the path where you have saved the .csv file
e.g. path_to_file = “c:/My R Downloads/Question 1 data.csv”
Note that forward slashes are used to indicate folders (this is not consistent with the usual syntax for Microsoft operating systems).
Write R code to calculate an appropriate Method of Moments Estimate and a Maximum Likelihood Estimate for the parameter b, given this data. [4 marks]

  1. The data in the file Question 2 data.csv is thought to be a realisation of Geometric Brownian Motion
    St = S0eσWt+µt
    where Wt is a Wiener process and σ,µ and S0 are unknown parameters. Load the data into R using the command
    S <- read . csv (path_to_f i l e )
    where path_to_file indicates the path where you have saved the .csv file.
    (a) Write R code to determine the parameter S0. [2 marks]
    (b) Write R code to determine if Geometric Brownian Motion is suitable to model this data.
    You may do this by
    • plotting an appropriate scatter plot/histogram, and/or • using an appropriate statistical test.
    [6 marks]
    (c) Write R code to determine an estimate for µ and σ2 using Maximum Likelihood Estimators.
    (You do not have to derive these estimators). [5 marks]
  2. The data in the file Question 3 data.csv is a matrix of transition probabilities of a Markov Chain. Load the data into R using the command
    P <- as . matrix ( read . csv (path_to_f i l e ))
    with an appropriate value for path_to_file.
    (a) Verify that this Markov Chain is ergodic. (No R code necessary) [4 marks]
    (b) Suppose that an initial state vector is given by
    x=(0.1,0.2,0.4,0.1,0.2) (1)
    Write R code to determine the state vector after 10 time steps. Do this without diagonalising the matrix
    P. [3 marks]
    (c) Write R code to verify this answer by diagonalising the matrix P. Note that the eigen(A) function produces the right-eigenvectors of a matrix A (solutions of Av = λv) However we want the left-eigenvectors (solutions of vA= λv).
    These are related by
    v is a left-eigenvector of A if and only if vT is a right-eigenvector of AT.
    [8 marks]
    (d) Hence, or otherwise, determine the limiting distribution with the initial state vector given in (1).
    [4 marks]
  3. The data if the file Question 4 data.csv is a generator matrix for a Markov Process. Load the data into R using the command
    A <- as . matrix ( read . csv (path_to_f i l e ))
    with an appropriate value for path_to_file.
    (a) Suppose that X0 =0. Write R code to simulate one realisation of the Markov Process Xt. The output should be two vectors (or one data frame with two variables).
    • The first vector indicates transition times.
    • The second vector indicates which state the Markov Process takes at this time (i.e. one of 0,1,2,3,4).
    How to proceed:
    • The first line of your code must read
    set . seed (4311)
    to ensure that this realisation is repeatable.
    • For each Xt you must determine
    – what the transition time s to the next state is,
    – what the probabilities to transfer to each state are, and hence randomly select a suitable value for Xt+s.
    [9 marks] (b) Write R code to plot an appropriate graph that describes this realisation. [1 mark]