## Assignment: Formulating logistic model using SAS

A)
Download the excel dataset name online_buying. The dataset consists of purchases made by consumers across two channels from a company/firm: online and the physical store (multi-channelpurchases). More specifically, the dataset has following columns:

1) Customer: The Id of the customer.
2) Online_buy: This takes the value of 1 if the consumer buys online and 0 if he/she buys in
the physical store.
3) distance: The distance in miles from the customers home/residence to the physical store.
4) experience: The number of months that the customer has been shopping with the firm.
Thus, this can be interpreted as the experience the customer has with the firm.

5) c_rank: This is the convenience rank. Customers were asked about how convenient it
would be for them to shop online versus the physical store on a scale from 1(least
convenient) to 4 (most convenient).

1. Formulate and estimate a (logistic) model to understand how the factors can affect the likelihood of customers shopping online. Comment on the model parameter estimates. You can calculate the odd ratios etc. What are the managerial implications of your findings?
2. For each customer calculate the probability of him/her shopping online. You should do this based on the model estimates and the data and using the formula for p. You can do this either in excel or in SAS.

B)
Download the excel dataset brand_choice**. You have disaggregate data of choice of brands (denoted by 1, 2 and 3) in the beverage category by 50 consumers. Let’s say brands 1, 2 and 3 are Fuze, Honestea and Suja respectively. The data set consists of the following:

1) Customer_id: This denotes/indexes the customer who is making the purchase.
2) Price: This indicates the price per 12 oz. of the brand.
3) Brand: If the particular brand is chosen by the customer, then this takes the value of 1; if the brand is not chosen then this takes the value of 0.
4) Product: This number tells you which brand it is: brands 1, 2 and 3 are Fuze, Honestea
and Suja respectively.

The objective is to model the consumer’s choice of beverage brands using
brands’ pricing information. Estimate the model and interpret the results of the model.
Note: You should use proc mdc to do this.
You can modify the code given below and use it. You should first generate the intercept terms in the data for each of the products.

You can do this as follows.
In cases when product=1 then you can have int1=1 and when product=2 and product=3 then int1=0
· When product=2 then you can have int2=1 and when product=1 and product=3 then
int2=0
· When product=3 then you can have int3=1 and when product=1 and product=2 then
int3=0
So int1=1 for product 1 only and 0 otherwise; int2=1 for product 2 only and 0 otherwise; int3=1 for product 3 only and 0 otherwise. You can do the above in excel or SAS.

``````           proc sort data=brand_choice;
by customer_id;
run;
proc mdc data=brand_choice;
model brand = in1 in2 price /
type=clogit
nchoice=3;
id customer_id;
run;
``````

C)

Please read the article attached (“Retailers’ Emails Are Misfires for Many Holiday Shoppers”)
and answer the following questions.

1) In your own words, what does it mean to 'personalize' digital messaging? What tools,
technologies and methods are necessary to personalize digital messaging?

2) The article says “the ugly truth is that most retailers haven’t done the unsexy work of
understanding how to use the data”. How can retailers use the data available to them to
better design personalized emails? What should they do? How can customers be targeted with the right product?

https://www.wsj.com/articles/retailers-emails-are-misfires-for-many-holiday-shoppers-1511778600

## Scenario: Melbourne Property Prices

An important issue to many young Australians is housing affordability with recent
booms in property prices in Australia’s largest cities making it more difficult for
first-time homebuyers to enter the market. Sellers and buyers alike are interested in
what drives property prices, and buyers are generally interested in knowing where
bargains can be found.

We will consider data on more than 23,000 properties listed on
domain.com.au in the Melbourne metropolitan area between January 2016 and September 2017.

These data include key geographic information (e.g., suburb, address, distance from Melbourne CBD), property information (e.g., property type, number of rooms, land
and building area), and selling information (e.g., agent, sale method, price).

For this project, there are three primary questions to be investigated:

(a) Does property price increase the closer you get to the Melbourne CBD, and does the relationship between distance from the Melbourne CBD and property price change depending on the property type?

(b)What factors are most relevant in helping to predict property prices, and which
general region (REGION NAME) appears to be the best bargain for houses (i.e.
excluding other property types) based on what you would predict house prices
to be for that region?

(c)Are there certain (non-geographic) attributes of properties that characterize a
general region (REGION NAME)? In other words, is (non-geographic) information
on the property sufficient to allow a buyer to understand where a property is
likely to be located?

Expectation:
Methods and Analysis:
Provides a description of the data and the statistical analyses that will be performed. If a linear regression, principal component analysis, or linear discriminant analysis is to be carried out, this section should provide an explanation of and motivation for the variables that are included in the model. This section should also include descriptive statistics (statistics, tables, graphs) that are useful in describing the data and providing a glimpse of what you might expect from your statistical analyses. A good deal of thought should go into your descriptive statistics, as these must clearly show some relevance to your questions of interest, and you must explain what you can derive from these.

Results:
Provides a thorough description of the results of the analyses you described in the previous section. Include tables with relevant output. If analyses are carried out that involve the estimation of parameters, this should include an interpretation of the parameters for the variables of interest. Any issues with significant violations of the requirements/assumptions needed to perform the analyses carried out must be addressed.

R code and summary output should not be pasted into the document, but instead relevant results should be presented in nicely formatted tables.

## Mixed Effects R statistic Quiz

Qn17. Download the file vocab.csv from the course materials. This file describes a study in which 50 recent posts by men and women on social media were analyzed for how many unique words they used, i.e the size of their operational vocabulary on social media. The research question is how men’s and women’s vocabulary may differ on each of three social media websites. How many subjects took part in this study?

Qn18. Create an interaction plot with social on the X-axis and Sex as the traces. How many times, if any, do these lines cross?

Qn19. Perform three Kolmogorov-Smirnov goodness-of-fit tests on Vocab for each level of social using exponential distributions. To the nearest ten-thousandth (four digits), what is the lowest p-value of these three tests? Hint: use the MASS library and its fitdistr function on Vocab separately for each level of social. Use “exponential” as the distribution type. Save the estimate as a fit. The se ks.test with “pexp” passing fit [1] as the rate and requesting an exact test. Ignore any warnings produced about ties.

**

## Test of order effects using Generalized Linear Mixed Model

**
Qn20. Use a generalized linear mixed model (GLMM) to conduct a test of order effects on Vocab to ensure counterbalancing worked. To the nearest ten-thousandth (four digits), what is the p-value for the order main effect? Hint: use the lme4 library and its glmer function with family=Gamma(link=”log”) and subject as a random effect. Then use the car library and its Anova function with type = 3. Prior to either, set sum-to-zero contrasts for Sex and Order.

Qn21, Use a generalized linear mixed model (GLMM) to conduct a test of Vocab by Sex and Social. To the nearest then-thousandth (four digits), what is the p-value for the interaction effect? Hint: use the lme4 library and its glmer function with family-Gamma(link=”log”) and subject as a random effect. Then use the car library and its Anova function with type = 3. Prior to either, set sum-to-zero contrasts for sex and social.

Qn22. The only significant effect on Vocab was social. Therefore, perform post hoc pairwise comparisons among levels of social adjusted with Holm’s sequential Bonferroni procedure. To the nearest ten-thousandth (four digits), what is the p-value of the only non-significant pairwise comparison? Hint: Use the multicomp library and its mcp function called from within its glht function, Ignore any warnings produced.

Qn23. In module *, we conducted a generalized linear model (GLM) for ordinal logistic regression using the polr function form the MASS library. We also conducted a GLM for nominal logistic regression using the multinom function from the nnet library. It is, therefore, reasonable to ponder whether variants of such functions exist for generalized linear mixed models (GLMMs), i.e variants that can handle random effects and therefore repeated measures. Unfortunately, although certain approaches exist, ther are arcane and difficult to use, and the R community has not converged upon any widely adopted approaches to mutinomial models with random effects. Our lectures did not venture into such territory, but as a final topic pointing toward the future, here is a brief treatment of ordinal logistic regression with random effects. Let’s being by revisiting our file websearch3.csv from the course materials. Effort is a Likert-type response. How many ordered categories does effort have? Recode Effort as an ordinal response.

Qn24. Use a generalized linear mixed model (GLMM) for ordinal logistic regression to examine Effort by Engine, Specifically, we will use what is called a “cumulative link mixed model” (CLMM). We find the clmm function in the ordinal library. To produce significant tests we use a special version of the ANova function form RVAideMemoire library. There are two quirks. One is that we must make our data frame before passing it to clmm. The second is that the type of parameter seems to be ignore by Anova, resulting in a type II ANOVA. (with a Type II ANOVA, if an interaction is present, then main effects are ignored; not an issue for our one-way analysis of effort by Engine here.) To the nearest ten-thousandth (four digits), what is the p-value of the Engine maineffect? Hint: Here is the code to use:

``````#assuming df contains websearch3.csv
#Assuming Subject has been recoded as nominal
#Assumng effort has been recoded as ordinal
library(ordinal)
library(RVAideMemoire)
df2 <- as.data.frame(df) # quirk
Contrasts (df2)\$Engine) <- “contr.sum”
m  = clmm(Effort ~ Engine + (1|Subject), data = df2)
Anova(m, type = 3) # type ignored
``````

Qn25. In light of the significant main effect of Engine on Effort, post hoc pairwise comparisons are justified among the levels of Engine. However, there is no glht equivalent for clmm, so the best we can do is to treat Effort as a numeric value. Plot the Effort ratings by Engine and perform pairwise comparisons with the following code, To the neares ten-thousandth (four digits), what is the p-value of the one non-significant pairwise comparisons?

``````#assuming code continuing from Q24
plot(as.numeric(Effort)~Engine, data = df2)
library(mle4)
library(multcomp)
m = lmer(as.numeric(Effort)~Engine + (1|subject), data=df2)
summary(glht(m, mcp(Engine = “Tukey”)), test = adjusted(type=”holm”))
``````

Looking for someone to help you with this Coursera Quiz ? Do not hesitate because MyMathLabhomeworkhelp.com statistics experts are the best when it comes to providing accurate and timely solutions to all your statistics problems. You can place an order under our Coursework help tab, and we'll get back to you ASAP

## Questions from Generalized mixed-model Coursera Quiz

Qn11. Download the file teaser.csv from the course materials. This file describes a survey in which respondents recruited online saw five different teaser trailers for upcoming movies of different genres. Respondents simply indicated whether they liked each teaser or not. The research question is whether trailers from certain film genres were liked more than others. How many respondents took part in this survey?

Qn12. By viewing the data table, discern which counterbalancing scheme was used for the Teaser factor, if any:

``````Full counterbalancing
Latin Square
Balanced Latin square
Random
None random
``````

Q13. Create a plot of Liked by Teaser. Which teaser trailer genre was like the most?

``````Action
Comedy
Horror
Romance
Thriller
``````

Qn14. Using a generalized linear mixed model (GLMM), conduct a test of order effects on Liked to ensure counterbalancing worked, To the nearest ten-thousandth (four digits), what is the p-value for the order main effect? Hint: use the lme4 library and its glmer function with family=binomial and subject as random effect. The use the car library and its Anova function with type= 3. Prior to either, set sum-to-zero contrasts for order.

Qn15. Using a generalized linear mixed model (GLMM), conduct a test of liked by Teaser. To the nearest ten-thousandth (four digits), what is the chi-square statistic for the Teaser main effect? Hint: Use the lme4 library and its glmer function with family=binomial and subject as a random effect. Then use the car library and its Anova function with type = 3. Prior to either, set sum-to-zero contrasts for Teaser.
Qn16. Conduct simultaneous post hoc pairwise comparisons among levels of Teaser. Be sure to use Holm’s sequential Bonferroni procedure. How many of the tests are statistically significant? Hint: use the multcomp library and its mcp function called from within its glht function.

## Coursera conducting linear mixed model using social sav data

Conducting Linear Mixed Model using Social.sav Data

Qn6. Because the omnibus linear mixed model (LMM) did not result in a significant main effect of Engine on Searches, post hoc pairwise comparisons were not justified. As a result, despite one such comparison having p < 0.05, strictly speaking this “finding” must be disregarded

``````True
False``````

Qn7. Recall our file socialvalue.cv. If you have not done so already, please download it form the course materials. This file describes a study of people viewing a positive or negative film clip before going onto social media and then judging the value of the first 100 posts they see there. The number of valued posts was recorded. You originally analyzed this data with a 2x2 within subjects ANOVA. Now you will use a linear mixed model (LMM). Let’s refresh our memory: How many subjects took part in this study?

Qn8. To the nearest whole number, how many more posts were valued of Facebook than Twitter after seeing a positive film clip?

Qn9. Conduct a linear mixed model (LMM) on valued by social and Clip. To the nearest ten-thousandth (four digits), what is the p-value of the interaction effect? Hint: use the lme4 library and its lmer function with subject as a random effect. Then use the car library and its Anova function with type = 3 and test.statistic = “F”. Prior to either, set sum-to-zero contrasts for both social and clip.

Planned Pairwise comparisons of the data
Q10. Conduct two planned pairwise comparisons of how the film clips may have influenced judgements about the vale of social media. The first question is whether on Facebook, the number of valued posts was different after people saw a positive film clip versus a negative film clip. The second question is whether on Twitter, the number of valued posts was different after people saw a positive film clip versus a negative film clip. Correcting for these two planned comparisons using Holm’s sequential Bonferroni procedure, to the nearest ten-thousandth (four digits), what is the lowers corrected p-value of the two tests? Hint: use the multcomp and lsmeans libraries and the lsm function within the glht function. Do not correct for multiple comparisons yet as only two planned comparisons will be regarded. After retrieving the two as-yet uncorrected p-values of interest manually pass them to p.adjust for correction.

if you'd like someone to help you with r statistics assignments, then you can send us your files with the assignments instructions, and we'll get back to you with the solutions on time.