# Assignment: Formulating logistic model using SAS

A)

Download the excel dataset name **online_buying**. The dataset consists of purchases made by consumers across two channels from a company/firm: online and the physical store (multi-channelpurchases). More specifically, the dataset has following columns:

1) **Customer**: The Id of the customer.

2) **Online_buy**: This takes the value of 1 if the consumer buys online and 0 if he/she buys in

the physical store.

3) **distance**: The distance in miles from the customers home/residence to the physical store.

4) **experience**: The number of months that the customer has been shopping with the firm.

Thus, this can be interpreted as the experience the customer has with the firm.

5) **c_rank**: This is the convenience rank. Customers were asked about how convenient it

would be for them to shop online versus the physical store on a scale from 1(least

convenient) to 4 (most convenient).

- Formulate and estimate a (logistic) model to understand how the factors can affect the likelihood of customers shopping online. Comment on the model parameter estimates. You can calculate the odd ratios etc. What are the managerial implications of your findings?
- For each customer calculate the probability of him/her shopping online. You should do this based on the model estimates and the data and using the formula for p. You can do this either in excel or in SAS.

B)**Download the excel dataset **brand_choice****. You have disaggregate data of choice of brands (denoted by 1, 2 and 3) in the beverage category by 50 consumers. Let’s say brands 1, 2 and 3 are Fuze, Honestea and Suja respectively. The data set consists of the following:

1) **Customer_id**: This denotes/indexes the customer who is making the purchase.

2) **Price**: This indicates the price per 12 oz. of the brand.

3) **Brand**: If the particular brand is chosen by the customer, then this takes the value of 1; if the brand is not chosen then this takes the value of 0.

4) **Product**: This number tells you which brand it is: brands 1, 2 and 3 are Fuze, Honestea

and Suja respectively.

The objective is to model the consumer’s choice of beverage brands using

brands’ pricing information. Estimate the model and interpret the results of the model.

Note: You should use proc mdc to do this.

You can modify the code given below and use it. You should first generate the intercept terms in the data for each of the products.

** You can do this as follows**.

In cases when product=1 then you can have int1=1 and when product=2 and product=3 then int1=0

· When product=2 then you can have int2=1 and when product=1 and product=3 then

int2=0

· When product=3 then you can have int3=1 and when product=1 and product=2 then

int3=0

So int1=1 for product 1 only and 0 otherwise; int2=1 for product 2 only and 0 otherwise; int3=1 for product 3 only and 0 otherwise. You can do the above in excel or SAS.

```
proc sort data=brand_choice;
by customer_id;
run;
proc mdc data=brand_choice;
model brand = in1 in2 price /
type=clogit
nchoice=3;
id customer_id;
run;
```

C)

Please read the article attached (“Retailers’ Emails Are Misfires for Many Holiday Shoppers”)

and answer the following questions.

1) In your own words, what does it mean to 'personalize' digital messaging? What tools,

technologies and methods are necessary to personalize digital messaging?

2) The article says “the ugly truth is that most retailers haven’t done the unsexy work of

understanding how to use the data”. How can retailers use the data available to them to

better design personalized emails? What should they do? How can customers be targeted with the right product?

https://www.wsj.com/articles/retailers-emails-are-misfires-for-many-holiday-shoppers-1511778600