## Math 338 Lab one Visualizing and Interpreting data

The goal of this lab is to start getting you comfortable using the Rguroo point-and-click interface and using the software to help visualize and interpret data.

## Part I. Eye Color Dataset

For this part of the lab, we will explore the graphical features of Rguroo using the dataset called HairEyeColor. This dataset can be found on Titanium. Download the dataset to your desktop. In Rguroo in the left hand column select the dropdown Data, then select Data Import. Within Data Import select Data Frame, then select the file and select Upload.

Question #1 Once you have imported the data, then if you double click on the dataset name the raw data will show up. If you right click on the dataset name there are many features, one of which is the summary function. Using these features answer the following questions.
(a) How many variables are there in this dataset?
(b) Are the variables quantitative or categorical?
(c) Specifically name one of the variables, and state what values it can take.
(d) How many cases are in this dataset?

Question #2 Now let’s look at only the variable of Eye color and obtain a barplot of the values. Do this by clicking on the drop menu for Create Plot and select Barplot. We first need to select the dataset by clicking the drop down menu of Select a Dataset; choose HairEyeColor. Switch from Numerical/Freq tab to Categorical tab. Select the Factor 1 drop down menu and click on Eye. Now, click on the Relative Frequency selection. Fill in the Labels for the Title, X-Axis, and Y-Axis. Click on the eye icon to view the bar graph.
Copy the Barplot and paste it below.

Question #3 You can add the specific percentage of each category as well as other features by clicking on the Details tab. To add the specific percentages, go to Bar, Value Labels, Error Bars and select Add Value Labels. Press the eye icon to see the change.
Copy the new more detailed Barplot and paste it below.

Question #4 Based on your Barplot in the previous question, which category has the most people? Which has the least?

Question #5 We can also look at the Eye as a Factor of Gender. This would allow us to visually compare the distribution of Eye Color of males and females. To do this click on the Basics tab, and select Sex for the Factor 2 box and select the eye icon.
Copy the Barplot with eye color and gender and paste it below.

Question #6 Which color is most prevalent for females; which color for males?

## Data Analysis Using Excel

For each of the following problems, save your work to a .r file. Name your files like
<.First Name>_HW3.
So my file for problem 2 would be Hendrix_Jeremy_HW3_2.r

I have provided you with an Excel spreadsheet called Last_FM_data_shuffled.xlsx. It contains the log of all the music I have listened to on my phone since I began using the Last.fm website. As the name implies however, I have shuffled the entries so that they are no longer in chronological order. There is a header row at the top of the spreadsheet, and there are four columns of data: Band, Album, Song, and Date.

1. Assuming you are not using packages that let you read from Excel, what must you do first in order to prepare this data to import to an R dataframe? What command will you use to import it?
For this problem, submit a .r file where the first line is a comment telling me what you have to do, and the second line is the R command to import the data. Remember that # is the comment character.
2. What is a single R command that can be used to count how many different bands are represented in the data file?
3. Write an R script that will sort the data back into chronological order and store it in a new dataframe.
4. Recall that the table() function can be used to quickly summarize data. As an example, assuming I have attached the dataframe with the song data, I can type

And get the following output

Song
(Song For My) Sugar Spun Sister 1901 45

``````                          2                        1               2

50 Ways to Say Goodbye     6th Avenue Heartache      8:02:00 PM
1                        2               1
``````

Each song title appears as a column heading and the number underneath it represents the number of time the song appears in the Song column of the dataframe.
Using this, what is the R command to determine the name of the song that has been played the most times? What is the R command to determine how many times that song has been played?

1. Using R, determine the average number of songs I listened to per day over the time period in the dataset.

## Using normal and Binomial distribution tables

Qn14. Seventy-six percent of adults want to live to age 100. You randomly select five adults and ask them whether they want to live to age 100. The random variable represents the number of adults who want to live to age 100. Complete parts (a) through (c) below. Graph the binomial distribution using a histogram and describe its shape.

Qn15. 38% of employees judge their peers by the cleanliness of their workspaces. You randomly select 8 employees and ask them whether they judge their peers by the cleanliness of their workspaces. The random variable represents the number of employees who judge their peers by the cleanliness of their workspaces. Complete parts a through c below. Qn16. Find the area of the shaded region under the normal curve. If convenient, use technology to find the area. Qn17. A standardized exam’s scores are normally distributed. In a recent year, the mean test score was 20.8 and the standard deviation was 5.6. The test scores of four students selected at random are 13,23,9, and 36. Find the z-scores that correspond to each value and determine whether any of the values are unusual.

## Write an R function called Cleaner that accepts a single vector of numbers

1. Write an R function called Cleaner that accepts a single vector of numbers that may contain NA entries and returns a vector where the NA’s have been replaced with -1.
2. Write an R function that accepts three parameters: a lower bound, an upper bound, and an increment. Then use a repeat loop to generate a vector of the numbers from the lower bound to the upper bound by increment.
For example, if my function was called counter
 2 4 6 8 10
i.e. the numbers from 2 to 10 in increments of 2
 2 5 8
3. Assuming I have three variables called lower, upper, and increment, how could I produce the same thing as number 2 with a single R statement that does not employ a loop?
4. Write an R function that accepts two parameters: a vector of strings and a single search character. The function will then return a vector that contains the input strings that contain the search character.

For Example, if my function was called searcher
names <- c(“Bob”, “Bill”, “William”,”Tom”)
 “Bill” “William”

1. Write an R function that accepts three parameters: a vector of strings, a single search character, and a single replacement character. The function will return the vector of strings, but with all instances of the search character replaced with the replacement character.
For example, if my function was called replacer
names <- c(“Bob”, “Bill”, “William”,”Tom”)
 “BOb” “Bill” “William” “TOm"

``# Question 5``

byday <- table(FM\$Date)

# Average Number of Songs ListenedTo Over the Time Priod

avsongs <- mean(byday)
avsongs

# Maximum Songs Listened To per Day

max(byday)

## Lesson Chapter 5 Review Questions

Qn1. Charity is planting trees along her driveway, and she has 6 pine trees and 6 willows to plant in one row. What is the probability that she randomly plants the trees so that all 6 pine trees are next to each other and all 6 willows are next to each other? Express your answer as a fraction or a decimal number rounded to four decimal places.
Q2. A person rolls a standard six-sided die 12 times. In how many ways can he get 6 fours, 5 ones, and 1 two?

Qn3. A card is drawn from a standard deck of 52 playing cards. What is the probability that the card will be a heart and not a seven? Express your answer as a fraction or a decimal number rounded to four decimal places.
Qn4. You are going to play mini golf. A ball machine that contains 21 green golf balls, 18 red golf balls, 23 blue golf balls, and 17 yellow golf balls, randomly gives you your ball. What is the probability that you end up with a blue golf ball? Express your answer as a simplified fraction or a decimal rounded to four decimal places.
Qn6.
A newspaper company classifies its customers by gender and location of residence. The research department has gathered data from a random sample of 1738
customers. The data is summarized in the table below.

## Probability Distribution Table What is the probability that a customer is female? Express your answer as a fraction or a decimal number rounded to four decimal places.

Qn7. A coin is tossed 3 times. What is the probability that the number of tails obtained will be 1? Express your answer as a fraction or a decimal number rounded to four decimal places.
Qn8. f a coin is tossed 5 times, and then a standard six-sided die is rolled 4 times, and finally a group of two cards are drawn from a standard deck of 52 cards without replacement, how many different outcomes are possible?

``If a coin is tossed 5 times, and then a standard six-sided die is rolled 4 times, and finally a group of two cards are drawn from a standard deck of 52``

cards without replacement, how many different outcomes are possible?

You can also solve this using technology.
Use The Fundamental Principle of Counting with the Combination Rule.
The experiments or tasks in this problem can be grouped into three basic types of activities, namely, tossing a coin 5
times, rolling a standard six-sided die 4
times, and drawing two cards from a deck of cards without replacement. To obtain the solution to the problem, the number of possible outcomes for each task is computed and then the Fundamental Principle of Counting is applied to the three tasks.
There are 25
outcomes possible when tossing a coin 5 times, 64 outcomes possible when rolling a standard six-sided die 4 times, and C252
outcomes possible when drawing two cards from a deck of cards without replacement. Applying the Fundamental Principle of Counting to these three tasks, we see that the total number of different outcomes possible is
25⋅64⋅C252=32⋅1296⋅1326=54991872
.Qn9.
6 cards are drawn from a standard deck without replacement. What is the probability that at least one of the cards drawn is a black card? Express your answer as a fraction or a decimal number rounded to four decimal places.
Qn11. In a history class there are 8 history majors and 8 non-history majors. 4 students are randomly selected to present a topic. What is the probability that at least 2 of the 4 students selected are non-history majors? Express your answer as a fraction or a decimal number rounded to four decimal places.
Qn12.
Jill is ordering pizza at a restaurant, and the server tells her that she can have up to three toppings: black olives, chicken, and spinach. Since she cannot decide how many of the toppings she wants, she tells the server to surprise her. If the server randomly chooses which toppings to add, what is the probability that Jill gets just spinach? Express your answer as a fraction or a decimal number rounded to four decimal places.

Qn13.
There are 77 students in a history class. The instructor must choose two students at random.
Academic Year History majors non-History majors
Freshmen 13 5
Sophomores 2 9
Juniors 12 12
Seniors 14 10
What is the probability that a junior non-History major and then another junior non-History major are chosen at random? Express your answer as a fraction or a decimal number rounded to four decimal places.

Qn14.
Customer account "numbers" for a certain company consist of 3 letters followed by 5 numbers.
Step 1 of 2 : How many different account numbers are possible if repetitions of letters and digits are allowed?
Qn15.
A coin is tossed 6 times. What is the probability that the number of heads obtained will be between 4 and 6 inclusive? Express your answer as a fraction or a decimal number rounded to four decimal places.
Qn16. A mail order company classifies its customers by gender and location of residence. The research department has gathered data from a random sample of 1936 customers. The data is summarized in the table below. What is the probability that a customer lives in a dorm? Express your answer as a fraction or a decimal number rounded to four decimal places.

Qn18. A bag contains 9 red, 8 orange, and 7 green jellybeans. What is the probability of reaching into the bag and randomly withdrawing 16 jellybeans such that the number of red ones is 5, the number of orange ones is 7, and the number of green ones is 4? Express your answer as a fraction or a decimal number rounded to four decimal places.

Qn19.
Larissa is ordering apple pie at a restaurant, and the server tells her that she can have up to four toppings: walnuts, pecans, whipped cream, and caramel. Since she cannot decide how many of the toppings she wants, she tells the server to surprise her. If the server randomly chooses which toppings to add, what is the probability that Larissa gets just walnuts and pecans? Express your answer as a fraction or a decimal number rounded to four decimal places.