Probability Simulations and Statistical Analysis using R Programming
Repeat 1000 times the experiment you performed in Task 1, that is rolling a tetrahedron die 10 times and computing the average. Report the average and standard deviation of the 1000 experiments. The standard deviation function in R is sd(x).
S = 1000
rolls.Avgs = vector(length = S)
for(simnum in 1:S){
x = 1:4
roll = sample(x, 10, replace = TRUE)
rolls.Avgs[simnum] = mean(roll)
}
mean(rolls.Avgs)
sd(rolls.Avgs)
# compute the mean of the 1000 experiments
mean(rolls.Avgs)
hist(rolls.Avgs, main=””, xlab=“Average of set of 10 rolls”)
//
We will see how many times a six occurs at least once out of ten rolls across all these experiments. The code for this counting exercise is sum(roll==6)>0 since a “success” is an experiment where the total number of sixes showing on ten rolls is more than zero!
six = 0 # start a counter for number of times at least one six shows in 5 rolls
S = 1000 # number of experiments
# for-loop to repeat die rolling experiment 1000 times
for(simnum in 1:S){
x = 1:6 # 6-sided die
roll = sample(x, 10, replace = TRUE) # roll the die five times
# two ways to count: with an if-then statement, or more elegantly with a Boolean computation
#if(sum(roll==2) > 0){st = st + 1} # if-then statement
six = six + (sum(roll==6)>0) # Boolean computation: add one to the counter only if at least one 6 shows.
}
six
//
Roll the tetrahedron die 5 times and repeat this experiment 1000 times as in Task 2. Report the proportion of 1000 simulations where a two occurred. (This derives from Dobrow problem 1.44: Probability of rolling at least one 2 in five rolls of a tetrahedron die.)
Dobrow problem 1.30: Exact answer is 0.7627
n = 1000
success = 0
for (i in 1:n) {
rolls = sample(1:4, 5, replace = TRUE)
if (sum(rolls == 2) > 0) {
success = success + 1
}
}
success / n
//
choose(n,k) – computes combination of choosing k objects from n.
prod(n:m) – computes the product of values from n to m. To do a permutation of r objects out of n,
you want to use prod(n:n-r+1). (For whatever reason, this doesn’t work if you assign n and r first, just
put in the numbers).
factorial(x) – computes x!
//
We perform the experiment of observing the number of heads after tossing a fair coin 100 times (probability of a heads on any one toss is 50%). Just like rolling a die, we can use the R function `sample` to flip a coin. Though recognize there are only two outcomes: heads (1) and tails (0). We will report the number of heads after 50 tosses and intermediary output. We will also graphically display the cumulative proportion of heads again the coin toss (1 to 100). The `type=”l”` parameter in the R function `plot` will draw a solid line.
simnum = 100 # number of coin flips
coinflips = sample(0:1, simnum, replace = TRUE) # flip the coin: heads = 1, tails = 0
heads = cumsum(coinflips) # cumulative sum of number of heads after each coin toss
prop = heads/(1:simnum) # running proportion of heads after each coin toss
head(heads) # report cumulative number of heads after each of the first 6 flips
heads[50]
# running mean plot for proportion of heads.
plot(1:simnum, prop, type=”l”, xlab=”Number of coins”, ylab=”Running average”, main=”Proportion of heads in 100 coin flips”)
abline(h=0.5) # add a line at 50%
//
Let us now flip a “biased” coin. Perform the experiment of observing the number of heads after tossing a coin 1000 times, with the probability of getting a heads on any one toss being 40%. To change the probability in the R function `sample` use the parameter `prob=c(0.6,0.4)`; note that we need to specify the probability of a tails (0) and a heads (1) in this parameter.
n = 1000 # number of coinflips
toss = sample(0:1, n, replace=TRUE, prob=c(0.6, 0.4))# heads = 1, tails = 0
# cumulative number of heads
heads <- cumsum(toss)
# cumulative proportion of heads
prop <- heads / (1:n)
prop[10]
prop[50]
prop[100]
prop[200]
prop[500]
# plot
plot(1:n, prop, type = “l”, xlab = “Number of Tosses”, ylab = “Proportion of Heads”, main = “Cumulative Proportion of Heads (Biased Coin)”)
abline(h=0.4)
//
a function is written which draws the number at random from the integers 1 to 1000 and then checks if it is divisible by 3, 5, or 6. It uses modular arithmetic, `x%%n` being *x mod n* in R. Simulate the probablity that a random integer between 1 and 5000 is divisible by 4, 7, or 10.
# simdivis() simulates one trial
simdivis = function(){
num = sample(1:5000, 1) # draw a number at random from the integers 1 to 5000
# determine if the number is divisible by 4, 7 or 10 by checking if the remainder is 0
if (num%%4==0 || num%%7==0 || num%%10==0) 1 else 0
}
simlist = replicate(1000, simdivis()) # replicate the experiment 1000 times
mean(simlist) # compute the estimated probability as the proportion of times the number is divisible by 4, 7, or 10
//
Present the empirical probability based on repeating the experiment 100, 1000, 10000, and 100000 times. Consider using the `xtable` code chunk from the first task to build a table for these values.
library(knitr)
library(xtable)
library(pander)
estimate100 <- mean(replicate(100, simdivis()))
estimate1000 <- mean(replicate(1000, simdivis()))
estimate10000 <- mean(replicate(10000, simdivis()))
estimate100000 <- mean(replicate(100000, simdivis()))
reps = formatC(c(100, 1000, 10000, 100000), digits=0, format=”d”, flag=”#”)
emp = c(estimate100, estimate1000, estimate10000, estimate100000)
emp = formatC(signif(emp, digits=6), digits=6, format=”f”, flag=”#”)
table.Elts = rbind(reps, emp)
row.Names(table.Elts) = cbind(“# of repetitions”, “Empirical Probability”)
lab2.Table = xtable(table.Elts, caption = “Empirical probability (1…5000 divisible by 4, 7, or 10) at different repetition counts.”, label=”div4710″, align = “|l|rrrr|”)
pander(lab2.Table)
//
Simulate in Rstudio the nontransitive dice probabilities in Exercise 2.6. Reference: (Consider three nonstandard dice. Instead of the numbers 1 through 6,
die A has two 3’s, two 5’s, and two 7’s;
die B has two 2’s, two 4’s, and two 9’s;
die C has two 1’s, two 6’s, and two 8’s, Suppose dice A and B are rolled. Show that A is more likely to get the higher number. That is, P(A > B) > 0.50, where {A > B} denotes the event that A beats B. Hint: Condition on the outcome of die A. Now show that if B and C are rolled, B is more likely to get the higher number. And, remarkably, if C and A are rolled, C is more likely to get the higher number.
n = 10000
dieA = c(3,3,3,5,5,7,7)
dieB = c(2,2,4,4,9,9)
dieC = c(1,1,6,6,8,8)
rollA = sample(dieA, n, replace = TRUE)
rollB = sample(dieB, n, replace = TRUE)
rollC = sample(dieC, n, replace = TRUE)
mean(rollA > rollB)
mean(rollB > rollC)
mean(rollC > rollA)
//
Consider the simple experiment of rolling a six-sided die and estimating the probability of rolling a 2.
success = 0 # storage vector for whether a roll is a 2 or not
nrolls = 10 # number of rolls of the die in the simulation experiment
for(i in 1:nrolls){
roll = sample(1:6, 1) # roll the six-sided die once
# Two ways to determine if the roll is a 2
# In each case, we want to store the success result in
# the ith element of the vector `success’.
# If roll is a 2, store value of ‘1’ in the success vector;
# otherwise store a value of zero
# 1) If-then statement: an if-else syntax for the Boolean expression
if(roll==2){success[i]=1}else{success[i]=0}
# 2) Straight Boolean expression
#success[i] = (roll == 2)
}
mean(success) # proportion of 1s (successes) in the simulation experiment
//
Simulate the birthday problem to obtain an empirical estimate of the probability that two people in a class of 23 will have the same birth date. In particular, simulate birthdays for 1000 classes (`for(i in 1:1000){…}`) each of size 23 and compute the proportion of these classes in which at least one pair of students has the same birth date.
success = 0
n = 1000
numstudents = 50
for (i in 1:n){
trial = sample(1:365, numstudents, replace = TRUE)
if (any(table(trial) >= 2)){ // at least one pair (2 or more). If (3 %in% table(trial)) is used for finding the approximate probability that three people have the same birthday in a class of 50 students.
success[i] = 1
}
else{
success[i] = 0
}
}
mean(success)
The algorithm sequentially moves down the list. At each position, say $i$, it swaps the number in slot $i$ with a randomly chosen element from positions `i:n` (so later in the list or no swap at all).
# Simulating random permutations
# Code from Example 2.14 of Dobrow (2014)
n = 12 # permutation of size n
perm = 1:n # store list of n integers
# sequentially move down the list
for(card in 1:(n-1)){
x = sample(card:n,1) # randomly chose a position for swapping
old = perm[card] # store the orginal number in position i
perm[card] = perm[x] # replace the number in position i with the number in the randomly chosen position x
perm[x] = old # complete the swap by placing the original number in position i in position x
}
1:n # original list
perm # randomly permuted list
//
In this task, we are going to shuffle a deck of cards and determine the probability that the top (first element in permutation) and bottom (last element in permutation) card are the same. The code
`(floor((perm[1]-1)/13) == floor((perm[52]-1)/13))`
checks if the first and last cards from the random permutation are the same suit. After performing the random permutation, you can store this value in a success vector (Boolean expression), and then wrap the code in a for-loop to repeat the simulation. The proportion of experiments reporting a 1 in the success vector is an estimate of the probability that the top and bottom cards of a shuffle are the same.
Based on Dobrow problem 2.46. Revise the code chunk above to shuffle a standard deck of cards (52 cards). Simulate the probability that in a randomly shuffled deck, the top and bottom cards are the same suit.
trials = 10000
n= 52
success = numeric(trials)
perm = 1:n
for (trial in 1:trials){
# shuffle perm in place
for(card in 1:(n-1)){
random = sample(card:n, 1)
old = perm[card]
perm[card] = perm[random]
perm[random] = old
}
# check if first and last cards are of the same suit
success[trial] = (floor((perm[1]-1)/13) == floor((perm[52]-1)/13))
}
mean(success)
//
A bag contains one red, two blue, three green, and four yellow balls. A sample of three balls is taken without replacement. Let upper B be the number of blue balls and Y the number of yellow balls in the sample.
Simulate the mean and variance of the total number of blue and yellow balls in the sample.
bag = c(“R”, “B”, “B”, “G”, “G”, “G”, “Y”, “Y”, “Y”, “Y”)
n=10000
list = numeric(n)
for (i in 1:n){
trial = sample(bag,3, replace = F)
total = sum(trial == “B”) + sum(trial == “Y”) # T = B + Y
list[i] = total
}
mean(list)
var(list)
//
Play the dice game described in Example 4.3 multiple times. Estimate both the expected value and variance of your earnings.
(For reference: Example 4.3 The following dice game cost $10 to play. If you roll 1, 2, or 3, you lose your money. If you roll 4 or 5, you get your money back. If you roll a 6, you win $24.)
Use R to simulate and estimate the expected value and the variance for the winnings of this game.
Hint: if you lose it is -10, getting money back is 0, and winnings is 14.
n=10000
win = numeric(n)
for (i in 1:n){
roll = sample(1:6,1, replace = T)
if (roll %in% 1:3) {
win[i] = -10
} else if (roll %in% 4:5) {
win[i] = 0
} else {
win[i] = 14
}
}
mean(win)
var(win)
//
Consider rolling a six-sided die. Estimate the probability of rolling a 2 by embedding this experiment in a for-loop and storing success results.
success = 0
nrolls = 10
for(i in 1:nrolls){
roll = sample(1:6, 1)
if(roll==2){success[i]=1}else{success[i]=0}
}
mean(success)
//
Simulate the probability of exactly one head in four coin tosses.
n <- 10000
simlist <- numeric(n)
for (i in 1:n) {
trial <- sample(0:1, 4, replace=TRUE)
success <- if (sum(trial)==1) 1 else 0
simlist[i] <- success
}
mean(simlist) # ~0.25
//
Simulate the probability of getting at least 8 in the sum of two dice rolls.
n = 10000
resultList <- numeric(n)
for (i in 1:n) {
rolls <- sample(1:6, 2, replace=TRUE)
resultList[i] <- if (sum(rolls) >= 8) 1 else 0
}
mean(resultList) # ~0.417
//
Simulate the probability that Judith’s coin (random penny/nickel/dime/quarter) is greater than Kory’s.
coins <- c(1, 5, 10, 25)
trial <- function() {
judith <- sample(coins, 1)
kory <- sample(coins, 1)
if (judith > kory) 1 else 0
}
n <- 10000
resultList <- numeric(n)
for (i in 1:n) {
resultList[i] <- trial()
}
//
Create a sample from 0/1 and compute its mean. Then sample from digits 0–9 and compute its mean.
n <- 100
sample(0:1, n, replace = TRUE)
mean(sample(0:1, n, replace = TRUE))
x = sample(0:9, n, replace = TRUE)
mean(x)
choose(10, 2) # 45
prod(8:6) # 336
factorial(4) # 24
choose(30, 10) # 30045015
prod(10:1) # 3628800
factorial(9) # 362880
//
(a)
Simulate the probability of exactly two heads in 4 tosses of a fair coin.
(b)
Compare your simulation to the true probability (42)(0.5)2(0.5)2\binom{4}{2}(0.5)^2(0.5)^2(24)(0.5)2(0.5)2.
(c)
Now simulate rolling two dice. Estimate the probability that the sum is ≥ 9.
(d)
Repeat the experiment 100, 1000, 10000 times and tabulate the empirical probabilities.
n = 10000
simlist = numeric(n)
for (i in 1:n) {
trial = sample(c(“H”,”T”), 4, replace = TRUE)
if (sum(trial == “H”) == 2) {
simlist[i] = 1
} else {
simlist[i] = 0
}
}
mean(simlist) # simulated P(exactly 2 heads)
true_prob = choose(4,2) * (0.5^2) * (0.5^2)
true_prob
(c)
n = 10000
resultList = numeric(n)
for (i in 1:n) {
rolls = sample(1:6, 2, replace = TRUE)
if (sum(rolls) >= 9) {
resultList[i] = 1
} else {
resultList[i] = 0
}
}
mean(resultList)