forked from jtleek/modules
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.Rmd
More file actions
224 lines (171 loc) · 5.18 KB
/
index.Rmd
File metadata and controls
224 lines (171 loc) · 5.18 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
---
title : Simulation
subtitle :
author : Roger D. Peng, Associate Professor of Biostatistics
job : Johns Hopkins Bloomberg School of Public Health
logo : bloomberg_shield.png
framework : io2012 # {io2012, html5slides, shower, dzslides, ...}
highlighter : highlight.js # {highlight.js, prettify, highlight}
hitheme : tomorrow #
url:
lib: ../../libraries
assets: ../../assets
widgets : [mathjax] # {mathjax, quiz, bootstrap}
mode : selfcontained # {standalone, draft}
---
## Generating Random Numbers
Functions for probability distributions in R
- `rnorm`: generate random Normal variates with a given mean and standard deviation
- `dnorm`: evaluate the Normal probability density (with a given mean/SD) at a point (or vector of points)
- `pnorm`: evaluate the cumulative distribution function for a Normal distribution
- `rpois`: generate random Poisson variates with a given rate
---
## Generating Random Numbers
Probability distribution functions usually have four functions associated with them. The functions are prefixed with a
- `d` for density
- `r` for random number generation
- `p` for cumulative distribution
- `q` for quantile function
---
## Generating Random Numbers
Working with the Normal distributions requires using these four functions
```r
dnorm(x, mean = 0, sd = 1, log = FALSE)
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)
```
If $\Phi$ is the cumulative distribution function for a standard
Normal distribution, then `pnorm(q)` = $\Phi(q)$ and `qnorm(p)` =
$Φ^{−1}(p)$.
---
## Generating Random Numbers
```r
> x <- rnorm(10)
> x
[1] 1.38380206 0.48772671 0.53403109 0.66721944
[5] 0.01585029 0.37945986 1.31096736 0.55330472
[9] 1.22090852 0.45236742
> x <- rnorm(10, 20, 2)
> x
[1] 23.38812 20.16846 21.87999 20.73813 19.59020
[6] 18.73439 18.31721 22.51748 20.36966 21.04371
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
18.32 19.73 20.55 20.67 21.67 23.39
```
---
## Generating Random Numbers
Setting the random number seed with `set.seed` ensures reproducibility
```r
> set.seed(1)
> rnorm(5)
[1] -0.6264538 0.1836433 -0.8356286 1.5952808
[5] 0.3295078
> rnorm(5)
[1] -0.8204684 0.4874291 0.7383247 0.5757814
[5] -0.3053884
> set.seed(1)
> rnorm(5)
[1] -0.6264538 0.1836433 -0.8356286 1.5952808
[5] 0.3295078
```
Always set the random number seed when conducting a simulation!
---
## Generating Random Numbers
Generating Poisson data
```r
> rpois(10, 1)
[1] 3 1 0 1 0 0 1 0 1 1
> rpois(10, 2)
[1] 6 2 2 1 3 2 2 1 1 2
> rpois(10, 20)
[1] 20 11 21 20 20 21 17 15 24 20
> ppois(2, 2) ## Cumulative distribution
[1] 0.6766764 ## Pr(x <= 2)
> ppois(4, 2)
[1] 0.947347 ## Pr(x <= 4)
> ppois(6, 2)
[1] 0.9954662 ## Pr(x <= 6)
```
---
## Generating Random Numbers From a Linear Model
Suppose we want to simulate from the following linear model
\[
y = \beta_0 + \beta_1 x + \varepsilon
\]
where $\varepsilon\sim\mathcal{N}(0, 2^2)$. Assume
$x\sim\mathcal{N}(0,1^2)$, $\beta_0 = 0.5$ and $\beta_1 = 2$.
```r
> set.seed(20)
> x <- rnorm(100)
> e <- rnorm(100, 0, 2)
> y <- 0.5 + 2 * x + e
> summary(y)
Min. 1st Qu. Median
-6.4080 -1.5400 0.6789 0.6893 2.9300 6.5050
> plot(x, y)
```
---
## Generating Random Numbers From a Linear Model
<img src="../assets/img/plot4.png" height=500>
---
## Generating Random Numbers From a Linear Model
What if `x` is binary?
```r
> set.seed(10)
> x <- rbinom(100, 1, 0.5)
> e <- rnorm(100, 0, 2)
> y <- 0.5 + 2 * x + e
> summary(y)
Min. 1st Qu. Median
-3.4940 -0.1409 1.5770 1.4320 2.8400 6.9410
> plot(x, y)
```
---
## Generating Random Numbers From a Linear Model
<img src="../assets/img/plot5.png" height=500>
---
## Generating Random Numbers From a Generalized Linear Model
Suppose we want to simulate from a Poisson model where
Y ~ Poisson(μ)
log μ = $\beta_0 + \beta_1x$
and $\beta_0 = 0.5$ and $\beta_1 = 0.3$. We need to use the `rpois` function for this
```r
> set.seed(1)
> x <- rnorm(100)
> log.mu <- 0.5 + 0.3 * x
> y <- rpois(100, exp(log.mu))
> summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 1.00 1.00 1.55 2.00 6.00
> plot(x, y)
```
---
## Generating Random Numbers From a Generalized Linear Model
<img src="../assets/img/plot6.png" height=500>
---
## Random Sampling
The `sample` function draws randomly from a specified set of (scalar) objects allowing you to sample from arbitrary distributions.
```r
> set.seed(1)
> sample(1:10, 4)
[1] 3 4 5 7
> sample(1:10, 4)
[1] 3 9 8 5
> sample(letters, 5)
[1] "q" "b" "e" "x" "p"
> sample(1:10) ## permutation
[1] 4 710 6 9 2 8 3 1 5
> sample(1:10)
[1] 2 3 4 1 9 5 10 8 6 7
> sample(1:10, replace = TRUE) ## Sample w/replacement
[1] 2 9 7 8 2 8 5 9 7 8
```
---
## Simulation
Summary
- Drawing samples from specific probability distributions can be done with `r*` functions
- Standard distributions are built in: Normal, Poisson, Binomial, Exponential, Gamma, etc.
- The `sample` function can be used to draw random samples from arbitrary vectors
- Setting the random number generator seed via set.seed is critical for reproducibility