formals()
, body()
, and environment()
NULL
Conditional Executionmessage()
, warning()
, and stop()
missing()
...
base
R
purrr
furrr
Use cmd
+
enter
to execute a line
As of webR 0.2.3
, webr
does not support smart execution, so for multiline code highlight the entire section before hitting cmd
+
enter
for
Loopwhile
loops are a good skill to know1apply
Functionsapply
Functions (cont’d)pbapply
package if you regularly parallelize your code.Make a column in the data created below (a subset of the weather station data) for week. When X
(this is the row ID or day of the month) is 1-7, Week
should be 1; when X
(day) is 8-14, Week
should be 2; when X
(day) is 15-21, Week
should be 3; when X
(day) is 22-28, Week
should be 4; and for X
(day) from 28-30, Week
should be assigned a value of NA
.
Functions are like a grammatically correct sentences; they require arguments, a body, and an environment1.
x
or a return(x)
) in order to return output from x
to the global environmentformals()
, body()
and environment()
NULL
Conditional Execution (1)message()
, warning()
, and stop()
NULL
Conditional Execution (2)missing()
...
dot-dot-dot (1)...
` argument is also known as an Ellipsis or simply dot-dot-dot....
dot-dot-dot (2)`...
` has pros and cons, for more info see here
...
dot-dot-dot (3)So why does f5
work?
“An Anonymous Function (also known as a lambda expression) is a function definition that is not bound to an identifier. That is, it is a function that is created and used, but never assigned to a variable” (see link)
base
R anonymous function syntax:
purrr
’s anonymous function syntax:
base
Rbase
RFunction | Description |
---|---|
apply(X, MARGIN, FUN, ...) |
Applies a function over the margins (rows or columns) of an array or matrix. |
sapply(X, FUN, ...) |
Simplifies the result of lapply() by attempting to reduce the result to a vector, matrix, or higher-dimensional array. |
vapply(X, FUN, FUN.VALUE) |
Similar to sapply() , but with a specified type of return value, making it safer and faster by avoiding unexpected type coercion. |
lapply(X, FUN, ...) |
Applies a function to each element of a list or vector and returns a list. |
tapply(X, INDEX, FUN = NULL) |
Applies a function over subsets of a vector, array, or data frame, split by the levels of a factor or list of factors |
do.call(what, args, ...) |
constructs and executes a function call from a name or a function and a list of arguments to be passed to it |
mapply(FUN, ...) |
A multivariate version of sapply() , applies a function to the 1st elements of each argument, then the 2nd elements of each argument, and so on. |
Map(f, ...) |
Similar to mapply() but always returns a list, regardless of the output type. |
Reduce(f, x, init, ...) |
Applies a function successively to elements of a vector from left to right so as to reduce the vector to a single value. |
base
R Examples (1)base
R Examples (2)purrr
purrr
Function | Description |
---|---|
map(.x, .f, ...) |
Applies a function to each element of a list or vector and returns a list. Useful for operations on list elements. |
map2(.x, .y, .f, ...) |
Applies a function to the corresponding elements of two vectors/lists, useful for element-wise operations on two inputs. |
pmap(.l, .f, ...) |
Applies a function to each element of a list or vector in parallel, taking multiple arguments from parallel lists or vectors. |
reduce(.x, .f, ..., .init, .) |
Reduces a list or vector to a single value by iteratively applying a function that takes two arguments. |
_dbl, _int _chr , _lgl and _vec |
map, map2 and pmap variants to change output type, e.g., map_dbl , map_int , map_chr , map_lgl , map_vec , map2_dbl ... |
purrr
Examplesfurrr
Parallel Computing is not a magic bullet. Performance depends on Overhead of Parallelization, Task Granularity, and whether or not the task is sequential
seq_func
, the internals of seq_func
cannot not be parallized, but the call to seq_func
can be parallized.random_walk()
source codelibrary(furrr)
library(tictoc)
nworkers = parallel::detectCores() - 1 # select nworkers to amount of cores - 1
random_walk <- function(steps) {
position <- numeric(steps) # Initialize the position vector
position[1] <- 0 # Start at the origin
for (i in 2:steps) { # Simulate each step of the walk
if (runif(1) < 0.5) {
position[i] <- position[i - 1] + 1 # Move forward
} else {
position[i] <- position[i - 1] - 1 # Move backward
}
}
return(position)
}
steps = 10000; n_random_walks = 300 # Define the number of steps and walks
future::plan(multisession, workers = 1) # setting num of cores/workers
tic() # Measure time taken to execute the random walk
set.seed(1); walks = future_map(1:n_random_walks , ~random_walk(steps),.options = furrr_options(seed = TRUE))
toc() # 3.088 sec elapsed
tic()
future::plan(multisession, workers = nworkers) # setting num of cores/workers
set.seed(1);walks = future_map(1:n_random_walks , ~random_walk(steps),.options = furrr_options(seed = TRUE))
toc() # 1.713 sec elapsed
pdf("random_walks.pdf")
invisible(
lapply(1:10, function(i)
plot(walks[[i]],type = "l", ylab = "Position", xlab = "Step",
main = paste("Random Walk",i)))
);dev.off()
boot()
and samp.o()
boot <- function(x, B = 5000, m, theta.f, w = 1, rdist, ...) {
plan(multisession, workers = w) # Set up for parallel execution
b_indices <- 1:B # vector of indices for bootstrapping iterations
iterate_func <- function(b) { # apply for each bootstrap iteration
if (m == "p") {
d.b <- rdist(...) # parametric bootstrap
} else if (m == "np") {
d.b <- x[sample(1:length(x), replace = TRUE)] # nonparametric bootstrap
} else {
stop("possible arguments for m is 'p' (parametric) or 'np' (nonparametric)")
}
theta.f(d.b)
}
# future_map_dbl to apply iterate_func over each index in parallel with proper seeding
t.s <- future_map_dbl(b_indices, iterate_func, .options = furrr_options(seed = TRUE))
samp.o(t.s) # Summarize the bootstrap results
}
samp.o = function(t.s) {
round(c(mean=mean(t.s),sd=sd(t.s),lower=quantile(t.s, 0.025, names = F),
upper= quantile(t.s, 0.975, names = F)),digits=6)}
library(purrr)
library(future)
library(tictoc)
# boot <- function(x, B = 5000, m, theta.f, w = 1, rdist, ...) {} # see above
# samp.o = function(t.s) {} # see above
theta.f = function(d.b) {p = sum(d.b)/n; p/(1-p)}
set.seed(1); n = 800000; y = 480; B = 5000
data <- c(rep(1, y), rep(0, n-y)); phat <- sum(data)/n
webr
!!