Speeding up random sampling of an array in R

November 15, 2020

Problem statement Given a set $S = {s_1, s_2, \dots, s_n}$, one would like to sample a subset of $X \subset S$ of size $m$. If this operation needs to be repeated for a very large number of times $k$, what is the most efficient way? set_S = c(1:100) microbenchmark::microbenchmark(sample(set_S, size = 50), times = 10) ## Unit: microseconds ## expr min lq mean median uq max neval ## sample(set_S, size = 50) 5.

Sweep vs Matrix multiplication

April 25, 2020

Sweeping along an axis can be represented by matrix multiplication. Given the matrix $A$ and diagonal matrix $D$, $DA$ is equivalent to multiplying each row $i$ of $A$ by $d_{ii}$, and $AD$ is equivalent to multiplying each column $j$ of $A$ by $d_{jj}$ A = matrix(runif(50000),ncol=100) w = apply(A, 1, norm, '2') all(abs(sweep(A,1, w, '/') - (diag(1/w) %*% A) ) < .Machine$double.eps) ## [1] TRUE It is reasonably expected that the sweeping operation on invidual row/column vector will be more efficient than the equivalent matrix operation, because no additional memory will be required to store the non-diagonal entries of $D$.

Efficient `reduce` in R

November 28, 2018

Combining rows Combining columns When you need to do a Reduce operation on a list, it’s more efficient to use do.call(). Combining rows a = lapply(1:100, rnorm, n = 50) microbenchmark(Reduce(rbind, a), do.call(rbind, a)) %>% boxplot(unit = 'ms', boxwex=0.2) Combining columns microbenchmark(Reduce(cbind, a), do.call(cbind, a)) %>% boxplot(unit = 'ms', boxwex = 0.2)

Allocation cost

September 17, 2017

Many high-level programming languages allow their users to afford the luxury of extending an existing matrix or vector. The question is, how luxury it can be? The two functions below both return an $m \times n$ matrix, by calling a random generator $n$ times. The first function does that by initializing the whole matrix with zeros, and filling the values until finish. The second extends the results one column at a time.

.Internal(sample)

July 7, 2016

.Internal(sample()) requires explicitly 4 arguments in order: n, size, replacement, probabilities If probabilities is not NULL, the first argument has to be an integer. To achieve an equivalent output as that of sample, we need to map the sampled integers back to desired values internal_boolean <- function(size,replace,prob) { s = .Internal(sample(2,size,replace, prob)) return(s<2) } internal_boolean_rle <- function(size,replace,prob) { s = .Internal(sample(2,size,replace, prob)) return(rle(s<2)) } N = 100000 probs = c(0.0001, 1-0.

How much harm can nested if's do?

July 7, 2016

The definition of a(), b(), and sw() below achieve the same effect with different implementations: if else, nested if, and switch a <- function(x) { if (x == 'A') { paste('Apple') } else if (x == 'R') { paste('Ready') } else if (x == 'N') { paste('Novel') } else if (x == 'G') { paste('Ginger') } else { paste("Bingo") } } b <- function(x) { ifelse (x == 'A', paste('Apple'), ifelse(x == 'R', paste('Ready'), ifelse(x == 'N', paste('Novel'), ifelse(x == 'G', paste('Ginger'), paste("Bingo"))))) } sw <- function(x) { switch (x, A = paste('Apple'), R = paste('Ready'), N = paste('Novel'), G = paste('Ginger'), "Bingo" ) } Timing microbenchmark(a('R'), b('R'), sw('R'), times=1000) ## Unit: microseconds ## expr min lq mean median uq max neval ## a("R") 1.

String manipulation efficiency

July 7, 2016

String concatenation can be done with paste, paste0, sprintf, or .Internal(sprintf). sprintf vs paste There’s not much difference in performance among these options. x = 3 y = 5.7 microbenchmark(paste0("The value of x is ", x, " and y is ", y, "."), sprintf("The value of x is %s and y is %s.", x, y),times = 1000, unit = 's') ## Unit: seconds ## expr min lq ## paste0("The value of x is ", x, " and y is ", y, ".

OLDER POSTS
page 1 of 2

Speeding up random sampling of an array in R

Sweep vs Matrix multiplication

Efficient `reduce` in R

Allocation cost

.Internal(sample)

How much harm can nested if's do?

String manipulation efficiency

Trang Tran

Transforming normal to uniform distribution

A tabulated list of Markdown editors

Speeding up random sampling of an array in R

PCA, SVD and Eigen decomposition

Sweep vs Matrix multiplication

SVD in different languages

Multi-core parallel computing in Julia

Affinity propagation - step by step

Configure Nginx as a reverse proxy for Rstudio server

Fast SVD