Problem statement Given a set \(S = {s_1, s_2, \dots, s_n}\), one would like to sample a subset of \(X \subset S\) of size \(m\). If this operation needs to be repeated for a very large number of times \(k\), what is the most efficient way? set_S = c(1:100) microbenchmark::microbenchmark(sample(set_S, size = 50), times = 10) ## Unit: microseconds ## expr min lq mean median uq max neval ## sample(set_S, size = 50) 5.

Continue reading

Sweeping along an axis can be represented by matrix multiplication. Given the matrix \(A\) and diagonal matrix \(D\), \(DA\) is equivalent to multiplying each row \(i\) of \(A\) by \(d_{ii}\), and \(AD\) is equivalent to multiplying each column \(j\) of \(A\) by \(d_{jj}\) A = matrix(runif(50000),ncol=100) w = apply(A, 1, norm, '2') all(abs(sweep(A,1, w, '/') - (diag(1/w) %*% A) ) < .Machine$double.eps) ## [1] TRUE It is reasonably expected that the sweeping operation on invidual row/column vector will be more efficient than the equivalent matrix operation, because no additional memory will be required to store the non-diagonal entries of \(D\).

Continue reading

Efficient `reduce` in R

Combining rows Combining columns When you need to do a Reduce operation on a list, it’s more efficient to use do.call(). Combining rows a = lapply(1:100, rnorm, n = 50) microbenchmark(Reduce(rbind, a), do.call(rbind, a)) %>% boxplot(unit = 'ms', boxwex=0.2) Combining columns microbenchmark(Reduce(cbind, a), do.call(cbind, a)) %>% boxplot(unit = 'ms', boxwex = 0.2)

Continue reading

Allocation cost

Many high-level programming languages allow their users to afford the luxury of extending an existing matrix or vector. The question is, how luxury it can be? The two functions below both return an \(m \times n\) matrix, by calling a random generator \(n\) times. The first function does that by initializing the whole matrix with zeros, and filling the values until finish. The second extends the results one column at a time.

Continue reading

.Internal(sample)

.Internal(sample()) requires explicitly 4 arguments in order: n, size, replacement, probabilities If probabilities is not NULL, the first argument has to be an integer. To achieve an equivalent output as that of sample, we need to map the sampled integers back to desired values internal_boolean <- function(size,replace,prob) { s = .Internal(sample(2,size,replace, prob)) return(s<2) } internal_boolean_rle <- function(size,replace,prob) { s = .Internal(sample(2,size,replace, prob)) return(rle(s<2)) } N = 100000 probs = c(0.0001, 1-0.

Continue reading

The definition of a(), b(), and sw() below achieve the same effect with different implementations: if else, nested if, and switch a <- function(x) { if (x == 'A') { paste('Apple') } else if (x == 'R') { paste('Ready') } else if (x == 'N') { paste('Novel') } else if (x == 'G') { paste('Ginger') } else { paste("Bingo") } } b <- function(x) { ifelse (x == 'A', paste('Apple'), ifelse(x == 'R', paste('Ready'), ifelse(x == 'N', paste('Novel'), ifelse(x == 'G', paste('Ginger'), paste("Bingo"))))) } sw <- function(x) { switch (x, A = paste('Apple'), R = paste('Ready'), N = paste('Novel'), G = paste('Ginger'), "Bingo" ) } Timing microbenchmark(a('R'), b('R'), sw('R'), times=1000) ## Unit: microseconds ## expr min lq mean median uq max neval ## a("R") 1.

Continue reading

String concatenation can be done with paste, paste0, sprintf, or .Internal(sprintf). sprintf vs paste There’s not much difference in performance among these options. x = 3 y = 5.7 microbenchmark(paste0("The value of x is ", x, " and y is ", y, "."), sprintf("The value of x is %s and y is %s.", x, y),times = 1000, unit = 's') ## Unit: seconds ## expr min lq ## paste0("The value of x is ", x, " and y is ", y, ".

Continue reading

Author's picture

Trang Tran


Student

USA