.Internal(sample)

.Internal(sample()) requires explicitly 4 arguments in order: n, size, replacement, probabilities If probabilities is not NULL, the first argument has to be an integer. To achieve an equivalent output as that of sample, we need to map the sampled integers back to desired values internal_boolean <- function(size,replace,prob) { s = .Internal(sample(2,size,replace, prob)) return(s<2) } internal_boolean_rle <- function(size,replace,prob) { s = .Internal(sample(2,size,replace, prob)) return(rle(s<2)) } N = 100000 probs = c(0.0001, 1-0.

Continue reading

Obtaining data Fitting with pre-determined distribution The effects of sample size Goodness-of-fit Assuming Poisson distribution Assuming NB distribution The package MASS provides a function, fitdistr to fit an observation over discrete distribution using Maximum likelihood. Obtaining data We first need to generate some data to fit. The rnegbin(n,mu,theta) function can be used to generate n samples of negative binomial with mean mu and variance mu + mu^2 / theta.

Continue reading

The definition of a(), b(), and sw() below achieve the same effect with different implementations: if else, nested if, and switch a <- function(x) { if (x == 'A') { paste('Apple') } else if (x == 'R') { paste('Ready') } else if (x == 'N') { paste('Novel') } else if (x == 'G') { paste('Ginger') } else { paste("Bingo") } } b <- function(x) { ifelse (x == 'A', paste('Apple'), ifelse(x == 'R', paste('Ready'), ifelse(x == 'N', paste('Novel'), ifelse(x == 'G', paste('Ginger'), paste("Bingo"))))) } sw <- function(x) { switch (x, A = paste('Apple'), R = paste('Ready'), N = paste('Novel'), G = paste('Ginger'), "Bingo" ) } Timing microbenchmark(a('R'), b('R'), sw('R'), times=1000) ## Unit: microseconds ## expr min lq mean median uq max neval ## a("R") 1.

Continue reading

String concatenation can be done with paste, paste0, sprintf, or .Internal(sprintf). sprintf vs paste There’s not much difference in performance among these options. x = 3 y = 5.7 microbenchmark(paste0("The value of x is ", x, " and y is ", y, "."), sprintf("The value of x is %s and y is %s.", x, y),times = 1000, unit = 's') ## Unit: seconds ## expr min lq ## paste0("The value of x is ", x, " and y is ", y, ".

Continue reading

This note compares the performance of 2 methods for calculating Pearson correlation: 1. R stats::cor function 2. WGCNA::cor function (or corFast) SparkR (1.6) provides a function corr to calculate the Pearson correlation between two columns of a data frame, but not between every pair of columns in a data frame. We would need to use Scala/Python interface for that. Correlation at 100 data points library(WGCNA) enableWGCNAThreads(nThreads = 32) ## Allowing parallel execution with up to 32 working processes.

Continue reading

a = matrix(rnorm(10000^2),ncol=10000) Ns = c(1000, 5000,10000) times = data.frame(list(N=Ns)) t1 = sapply(Ns, function(n) {system.time(image(a[1:n,1:n],useRaster=T))['elapsed']}) times$raster <- t1

Continue reading

Recursive indexing failed

An example where recursive indexing failed error is not clear mydf = list(a=c(1,2,3,5,7,11,13), b=c(1,3,5,7,9,11,13), c=c(2,4,6,8,10,12,14)) Indexing of mydf is done by list indexing syntax mydf$a ## [1] 1 2 3 5 7 11 13 mydf[['b']] ## [1] 1 3 5 7 9 11 13 However, inside an outer function, it does not work k = names(mydf) myfunc = function(x,y) { return(length(mydf[[x]]) + length(mydf[[y]])) } tryCatch(expr = { outer(k,k,myfunc) }, error= function(e) { print(e) }) ## <simpleError in mydf[[x]]: recursive indexing failed at level 2 ## > Similarly with sapply

Continue reading

Author's picture

Trang Tran


Student

USA