.Internal(sample)

July 7, 2016

.Internal(sample()) requires explicitly 4 arguments in order: n, size, replacement, probabilities If probabilities is not NULL, the first argument has to be an integer. To achieve an equivalent output as that of sample, we need to map the sampled integers back to desired values internal_boolean <- function(size,replace,prob) { s = .Internal(sample(2,size,replace, prob)) return(s<2) } internal_boolean_rle <- function(size,replace,prob) { s = .Internal(sample(2,size,replace, prob)) return(rle(s<2)) } N = 100000 probs = c(0.0001, 1-0.

Fitting negative binomial distribution and goodness-of-fit

July 7, 2016

Obtaining data Fitting with pre-determined distribution The effects of sample size Goodness-of-fit Assuming Poisson distribution Assuming NB distribution The package MASS provides a function, fitdistr to fit an observation over discrete distribution using Maximum likelihood. Obtaining data We first need to generate some data to fit. The rnegbin(n,mu,theta) function can be used to generate n samples of negative binomial with mean mu and variance mu + mu^2 / theta.

How much harm can nested if's do?

July 7, 2016

The definition of a(), b(), and sw() below achieve the same effect with different implementations: if else, nested if, and switch a <- function(x) { if (x == 'A') { paste('Apple') } else if (x == 'R') { paste('Ready') } else if (x == 'N') { paste('Novel') } else if (x == 'G') { paste('Ginger') } else { paste("Bingo") } } b <- function(x) { ifelse (x == 'A', paste('Apple'), ifelse(x == 'R', paste('Ready'), ifelse(x == 'N', paste('Novel'), ifelse(x == 'G', paste('Ginger'), paste("Bingo"))))) } sw <- function(x) { switch (x, A = paste('Apple'), R = paste('Ready'), N = paste('Novel'), G = paste('Ginger'), "Bingo" ) } Timing microbenchmark(a('R'), b('R'), sw('R'), times=1000) ## Unit: microseconds ## expr min lq mean median uq max neval ## a("R") 1.

String manipulation efficiency

July 7, 2016

String concatenation can be done with paste, paste0, sprintf, or .Internal(sprintf). sprintf vs paste There’s not much difference in performance among these options. x = 3 y = 5.7 microbenchmark(paste0("The value of x is ", x, " and y is ", y, "."), sprintf("The value of x is %s and y is %s.", x, y),times = 1000, unit = 's') ## Unit: seconds ## expr min lq ## paste0("The value of x is ", x, " and y is ", y, ".

Fast Pearson Correlation

June 27, 2016

This note compares the performance of 2 methods for calculating Pearson correlation: 1. R stats::cor function 2. WGCNA::cor function (or corFast) SparkR (1.6) provides a function corr to calculate the Pearson correlation between two columns of a data frame, but not between every pair of columns in a data frame. We would need to use Scala/Python interface for that. Correlation at 100 data points library(WGCNA) enableWGCNAThreads(nThreads = 32) ## Allowing parallel execution with up to 32 working processes.

Plotting large matrix in R

April 9, 2016

a = matrix(rnorm(10000^2),ncol=10000) Ns = c(1000, 5000,10000) times = data.frame(list(N=Ns)) t1 = sapply(Ns, function(n) {system.time(image(a[1:n,1:n],useRaster=T))['elapsed']}) times$raster <- t1

Recursive indexing failed

February 18, 2016

An example where recursive indexing failed error is not clear mydf = list(a=c(1,2,3,5,7,11,13), b=c(1,3,5,7,9,11,13), c=c(2,4,6,8,10,12,14)) Indexing of mydf is done by list indexing syntax mydf$a ## [1] 1 2 3 5 7 11 13 mydf[['b']] ## [1] 1 3 5 7 9 11 13 However, inside an outer function, it does not work k = names(mydf) myfunc = function(x,y) { return(length(mydf[[x]]) + length(mydf[[y]])) } tryCatch(expr = { outer(k,k,myfunc) }, error= function(e) { print(e) }) ## <simpleError in mydf[[x]]: recursive indexing failed at level 2 ## > Similarly with sapply

NEWER POSTS
OLDER POSTS
page 3 of 4

.Internal(sample)

Fitting negative binomial distribution and goodness-of-fit

How much harm can nested if's do?

String manipulation efficiency

Fast Pearson Correlation

Plotting large matrix in R

Recursive indexing failed

Trang Tran

Transforming normal to uniform distribution

A tabulated list of Markdown editors

Speeding up random sampling of an array in R

PCA, SVD and Eigen decomposition

Sweep vs Matrix multiplication

SVD in different languages

Multi-core parallel computing in Julia

Affinity propagation - step by step

Configure Nginx as a reverse proxy for Rstudio server

Fast SVD