The definition of a(), b(), and sw() below achieve the same effect with different implementations: if else, nested if, and switch a <- function(x) { if (x == 'A') { paste('Apple') } else if (x == 'R') { paste('Ready') } else if (x == 'N') { paste('Novel') } else if (x == 'G') { paste('Ginger') } else { paste("Bingo") } } b <- function(x) { ifelse (x == 'A', paste('Apple'), ifelse(x == 'R', paste('Ready'), ifelse(x == 'N', paste('Novel'), ifelse(x == 'G', paste('Ginger'), paste("Bingo"))))) } sw <- function(x) { switch (x, A = paste('Apple'), R = paste('Ready'), N = paste('Novel'), G = paste('Ginger'), "Bingo" ) } Timing microbenchmark(a('R'), b('R'), sw('R'), times=1000) ## Unit: microseconds ## expr min lq mean median uq max neval ## a("R") 1.

Continue reading

String concatenation can be done with paste, paste0, sprintf, or .Internal(sprintf). sprintf vs paste There’s not much difference in performance among these options. x = 3 y = 5.7 microbenchmark(paste0("The value of x is ", x, " and y is ", y, "."), sprintf("The value of x is %s and y is %s.", x, y),times = 1000, unit = 's') ## Unit: seconds ## expr min lq ## paste0("The value of x is ", x, " and y is ", y, ".

Continue reading

This note compares the performance of 2 methods for calculating Pearson correlation: 1. R stats::cor function 2. WGCNA::cor function (or corFast) SparkR (1.6) provides a function corr to calculate the Pearson correlation between two columns of a data frame, but not between every pair of columns in a data frame. We would need to use Scala/Python interface for that. Correlation at 100 data points library(WGCNA) enableWGCNAThreads(nThreads = 32) ## Allowing parallel execution with up to 32 working processes.

Continue reading

a = matrix(rnorm(10000^2),ncol=10000) Ns = c(1000, 5000,10000) times = data.frame(list(N=Ns)) t1 = sapply(Ns, function(n) {system.time(image(a[1:n,1:n],useRaster=T))['elapsed']}) times$raster <- t1

Continue reading

\(k^{th}\)-nearest neighbor Entropy estimator Nearest neighbor estimator \[ H(X) \text{(nats)} \approx -\psi(1) + \psi(N) + \frac{1}{N-1} + \ln c_d + \frac{1}{N}\sum\limits_{i=1}^{N} \ln (d_1(x_i)) \] in which \(\psi(x) = \frac{\Gamma '}{\Gamma}\) is the gamma function \(\psi(1) = -e = -0.5772156...\) (Euler-Mascheroni constant) \(c_d\) is the volume of the d-dimensional unit sphere \(d_1(x_i)\) is the distance from \(x_i\) to its nearest neighbor \(k^{th}\)-nearest neighbor estimator \[ H(X) \approx -\psi(\color{red}{k}) + \psi(N) + \frac{1}{N-1} + \ln c_d + \frac{1}{N}\sum\limits_{i=1}^{N} \ln (d_{\color{red}{k}}(x_i)) \]

Continue reading

See Lecture by (???). Kernel density estimation Given data set sampled from the Gaussian distribution \(\mathcal{N}(0,1)\) gauss.kernel <- function(x) { return(1/sqrt(2*pi)* exp(-(x^2)/2)) } kde <- function(x,observations, h,kernel) { s = sapply(x, function(i) { (sum(kernel((i-observations)/h))) }) return(s/length(observations)/h) } experiment <- function(n,bandwidths=c(0.5,1,5)) { obs = rnorm(n,0,1) x = -300:300/ 100 hist(obs,freq=F,breaks=12,main=paste('KDE on',n,'samples'),ylim=c(0,1.)) bandwidths = c(bandwidths, 1.06*sd(obs) *(n^(-0.2))) lines(x,dnorm(x,0,1),col=1,lw=2) for (j in 1:length(bandwidths)) { lines(x,kde(x,obs,h=bandwidths[j],gauss.kernel),col=j+1,lw=2) } legend('topright',c('Original distribution', paste('Gaussian KDE, h=',bandwidths)),col=c(1:(length(bandwidths)+1)),lwd=2) } References

Continue reading

Recursive indexing failed

An example where recursive indexing failed error is not clear mydf = list(a=c(1,2,3,5,7,11,13), b=c(1,3,5,7,9,11,13), c=c(2,4,6,8,10,12,14)) Indexing of mydf is done by list indexing syntax mydf$a ## [1] 1 2 3 5 7 11 13 mydf[['b']] ## [1] 1 3 5 7 9 11 13 However, inside an outer function, it does not work k = names(mydf) myfunc = function(x,y) { return(length(mydf[[x]]) + length(mydf[[y]])) } tryCatch(expr = { outer(k,k,myfunc) }, error= function(e) { print(e) }) ## <simpleError in mydf[[x]]: recursive indexing failed at level 2 ## > Similarly with sapply

Continue reading

Author's picture

Trang Tran


Student

USA