Efficient `reduce` in R

November 28, 2018

Combining rows Combining columns When you need to do a Reduce operation on a list, it’s more efficient to use do.call(). Combining rows a = lapply(1:100, rnorm, n = 50) microbenchmark(Reduce(rbind, a), do.call(rbind, a)) %>% boxplot(unit = 'ms', boxwex=0.2) Combining columns microbenchmark(Reduce(cbind, a), do.call(cbind, a)) %>% boxplot(unit = 'ms', boxwex = 0.2)

Importance sampling

August 7, 2018

An example Formalization References \[ \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\E}{\mathbb{E}} \] An example The example below is taken from [1] Let \(X\) be a random variable with uniform distribution in \([0,10]\), \[ X \sim Uniform(0,10) \] Consider the function \(h(x) = 10 e^{-2|x-5|}\). Suppose we want to calculate \(\E_X[h(X)]\). By definition, \[\begin{align} \E_X[h(X)] &= \int_{0}^{10} h(x) f(x)dx \\ &= \int_{0}^{10} exp(-2|x-5|) dx \end{align}\] A straightforward way to do this is sampling \(X_i\) from the uniform(0,10) density and calculating the mean of \(10\cdot h(X_i)\)

SVD of rank-deficient matrices

April 18, 2018

\[ \newcommand{\matrix}[1]{\mathbf{#1}} \] Let \(\matrix{A}\) be a data set of \(m\) points in \(\mathbb{R}^d\). One application of SVD is to create a compressed representation of \(\matrix{A}\). Rank-\(k\) approximation of \(A\) is created by calculating the singular value decomposition of \(\matrix{A}\) \[ \matrix{A} = \matrix{U}\matrix{\Sigma}{\matrix{V}} \] and reconstruct it with \(k \leq d\) first singular values. \[ \matrix{A_k} = \matrix{U_k}\matrix{\Sigma_k}\matrix{V_k^T} \] SVD in R Each implementation of SVD has some varieties in the output representation.

Parallel Pearson Correlation

November 9, 2017

Motivation Parallelization Simple parallelization: one variable per worker Massive parallel: chunk of pairs per worker Motivation Let’s look at the time it takes to calculate all pairwise correlation for \(n\) variable, with \(m\)=200 samples. n dt 1e+02 1.433043e+00 1e+03 1.359290e+02 2e+03 5.371534e+02 1e+05 1.230446e+06 Given the timing above, and the extrapolated timing for \(10^{5}\) genes, which is roughly the order of number of genes/transcripts in a transcriptomic profile, it would take 14.

Allocation cost

September 17, 2017

Many high-level programming languages allow their users to afford the luxury of extending an existing matrix or vector. The question is, how luxury it can be? The two functions below both return an \(m \times n\) matrix, by calling a random generator \(n\) times. The first function does that by initializing the whole matrix with zeros, and filling the values until finish. The second extends the results one column at a time.

data.table subsetting

March 2, 2017

The data.table package supports a powerful syntax to select rows and columns. Selecting a single column library(data.table) data("iris") iris = iris[sample.int(nrow(iris),size=10,replace = FALSE),] DT = data.table(iris) DT ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1: 4.9 3.0 1.4 0.2 setosa ## 2: 4.6 3.6 1.0 0.2 setosa ## 3: 7.2 3.0 5.8 1.6 virginica ## 4: 5.4 3.4 1.5 0.4 setosa ## 5: 6.7 3.1 5.6 2.4 virginica ## 6: 5.

Non-trivial operation on data.table columns

March 1, 2017

This note explores the use of data.table package to calculate pairwise correlation between columns, with iris data set as example. library(data.table) DT = data.table(iris) The iris data is now data.table-ized ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1: 5.1 3.5 1.4 0.2 setosa ## 2: 4.9 3.0 1.4 0.2 setosa ## 3: 4.7 3.2 1.3 0.2 setosa ## 4: 4.6 3.1 1.5 0.2 setosa ## 5: 5.0 3.6 1.4 0.2 setosa ## --- ## 146: 6.

NEWER POSTS
OLDER POSTS
page 2 of 4

Efficient `reduce` in R

Importance sampling

SVD of rank-deficient matrices

Parallel Pearson Correlation

Allocation cost

data.table subsetting

Non-trivial operation on data.table columns

Trang Tran

Transforming normal to uniform distribution

A tabulated list of Markdown editors

Speeding up random sampling of an array in R

PCA, SVD and Eigen decomposition

Sweep vs Matrix multiplication

SVD in different languages

Multi-core parallel computing in Julia

Affinity propagation - step by step

Configure Nginx as a reverse proxy for Rstudio server

Fast SVD