Usage Click on the scatter plot area, or type in the coordinates to add a new point, for example type -3 4 to add a point at \((-3,4)\). [To be implemented] Right-click on a point to remove it Drag the point around to change it coordinate Observe the correlation measures to see how they change with your data points Correlation measures Pearson correlation The Pearson correlation coefficient between a pair of vairable \((X,Y)\) is defined as [1]

Continue reading

This task is an embarrassingly parallel task, as explored in a previous post. import numpy as np import pandas as pd import time from scipy.stats import pearsonr from pyspark import SparkContext, SparkConf from scipy.sparse import coo_matrix ## The measurement (input data) is specified in a matrix ## samples x variables m = 150 n = 1000 measurements = np.random.rand(m*n).reshape((m,n)) nThreads = [1,2,4,6,8,10,12,14,16] dt = np.zeros(len(nThreads)) for i in range(len(nThreads)): ## Parameters NMACHINES = nThreads[i] NPARTITIONS = NMACHINES*4 conf = (SparkConf() .

Continue reading

Motivation Parallelization Simple parallelization: one variable per worker Massive parallel: chunk of pairs per worker Motivation Let’s look at the time it takes to calculate all pairwise correlation for \(n\) variable, with \(m\)=200 samples. n dt 1e+02 1.433043e+00 1e+03 1.359290e+02 2e+03 5.371534e+02 1e+05 1.230446e+06 Given the timing above, and the extrapolated timing for \(10^{5}\) genes, which is roughly the order of number of genes/transcripts in a transcriptomic profile, it would take 14.

Continue reading

This note compares the performance of 2 methods for calculating Pearson correlation: 1. R stats::cor function 2. WGCNA::cor function (or corFast) SparkR (1.6) provides a function corr to calculate the Pearson correlation between two columns of a data frame, but not between every pair of columns in a data frame. We would need to use Scala/Python interface for that. Correlation at 100 data points library(WGCNA) enableWGCNAThreads(nThreads = 32) ## Allowing parallel execution with up to 32 working processes.

Continue reading

Author's picture

Trang Tran


Student

USA