As of this writing, Julia supports three types of concurrency: Coroutines Multi-Threading Multi-Core or Distributed Processing This post will explore multicore parallelization in Julia Using multiple cores in julia If more than one cores are to be used in julia, it must be specified, either when starting julia, using -p <n_cpus, for example julia -p 8 # to use 8 cores or by adding processors in an interactive session

Continue reading

This task is an embarrassingly parallel task, as explored in a previous post. import numpy as np import pandas as pd import time from scipy.stats import pearsonr from pyspark import SparkContext, SparkConf from scipy.sparse import coo_matrix ## The measurement (input data) is specified in a matrix ## samples x variables m = 150 n = 1000 measurements = np.random.rand(m*n).reshape((m,n)) nThreads = [1,2,4,6,8,10,12,14,16] dt = np.zeros(len(nThreads)) for i in range(len(nThreads)): ## Parameters NMACHINES = nThreads[i] NPARTITIONS = NMACHINES*4 conf = (SparkConf() .

Continue reading

Motivation Parallelization Simple parallelization: one variable per worker Massive parallel: chunk of pairs per worker Motivation Let’s look at the time it takes to calculate all pairwise correlation for \(n\) variable, with \(m\)=200 samples. n dt 1e+02 1.433043e+00 1e+03 1.359290e+02 2e+03 5.371534e+02 1e+05 1.230446e+06 Given the timing above, and the extrapolated timing for \(10^{5}\) genes, which is roughly the order of number of genes/transcripts in a transcriptomic profile, it would take 14.

Continue reading

Author's picture

Trang Tran


Student

USA