As of this writing, Julia supports three types of concurrency:
Coroutines Multi-Threading Multi-Core or Distributed Processing This post will explore multicore parallelization in Julia
Using multiple cores in julia If more than one cores are to be used in julia, it must be specified, either when starting julia, using -p <n_cpus, for example
julia -p 8 # to use 8 cores or by adding processors in an interactive session
This task is an embarrassingly parallel task, as explored in a previous post.
import numpy as np import pandas as pd import time from scipy.stats import pearsonr from pyspark import SparkContext, SparkConf from scipy.sparse import coo_matrix ## The measurement (input data) is specified in a matrix ## samples x variables m = 150 n = 1000 measurements = np.random.rand(m*n).reshape((m,n)) nThreads = [1,2,4,6,8,10,12,14,16] dt = np.zeros(len(nThreads)) for i in range(len(nThreads)): ## Parameters NMACHINES = nThreads[i] NPARTITIONS = NMACHINES*4 conf = (SparkConf() .
Motivation Parallelization Simple parallelization: one variable per worker Massive parallel: chunk of pairs per worker Motivation Let’s look at the time it takes to calculate all pairwise correlation for \(n\) variable, with \(m\)=200 samples.
n dt 1e+02 1.433043e+00 1e+03 1.359290e+02 2e+03 5.371534e+02 1e+05 1.230446e+06 Given the timing above, and the extrapolated timing for \(10^{5}\) genes, which is roughly the order of number of genes/transcripts in a transcriptomic profile, it would take 14.