BIDMach library for GPU computing with Intel parallel studio XE: simply amazing [install on MacOS and EC2]
Using Intel Parallel Studio,
there is a very great BIDMach library with very competitive results(see the benchmarks): on some tasks, one can achieve on a single GPU instance the speed of a cluster of a few hundred instances, at a cost 10 to 1000 times lower.
We’ll see in this article how to install the library on Mac OS, or run an EC2 instance with the library, as well as a few first operations. For a deeper tutorial : my next article.
Install on Mac OS
Having the library installed locally presents the great advantage to develop and test on small datasets directly on the local computer, before renting an GPU-enabled instance in the cloud.
For a CPU-only install
For a GPU install, it requires an iMac with a NVIDIA GPU and its CUDA library installed.
Since I’m using CUDA 7.5 instead of 7.0, I had to recompile JCuda, BIDMat, and BIDMach.
First download Intel parallel studio (if you need to uninstall it, run command for i in rpm -qa |
grep intel ; do sudo rpm -e $i ; done a few times). |
In bidmach file, change the CUDA version to the current one JCUDA_VERSION="0.7.5"
, augment the memory
JCUDA_VERSION="0.7.5" # Fix if needed
MEMSIZE="-Xmx12G"
and start ./bidmach
command which gives :
Everything works well, my GPU is found correctly.
./scripts/getdata.sh
./bidmach
val a = loadSMat("data/rcv1/docs.smat.lz4")
returns
a: BIDMat.SMat =
( 33, 0) 1
( 47, 0) 1
( 94, 0) 1
( 104, 0) 1
( 112, 0) 3
( 118, 0) 1
( 141, 0) 2
( 165, 0) 2
... ... ...
Let’s continue on the Quickstart tutorial :
To clear the cache :
EC2 launch
To launch an EC2 G2 (GPU-enabled) instance with BIDMach, there exists AMI with BIDMach pre-installed :
-
in the US west zone (Oregon) : EC2 AMI created by BIDMach team
-
in EU west zone (Irland), I created an AMI with BIDMach and an AMI to launch a cluster of GPU with BIDMach
First, add an EC2 permission policy to your user :
Create a EC2 security group and a keypair and start the instance from an AMI, all in the zone where the AMI lives.
In the US west zone (Oregon) :
In the EU west zone (Irland) :
Let’s download the data :
Start BIDMach with bidmach
command and you get :
Data should be available in /opt/BIDMach/data/. Let’s load the data, partition it between train and test, train the model, predict on the test set and compute the accuracy :
During training, you get
- the percentage of consumed train data,
- the negative log likelyhood,
- the gigaflops,
- the times,
- the consumed data gigabytes,
- the megabytes per seconds, and
- the occupied GPU memory
as here :
corpus perplexity=14737,915077
Predicting
3,00%, ll=-0,00783, gf=9,558, secs=0,0, GB=0,00, MB/s=436,75, GPUmem=0,70
6,00%, ll=-0,00806, gf=9,610, secs=0,0, GB=0,01, MB/s=439,78, GPUmem=0,70
10,00%, ll=-0,00804, gf=10,101, secs=0,0, GB=0,01, MB/s=462,40, GPUmem=0,70
13,00%, ll=-0,00802, gf=10,380, secs=0,0, GB=0,01, MB/s=475,39, GPUmem=0,70
16,00%, ll=-0,00813, gf=10,550, secs=0,0, GB=0,02, MB/s=483,29, GPUmem=0,70
20,00%, ll=-0,00804, gf=10,605, secs=0,0, GB=0,02, MB/s=485,10, GPUmem=0,70
23,00%, ll=-0,00793, gf=10,444, secs=0,0, GB=0,02, MB/s=477,68, GPUmem=0,70
26,00%, ll=-0,00820, gf=10,548, secs=0,1, GB=0,02, MB/s=482,65, GPUmem=0,70
30,00%, ll=-0,00797, gf=10,625, secs=0,1, GB=0,03, MB/s=486,27, GPUmem=0,70
33,00%, ll=-0,00798, gf=10,685, secs=0,1, GB=0,03, MB/s=489,04, GPUmem=0,70
36,00%, ll=-0,00795, gf=10,750, secs=0,1, GB=0,03, MB/s=492,29, GPUmem=0,70
40,00%, ll=-0,00769, gf=10,813, secs=0,1, GB=0,04, MB/s=495,43, GPUmem=0,70
43,00%, ll=-0,00811, gf=10,718, secs=0,1, GB=0,04, MB/s=491,17, GPUmem=0,70
46,00%, ll=-0,00824, gf=10,746, secs=0,1, GB=0,04, MB/s=492,30, GPUmem=0,70
50,00%, ll=-0,00798, gf=10,786, secs=0,1, GB=0,05, MB/s=494,21, GPUmem=0,70
53,00%, ll=-0,00784, gf=10,802, secs=0,1, GB=0,05, MB/s=494,82, GPUmem=0,70
56,00%, ll=-0,00809, gf=10,832, secs=0,1, GB=0,05, MB/s=496,25, GPUmem=0,70
60,00%, ll=-0,00817, gf=9,144, secs=0,1, GB=0,06, MB/s=418,94, GPUmem=0,70
63,00%, ll=-0,00765, gf=9,239, secs=0,1, GB=0,06, MB/s=423,33, GPUmem=0,70
66,00%, ll=-0,00818, gf=9,323, secs=0,1, GB=0,06, MB/s=427,19, GPUmem=0,70
70,00%, ll=-0,00779, gf=9,346, secs=0,2, GB=0,07, MB/s=428,33, GPUmem=0,70
73,00%, ll=-0,00782, gf=9,418, secs=0,2, GB=0,07, MB/s=431,64, GPUmem=0,70
76,00%, ll=-0,00761, gf=9,494, secs=0,2, GB=0,07, MB/s=435,24, GPUmem=0,70
80,00%, ll=-0,00806, gf=9,555, secs=0,2, GB=0,07, MB/s=438,00, GPUmem=0,70
83,00%, ll=-0,00791, gf=9,559, secs=0,2, GB=0,08, MB/s=438,16, GPUmem=0,70
86,00%, ll=-0,00812, gf=9,616, secs=0,2, GB=0,08, MB/s=440,77, GPUmem=0,70
90,00%, ll=-0,00817, gf=9,666, secs=0,2, GB=0,08, MB/s=443,01, GPUmem=0,70
93,00%, ll=-0,00797, gf=9,711, secs=0,2, GB=0,09, MB/s=445,04, GPUmem=0,70
96,00%, ll=-0,00817, gf=9,757, secs=0,2, GB=0,09, MB/s=447,12, GPUmem=0,70
100,00%, ll=-0,00799, gf=9,705, secs=0,2, GB=0,09, MB/s=444,77, GPUmem=0,70
Time=0,2090 secs, gflops=9,71
The accuracies are :
0,99035
0,92883
0,98513
0,98612
0,95681
0,96348
...
To get the training options :
Command GPUmem
gives you percentage of used memory, free memory and memory capacity :
(Float, Long, Long) = (0.69568384,2987802624,4294770688)
Stop the instance :
To get an updated AMI with the new version of BIDMach and Cuda 7.5, have a look at my article about new AMI.
Well done!