On y va

Getting started with TensorFlow is relatively easy, assuming one is familiar with ANNs. TensorFlow's website provides good examples to get one started. For an example on how ANNs are implemented with TensorFlow, go straight for "Get started" => "Deep MNIST for experts".

Installation of TensorFlow on a computer without GPU support is easy, following the instructions on the website. Server installation in virtualenv, Anaconda installation using own package management system. One should note that  server installation is not visible in Anaconda if run on same system, one needs separate installs in this case.

Below, I'll describe my experiences with GPU (GeForce GTX 1070 on Ubuntu 16.04 LTS) as well as notes about running TensorFlow based programs with both GPU and CPU.

GPU installation

Installation of TensorFlow with GPU support is a bit trickier, primarily because TensorFlow relies on NVIDIA CUDA library. This in itself is not a problem, the challenge arises from the fact that TensorFlow 1.4 uses CUDA version 8 rather than the latest 9. Therefore one needs to be careful to install CUDA 8 and GPU drivers supporting it rather than the latest driver versions. Using apt package manager for NVIDIA drivers gave too much headache for me so I used runfile installation method - one of the options provided on NVIDIA website - to install both CUDA and GPU driver. Prior to installation, one needs to remove possible apt installed drivers and purge kernel parameters as described in the instructions. If one manages the install driver with apt, system updates may break GPU support in TensorFlow unless package marked as manually installed.

One needs to install cuDNN library to CUDA hierarchy as described in the instructions.

Support for CUDA 9 has been promised for TF 1.5, which hopefully solves some of these worries. An enterprising individual may also venture into compiling TF from sources, whereby one could get CUDA 9 support already now. In principle at least.

Hey, it's running now

Once your favourite version is running, it's time to do some testing. You can do some basic linear algebra first to explore the kind of performance enhancements one can achieve with TF. For CPU-only version, performance is enhanced by TF due to two reasons:
  • With properly defined computation graph, data flows within the TF and stays out of Python.
  • TF parallelizes computation across available CPU cores. Typical Python libraries don't do this, instead one has to explictly map processing to threads.
With GPU, the relative oomph of GPU compared to CPU sets an upper limit for speedup. To get the best performance from GPU, one has to pay close attention to where data structures are located: if TF needs to ship data between system RAM and GPU, performance suffers. This is particularly true if computations done in GPU are relatively lightweight. The best way to utilize GPU is to keep data local and perform largish computations. In my own testing which I'll describe later on, the largest tensor was 800 MB which fits nicely in 8 GB of GTX 1070 memory.

Momentary GPU utilization can be checked with nvidia-smi. The temporary utilization may vary a lot if GPU is twiddling its thumbs a lot while data are transferred via PCIe bus. The combined effect of PCIe bus and GPU execution from CPU viewpoint can be seen in kernel timing info, a.k.a. "sys" time. 


Popular posts from this blog

Business of Machine Learning

Latency - the new black?