Tensorflow 1.6-rc1 on macOS High Sierra 10.13.3 with GPU Acceleration
source: https://byai.io/howto-tensorflow-1-6-on-mac-with-gpu-acceleration/
Caution - you have to disable SIP (System Integrity Protection)
First, check your macOS build version. You can inspect the build version with the shell $ system_profiler SPSoftwareDataType
or by open the System-Profiler and clicking on the macOS version string.
With the build version in hand, grab the appropriate driver from NVIDIA and install it.
macOS High Sierra Version 10.13.3 (17D102) https://images.nvidia.com/mac/pkg/387/WebDriver-387.
macOS High Sierra Version 10.13.3 (17D47) https://images.nvidia.com/mac/pkg/387/WebDriver-387.
At this point a big thank to the egpu.io community. If you encounter any issues depending on the eGPU installation or if you need a recommendation for an enclosure or graphics card, I highly recommend you to visit the egpu.io website.
csrutil disable
caution! In case of incompatibility there is a chance, that you will not be able to boot after installing the eGPU Support file.
Here you can boot into recovery mode and delete the following file /Library/Extensions/NVDAEGPUSupport.kext
So again, depending on your macOS build version, you will need to pick the correct file and install it.
macOS High Sierra Version 10.13.3 (17D102) nvidia-egpu-v7.zip
macOS High Sierra Version 10.13.3 (17D47) nvidia-egpu-v7.zip
Did the Mac boot gracefully? Great! Now it is time to shut down your Mac and attach your eGPU enclosure. Boot again an check if the GPU is available within the System-Profiler.
If not also available, this installation process will also install the latest Apple Command-Line-Tools
$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
I recommend using pyenv for installing Python. On top of that, I will use pyenv-virtualenv to create a virtual environment for the custom build.
$ brew update
$ brew install pyenv pyenv-virtualenv
# add to bottom of `.bash_profile`
if command -v pyenv 1>/dev/null 2>&1; then
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
$ source .bash_profile
# install Python
$ pyenv install 3.6.0
# create virtualenv
$ pyenv virtualenv 3.6.0 tensorflow-gpu
$ pyenv activate tensorflow-gpu
$ pip install six numpy wheel
$ brew install coreutils
Do not install bazel with Homebrew
$ mkdir ~/temp && cd ~/temp
$ curl -L https://github.com/bazelbuild/bazel/releases/download/0.8.1/bazel-0.8.1-installer-darwin-x86_64.sh -o bazel-0.8.1-installer-darwin-x86_64.sh
$ chmod +x bazel-0.8.1-installer-darwin-x86_64.sh
$ ./bazel-0.8.1-installer-darwin-x86_64.sh
Because we need an older version of clang, unfortunately, we have to downgrade to an older version of the Apple Command-Line-Tools.
You can download the older version 8.3.2 directly from the Apple Developer Portal or from Xcode itself Xcode -> Support -> Apple Developer.
$ sudo mv /Library/Developer/CommandLineTools /Library/Developer/CommandLineTools_backup
$ sudo xcode-select --switch /Library/Developer/CommandLineTools
$ vim ~/.bash_profile
# add to .bash_profile
export PATH=/usr/local/cuda/bin:/Developer/NVIDIA/CUDA-9.1/bin${PATH:+:${PATH}}
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:/Developer/NVIDIA/CUDA-9.1/lib
$ source ~/.bash_profile
Let`s quick check if the driver is loaded.
$ kextstat | grep -i cuda
164 0 0xffffff7f83c65000 0x2000 0x2000 com.nvidia.CUDA (1.1.0) 4329B052-6C8A-3900-8E83-744487AEDEF1 <4 1>
We want to compile some CUDA sample to check if the GPU is correctly recognized and supported.
$ cp -R /Developer/NVIDIA/CUDA-9.1/samples ~/temp/cuda_samples
$ cd ~/temp/cuda_samples/
$ make -C 1_Utilities/deviceQuery
# execute sample
$ ~/temp/cuda_samples/bin/x86_64/darwin/release/deviceQuery
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1060 6GB"
CUDA Driver Version / Runtime Version 9.1 / 9.1
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 6144 MBytes (6442254336 bytes)
(10) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores
GPU Max Clock rate: 1709 MHz (1.71 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 195 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
If not already done, register at https://developer.nvidia.com/cudnn Download cuDNN 7.0.5[1]
Change into your download directory and follow the post installation steps.
$ tar -xzvf cudnn-9.1-osx-x64-v7-ga.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*
$ cd ~/temp
$ git clone https://github.com/tensorflow/tensorflow
$ cd tensorflow
$ git checkout v1.6.0-rc1
Unfortunately, with the repo untouched, it will fail to build.
I created a patch to occur this. Grab it from Github and apply it.
$ git apply tensorflow_v1.6.0-rc1_osx.patch
Except CUDA support, CUDA SDK version and Cuda compute capabilities, I left the other settings untouched.
$ ./configure
You have bazel 0.8.1 installed.
Please specify the location of python. [Default is /Users/user/.pyenv/versions/tensorflow-gpu/bin/python]:
Found possible Python library paths:
Please input the desired Python library path to use. Default is [/Users/user/.pyenv/versions/tensorflow-gpu/lib/python3.6/site-packages]
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Apache Kafka Platform support? [y/N]: n
No Apache Kafka Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.
Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1
Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]:
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]6.1
Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
Configuration finished
$ export CUDA_HOME=/usr/local/cuda
# (of course USERNAME is your Mac Username)
$ export DYLD_LIBRARY_PATH=/Users/USERNAME/lib:/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib
You can download my wheel file. But hey, no build no fun ;-) Build duration on my machine was about one hour.
$ bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
$ cd ~
$ pip install /tmp/tensorflow_pkg/tensorflow-1.6.0rc0-cp36-cp36m-macosx_10_13_x86_64.whl
Get it a shoot an open the python interpreter...
>>> import tensorflow as tf
>>> tf.__version__
>>> if.Session()
tensorflow/core/common_runtime/gpu/gpu_device.cc:1331] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7085
pciBusID: 0000:c3:00.0
totalMemory: 6.00GiB freeMemory: 5.91GiB
tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5699 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:c3:00.0, compute capability: 6.1)
Finally, we will use Theano, Keras and TensorFlow to test the GPU acceleration.
$ pip install git+git://github.com/Theano/Theano.git
$ pip install keras
$ cd ~/temp
$ git clone https://github.com/fchollet/keras.git
$ cd keras/examples
# Run in CPU mode
$ THEANO_FLAGS=mode=FAST_RUN python imdb_cnn.py
# Run in GPU mode
$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python imdb_cnn.py
25000/25000 [==============================] - 15s 595us/step - loss: 0.4028 - acc: 0.8008 - val_loss: 0.3038 - val_acc: 0.8690
Epoch 2/2
25000/25000 [==============================] - 10s 387us/step - loss: 0.2298 - acc: 0.9072 - val_loss: 0.2858 - val_acc: 0.8817
Detailed installation instructions are available at: cuDNN-Installation-Guide.pdf ↩
From 5ae119ee950c036619342b462aacff23769e2343 Mon Sep 17 00:00:00 2001
From: Damian Broncel <damian@tests-MBP.localdomain>
Date: Fri, 9 Mar 2018 14:26:17 +0100
Subject: [PATCH 1/3] eigen_archive url changed
tensorflow/workspace.bzl | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/tensorflow/workspace.bzl b/tensorflow/workspace.bzl
index 14a4281fae..aa4b05af55 100644
--- a/tensorflow/workspace.bzl
+++ b/tensorflow/workspace.bzl
@@ -120,11 +120,11 @@ def tf_workspace(path_prefix="", tf_repo_name=""):
name = "eigen_archive",
urls = [
- "https://mirror.bazel.build/bitbucket.org/eigen/eigen/get/2355b229ea4c.tar.gz",
- "https://bitbucket.org/eigen/eigen/get/2355b229ea4c.tar.gz",
+ "https://mirror.bazel.build/bitbucket.org/dtrebbien/eigen/get/374842a18727.tar.gz",
+ "https://bitbucket.org/dtrebbien/eigen/get/374842a18727.tar.gz",
- sha256 = "0cadb31a35b514bf2dfd6b5d38205da94ef326ec6908fc3fd7c269948467214f",
- strip_prefix = "eigen-eigen-2355b229ea4c",
+ sha256 = "fa26e9b9ff3a2692b092d154685ec88d6cb84d4e1e895006541aff8603f15c16",
+ strip_prefix = "dtrebbien-eigen-374842a18727",
build_file = str(Label("//third_party:eigen.BUILD")),
@@ -353,11 +353,11 @@ def tf_workspace(path_prefix="", tf_repo_name=""):
name = "protobuf_archive",
urls = [
- "https://mirror.bazel.build/github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
- "https://github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
+ "https://mirror.bazel.build/github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz",
+ "https://github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz",
- sha256 = "846d907acf472ae233ec0882ef3a2d24edbbe834b80c305e867ac65a1f2c59e3",
- strip_prefix = "protobuf-396336eb961b75f03b25824fe86cf6490fb75e3a",
+ sha256 = "eb16b33431b91fe8cee479575cee8de202f3626aaf00d9bf1783c6e62b4ffbc7",
+ strip_prefix = "protobuf-50f552646ba1de79e07562b41f3999fe036b4fd0",
# We need to import the protobuf library under the names com_google_protobuf
2.11.0 (Apple Git-81)
From 067ab153a782ded0d812e832b957b6072223825a Mon Sep 17 00:00:00 2001
From: Damian Broncel <damian@tests-MBP.localdomain>
Date: Fri, 9 Mar 2018 14:29:08 +0100
Subject: [PATCH 2/3] remove __align__(sizeof(T)) entries
tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc | 2 +-
tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc | 8 ++++----
tensorflow/core/kernels/split_lib_gpu.cu.cc | 2 +-
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc b/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
index 0f7adaf24a..8d89c66f3f 100644
--- a/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
+++ b/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
@@ -69,7 +69,7 @@ __global__ void concat_variable_kernel(
IntType num_inputs = input_ptr_data.size;
// verbose declaration needed due to template
- extern __shared__ __align__(sizeof(T)) unsigned char smem[];
+ extern __shared__ unsigned char smem[];
IntType* smem_col_scan = reinterpret_cast<IntType*>(smem);
if (useSmem) {
diff --git a/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc b/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
index 505d33046e..9bed380f38 100644
--- a/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
+++ b/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
@@ -172,7 +172,7 @@ __global__ __launch_bounds__(1024, 2) void DepthwiseConv2dGPUKernelNHWCSmall(
const DepthwiseArgs args, const T* input, const T* filter, T* output) {
// Holds block plus halo and filter data for blockDim.x depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
@@ -450,7 +450,7 @@ __global__ __launch_bounds__(1024, 2) void DepthwiseConv2dGPUKernelNCHWSmall(
const DepthwiseArgs args, const T* input, const T* filter, T* output) {
// Holds block plus halo and filter data for blockDim.z depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
@@ -1099,7 +1099,7 @@ __launch_bounds__(1024, 2) void DepthwiseConv2dBackpropFilterGPUKernelNHWCSmall(
const DepthwiseArgs args, const T* output, const T* input, T* filter) {
assert(CanLaunchDepthwiseConv2dBackpropFilterGPUSmall(args, blockDim.z));
// Holds block plus halo and filter data for blockDim.x depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
@@ -1367,7 +1367,7 @@ __launch_bounds__(1024, 2) void DepthwiseConv2dBackpropFilterGPUKernelNCHWSmall(
const DepthwiseArgs args, const T* output, const T* input, T* filter) {
assert(CanLaunchDepthwiseConv2dBackpropFilterGPUSmall(args, blockDim.x));
// Holds block plus halo and filter data for blockDim.z depths.
- extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+ extern __shared__ unsigned char shared_memory[];
T* const shared_data = reinterpret_cast<T*>(shared_memory);
const int num_batches = args.batch;
diff --git a/tensorflow/core/kernels/split_lib_gpu.cu.cc b/tensorflow/core/kernels/split_lib_gpu.cu.cc
index 9f234fc093..5115a96d17 100644
--- a/tensorflow/core/kernels/split_lib_gpu.cu.cc
+++ b/tensorflow/core/kernels/split_lib_gpu.cu.cc
@@ -119,7 +119,7 @@ __global__ void split_v_kernel(const T* input_ptr,
int num_outputs = output_ptr_data.size;
// verbose declaration needed due to template
- extern __shared__ __align__(sizeof(T)) unsigned char smem[];
+ extern __shared__ unsigned char smem[];
IntType* smem_col_scan = reinterpret_cast<IntType*>(smem);
if (useSmem) {
2.11.0 (Apple Git-81)
From 7f108ef6b616b29607b30db10e436cee02e371e8 Mon Sep 17 00:00:00 2001
From: Damian Broncel <damian@tests-MBP.localdomain>
Date: Fri, 9 Mar 2018 14:30:39 +0100
Subject: [PATCH 3/3] comment out lgomp
third_party/gpus/cuda/BUILD.tpl | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/third_party/gpus/cuda/BUILD.tpl b/third_party/gpus/cuda/BUILD.tpl
index 2a37c65bc7..61b203e005 100644
--- a/third_party/gpus/cuda/BUILD.tpl
+++ b/third_party/gpus/cuda/BUILD.tpl
@@ -110,7 +110,7 @@ cc_library(
- linkopts = ["-lgomp"],
+ # linkopts = ["-lgomp"],
linkstatic = 1,
visibility = ["//visibility:public"],
2.11.0 (Apple Git-81)