… the home of Uncia uncia, the Snow Leopard.
Going from Mac OS 10.5 Leopard Server to 10.6 Snow Leopard is not completely trivial, but not too painful. Snow Leopard of course has OpenCL built in (see earlier post). My main office machine is an 8 core Mac Pro with an NVidia 88oo GT graphics card. As expected, the clDeviceQuery program detects 2 devices: the 8800 GT with 14 compute units and the Xeon CPU with 8 compute units.
No real surprise, but it’s nice when it all works!
My modification of the NVidia Device Query program also compiles on a MacMini running Snow Leopard (Mac OS 10.6), which has OpenCL built in. This time, it reports that there is a twin core CPU, but no GPU. The program can be compiled with Xcode, or use the Linux Makefile, but without defining INCLUDEDIR or LIBDIR and simply setting:
CC = g++
CFLAGS = -c -Wall
LDFLAGS = -framework OpenCL
I also have a Linux box running OpenSuse. Again no GPU. I Installed the AMD drivers (I’m not sure whether this is necessary) and the SDK. This time the twin processor Core 2 was reported and no GPU. I would expect this.
So, why isn’t the CPU reported on the other machines?
First, the NVidia documentation makes no reference to the CPU, so it just isn’t supported. The list of supported devices on the AMD page includes:
X86 CPU w/ SSE 3.x or later
The FireStream card is in a box with Opterons of 2005 vintage. They don’t have SSE3. I don’t see this as a reason not to support OpenCL, however.
The next experiment was to run HelloCL on the basic Linux box. This has a simple switch: gpu = 1 or 0. If set to 0, the program attempts to run on the CPU. As there’s no gpu, the program fails with gpu=1. It runs with gpu=0. In other words, hello.cl is being compiled and run on the host CPU. I’m not yet clear whether it is being directly compiled or whether there is a virtual machine.
One of NVidia’s useful examples is oclDeviceQuery. This interrogates the OpenCL environment and prints out data about the hardware etc. A useful tool. Will it run with the AMD SDK? Ha ha!
The OpenCL specification describes what needs to be done and the NVidia example follows this quite closely, but it uses various other functions. So, I hacked it! My portable version is here. This compiles and runs on both boxes (and more).
On the NVidia box, the program finds the Tesla C870 and the GeForce 9400.
On the AMD box, it finds one “ATI RV770”.
In neither case are the CPUs found. This led me to some further experiments…
In most of the NVidia and AMD examples, the kernel code is kept in a separate .cl file and streamed in. In Apple’s HelloCL example, the kernel code is stored as a string in the file.
The NVidia examples use a function – oclLoadProgStream. The AMD examples use a non-portable stream. Here’s how to do it without using any vendor-specific functions.
#include <fstream>
…
std::ifstream in;
std::string KernelSource, line;
…
in.open(“hello.cl”);
if (in.is_open())
while (getline(in, line))
KernelSource += line + “\n”;
else
{
printf(“Error: hello.cl does not exist!\n”);
return EXIT_FAILURE;
}
in.close();
Note that the end of line markers must be included, otherwise the number of characters in the string must be supplied.
Download the source: hello.cpp, hello.cl
(Imagine the post’s title spoken in Leslie Phillips‘ drawl. This will probably either amuse you or mean absolutely nothing!)
OK. It’s party time! I have two Linux boxes, both apparently with OpenCL SDKs, each with a different GPGPU:
euler – NVidia Tesla 870
crout – AMD/ATI FireStream 9270
The supplied examples work, but they have vendor-specific bits. Can I get a “pure” OpenCL program to compile and run (without modification) on both machines?
Apple provide an example that sets up the environment and runs a small kernel. I had two make two fixes:
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif
instead of simply
#include <OpenCL/opencl.h>
Second there was a bug in the call to clGetKernelWorkGroupInfo, that Apple fixed yesterday! (I did this a week ago.) (I also changed the extension from .c to .cpp – this is not critical.)
We also need a Makefile. Both NVidia and AMD provide very complex Makefiles for their examples. Here’s my stripped down version. (Change the commented out paths as needed.)
# NVIDIA directories
# OCLROOTDIR = /usr/local/NVIDIA_GPU_Computing_SDK/OpenCL
# INCLUDEDIR = $(OCLROOTDIR)/inc
# LIBDIR = $(OCLROOTDIR)/lib
# ATI directories
OCLROOTDIR = /usr/local/ati-stream-sdk-v2.0-beta4-lnx64
INCLUDEDIR = $(OCLROOTDIR)/include
LIBDIR = $(OCLROOTDIR)/lib/x86_64
CC = g++
CFLAGS = -c -Wall -I$(INCLUDEDIR)
LDFLAGS = -L$(LIBDIR) -lOpenCL
SOURCES = hello.cpp
OBJECTS = $(SOURCES:.cpp=.o)
EXECUTABLE = hello
all: $(SOURCES) $(EXECUTABLE)
$(EXECUTABLE): $(OBJECTS)
$(CC) $(LDFLAGS) $(OBJECTS) -o $@
.cpp.o:
$(CC) $(CFLAGS) $< -o $@
And it works on both machines!
We were donated a FireStreaam 9270 by the University’s computing service (iSolutions), who had been given it by AMD. This fitted happily into the PCI-e slot of a 4-year old machine. The 550W power supply is adequate. The existing (NVidia) graphics card was removed. The FireStream has a DVI socket.
The system is running Ubuntu 9.04, 64 bit. The current release of the drivers does not work with Ubuntu 9.10. (I tried!)
The drivers and SDK are available from AMD. The drivers cannot be installed in a live X session, but they can be installed using a remote login (and rebooting).
I put the SDK in /usr/local/ati-stream-sdk-v2.0-beta4-lnx64
If the drivers are installed correctly, there is an AMD logo at the bottom right corner of the console. But remote access is a problem. AMD provide an applications note. This requires some changes for Ubuntu. The first file to be edited is /etc/gdm/gdm.conf-custom – add the lines listed. In /etc/gdm/Init/Default, the line to be added is:
chmod uog+rw /dev/ati/card*
BUT, having done all that, logins using NX or VNC will not give access to the card. The only thing that works is ssh. Even ssh with X tunnelling fails! That’s fine if all you care about is numerical computation. If in doubt, ensure that the environment variable DISPLAY is set to :0.0. (Not localhost:0.0 – even that won’t work.) This can produce some weird results if a graphical app is run – it appears on the main physical display. (I have also tried shadowing the console with NX. It sort of works but it’s not really usable.)
One other useful tip. Create a file, /etc/ld.so.conf.d/opencl.conf with the line
/usr/local/ati-stream-sdk-v2.0-beta4-lnx64/lib/x86_64
(or whatever is appropriate). Run sudo ldconfig (or reboot). This makes the dynamic library available to any application – there’s no need to set LD_LIBRARY_PATH.
The major problem with the Tesla cards is power. The second problem is finding a motherboard with suitable slots. We settled on taking an old PC (that had a fault on its motherboard) and reusing the case, disk etc. We used an Earthwatts EA-750 supply. The Tesla C870 needs two six-wire power connections – like an HDD. See picture.
The board has an NVidia graphics chip set. This has some interesting implications…
The operating system is Ubuntu 9.10, 64 bit. We have another Tesla card in a Windows system. The disadvantage of Ubuntu over other Linux distros is that there is little choice about versions of associated packages. I wanted gcc 4.4 because that includes OpenMP 3.0. Ubuntu 9.04 has gcc 4.3, which would have required a separate gcc build. The downside of this choice is that the NVidia CUDA examples don’t compile with gcc 4.4.
The OpenCL drivers and SDK (Software Development Kit) were downloaded from NVidia’s site. (Once you have registered, bookmark the page to avoid re-registering!)
To install the drivers, the X Window system needs to be off. So this requires a “safe” boot from the console.
I copied the SDK to /usr/local/NVIDIA_GPU_Computing_SDK
I put freenx on the system. It is possible to run OpenCL and CUDA examples remotely with freenx and with vnc (and with ssh login), but obviously(?) the graphics acceleration isn’t there.
Last year I bought an NVidia Tesla C870 card. This is a power-hungry beast and it was a little while before I got the whole thing working. Recently, we were given an AMD/ATI Firestorm card. I also have a MacPro. In subsequent posts, I’ll describe these systems.
I’m going to use this to record my experiences with NVidia and ATI GPGPUs, my attempts at using OpenCL, OpenMP and MPI and connecting the whole lot as a heterogeneous, desktop HPC resource.
Mark Zwolinski