NVIDIA’s CUDA is definitely I great thing and I got to admit that I got excited years ago when I first learned about it. For many operations a nicely optimized GPU implementation definitely seems the way to go.

GPUs are traditionally used for scientific operations and massively parallel tasks. However, some important work is also going into the PGStrom project, which is all about bringing the power of modern GPUs to PostgreSQL: https://wiki.postgresql.org/wiki/PGStrom

Installing CUDA

At this point installing CUDA on Linux might be the hardest part of the entire undertaking. The CUDA installer only works nicely, when no X-server is running. A simple „init 1“ should solve this problem, however.

Before you get started with pgstrom, it is usually a good idea to check, if the GPU has been detected properly:

[hs@laura ~]$ cd /usr/local/cuda/samples/1_Utilities/deviceQuery

[hs@laura deviceQuery]$ ./deviceQuery

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 970"

  CUDA Driver Version / Runtime Version          7.5 / 7.0

  CUDA Capability Major/Minor version number:    5.2

  Total amount of global memory:                 4095 MBytes (4294246400 bytes)

  (13) Multiprocessors, (128) CUDA Cores/MP:     1664 CUDA Cores

  GPU Max Clock rate:                            1253 MHz (1.25 GHz)

  Memory Clock rate:                             3505 Mhz

  Memory Bus Width:                              256-bit

  L2 Cache Size:                                 1835008 bytes

  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)

  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers

  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers

  Total amount of constant memory:               65536 bytes

  Total amount of shared memory per block:       49152 bytes

  Total number of registers available per block: 65536

  Warp size:                                     32

  Maximum number of threads per multiprocessor:  2048

  Maximum number of threads per block:           1024

  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)

  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)

  Maximum memory pitch:                          2147483647 bytes

  Texture alignment:                             512 bytes

  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)

  Run time limit on kernels:                     Yes

  Integrated GPU sharing Host Memory:            No

  Support host page-locked memory mapping:       Yes

  Alignment requirement for Surfaces:            Yes

  Device has ECC support:                        Disabled

  Device supports Unified Addressing (UVA):      Yes

  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0

  Compute Mode:

     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.0, NumDevs = 1, Device0 = GeForce GTX 970

Result = PASS

If the test is passed, CUDA is ready for PostgreSQL.

Installing PostgreSQL with CUDA

Installing pg_strom for PostgreSQL is basically not hard. Here is how it works:

$ git clone https://github.com/postgres/postgres.git pgsql

$ cd pgsql

$ ./configure --enable-debug --enable-cassert

$ make

$ sudo make install


$ git clone https://github.com/pg-strom/devel pg_strom

$ cd pg_strom

$ which pg_config


$ make

$ sudo make install

What happened in my case was that I had to uncomment 3 lines in a pg_strom header file because my version of PostgreSQL was a bit more up to date than expected. However, this is nothing major. It is more of a small fix.

Once pg_strom has been added to shared_preload_libraries, the system is already ready for action. In my case starting the database shows the following listing:

[hs@laura ~]$ pg_ctl -D /data/dbstrom/ start

server starting

LOG:  CUDA Runtime version: 7.5.0

LOG:  NVIDIA driver version: 352.30

LOG:  GPU0 GeForce GTX 970 (1664 CUDA cores, 1253MHz), L2 1792KB, RAM 4095MB (256bits, 3505MHz), capability 5.2

LOG:  NVRTC - CUDA Runtime Compilation vertion 7.0

LOG:  database system shutdown was interrupted; last known up at 2015-08-27 21:05:41 CEST

LOG:  database system was not properly shut down; automatic recovery in progress

LOG:  invalid record length at 0/A54973F8

LOG:  redo is not required

LOG:  MultiXact member wraparound protections are now enabled

LOG:  database system is ready to accept connections

LOG:  autovacuum launcher started

The important point here is that during PostgreSQL startup the CUDA device has to be in the LOG message – otherwise there is a problem with the driver.

The beauty is that pgstrom automatically uses GPU code when it seems useful. The user does not have to worry about where the code is actually executed. The optimizer will make those decisions for your automatically.

So far pgstrom seems pretty promising. Of course, it is not ready for production yet but it is definitely worth investigating the issue futher and run tests next week.