CYBERTEC Logo

pgstrom: Checking GPU performance

09.2015 / Category: / Tags: |

Finally I got around to take a more extensive look at pgstrom (a module to make use of GPUs). The entire GPU thing fascinates me, and I was excited to see the first real performance data.

Here is some simple test data:

5 million rows should be enough to get a first impression of what is going on.

Queries can benefit

To make sure that a real difference can actually be observed, I have decided to use no indexes. In real life, this is not too realistic because performance would suffer in a horrible way. pgstrom has not been made to speed up index lookups anyway so this should not be an issue.

The first thing I tried was to filter and group some data on the CPU:

My box (4 GHz AMD) can do that in just under 4 seconds. Note that I am using the standard PostgreSQL storage manager here (no column store or so).

Let us try the same thing on the GPU:

We see a nice improvement here. The speedup is incredible - especially when taking into consideration that getting the data already takes more than a second. It seems moving stuff out to the GPU definitely pays off in this case.

The interesting thing to notice is that the real improvement can be seen because of the GROUP BY clause. A normal filter does not show a benefit:

It certainly makes sense that there is no improvement in this case because moving data around is simply too expensive to make a difference. Remember: GPUs only make sense if things can be done in parallel and if data is coming fast enough. sqrt is not complicated enough to justify the effort of moving data around and PostgreSQL cannot provide data fast enough.

Or queries can be slower

It is important to mention that many queries won't benefit from the GPU at all. In fact, I would expect than the majority of queries in a usual system will not behave differently.

Here is an example of a query, which is actually slower with pgstrom:

In this case the GPU seems like a loss - at least there is no benefit to be expected at this stage.

One word about sorts

According to the main developer of pgstrom sorting is not yet as good as he wants it to be, so I skipped the sort part for now. As sorts are key to many queries, there is still pgstrom functionality I am really looking forward to.

I assume that sorts can greatly benefit from a GPU because there is a lot of intrinsic parallelism in a sort algorithm. Therefore sorting on the GPU could be highly beneficial. The speedup we can expect is hard to predict but I firmly believe that it can be quite substantial.

Stability

What stunned me is that I have not encountered a single segmentation fault during my tests. I definitely did not expect that. My assumption was that there would be more loose ends but actually things worked as expected most of the time - given the stage of the project I am pretty excited. pgstrom certainly feels like the future ...

Find all the latest CYBERTEC blog posts by Hans-Jürgen Schönig, Laurenz Albe, Pavlo Golub and others in our Performance blog spot.

0 0 votes
Article Rating
Subscribe
Notify of
guest
19 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Ernst-Georg Schmid
Ernst-Georg Schmid
8 years ago

> The interesting thing to notice is that the real improvment can be seen because of the GROUP BY clause.

Having done a bit of programming in OpenCL, I'd say the improvement comes from the GROUP BY implemented as parallel reduction, a pattern where GPGPUs really shine.

Unfortunately, the choice of CUDA limits the range of supported devices. As you mentioned, transferring data between host and device memory can kill performance easily, so integrated GPGPUs like Intel's IRIS/Pro often show comparatively good performance because they use the same slow main memory as the CPU but in return do not need those transfers. In addition, this allows to spread the work over both, CPU and GPGPU.

Since recent GPGPUs allow multiple concurrent kernels, combining GPGPU parallel processing with columnar data stores seems also very promising. Ah, so many options, so little time...

Hans-Jürgen Schönig
Hans-Jürgen Schönig
8 years ago

this seems to be the issue here. i am planning to give it some more tests with more grouping and so on. it seems grouping is where pgstrom really excels. in addition to that i am really looking forward to seeing, what sorts can do once they are done the way they are planned. we got interesting times ahead.

KaiGai Kohei
KaiGai Kohei
8 years ago

> Unfortunately, the choice of CUDA limits the range of supported devices.

The previous version of PG-Strom used OpenCL, however, I backed to CUDA because of driver's quality and debug support.
CUDA has widespread user base, thus gives us stable run-time environment. On the other hands, I faced some strange behavior on *ntel's driver when PG-Strom used OpenCL implementation. It was hard time for me...

Hans-Jürgen Schönig
Hans-Jürgen Schönig
8 years ago
Reply to  KaiGai Kohei

what you have achieved is definitely beyond incredible. i was stunned when i tested things. not a single segfault ... despite the size of the code.

KaiGai Kohei
KaiGai Kohei
8 years ago

If you give me the back trace of the crash, it may help to fix.
Also, I merged cumulative bugfixes around GpuJoin code. If you can, please retry with the latest master branch.

Ernst-Georg Schmid
Ernst-Georg Schmid
8 years ago
Reply to  KaiGai Kohei

Oh, this was no criticism. GPGPU acceleration for PostgreSQL is way cool, no matter what API. 🙂

Ernst-Georg Schmid
Ernst-Georg Schmid
8 years ago
Reply to  KaiGai Kohei

I just tried pg_strom:

LOG: CUDA Runtime version: 6.5.0
LOG: NVIDIA driver version: 340.76
LOG: GPU0 Quadro K1100M (384 CUDA cores, 705MHz), L2 256KB, RAM 2047MB (128bits, 1400MHz), capability 3.0
LOG: NVRTC - CUDA Runtime Compilation vertion 7.5

But when I try to run a query with pg_strom.enabled = on:

ERROR: failed on cuModuleLoadData (CUDA_ERROR_NO_BINARY_FOR_GPU - no kernel image is available for execution on the device)

Any hints what causes this?

Ernst-Georg Schmid
Ernst-Georg Schmid
8 years ago

Seems to be a driver conflict. But on Ubuntu 14.04 the latest official driver is 346 and then pg_strom does not compile:

src/cuda_control.c:2522:4: error: ‘CU_DEVICE_ATTRIBUTE_GLOBAL_L1_CACHE_SUPPORTED’ undeclared (first use in this function)
{CU_DEVICE_ATTRIBUTE_GLOBAL_L1_CACHE_SUPPORTED,
^

src/cuda_control.c:2522:4: note: each undeclared identifier is reported only once for each function it appears in
src/cuda_control.c:2524:4: error: ‘CU_DEVICE_ATTRIBUTE_LOCAL_L1_CACHE_SUPPORTED’ undeclared (first use in this function)
{CU_DEVICE_ATTRIBUTE_LOCAL_L1_CACHE_SUPPORTED,
^

and so on...

Ernst-Georg Schmid
Ernst-Georg Schmid
8 years ago

Ok, works now

KaiGai Kohei
KaiGai Kohei
8 years ago

I guess you use unsupported CUDA version (6.5), please ensure CUDA 7.0 or later is installed.

LOG: CUDA Runtime version: 6.5.0
LOG: NVIDIA driver version: 340.76

Also, PG-Strom should have version check here.
Thanks for your feedback. If you can, please file your troubles in the project github.
https://github.com/pg-strom/devel/issues

⚡ ⚕ Ayy LOLz LMAO ⚕ ⚡
⚡ ⚕ Ayy LOLz LMAO ⚕ ⚡
7 years ago
Reply to  KaiGai Kohei

Could you revisit the OpenCL implementation in the current landscape (2016)?

Maybe things have improved.

Worst case, ignore *ntel and focus on AMD/Nvidia.

KaiGai Kohei
KaiGai Kohei
7 years ago

No, what I have to do "first" is provision of a working, valuable and stable software for users.
Other comprehensive might change, however, I already built many stuffs on CUDA.
Switch of the platform makes the v1.0 delayed. Sorry.

⚡ ⚕ Ayy LOLz LMAO ⚕ ⚡
⚡ ⚕ Ayy LOLz LMAO ⚕ ⚡
7 years ago
Reply to  KaiGai Kohei

HIP : C++ Heterogeneous-Compute Interface for Portability
http://gpuopen.com/compute-product/hip-convert-cuda-to-portable-c-code/

OpenCL for AMD
CUDA for NVIDIA
WIN / WIN

From the link: "To further reduce the learning curve when moving from Cuda to HIP, we developed the hipify tool to automate your application’s core conversion."

Maybe for future consideration?

⚡ ⚕ Ayy LOLz LMAO ⚕ ⚡
⚡ ⚕ Ayy LOLz LMAO ⚕ ⚡
7 years ago
⚡ ⚕ Ayy LOLz LMAO ⚕ ⚡
⚡ ⚕ Ayy LOLz LMAO ⚕ ⚡
7 years ago
Reply to  KaiGai Kohei

GPUOpen is being pushed a lot:
http://gpuopen.com/

HIP : C++ Heterogeneous-Compute Interface for Portability:
http://gpuopen.com/compute-product/hip-convert-cuda-to-portable-c-code/

OpenCL for AMD
CUDA fro NVIDIA
WIN / WIN

Nikos
Nikos
7 years ago

Hi! Very interesting article. Is it possible to use PG-Storm with PostgreSQL in Windows 10 Pro? Alternatively, would it be straightforward to rebuild for Windows, or does it depend on Linux-specific libraries?

Hans-Jürgen Schönig
Hans-Jürgen Schönig
7 years ago
Reply to  Nikos

we did not dare to run that on Windows.

Nikos
Nikos
7 years ago

Why not, is it not possible? I've downloaded PG-Storm and trying to figure out if/how to compile/build it using Visual Studio or any other appropriate toolchain.

Any help/advise would be kindly accepted! 🙂

Anthony DeMaio
Anthony DeMaio
5 years ago

Hi Hans-Jürgen, Can you please provide an update to this article with testing of PG-strom 2.0? I'm interested to know if the performance has improved.

3. PG-Strom v2.0 features highlight PG-Strom v2.0 Release Technical Brief (17-Apr-2018)3 ▌Storage Enhancement  SSD-to-GPU Direct SQL Execution  In-memory Columnar Cache  GPU memory store (gstore_fdw) ▌Advanced SQL Infrastructure  PostgreSQL v9.6/v10 support – CPU+GPU Hybrid Parallel  SCAN+JOIN+GROUP BY combined GPU kernel  Utilization of demand paging of GPU device memory ▌Miscellaneous  PL/CUDA related enhancement  New data type support  Documentation and Packaging

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    19
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram