September 2014

This week I started my preparations for one of my talks in Madrid. The topic is: "Joining 1 million tables". Actually 1 million tables is quite a lot and I am not sure if there is anybody out there who has already tried to do something similar. Basically the idea is to join 1 million tables containing just a single integer column (and no data). My idea was to come up with a join like this:

[hs@jacqueline 1million]$ ./make_join.pl 4
timing 
SELECT 1 
FROM   t_tab_1, 
       t_tab_2, 
       t_tab_3, 
       t_tab_4 
WHERE  t_tab_1.id = t_tab_2.id AND 
       t_tab_2.id = t_tab_3.id AND 
       t_tab_3.id = t_tab_4.id AND 
       1 = 1;

[hs@jacqueline 1million]$ ./make_join.pl 4

timing

SELECT 1

FROM t_tab_1,

t_tab_2,

t_tab_3,

t_tab_4

WHERE t_tab_1.id = t_tab_2.id AND

t_tab_2.id = t_tab_3.id AND

t_tab_3.id = t_tab_4.id AND

1 = 1;

To give you an impression of the size of the problem I am trying to solve - here is some data on the SQL:

[hs@jacqueline 1million]$ ./make_join.pl 1000000 | wc
2000007 5000024 54666838

1 2	[hs@jacqueline 1million]$ ./make_join.pl 1000000 \| wc 2000007 5000024 54666838

The statement itself is 54 MB in size and is around 2 million lines long.

First steps

The first observation is that the standard planner doing exhaustive search clearly cannot do it. The potential number of joins simply sky rockets - even ways below a thousand tables the system won't respond anymore because planning simply takes too long.

So, I tried to focus my tests on genetic query optimization, which proved to be a lot better for this little competition. Joining a couple hundreds of tables with Geqo is actually possible straight away without having to change any parameters at all. This was pretty impressive actually, as in real life you would never attempt to do this kind of operation. Given some runtime estimations it seems that gradually increasing the number of tables would mean that 1 million tables can actually be joined in 1575 days. The thing is just: I don't have time to run this for 5 years as I got to produce my slides for Madrid fairly soon ;).

Finally...

Geqo is highly configurable, so some parameters can be changed to speed up things by tweaking the generic optimization process and making it a bit more lazy. I managed to cut runtime to roughly one year. To speed this up it seems that I got to inspect the backend process and see what is going on ... stay tuned 😉

More recent blog posts about joins can be found here in our join-related blog spot.

Writing a complex database server like PostgreSQL is not an easy task. Especially memory management is an important task, which needs special attention. Internally PostgreSQL makes use of so called “memory contexts”. The idea of a memory context is to organize memory in groups, which are organized hierarchically. The main advantage is that in case of an error, all relevant memory can be freed at once.

Understanding PostgreSQL memory contexts can be useful to solve a bunch of interesting support cases. Here is an example: Recently we have stumbled across a problem. A database server was constantly running out of memory and was finally killed by the OOM killer over and over again. Backend processes were constantly increasing memory consumption for non-obvious reasons. How can a problem like this be approached?

GDB comes to the rescue

GDB can come to the rescue and solve the riddle of memory consumption nicely. The basic procedure works as follows:

Create a core dump of the process in question
Come up with a GDB macro to debug memory
Run the macro

The first part is actually quite simple. To extract a core dump of a running process we have to find out the process ID first:

[hs@jacqueline debug]$ ps ax | grep post
1987 pts/1 S 0:00 /usr/local/pg94/bin/postgres -D /tmp/db94
1989 ? Ss 0:00 postgres: checkpointer process
1990 ? Ss 0:00 postgres: writer process
1991 ? Ss 0:00 postgres: wal writer process
1992 ? Ss 0:00 postgres: autovacuum launcher process
1993 ? Ss 0:00 postgres: stats collector process
1999 ? Ss 0:00 postgres: hs test [local] idle
2004 pts/1 S+ 0:00 grep post

[hs@jacqueline debug]$ ps ax | grep post

1987 pts/1 S 0:00 /usr/local/pg94/bin/postgres -D /tmp/db94

1989 ? Ss 0:00 postgres: checkpointer process

1990 ? Ss 0:00 postgres: writer process

1991 ? Ss 0:00 postgres: wal writer process

1992 ? Ss 0:00 postgres: autovacuum launcher process

1993 ? Ss 0:00 postgres: stats collector process

1999 ? Ss 0:00 postgres: hs test [local] idle

2004 pts/1 S+ 0:00 grep post

In this example the process ID of the process we want to inspect is 1999 (a simple, idle local backend).
Then it is time to create the core file. gcore can do exactly that for you:

[hs@jacqueline debug]$ gcore 1999
[Thread debugging using libthread_db enabled]
0x00000033e6ee98c2 in recv () from /lib64/libc.so.6
Saved corefile core.1999

[hs@jacqueline debug]$ gcore 1999

[Thread debugging using libthread_db enabled]

0x00000033e6ee98c2 in recv () from /lib64/libc.so.6

Saved corefile core.1999

The beauty here is that gcore is just a simple shell script calling some gdb magic internally. The result will be a core file we can then make use of:

[hs@jacqueline debug]$ ls -l
total 251220
-rw-rw-r-- 1 hs hs 257244232 Sep 2 09:23 core.1999

[hs@jacqueline debug]$ ls -l

total 251220

-rw-rw-r-- 1 hs hs 257244232 Sep 2 09:23 core.1999

The harder part

Then comes the harder part: Writing the gdb macro to debug those memory contexts. gdb has a scripting language to handle that. Here is the code:

[hs@jacqueline debug]$ cat /tmp/debug/pg_gdb_marcos
define sum_context_blocks
set $context = $arg0
set $block = ((AllocSet) $context)->blocks
set $size = 0
while ($block)
set $size = $size + (((AllocBlock) $block)->endptr - ((char *) $block))
set $block = ((AllocBlock) $block)->next
end
printf "%s: %dn",((MemoryContext)$context)->name, $size
end

define walk_contexts
set $parent_$arg0 = ($arg1)
set $indent_$arg0 = ($arg0)
set $i_$arg0 = $indent_$arg0
while ($i_$arg0)
printf " "
set $i_$arg0 = $i_$arg0 - 1
end
sum_context_blocks $parent_$arg0
set $child_$arg0 = ((MemoryContext) $parent_$arg0)->firstchild
set $indent_$arg0 = $indent_$arg0 + 1
while ($child_$arg0)
walk_contexts $indent_$arg0 $child_$arg0
set $child_$arg0 = ((MemoryContext) $child_$arg0)->nextchild
end
end

walk_contexts 0 TopMemoryContext

[hs@jacqueline debug]$ cat /tmp/debug/pg_gdb_marcos

define sum_context_blocks

set $context = $arg0

set $block = ((AllocSet) $context)->blocks

set $size = 0

while ($block)

set $size = $size + (((AllocBlock) $block)->endptr - ((char *) $block))

set $block = ((AllocBlock) $block)->next

end

printf "%s: %dn",((MemoryContext)$context)->name, $size

end

define walk_contexts

set $parent_$arg0 = ($arg1)

set $indent_$arg0 = ($arg0)

set $i_$arg0 = $indent_$arg0

while ($i_$arg0)

printf " "

set $i_$arg0 = $i_$arg0 - 1

end

sum_context_blocks $parent_$arg0

set $child_$arg0 = ((MemoryContext) $parent_$arg0)->firstchild

set $indent_$arg0 = $indent_$arg0 + 1

while ($child_$arg0)

walk_contexts $indent_$arg0 $child_$arg0

set $child_$arg0 = ((MemoryContext) $child_$arg0)->nextchild

end

walk_contexts 0 TopMemoryContext

The last line in the file is the actual call executing the code just written. walk_contexts will go through those memory contexts starting at the TopMemoryContext.
To run the script the following line will be useful. The script can simply be piped into gdb. The result will list information about memory consumption:

[hs@jacqueline debug]$ gdb -c ./core.1999 /usr/local/pg94/bin/postgres < pg_gdb_marcos 
*snip* 
Loaded symbols for /lib64/libnss_files.so.2 
Core was generated by `postgres: hs test [local] idle '. 
#0 0x00000033e6ee98c2 in recv () from /lib64/libc.so.6 
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6_5.2.x86_64 
(gdb) >>>> > > >>>(gdb) (gdb) >>>> > > >>>>> > > >>(gdb) (gdb)
TopMemoryContext: 69936
MessageContext: 8192
Operator class cache: 8192
smgr relation table: 24576
TransactionAbortContext: 32768
Portal hash: 8192
PortalMemory: 0
Relcache by OID: 24576
CacheMemoryContext: 516096
pg_db_role_setting_databaseid_rol_index: 1024
pg_user_mapping_user_server_index: 1024
pg_user_mapping_oid_index: 1024

*snip*

pg_class_oid_index: 1024
MdSmgr: 8192
ident parser context: 0
hba parser context: 3072
LOCALLOCK hash: 8192
Timezones: 83472
ErrorContext: 8192
(gdb) TopMemoryContext: 69936

[hs@jacqueline debug]$ gdb -c ./core.1999 /usr/local/pg94/bin/postgres < pg_gdb_marcos

*snip*

Loaded symbols for /lib64/libnss_files.so.2

Core was generated by `postgres: hs test [local] idle '.

#0 0x00000033e6ee98c2 in recv () from /lib64/libc.so.6

Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6_5.2.x86_64

(gdb) >>>> > > >>>(gdb) (gdb) >>>> > > >>>>> > > >>(gdb) (gdb)

TopMemoryContext: 69936

MessageContext: 8192

Operator class cache: 8192

smgr relation table: 24576

TransactionAbortContext: 32768

Portal hash: 8192

PortalMemory: 0

Relcache by OID: 24576

CacheMemoryContext: 516096

pg_db_role_setting_databaseid_rol_index: 1024

pg_user_mapping_user_server_index: 1024

pg_user_mapping_oid_index: 1024

*snip*

pg_class_oid_index: 1024

MdSmgr: 8192

ident parser context: 0

hba parser context: 3072

LOCALLOCK hash: 8192

Timezones: 83472

ErrorContext: 8192

(gdb) TopMemoryContext: 69936

The output is actually quite long so I decided to remove a couple of lines. What you see here is how memory contexts are organized and how much memory is in each memory context.
If you happen to see any context which uses insane amounts of memory, it will definitely bring you one step closer to finding the root cause of a memory related problem.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.