PostgreSQL is a powerful database which supports partitioning. In contrast to Oracle partitioning is part of the PostgreSQL core engine and does not need any additional licensing or extensions. If you migrate from Oracle to PostgreSQL this will be a major advantage.

However, just like any other technology, partitioning is not without risk. While it might help to handle large quantities of data efficiently it can also have downsides which have to be taken into consideration when using this powerful feature.

Querying data without partitions

The most basic example involves a table consisting of two columns:

test=# CREATE TABLE t_simple (
		id 		serial, 
		val 		int 		DEFAULT random()*100000
);
CREATE TABLE
test=# INSERT INTO t_simple 
		SELECT FROM generate_series(1, 10000000);
INSERT 0 10000000
test=# CREATE INDEX ON t_simple (val);
CREATE INDEX
test=# VACUUM ANALYZE;
VACUUM

In this case we have created a table and loaded 10 million rows. The column we want to query is indexed. When looking for a handful of values we will end up with a single index scan:

test=# explain (analyze, buffers, costs) 
	SELECT 	* 
	FROM 		t_simple 
	WHERE 	val = 454650;
                           QUERY PLAN                                                        
--------------------------------------------------------------
 Bitmap Heap Scan on t_simple  
	(cost=5.20..388.39 rows=99 width=8) 
	(actual time=0.010..0.011 rows=0 loops=1)
   Recheck Cond: (val = 454650)
   Buffers: shared hit=3
   ->  Bitmap Index Scan on t_simple_val_idx  
		(cost=0.00..5.18 rows=99 width=0) 
		(actual time=0.009..0.010 rows=0 loops=1)
         Index Cond: (val = 454650)
         Buffers: shared hit=3
 Planning Time: 0.066 ms
 Execution Time: 0.028 ms
(8 rows)

This is not really a surprise. What is noteworthy here is that the query is executed in a fraction of a millisecond. PostgreSQL can execute thousands of such queries per second per CPU core. On a large server we can easily reach more than 1 million queries per second.

Using PostgreSQL partitioning to store data

Let’s deploy the same data using partitions. To prove our point we’ll use hash partitions for this purpose:

test=# CREATE TABLE t_part (
		id 		serial, 
		val 		int 	DEFAULT random()*100000) 
	 				PARTITION BY HASH (id);
CREATE TABLE

After deploying the parent table, we can create the partitions. For the sake of simplicity, I have created a set of only 8 partitions which is enough to make a point. The effect you are going to see is even larger if the number of partitions increases:

CREATE TABLE t_part_1 
	PARTITION OF t_part 
	FOR VALUES WITH (MODULUS 8, REMAINDER 0);                                    CREATE TABLE t_part_2 
	PARTITION OF t_part 
	FOR VALUES WITH (MODULUS 8, REMAINDER 1);                                           CREATE TABLE t_part_3 
	PARTITION OF t_part 
	FOR VALUES WITH (MODULUS 8, REMAINDER 2);                                           CREATE TABLE t_part_4 
	PARTITION OF t_part 
	FOR VALUES WITH (MODULUS 8, REMAINDER 3);                                           CREATE TABLE t_part_5 
	PARTITION OF t_part 
	FOR VALUES WITH (MODULUS 8, REMAINDER 4);                                           CREATE TABLE t_part_6 
	PARTITION OF t_part 
	FOR VALUES WITH (MODULUS 8, REMAINDER 5);                                           CREATE TABLE t_part_7 
	PARTITION OF t_part 
	FOR VALUES WITH (MODULUS 8, REMAINDER 6);                                           CREATE TABLE t_part_8 
	PARTITION OF t_part 
	FOR VALUES WITH (MODULUS 8, REMAINDER 7);

Once the data structure has been created, we can load the same data we loaded before:

test=# INSERT INTO t_part 
		SELECT FROM generate_series(1, 10000000);
INSERT 0 10000000

Again an index is created to make a fair comparison:

test=# CREATE INDEX ON t_part (val);
CREATE INDEX
test=# VACUUM ANALYZE;
VACUUM

Running the same query as before will result in a far more complex execution plan, which already points to the root cause we’re going to discuss:

test=# explain (analyze, buffers, costs) 
		SELECT * FROM t_part WHERE val = 454650;
                         QUERY PLAN                                                           
------------------------------------------------------
 Append  (cost=4.53..438.92 rows=104 width=8) 
	(actual time=0.141..0.145 rows=0 loops=1)
   Buffers: shared read=24
   ->  Bitmap Heap Scan on t_part_1  
		(cost=4.53..54.80 rows=13 width=8) 
		(actual time=0.024..0.025 rows=0 loops=1)
         Recheck Cond: (val = 454650)
         Buffers: shared read=3
         ->  Bitmap Index Scan on t_part_1_val_idx  
			(cost=0.00..4.53 rows=13 width=0) 
			(actual time=0.023..0.023 rows=0 loops=1)
               Index Cond: (val = 454650)
               Buffers: shared read=3
   ->  Bitmap Heap Scan on t_part_2  		
		(cost=4.53..54.80 rows=13 width=8) 
		(actual time=0.018..0.019 rows=0 loops=1)
         Recheck Cond: (val = 454650)
         Buffers: shared read=3
         ->  Bitmap Index Scan on t_part_2_val_idx  
			(cost=0.00..4.53 rows=13 width=0) 
			(actual time=0.018..0.018 rows=0 loops=1)
               Index Cond: (val = 454650)
               Buffers: shared read=3
   ->  Bitmap Heap Scan on t_part_3  
		(cost=4.53..54.80 rows=13 width=8) 
		(actual time=0.014..0.015 rows=0 loops=1)
         Recheck Cond: (val = 454650)
         Buffers: shared read=3
         ->  Bitmap Index Scan on t_part_3_val_idx  
			(cost=0.00..4.53 rows=13 width=0) 
			(actual time=0.014..0.014 rows=0 loops=1)
               Index Cond: (val = 454650)
               Buffers: shared read=3
   ->  Bitmap Heap Scan on t_part_4  
		(cost=4.53..54.80 rows=13 width=8) 
		(actual time=0.014..0.014 rows=0 loops=1)
         Recheck Cond: (val = 454650)
         Buffers: shared read=3
         ->  Bitmap Index Scan on t_part_4_val_idx  
			(cost=0.00..4.53 rows=13 width=0) 
			(actual time=0.014..0.014 rows=0 loops=1)
               Index Cond: (val = 454650)
               Buffers: shared read=3
   ->  Bitmap Heap Scan on t_part_5  
		(cost=4.53..54.80 rows=13 width=8) 
		(actual time=0.016..0.017 rows=0 loops=1)
         Recheck Cond: (val = 454650)
         Buffers: shared read=3
         ->  Bitmap Index Scan on t_part_5_val_idx  
			(cost=0.00..4.53 rows=13 width=0) 
			(actual time=0.016..0.016 rows=0 loops=1)
               Index Cond: (val = 454650)
               Buffers: shared read=3
   ->  Bitmap Heap Scan on t_part_6  
		(cost=4.53..54.80 rows=13 width=8) 
		(actual time=0.018..0.018 rows=0 loops=1)
         Recheck Cond: (val = 454650)
         Buffers: shared read=3
         ->  Bitmap Index Scan on t_part_6_val_idx  
			(cost=0.00..4.53 rows=13 width=0) 
			(actual time=0.017..0.017 rows=0 loops=1)
               Index Cond: (val = 454650)
               Buffers: shared read=3
   ->  Bitmap Heap Scan on t_part_7  
		(cost=4.53..54.80 rows=13 width=8) 
		(actual time=0.017..0.017 rows=0 loops=1)
         Recheck Cond: (val = 454650)
         Buffers: shared read=3
         ->  Bitmap Index Scan on t_part_7_val_idx  
			(cost=0.00..4.53 rows=13 width=0) 
			(actual time=0.017..0.017 rows=0 loops=1)
               Index Cond: (val = 454650)
               Buffers: shared read=3
   ->  Bitmap Heap Scan on t_part_8  
		(cost=4.53..54.80 rows=13 width=8) 
		(actual time=0.017..0.018 rows=0 loops=1)
         Recheck Cond: (val = 454650)
         Buffers: shared read=3
         ->  Bitmap Index Scan on t_part_8_val_idx  
			(cost=0.00..4.53 rows=13 width=0) 
			(actual time=0.017..0.017 rows=0 loops=1)
               Index Cond: (val = 454650)
               Buffers: shared read=3
 Planning:
   Buffers: shared hit=100
 Planning Time: 0.769 ms
 Execution Time: 0.235 ms
(54 rows)

Wow, what an execution plan. Partitioning has not done us any favors here. The reason is that at this point, we have to scan every single partition to find the data. This causes a significantly higher runtime, since it is simply more work for the database engine.

Conclusion and takeaways

The takeaway is that if the partitioning criteria is not part of the query or if many partitions have to be touched in general, runtime will suffer, which is exactly what happened here. It is also important to note that using too many partitions will significantly increase the time the planner needs to do its job. Having hundreds of partitions can easily lead to a real disaster.

Partitioning has many advantages. However, it has to be used cleverly and not blindly. There is no feature which is always advantageous. The same is true for partitioning and it makes sense to take that into consideration.

For further blogs on the topic of partitioning and performance, see these posts:


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.