When cleaning up some old paperwork this weekend I stumbled over a very old tutorial on regular expressions. In fact, I received this little handout during a UNIX course I attended voluntarily during my first year at university. It seems that those two days really changed my life – the price tag: 100 Austrian Shillings which translates to something like 7 Euros in today’s money.
While looking over this old thing, I noticed a nice example showing how to test regular expression support in grep. Over the years I had almost forgotten this little test. Here is the idea: There is no single, unique way to transliterate the name of Libya’s former dictator. According to this example there are around 30 ways to do it:
test=# CREATE TABLE t_code (name text); CREATE TABLE test=# Copy t_code FROM stdin; Gadaffi Gadafi Gadafy Gaddafi Gaddafy Gaddhafi Gadhafi Gathafi Ghadaffi Ghadafi Ghaddafi Ghaddafy Gheddafi Kadaffi Kadafi Kaddafi Kadhafi Kazzafi Khadaffy Khadafy Khaddafi Qadafi Qaddafi Qadhafi Qadhdhafi Qadthafi Qathafi Quathafi Qudhafi Kad'afi
Of course I couldn’t resist trying it in PostgreSQL to see if things work just like 18 years ago.
Regular Expressions in PostgreSQL
In PostgreSQL there are a handful of operators to work with regular expressions:
~ Matches regular expression, case-sensitive
~* Matches regular expression, case-insensitive
!~ Does not match regular expression, case-sensitive
!~* Does not match regular expression, case-insensitive
In our case, we have a case-sensitive regular expression, so that ~ operator should work. To make sure we get the right result, we will add one more row to the data set:
test=# INSERT INTO t_code VALUES ('hans'); INSERT 0 1
Let’s give it a try now:
test=# SELECT name, name ~ '(Kh?|Gh?|Qu?)[aeu](d[''dt]?|t|zz|dhd)h?aff?[iy]' FROM t_code; name | ?column? -----------+---------- Gadaffi | t Gadafi | t ... Qudhafi | t Kad'afi | t hans | f (31 rows)
WOW 🙂 Things work just like 20 years ago – without any changes, without a single problem. I guess that can be called a “safe investment”. Coding which still works after 20+ years without any changes can be considered good code!
Read more about coding topics in this blog about case-insensitive pattern matching by Laurenz Albe.