embedded SQL taken literally
© Laurenz Albe 2020

 

Embedded SQL is by no means a new feature — in fact it is so old-fashioned that many people may not know about it at all. Still, it has lots of advantages for client code written in C. So I’d like to give a brief introduction and talk about its benefits and problems.

What is embedded SQL?

Typically you use a database from application code by calling the API functions or methods of a library. With embedded SQL, you put plain SQL code (decorated with “EXEC SQL”) smack in the middle of your program source. To turn that into correct syntax, you have to run a pre-processor on the code that converts the SQL statements into API function calls. Only then you can compile and run the program.

Embedded SQL is mostly used with old-fashioned compiled languages like C, Fortran, Pascal, PL/I or COBOL, but with SQLJ there is also a Java implementation. One reason for its wide adoption (at least in the past) is that it is specified by the SQL standard ISO/IEC 9075-2 (SQL/Foundation). This enables you to write fairly portable applications.

To be able to discuss the features in some more detail, I’ll introduce a sample C program using embedded SQL.

Sample embedded SQL program

The sample program operates on a database table defined like this:

CREATE TABLE atable(
   key integer PRIMARY KEY,
   value character varying(20)
);

The program is in a file sample.pgc and looks like this:

#include <stdlib.h>
#include <stdio.h>

/* error handlers for the whole program */
EXEC SQL WHENEVER SQLERROR CALL die();
EXEC SQL WHENEVER NOT FOUND DO BREAK;

static void die(void)
{
	/* avoid recursion on error */
	EXEC SQL WHENEVER SQLERROR CONTINUE;

	fprintf(
		stderr,
		"database error %s:\n%s\n",
		sqlca.sqlstate,
		sqlca.sqlerrm.sqlerrmc
	);

	EXEC SQL ROLLBACK;
	EXEC SQL DISCONNECT;

	exit(1);

	/* restore the original handler */
	EXEC SQL WHENEVER SQLERROR CALL die();
}

int main(int argc, char **argv)
{
	EXEC SQL BEGIN DECLARE SECTION;
	int v_key, v_val_ind;
	char v_val[81];
	EXEC SQL END DECLARE SECTION;

	EXEC SQL DECLARE c CURSOR FOR
		SELECT key, value
		FROM atable
		ORDER BY key;

	/* connect to the database */
	EXEC SQL CONNECT TO
		tcp:postgresql://localhost:5432/test?application_name=embedded
		USER laurenz;

	/* open a cursor */
	EXEC SQL OPEN c;

	/* loop will be left if the cursor is done */
	for(;;)
	{
		/* get the next result row */
		EXEC SQL FETCH NEXT FROM c INTO :v_key, :v_val :v_val_ind;

		printf(
			"key = %d, value = %s\n",
			v_key,
			v_val_ind ? "(null)" : v_val
		);
	}

	EXEC SQL CLOSE c;
	EXEC SQL COMMIT;
	EXEC SQL DISCONNECT;

	return 0;
}

Embedded SQL syntax explained

Each SQL statement or declaration starts with the magic words EXEC SQL and ends with a semicolon.

Most statements are translated by the preprocessor right where they are.

Connection and disconnection

EXEC SQL CONNECT TO
	tcp:postgresql://localhost:5432/test?application_name=embedded
	USER laurenz;

There are several ways to specify a connect string to the database, and of course the value does not have to be hard-coded. You can also use CONNECT TO DEFAULT and use libpq environment variables and a password file to connect.

It is possible to open several database connections at once; then you have to name the connections. Connections are like global variables: they are available everywhere once opened, and you don’t have to pass them to functions.

Host variables

Special “host variables” exchange data between the program and the SQL statements.

They have to be declared in the declare section:

EXEC SQL BEGIN DECLARE SECTION;
int v_key, v_val_ind;
char v_val[81];
EXEC SQL END DECLARE SECTION;

This is just like any other C variable declaration, but only these variables can be used in SQL statements. They have to be prepended with a colon (:) in SQL statements:

EXEC SQL FETCH NEXT FROM c INTO :v_key, :v_val :v_val_ind;

The last variable, v_val_ind, is an indicator variable: it is set to a negative number to indicate a NULL value.

One advantage of embedded SQL is that the conversion between PostgreSQL data types and C data types is done automatically. With libpq, you can either get strings or the internal PostgreSQL representation. There is the additional (non-standard) libpgtypes library that provides convenient support for data types like `timestamp` or `numeric`.

Note that v_val was declared as char[81] so that it can contain any 20 UTF-8 characters. If you cannot predict the size of the result, you can use descriptor areas to exchange data.

Exception handling

If SQL statements cause an error or warning or return no data, you define the program behavior with WHENEVER declarations:

EXEC SQL WHENEVER SQLERROR CALL die();
EXEC SQL WHENEVER NOT FOUND DO BREAK;

These slightly diverge from the SQL standard, which only has CONTINUE and GO TO label actions. Also, the standard uses SQLEXCEPTION instead of SQLERROR. The DO BREAK action inserts a break; statement to break out of the containing loop.

Different from other embedded SQL statements, these directives apply to all embedded SQL statements below them in the source file. They define how the pre-processor translates SQL statements and are independent of the control flow in the program.

To avoid recursion, it is best to set the action to CONTINUE (ignore) in the exception handler.

Note that this syntax allows to write code without adding error handling to every database call, which is convenient and makes the code concise and readable.

Interplay of embedded SQL with libpq

Some advanced or PostgreSQL-specific features of libpq are not available in embedded SQL, for example the Large Object API or, more importantly, COPY.

To use these, you can call the function ECPGget_PGconn(const char *connection_name), which returns the underlying libpq connection object.

Building the executable

The pre-processor is called ecpg and part of the PostgreSQL core distribution. By default, ecpg assumes that source files have the extension .pgc.

To compile the resulting C program, you can use any C compiler, and you have to link with the libecpg library.

Here is a Makefile that can be used to build the sample program with PostgreSQL v12 on RedHat-like systems:

CFLAGS ::= $(CFLAGS) -I/usr/pgsql-12/include -g -Wall
LDFLAGS ::= $(LDFLAGS) -L/usr/pgsql-12/lib -Wl,-rpath,/usr/pgsql-12/lib
LDLIBS ::= $(LDLIBS) -lecpg

PROGRAMS = sample

.PHONY: all clean

%.c: %.pgc
	ecpg $<

all: $(PROGRAMS)

clean:
	rm -f $(PROGRAMS) $(PROGRAMS:%=%.c) $(PROGRAMS:%=%.o)

Conclusion

We have only scratched the surface, but I hope I could demonstrate that embedded SQL is a convenient way of writing database client code in C.

Here is a list of advantages and disadvantages of embedded SQL compared to directly using the C API of libpq:

Disadvantages:

  • requires preprocessing
  • unknown to most present-day developers
  • less powerful than libpq (e.g., no direct COPY support)

Advantages:

  • standardized API for good portability between databases
  • more concise and readable code
  • simple but elegant exception handling
  • translation between C and SQL data types