Wednesday, 2 April 2008

Nifty Postgresql Features

I was poking around and happened across a postgres feature matrix which brought to my attention the addition of the new (8.2 and 8.3) RETURNING feature of INSERT/UPDATE/DELETE. This is really nice as it eliminates the often needed ugly/wasteful SELECT after one of these operations. It seems particularly useful for inserts into tables with SERIAL primary keys.

Posted by james at 4:03 PM in Postgres

Wednesday, 14 March 2007

PostgreSQL user defined aggregates

PostgreSQL is really nice. I have just been playing around with user defined aggregates and find the nice example:
CREATE AGGREGATE array_accum (
    sfunc = array_append,
    basetype = anyelement,
    stype = anyarray,
    initcond = '{}'
);
Which allows constructions like
pooky=# SELECT * from foo;

 row | col 
-----+-----
 a   |   1
 a   |   2
 a   |   3
 b   |   4
 b   |   5
 c   |   5
 c   |   4
(7 rows)

pooky=# SELECT row, array_accum(col) AS cols FROM (SELECT * FROM foo ORDER BY row,col) AS tmp GROUP BY row;

 row |  cols   
-----+---------
 a   | {1,2,3}
 b   | {4,5}
 c   | {4,5}
(3 rows)

the sorted sub-select is to ensure that we end up with "{4,5}" rather than "{5,4}". Finally, we can use the result

pooky=# SELECT cols, count(row) FROM (SELECT row, array_accum(col) AS cols FROM 
  (SELECT * FROM foo ORDER BY row,col) AS tmp GROUP BY row) as foo group by cols;

  cols   | count 
---------+-------
 {1,2,3} |     1
 {4,5}   |     2
(2 rows)
to perform simple clustering.
Posted by james at 11:33 AM in Postgres

Thursday, 8 March 2007

Dispatch-Merge Database Pattern

I am currently using PostgreSQL to store intrusion detection logging information. For my purposes it is useful to process it on a daily basis but older data is of no use. The difficulty I have is that due to the high volume of data (several million records per day), the delete+vacuum process tends to takes an unworkably long period of time and has a very negative impact upon the logging and analysis process.

This describes a construction that I have found useful in maintaining a running system. The basic idea is to split table T into series of tables and then round-robin dispatch events based on time. In the example, I use two tables T_0 and T_1 and use a trigger to add events destined for T into one of T_0 or T_1 based on the day. In that day that one of the split tables _0 is being written to, I can TRUNCATE the other table thereby avoiding the cost of delete and vacuum. For convenience, I use a VIEW to merge the T_i.

diagram

The following is an implementation PostgreSQL:

-- -- execute these two commands as postgres user
-- CREATE FUNCTION plpgsql_call_handler() RETURNS language_handler AS '\$libdir/plpgsql' LANGUAGE C;
-- CREATE TRUSTED PROCEDURAL LANGUAGE plpgsql HANDLER plpgsql_call_handler;

CREATE SCHEMA events;
SET search_path TO "events";


CREATE SEQUENCE "seq_event";

CREATE TABLE event (
       seqid integer DEFAULT nextval('seq_event') NOT NULL,
       src INET,
       description TEXT,
       time TIMESTAMP,
       PRIMARY KEY (seqid));

-- copy the event database structure to event_0 and event_1
-- NB should actually use create table to keep primary key but
-- want the write-up to remain short
SELECT * into event_0 from event limit 0;
SELECT * into event_1 from event limit 0;

-- a merged view of event_0 and event_1
CREATE VIEW event_merge AS SELECT * FROM event_0 UNION SELECT * from event_1;


-- dispatch function returns NULL so that nothing ever ends up in event
CREATE OR REPLACE FUNCTION dispatch_event () RETURNS TRIGGER AS $$
  DECLARE
     mydest INTEGER;
  BEGIN
    -- doy is day of year so that the destination table alternates day by day
    -- thus one can truncate the table not being inserted into
    mydest := (date_part('doy',NEW.time)::integer)&1;

    IF mydest = 0 THEN
      INSERT INTO event_0 SELECT NEW.*;
    ELSE
      INSERT INTO event_1 SELECT NEW.*;
  END IF;
  RETURN NULL;
  END;
$$ LANGUAGE 'plpgsql';
CREATE TRIGGER dispatch_event BEFORE INSERT ON event FOR EACH ROW EXECUTE PROCEDURE dispatch_event ();

Some data to try it out...


-- some test data
INSERT INTO event ("src","time","description") VALUES ('127.0.0.1'::inet, now(),'event 0');
INSERT INTO event ("src","time","description") VALUES ('127.0.0.1'::inet, now(),'event 1');
INSERT INTO event ("src","time","description") VALUES ('127.0.0.1'::inet, now(),'event 2');
INSERT INTO event ("src","time","description") VALUES ('127.0.0.1'::inet, now(),'event 3');
-- wait 24 hours or...
INSERT INTO event ("src","time","description") VALUES ('127.0.0.1'::inet, now()+interval '24 hours','event 4');
INSERT INTO event ("src","time","description") VALUES ('127.0.0.1'::inet, now()+interval '24 hours','event 5');
INSERT INTO event ("src","time","description") VALUES ('127.0.0.1'::inet, now()+interval '24 hours','event 6');
INSERT INTO event ("src","time","description") VALUES ('127.0.0.1'::inet, now()+interval '24 hours','event 7');
Posted by james at 11:43 AM in Postgres

Monday, 19 February 2007

Postgres schema trick

Every so often I encounter an application that wants to log to a database but is not schema aware. This leads to chaos and name conflicts (if you want several applications writing to the same database).

Postgres permits a solution by using search_path to set default schemas. To create tables one can add an explict set statement to database creation scripts:

  SET search_path TO toto;
  CREATE TABLE bar (...);
  CREATE TABLE foo (...);

And then extend this mapping by altering the seach_path of the database user:

  ALTER USER appuser SET search_path TO toto;
Now the application will automatically write into the schema toto.
Posted by james at 2:34 PM in Postgres