Difference between revisions of "PostgreSQL/From MySQL"

Revision as of 22:02, 7 February 2008

The following table shows how to get around some MySQL-specific non SQL-92 syntax.

MySQL	PostgreSQL	Supported by both?
LIMIT offset, count	LIMIT count OFFSET offset	Yes
MATCH (column(s)) AGAINST query	ftsindex @@ query::tsquery	No
BOOLEAN INT INT(n) INTEGER(n) TINYINT TINYINT(n) TINYINTEGER TINYINTEGER(n) MEDIUMINT MEDIUMINT(n) MEDIUMINTEGER MEDIUMINTEGER(n) SMALLINT SMALLINT(n) SMALLINTEGER SMALLINTEGER(n)	INTEGER	Yes⁽¹⁾
BOOLEAN	INTEGER CHECK (column IN ('0', '1'))	No
INTEGER UNSIGNED	INTEGER CHECK (column > -1)	No
INTEGER SIGNED	INTEGER	Yes
INTEGER AUTO_INCREMENT	SERIAL PRIMARY KEY	No⁽²⁾
VARCHAR(n) CHAR(n) LONGTEXT BLOB LONGBLOB	TEXT	Yes⁽³⁾
DATETIME DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00'	TIMESTAMP TIMESTAMP NOT NULL DEFAULT NOW()	No
'0000-00-00 00:00:00'	NOW()	No
CHARACTER SET UTF8 COLLATE UTF8_BIN		No⁽⁴⁾
column SET('value1', 'value2')	column TEXT CHECK (column IN ('value1', 'value2'))	No⁽⁵⁾
<columndef> COMMENT comment	COMMENT ON COLUMN column IS 'comment'	No
[UNIQUE] KEY indexname (column1, column2)	CREATE [UNIQUE] INDEX indexname ON tablename (column1, column2)	Yes

Notes

PostgreSQL has INT2 (SMALLINT), INT4 (INTEGER) and INT8 (BIGINT), but use INTEGER unless you want ones > 2³¹ in which case use BIGINT.
PostgreSQL full text searching is done with tsearch2, now integrated into PostgreSQL 8.3 or in contrib for earlier versions.
The SERIAL type is an auto-incrementing INTEGER sequence (implicitly created at CREATE TABLE time), maintained externally from the table (akin to an index). Autoincrementing integers suck in distributed scenarios, consider a UUID or composite key instead.
PostgreSQL couldn't care less, and stores them all as varying text anyway. Insisting on a length won't make it faster or use less disk space.
Create your database with UTF8 encoding.
As of 8.3 one can now go
CREATE TYPE type AS ENUM ('value1', 'value2')
column type

Useful magic MySQL incantation:

mysqldump fez_svn_empty -v -n --compatible=ansi,postgresql --complete-insert=TRUE --extended-insert=FALSE --compact --default-character-set=UTF8 -r fez_svn_empty.sql

For now, remove all KEY and FULLTEXT KEY declarations from the above sql file

Importing rows from a MySQL dump that uses INTEGER AUTO_INCREMENT for primary keys into a new PostgreSQL database using SERIAL primary keys is problematic. Options include:

reset the PostgreSQL sequences to start from MAX(idcolumn) + 1.
SELECT MAX(idcolumn) + 1 AS startvalue
CREATE SEQUENCE newsequence INCREMENT BY 1 MINVALUE 1 NO MAXVALUE START WITH startvalue CACHE 1 NO CYCLE
ALTER TABLE table ALTER COLUMN idcolumn SET DEFAULT NEXTVAL('newsequence')
edit the dump to remove idcolumn values from the INSERT statements (highly tedious and probably not easily scriptable)

Script To Update Sequences

A quick dirty script to update sequences to start from above the maximum value of the id field. Adjust to requirements.

#!/usr/bin/python
import sys
import string
from pyPgSQL import PgSQL

def main(filename):
    try:
        conn = PgSQL.connect(database='dbname', user='username', host='localhost', password='password')
        cur = conn.cursor()
    except PgSQL.Error, msg:
        print "*** PostgreSQL Connection Error: %s" % msg
        return 1

    try:
        f = open(filename, "r")
    except:
        print "*** File not found: %s" % filename
        return 1

    for line in f:
        if (line.find("CREATE TABLE ") != -1):
            tablename = line.split()[2]
            seqname = ""
        elif (line.find(" SERIAL ") != -1):
            columnname = line.split()[0]
            seqname = "\"%s_%s_seq\"" % (tablename.strip('"'), columnname.strip('"'))

        if seqname != "":
            maxsql = "SELECT MAX(%s)+1 AS seqrestart FROM %s;" % (columnname, tablename)
            cur.execute(maxsql)
            res = cur.fetchall()
            seqrestart = res[0].seqrestart
            if seqrestart > 0:
                seqsql = "ALTER SEQUENCE %s RESTART WITH %s;" % (seqname, seqrestart)
                print seqsql
                cur.execute(seqsql)
            seqname = ""
    
    cur.close()
    conn.commit()
    del cur, conn
    return 0

if __name__ == "__main__":
    sys.exit(main("/path/to/your_schema.sql"))

@@ Line 6: / Line 6: @@
 |-
 | LIMIT ''offset'', ''count''    ||  LIMIT ''count'' OFFSET ''offset''  ||  Yes
+|-
+| MATCH (''column(s)'') AGAINST ''query''    ||  ''ftsindex'' @@ ''query''::tsquery ||  No
 |-
 | BOOLEAN <br /> INT <br /> INT(''n'') <br /> INTEGER(''n'') <br /> TINYINT <br /> TINYINT(''n'') <br /> TINYINTEGER <br /> TINYINTEGER(''n'') <br /> MEDIUMINT <br /> MEDIUMINT(''n'') <br /> MEDIUMINTEGER <br /> MEDIUMINTEGER(''n'') <br /> SMALLINT <br /> SMALLINT(''n'') <br /> SMALLINTEGER <br /> SMALLINTEGER(''n'')   ||  INTEGER  ||  Yes<sup>(1)</sup>
@@ Line 36: / Line 38: @@
 === Notes ===
 # PostgreSQL has INT2 (SMALLINT), INT4 (INTEGER) and INT8 (BIGINT), but use INTEGER unless you want ones > 2<sup>31</sup> in which case use BIGINT.
+# [[PostgreSQL full text searching]] is done with tsearch2, now integrated into PostgreSQL 8.3 or in contrib for earlier versions.
 # The SERIAL type is an auto-incrementing INTEGER sequence (implicitly created at CREATE TABLE time), maintained externally from the table (akin to an index). Autoincrementing integers suck in distributed scenarios, consider a UUID or composite key instead.
 # PostgreSQL couldn't care less, and stores them all as varying text anyway. Insisting on a length won't make it faster or use less disk space.