Difference between revisions of "PostgreSQL/From MySQL"

From Jon's Wiki
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
The following table shows how to get around some MySQL-specific non SQL-92 syntax.
+
The following table shows how to get around some of the weird MySQL-specific, non-standard SQL syntax.
 +
 
 +
;Note
 +
:After a bunch of editing here, I found this [http://en.wikibooks.org/wiki/Programming:Converting_MySQL_to_PostgreSQL "Converting MySQL to PostgreSQL" wikibook].
  
 
{| border=1
 
{| border=1
Line 5: Line 8:
 
|  '''MySQL'''  ||  '''PostgreSQL'''  ||  '''Supported by both?'''
 
|  '''MySQL'''  ||  '''PostgreSQL'''  ||  '''Supported by both?'''
 
|-
 
|-
| LIMIT x, y   ||  LIMIT x OFFSET y ||  Yes
+
| LIMIT ''offset'', ''count''   ||  LIMIT ''count'' OFFSET ''offset'' ||  Yes
 +
|-
 +
| MATCH (''column(s)'') AGAINST ''query''    ||  ''ftsindex'' @@ ''query''::tsquery ||  No<sup>(1)</sup>
 +
|-
 +
| IFNULL(''expr1'', ''expr2'') || CASE WHEN ''expr1'' IS NULL THEN ''expr2'' ELSE ''expr1'' END || Yes
 
|-
 
|-
| BOOLEAN <br /> INT <br /> INT(''n'') <br /> INTEGER(''n'') <br /> TINYINT <br /> TINYINT(''n'') <br /> TINYINTEGER <br /> TINYINTEGER(''n'') <br /> MEDIUMINT <br /> MEDIUMINT(''n'') <br /> MEDIUMINTEGER <br /> MEDIUMINTEGER(''n'') <br /> SMALLINT <br /> SMALLINT(''n'') <br /> SMALLINTEGER <br /> SMALLINTEGER(''n'')  ||  INTEGER  ||  Yes<sup>(1)</sup>
+
| REPLACE INTO ''table'' (''pkcolumn'', ''column2'') <br /> VALUES ('key', 'value2') || INSERT INTO ''table'' (''pkcolumn'', ''column2'') <br /> SELECT 'key', 'value2' FROM ''table'' <br /> WHERE 'key' NOT IN (SELECT ''pkcolumn'' FROM ''table'') LIMIT 1
 +
|| No
 +
|-
 +
| BOOLEAN <br /> INT <br /> INT(''n'') <br /> INTEGER(''n'') <br /> TINYINT <br /> TINYINT(''n'') <br /> TINYINTEGER <br /> TINYINTEGER(''n'') <br /> MEDIUMINT <br /> MEDIUMINT(''n'') <br /> MEDIUMINTEGER <br /> MEDIUMINTEGER(''n'') <br /> SMALLINT <br /> SMALLINT(''n'') <br /> SMALLINTEGER <br /> SMALLINTEGER(''n'')  ||  INTEGER  ||  Yes<sup>(2)</sup>
 
|-
 
|-
 
| BOOLEAN      ||  INTEGER CHECK (''column'' IN ('0', '1')) || No
 
| BOOLEAN      ||  INTEGER CHECK (''column'' IN ('0', '1')) || No
Line 15: Line 25:
 
| INTEGER SIGNED    ||  INTEGER                            || Yes
 
| INTEGER SIGNED    ||  INTEGER                            || Yes
 
|-
 
|-
| INTEGER AUTO_INCREMENT    ||  SERIAL PRIMARY KEY    || No<sup>(2)</sup>
+
| INTEGER AUTO_INCREMENT    ||  SERIAL PRIMARY KEY    || No<sup>(3)</sup>
 
|-
 
|-
| VARCHAR(''n'') <br /> CHAR(''n'') <br /> LONGTEXT <br /> BLOB <br /> LONGBLOB    ||  TEXT    || Yes<sup>(3)</sup>  
+
| VARCHAR(''n'') <br /> CHAR(''n'') <br /> LONGTEXT <br /> BLOB <br /> LONGBLOB    ||  TEXT    || Yes<sup>(4)</sup>  
 
|-
 
|-
 
| DATETIME <br /> DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00'  || TIMESTAMP <br> TIMESTAMP NOT NULL DEFAULT NOW() || No
 
| DATETIME <br /> DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00'  || TIMESTAMP <br> TIMESTAMP NOT NULL DEFAULT NOW() || No
Line 23: Line 33:
 
| '0000-00-00 00:00:00' || NOW() || No
 
| '0000-00-00 00:00:00' || NOW() || No
 
|-
 
|-
| CHARACTER SET UTF8 COLLATE UTF8_BIN ||    || No<sup>(4)</sup>
+
| CHARACTER SET UTF8 COLLATE UTF8_BIN ||    || No<sup>(5)</sup>
 
|-
 
|-
| ''column'' SET('value1', 'value2') || ''column'' TEXT CHECK (''column'' IN ('value1', 'value2')) || No<sup>(5)</sup>
+
| ''column'' SET('value1', 'value2') || ''column'' TEXT CHECK (''column'' IN ('value1', 'value2')) || No<sup>(6)</sup>
 
|-
 
|-
 
| ''<columndef>'' COMMENT ''comment'' || COMMENT ON COLUMN ''column'' IS 'comment'  || No
 
| ''<columndef>'' COMMENT ''comment'' || COMMENT ON COLUMN ''column'' IS 'comment'  || No
Line 35: Line 45:
  
 
=== Notes ===
 
=== Notes ===
 +
# [[PostgreSQL full text searching]] is done with tsearch2, now integrated into PostgreSQL 8.3 or in contrib for earlier versions.
 
# PostgreSQL has INT2 (SMALLINT), INT4 (INTEGER) and INT8 (BIGINT), but use INTEGER unless you want ones > 2<sup>31</sup> in which case use BIGINT.
 
# PostgreSQL has INT2 (SMALLINT), INT4 (INTEGER) and INT8 (BIGINT), but use INTEGER unless you want ones > 2<sup>31</sup> in which case use BIGINT.
 
# The SERIAL type is an auto-incrementing INTEGER sequence (implicitly created at CREATE TABLE time), maintained externally from the table (akin to an index). Autoincrementing integers suck in distributed scenarios, consider a UUID or composite key instead.
 
# The SERIAL type is an auto-incrementing INTEGER sequence (implicitly created at CREATE TABLE time), maintained externally from the table (akin to an index). Autoincrementing integers suck in distributed scenarios, consider a UUID or composite key instead.
Line 45: Line 56:
  
 
For now, remove all KEY and FULLTEXT KEY declarations from the above sql file
 
For now, remove all KEY and FULLTEXT KEY declarations from the above sql file
 +
 +
Importing rows from a MySQL dump that uses INTEGER AUTO_INCREMENT for primary keys into a new PostgreSQL database using SERIAL primary keys is problematic. Options include:
 +
# reset the PostgreSQL sequences to start from MAX(''idcolumn'') + 1. <br /> SELECT MAX(''idcolumn'') + 1 AS ''startvalue'' <br /> ALTER SEQUENCE ''sequence'' RESTART WITH ''startvalue''
 +
# edit the dump to remove ''idcolumn'' values from the INSERT statements (highly tedious and probably not easily scriptable)
 +
 +
== Script To Update Sequences ==
 +
 +
A quick dirty script to update sequences to start from above the maximum value of the id field. Adjust to requirements.
 +
 +
<pre>
 +
#!/usr/bin/python
 +
import sys
 +
import string
 +
from pyPgSQL import PgSQL
 +
 +
def main(filename):
 +
    try:
 +
        conn = PgSQL.connect(database='dbname', user='username', host='localhost', password='password')
 +
        cur = conn.cursor()
 +
    except PgSQL.Error, msg:
 +
        print "*** PostgreSQL Connection Error: %s" % msg
 +
        return 1
 +
 +
    try:
 +
        f = open(filename, "r")
 +
    except:
 +
        print "*** File not found: %s" % filename
 +
        return 1
 +
 +
    for line in f:
 +
        if (line.find("CREATE TABLE ") != -1):
 +
            tablename = line.split()[2]
 +
            seqname = ""
 +
        elif (line.find(" SERIAL ") != -1):
 +
            columnname = line.split()[0]
 +
            seqname = "\"%s_%s_seq\"" % (tablename.strip('"'), columnname.strip('"'))
 +
 +
        if seqname != "":
 +
            maxsql = "SELECT MAX(%s)+1 AS seqrestart FROM %s;" % (columnname, tablename)
 +
            cur.execute(maxsql)
 +
            res = cur.fetchall()
 +
            seqrestart = res[0].seqrestart
 +
            if seqrestart > 0:
 +
                seqsql = "ALTER SEQUENCE %s RESTART WITH %s;" % (seqname, seqrestart)
 +
                print seqsql
 +
                cur.execute(seqsql)
 +
            seqname = ""
 +
   
 +
    cur.close()
 +
    conn.commit()
 +
    del cur, conn
 +
    return 0
 +
 +
if __name__ == "__main__":
 +
    sys.exit(main("/path/to/your_schema.sql"))
 +
</pre>

Latest revision as of 02:25, 24 October 2014

The following table shows how to get around some of the weird MySQL-specific, non-standard SQL syntax.

Note
After a bunch of editing here, I found this "Converting MySQL to PostgreSQL" wikibook.
MySQL PostgreSQL Supported by both?
LIMIT offset, count LIMIT count OFFSET offset Yes
MATCH (column(s)) AGAINST query ftsindex @@ query::tsquery No(1)
IFNULL(expr1, expr2) CASE WHEN expr1 IS NULL THEN expr2 ELSE expr1 END Yes
REPLACE INTO table (pkcolumn, column2)
VALUES ('key', 'value2')
INSERT INTO table (pkcolumn, column2)
SELECT 'key', 'value2' FROM table
WHERE 'key' NOT IN (SELECT pkcolumn FROM table) LIMIT 1
No
BOOLEAN
INT
INT(n)
INTEGER(n)
TINYINT
TINYINT(n)
TINYINTEGER
TINYINTEGER(n)
MEDIUMINT
MEDIUMINT(n)
MEDIUMINTEGER
MEDIUMINTEGER(n)
SMALLINT
SMALLINT(n)
SMALLINTEGER
SMALLINTEGER(n)
INTEGER Yes(2)
BOOLEAN INTEGER CHECK (column IN ('0', '1')) No
INTEGER UNSIGNED INTEGER CHECK (column > -1) No
INTEGER SIGNED INTEGER Yes
INTEGER AUTO_INCREMENT SERIAL PRIMARY KEY No(3)
VARCHAR(n)
CHAR(n)
LONGTEXT
BLOB
LONGBLOB
TEXT Yes(4)
DATETIME
DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00'
TIMESTAMP
TIMESTAMP NOT NULL DEFAULT NOW()
No
'0000-00-00 00:00:00' NOW() No
CHARACTER SET UTF8 COLLATE UTF8_BIN No(5)
column SET('value1', 'value2') column TEXT CHECK (column IN ('value1', 'value2')) No(6)
<columndef> COMMENT comment COMMENT ON COLUMN column IS 'comment' No
[UNIQUE] KEY indexname (column1, column2) CREATE [UNIQUE] INDEX indexname ON tablename (column1, column2) Yes

Notes

  1. PostgreSQL full text searching is done with tsearch2, now integrated into PostgreSQL 8.3 or in contrib for earlier versions.
  2. PostgreSQL has INT2 (SMALLINT), INT4 (INTEGER) and INT8 (BIGINT), but use INTEGER unless you want ones > 231 in which case use BIGINT.
  3. The SERIAL type is an auto-incrementing INTEGER sequence (implicitly created at CREATE TABLE time), maintained externally from the table (akin to an index). Autoincrementing integers suck in distributed scenarios, consider a UUID or composite key instead.
  4. PostgreSQL couldn't care less, and stores them all as varying text anyway. Insisting on a length won't make it faster or use less disk space.
  5. Create your database with UTF8 encoding.
  6. As of 8.3 one can now go
    CREATE TYPE type AS ENUM ('value1', 'value2')
    column type

Useful magic MySQL incantation:

mysqldump fez_svn_empty -v -n --compatible=ansi,postgresql --complete-insert=TRUE --extended-insert=FALSE --compact --default-character-set=UTF8 -r fez_svn_empty.sql

For now, remove all KEY and FULLTEXT KEY declarations from the above sql file

Importing rows from a MySQL dump that uses INTEGER AUTO_INCREMENT for primary keys into a new PostgreSQL database using SERIAL primary keys is problematic. Options include:

  1. reset the PostgreSQL sequences to start from MAX(idcolumn) + 1.
    SELECT MAX(idcolumn) + 1 AS startvalue
    ALTER SEQUENCE sequence RESTART WITH startvalue
  2. edit the dump to remove idcolumn values from the INSERT statements (highly tedious and probably not easily scriptable)

Script To Update Sequences

A quick dirty script to update sequences to start from above the maximum value of the id field. Adjust to requirements.

#!/usr/bin/python
import sys
import string
from pyPgSQL import PgSQL

def main(filename):
    try:
        conn = PgSQL.connect(database='dbname', user='username', host='localhost', password='password')
        cur = conn.cursor()
    except PgSQL.Error, msg:
        print "*** PostgreSQL Connection Error: %s" % msg
        return 1

    try:
        f = open(filename, "r")
    except:
        print "*** File not found: %s" % filename
        return 1

    for line in f:
        if (line.find("CREATE TABLE ") != -1):
            tablename = line.split()[2]
            seqname = ""
        elif (line.find(" SERIAL ") != -1):
            columnname = line.split()[0]
            seqname = "\"%s_%s_seq\"" % (tablename.strip('"'), columnname.strip('"'))

        if seqname != "":
            maxsql = "SELECT MAX(%s)+1 AS seqrestart FROM %s;" % (columnname, tablename)
            cur.execute(maxsql)
            res = cur.fetchall()
            seqrestart = res[0].seqrestart
            if seqrestart > 0:
                seqsql = "ALTER SEQUENCE %s RESTART WITH %s;" % (seqname, seqrestart)
                print seqsql
                cur.execute(seqsql)
            seqname = ""
    
    cur.close()
    conn.commit()
    del cur, conn
    return 0

if __name__ == "__main__":
    sys.exit(main("/path/to/your_schema.sql"))