What have you found for these years?

2011-09-28

copying table to table efficiently in postgres

Here's the background. I want to merge three tables into one,
and I don't care about primary keys. Also, we need to leave a
metadata to record where the record is copied from.

We are using PostgreSQL and ActiveRecord. The latter is not
important, it's just a convenient tool. All the tricks are inside SQL.

Suppose we have "From" model and "froms" table, and we want to
copy to "To" model and "tos" table in this case. Then we'll do:

columns = From.columns.inject([]){ |r, i|
  r << i.name if i.name != 'id'
  r
}
conn = ActiveRecord::Base.connection
puts "Dumping #{From}..."
conn.query(<<-SQL)
  COPY
  (SELECT #{columns.join(', ')}, '#{From.table_name}' FROM #{From.table_name})
  TO '/tmp/path-to-intermediate-file'
SQL
#                                ^^^^this is the metadata
puts "Restoring #{To}..."
conn.query(<<-SQL)
  COPY
  #{To.table_name}(#{columns.join(', ')}, from_table)
  FROM '/tmp/path-to-intermediate-file'
SQL
#                                ^^^^this is the metadata
Previously, I am running ActiveRecord with batch find and inserts.
(because that's what others did previously)
I run it overnight and it hadn't finished...
Now with COPY it runs in minutes!
36000 seconds => 120 seconds. It's 30000% improvement....

This reminds me once Heroku was using taps to transfer database.
That is awfully awfully awfully slow and I must say stupid.

At that time, I wrote an eventmachine based receiver and use
Heroku console to send data in batch. It's not very convenient to
use it, since some tunneling is also involved, but it also runs about
30000% or even more faster....

I guess they don't need to handle large data...

0 retries:

Post a Comment

Note: Only a member of this blog may post a comment.



All texts are licensed under CC Attribution 3.0