Stochastic Nonsense

Put something smart here.

MySQL, Batch Imports, and Rails

I really love Rails, but it’s not the most performant code in the world. Though it doesn’t often arise in CRUD programming, if you do any sort of stats, ML, or data analytics, you’ll frequently find yourself wanting to import lots of data into your db. You could create an ActiveRecord object for each row, but this is glacial, requiring one round trip to the db server per row, and is likely to abuse the kindness of your dba. Instead, there is a wonderful gem called ar-extensions that allows you to access mysql’s native bulk import facilities. To use it you just call Model.import with arrays of data and their corresponding fields. For example, say I have a table like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
mysql> describe adsense_analytics_days;
+------------------+----------+------+-----+---------+----------------+
| Field            | Type     | Null | Key | Default | Extra          |
+------------------+----------+------+-----+---------+----------------+
| id               | int(11)  | NO   | PRI | NULL    | auto_increment | 
| page_id          | int(11)  | NO   | MUL | NULL    |                | 
| impressions      | int(11)  | YES  |     | NULL    |                | 
| clicked          | int(11)  | YES  |     | NULL    |                | 
| ecpm             | float    | YES  |     | NULL    |                | 
| ctr              | float    | YES  |     | NULL    |                | 
| cpc              | float    | YES  |     | NULL    |                | 
| revenue          | float    | YES  |     | NULL    |                | 
| start_date       | date     | YES  | MUL | NULL    |                | 
| end_date         | date     | YES  |     | NULL    |                | 
| created_at       | datetime | YES  |     | NULL    |                | 
+------------------+----------+------+-----+---------+----------------+
11 rows in set (0.09 sec)
mysql>

This has a corresponding model AdsenseAnalyticsDay. Batch importing with rails is then trivial:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
require 'ar-extensions'
require 'ar-extensions/import/mysql'

# instead of
if false
  rows.each do |row|
    AdsenseAnalyticsDay.create( ) # etc
  end
end

# you can accomplish a bulk import from, eg, a csv as such:
f = File.new('bulk_import.csv', 'r')
data = []
while line = f.gets
  puts "#{line}" if rand(1000) >= 999
  # pid, impr, clicked, ecpm, ctr, cpc, revenue, start_date, end_date, created_at
  d = line.split(',')
  (0..2).each{ |i| d[i] = d[i].to_i  }
  (3..6).each{ |i| d[i] = d[i].to_f }
  data << d[0..8]
end
f.close

fields = [:page_id, :impressions, :clicked, :ecpm, :ctr, :cpc, :revenue, :start_date, :end_date]
AdsenseAnalyticsDay.import(fields, data, {:validate => false })

where the csv looks like:

1
2
3
4
5
6
7
8
9
10
11
12
$ head bulk_import.csv
0,344,5,0.755814,0.0145349,0.052,0.26,2009-08-06,2009-08-06,2009-08-10 19:49:12
1,8,1,0,0.125,0,0,2009-08-06,2009-08-06,2009-08-10 19:49:12
2,32,9,76.875,0.28125,0.273333,2.46,2009-08-06,2009-08-06,2009-08-10 19:49:12
4,16,1,1.875,0.0625,0.03,0.03,2009-08-06,2009-08-06,2009-08-10 19:49:12
6,17,2,8.82353,0.117647,0.075,0.15,2009-08-06,2009-08-06,2009-08-10 19:49:12
12,15,1,80,0.0666667,1.2,1.2,2009-08-06,2009-08-06,2009-08-10 19:49:12
34,5,0,0,0,0,0,2009-08-06,2009-08-06,2009-08-10 19:49:12
36,2,0,0,0,0,0,2009-08-06,2009-08-06,2009-08-10 19:49:12
39,46,2,11.7391,0.0434783,0.27,0.54,2009-08-06,2009-08-06,2009-08-10 19:49:12
41,3,0,0,0,0,0,2009-08-06,2009-08-06,2009-08-10 19:49:12
$