Stochastic Nonsense

Put something smart here.

Querying Databases in R

One of the first things you’ll want to do in R is set it up to talk to databases. The easiest way to do this is using ODBC, via package RODBC.

To get the package, run `

> install.packages(RODBC)

`

Once you have RODBC installed, you call it in R as follows. But it’s very simple: a bit of setup, then sqlQuery will run your sql and return the results in a data frame. `

library(RODBC)

db <- odbcConnect( dsn='your dsn name' )
sql <- 'select page_id, count(*) as cnt
           from document_ads
           group by page_id
           having count(*) > 1'

results <- sqlQuery(db, sql, errors=T, rows_at_time=1024)
str(results)
'data.frame':   282432 obs. of  2 variables:
 $ page_id: int  17646774 17115332 17606022 15899428 17099174 17283774 8604200 16315025 17259751 17283270 ...
 $ cnt            : int  489 1119 132 113 148 200 112 121 1135 633 ...

`

On Windows, you setup the DSNs in the ODBC Data Sources inside the control panel; on MacOS, mysql includes a program called ODBC Administrator; on linux, you’ll have to install unixODBC .

Also, it’s often convenient to write code that caches your query results, particularly if the query takes a while. I’ve found that the easiest thing to do is write the results into a data file and check for the file existence like such: `

filename <- 'query cache.RData'
if (!file.exists(filename)){
   # don't have a cached copy so run the query
   library(RODBC)
   [snip]
   query1 <- sqlQuery(db, sql, errors=T, rows_at_time=1024)

   # save the query results for the future
   save(list=c('query1', 'sql'), file=filename)
   rm(list=c('query1', 'sql') )
}
load(file=filename)

</code