One of the first things you’ll want to do in R is set it up to talk to databases. The easiest way to do this is using ODBC, via package RODBC.
To get the package, run `
> install.packages(RODBC)
`
Once you have RODBC installed, you call it in R as follows. But it’s very simple: a bit of setup, then sqlQuery will run your sql and return the results in a data frame. `
library(RODBC)
db <- odbcConnect( dsn='your dsn name' )
sql <- 'select page_id, count(*) as cnt
from document_ads
group by page_id
having count(*) > 1'
results <- sqlQuery(db, sql, errors=T, rows_at_time=1024)
str(results)
'data.frame': 282432 obs. of 2 variables:
$ page_id: int 17646774 17115332 17606022 15899428 17099174 17283774 8604200 16315025 17259751 17283270 ...
$ cnt : int 489 1119 132 113 148 200 112 121 1135 633 ...
`
On Windows, you setup the DSNs in the ODBC Data Sources inside the control panel; on MacOS, mysql includes a program called ODBC Administrator; on linux, you’ll have to install unixODBC .
Also, it’s often convenient to write code that caches your query results, particularly if the query takes a while. I’ve found that the easiest thing to do is write the results into a data file and check for the file existence like such: `
filename <- 'query cache.RData'
if (!file.exists(filename)){
# don't have a cached copy so run the query
library(RODBC)
[snip]
query1 <- sqlQuery(db, sql, errors=T, rows_at_time=1024)
# save the query results for the future
save(list=c('query1', 'sql'), file=filename)
rm(list=c('query1', 'sql') )
}
load(file=filename)
</code