Column Names of R Data Frames - Stochastic Nonsense

Say you read a data frame from a file but you don’t like the column names. Here’s how you go about labelling them as you like. Start with a simple csv file:

col1, col2, col3
"1,233", "$12.79", "$1,333,233.17"
"470", "$1,113.22", "$0.12"

Load it, and see what we get:

data <- read.csv(file='~/stuff/blog/dirty.csv', header=T, sep=',')
> data
col1       col2           col3
1 1,233     $12.79  $1,333,233.17
2   470  $1,113.22          $0.12
> str(data)
'data.frame': 2 obs. of  3 variables:
$ col1: Factor w/ 2 levels "1,233","470": 1 2
$ col2: Factor w/ 2 levels " $1,113.22"," $12.79": 2 1
$ col3: Factor w/ 2 levels " $0.12"," $1,333,233.17": 2 1
>

Now, lets examine the column names (and also note how we see how many there are) using colnames, nrow, ncol, dim:

> colnames( data )
[1] "col1" "col2" "col3"
> nrow(data)
[1] 2
> ncol(data)
[1] 3
> dim(data)
[1] 2 3

And R allows us to modify the column names of a data frame by assigning to the array produced by colnames:

> colnames(data)
[1] "col1" "col2" "col3"
>
> # set the name of column 2
> colnames(data)[2] <- 'column 2'
> colnames(data)
[1] "col1"     "column 2" "col3"
>     
> # you can assign all of the columns at once, if you wish
> colnames(data) <- c( 'col 1', 'col 2', 'col 3')
> colnames(data)
[1] "col 1" "col 2" "col 3"
> str(data)
'data.frame': 2 obs. of  3 variables:
$ col 1: Factor w/ 2 levels "1,233","470": 1 2
$ col 2: Factor w/ 2 levels " $1,113.22"," $12.79": 2 1
$ col 3: Factor w/ 2 levels " $0.12"," $1,333,233.17": 2 1