Stochastic Nonsense

Put something smart here.

Plotting With Custom X Axis Labels in R -- Part 5 in a Series

This is post #05 in a running series about plotting in R.

There are a variety of ways to control how R creates x and y axis labels for plots. Let’s walk through the typical process of creating good labels for our YHOO stock price close plot (see part 4).

Reviewing our plot from last time, we left off with code that plots two line series in different colors and different line widths.

1
2
3
4
plot(x=yahoo2$date, y=yahoo2$close, ylim=c(0,1.1*max(yahoo2$close)),
    col='black', type='l',
    main='YHOO stock close', xlab='date', ylab='close ($)')
points(x=yahoo2$date, y=yahoo2$close30, col='red', type='l', lwd=2)

Unfortunately, while R understands our X axis data as dates, it doesn’t choose optimal labels for our purposes.

Instead, let’s try labeling the first day of the month in each business quarter. To do this, we use the format function on dates to pick out the first (day 01) of every month, and select months 1,4,9, and 12 for the business quarters. Note that R allows us to use the %in% operator to ask if a value is contained in a vector. Further, note that format produces text, not numeric values, so we have to match the results against an array of strings.

1
2
3
4
5
at = format(yahoo2$date, '%m') %in% c('01', '04', '09', '12') & format(yahoo2$date, '%d') == '01'

# the first of many months isn't in the data
yahoo2$date[ at ]
[1] "2008-04-01" "2008-12-01" "2009-04-01"

Which is a little disappointing — we’re only left with three data values. Nonetheless, let’s see what it looks like:

1
2
3
4
5
6
7
8
9
plot(x=yahoo2$date, y=yahoo2$close, ylim=c(0,1.1*max(yahoo2$close)),
  col='black', type='l',
  main='YHOO stock close', xlab='date', ylab='close ($)',
  xaxt='n')

points(x=yahoo2$date, y=yahoo2$close30, col='red', type='l', lwd=2)

# create labels at side 1 (bottom), at the dates we've selected, and with abbreviated month - year labels
axis(side=1, at=yahoo2$date[ at ], labels=format(yahoo2$date[at], '%b-%y'))

Which produces

Walking through the code, in the plot call, we use xaxt='n' to tell plot not to create X axis labels. The format command asks which dates are in the months (1,4,7,10) that start quarters, and the second format command asks which days are the first of the month. You’ll note that we don’t have many dates on our graph — that’s because often, the first day of the month isn’t in our data! Only 3 days in the data are both on the first and the beginning of a new quarter.

Instead, let’s just find the first day of each month that is present in the data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
locs <- tapply(X=yahoo2$date, FUN=min, INDEX=format(yahoo2$date, '%Y%m'))
t(t(locs))
[,1]
200801 13880
200802 13910
200803 13941
200804 13970
200805 14000
200806 14032
200807 14061
200808 14092
200809 14124
200810 14153
200811 14186
200812 14214
200901 14246
200902 14277
200903 14305
200904 14335
200905 14365
200906 14396
200907 14426

# find the dates we have selected
at = yahoo2$date %in% locs
yahoo2$date[at]
[1] "2008-01-02" "2008-02-01" "2008-03-03" "2008-04-01" "2008-05-01" "2008-06-02" "2008-07-01" "2008-08-01"
[9] "2008-09-02" "2008-10-01" "2008-11-03" "2008-12-01" "2009-01-02" "2009-02-02" "2009-03-02" "2009-04-01"
[17] "2009-05-01" "2009-06-01" "2009-07-01"

tapply is an extraordinarily handy R function that runs a user supplied function, in this case min, on data, returning one value for each unique level of the factor supplied in INDEX. When we print loc, the first column is our unique factor — a combination of year and month — and the second column is the minimum date value for that factor. We then select select the first dates in each month and further select just those months that are the beginning of new business quarters:

1
2
3
at = at & format(yahoo2$date, '%m') %in% c('01', '04', '07', '10')
yahoo2$date[at]
[1] "2008-01-02" "2008-04-01" "2008-07-01" "2008-10-01" "2009-01-02" "2009-04-01" "2009-07-01"

Finally, we bring this all together to plot the data and format the X axis to show the first date in each quarter, adding vertical lines to draw the eye to the quarter divisions.

1
2
3
4
5
6
7
8
9
plot(x=yahoo2$date, y=yahoo2$close, ylim=c(0,1.1*max(yahoo2$close)),
  col='black', type='l',
  main='YHOO stock close', xlab='date', ylab='close ($)',
  xaxt='n')

points(x=yahoo2$date, y=yahoo2$close30, col='red', type='l', lwd=2)

axis(side=1, at=yahoo2$date[ at ],    labels=format(yahoo2$date[at], '%b-%y'))
abline(v=yahoo2$date[at], col='grey', lwd=0.5)