Stochastic Nonsense

Put something smart here.

Multiple Y Axes in R Plots -- Part 9 in a Series

This is post #09 in a running series about plotting in R.

Frequently, you want to plot data that is not at all on the same scale. In R, this is done via plotting a second graph on top of your first and building the axes labels by hand. Here’s a rough outline:

1
2
3
> plot  <-- first plot
> par(new=T)   <-- tell R to overwrite the first plot
> plot( ..., axes=F, ... )   <-- plot our second plot, but don't touch the axes

With that in mind, let’s continue our long running example and plot both YHOO and GOOG stock prices on the same graph, along with moving averages for both.

Here, again, are both data series: google data and yahoo data.

First, let’s prep our google data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
> goog <- read.csv(file='~/stuff/blog/GOOG stock prices [19960412, 20090702].csv', header=T, sep=',')
> colnames(goog) <- tolower( colnames(goog) )
> 
> goog$date <- as.Date( as.character( goog$date ) )
> goog <- goog[order(goog$date),]
> 
> # util functions
> summary30 <- function( x, FUN, na.rm=F ){
+     val <- rep( 0, length( x ) )
+     for( j in 1:length( x ) ){
+         val[ j ] <- FUN( x[ max( j - 29, 1 ):j ], na.rm=na.rm)
+     }
+     val
+ }
> 
> goog$close30 <- ma30(goog$close)
> goog2 <- goog[ goog$date >= as.Date('2008-01-01'),]
> 

This is exactly how we prepped the yahoo data.

Initially, let’s just try plotting both sets of series on the same scale and see what happens.

1
2
3
4
5
6
7
8
9
> plot(x=goog2$date, y=goog2$close, ylim=c(0,1.1*max(goog2$close)),
+     col='black', type='l',
+     main='goog stock close', xlab='date', ylab='close ($)',
+     xaxt='n')
> 
> points(x=goog2$date, y=goog2$close30, col='green', type='l', lwd=2)
> 
> points(x=yahoo2$date, y=yahoo2$close, col='black', type='l')
> points(x=yahoo2$date, y=yahoo2$close30, col='red', type='l')

While we did get all four time series onto the same plot, the yahoo data is so squashed that you can’t really tell what’s going on.

So let’s try the above approach and create independent scales / Y axes for the two sets of time series:

1
2
3
4
5
6
7
8
9
10
11
12
13
> plot(x=goog2$date, y=goog2$close, ylim=c(0,1.1*max(goog2$close)),
+     col='black', type='l',
+     main='goog stock close', xlab='date', ylab='close ($)',
+     xaxt='n')
> 
> points(x=goog2$date, y=goog2$close30, col='green', type='l', lwd=2)
> 
> par(new=T)
> plot(x=yahoo2$date, y=yahoo2$close, ylim=c(0,1.1*max(yahoo2$close)),
+     col='black', type='l', lty=2,
+     xaxt='n', axes=F, ylab='')
> 
> points(x=yahoo2$date, y=yahoo2$close30, col='red', type='l', lwd=2, lty=2)

I attempted to use dashed (lty=2) instead of solid lines to differentiate the two sets of data, but it’s clearly not a good outcome. Instead, let’s color both time series — the daily observations and the moving average — the same colors for each stock, and rely on line width to differentiate within stocks. I also switched blue for green for Google as it shows up much better. You can pass a col parameter to the axis functions, so let’s set the axis line and tick marks to the same color as our series to help associate the series with their proper scales.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> plot(x=goog2$date, y=goog2$close, ylim=c(0,1.1*max(goog2$close)),
+     col='blue', type='l',
+     main='Google (GOOG) vs Yahoo (YHOO) stock close', xlab='date', ylab='close ($)',
+     xaxt='n', yaxt='n', lwd=0.75)
> 
> points(x=goog2$date, y=goog2$close30, col='blue', type='l', lwd=2.5)
> axis(2, pretty(c(0, 1.1*max(goog2$close))), col='blue')
> 
> par(new=T)
> plot(x=yahoo2$date, y=yahoo2$close, ylim=c(0,1.1*max(yahoo2$close)),
+     col='red', type='l', lwd=0.75,
+     xaxt='n', axes=F, ylab='')
> 
> points(x=yahoo2$date, y=yahoo2$close30, col='red', type='l', lwd=2.5)
> 
> axis(4, pretty(c(0, 1.1*max(yahoo2$close))), col='red')
> 

This is definitely an improvement — you can see the differences in the two stocks, and the lines can easily be visually distinguished. Nonetheless, the blue color of the left axis is pretty faint, the red color of the right axis is left intuitive than I would have liked, and the tick marks not only are on different scales but occur with much different frequency. The last is an unfortunate side effect of how pretty, an R function used to attempt to pick out nice values, works.

So for our final plot, I decided to create nicer tick values by hand. If you look at the maxima of the two series, you’ll note that you can round them up a little and pick a set of numbers with a nice ratio. So I’ll adjust the two scales so that each tick for google is 35 times the same yahoo tick.

1
2
3
4
5
6
> max(goog2$close)
[1] 685.33
> max(yahoo2$close)
[1] 29.98
> 700/35
[1] 20

I’m also going to stick both Y axes on the left to really help distinguish between the two stocks, and in doing so, I’ll have to manually move the Y axis label out farther to accommodate. This can be accomplished with the oma, or outer margin, parameter to par. I’ll also bring back the fancy X axis labels from part 6 of this series.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
> # create label locations for the yahoo data -- pretty values for the range [0, 35]
> yat <- pretty(c(0, 35))
> 
> # add extra room to the left of the plot 
> par(oma=c(0,2,0,0))
>
> # plot, but don't label any of the axes
> plot(x=goog2$date, y=goog2$close, ylim=c(0,700),
+     col='blue', type='l',
+     main='Google (GOOG) vs Yahoo (YHOO) stock close', xlab='date', ylab='',
+     xaxt='n', yaxt='n', lwd=0.75)
> 
> points(x=goog2$date, y=goog2$close30, col='blue', type='l', lwd=2.5)
> # manually label axis 2, left, with the ratio calculated above times our manual label locations
> axis(2, col='blue', at=20*yat, labels=20*yat)
> 
> # tell R to draw over the current plot with a new one
> par(new=T)
> plot(x=yahoo2$date, y=yahoo2$close, ylim=c(0,35),
+     col='red', type='l', lwd=0.75,
+     xaxt='n', axes=F, ylab='')
> 
> points(x=yahoo2$date, y=yahoo2$close30, col='red', type='l', lwd=2.5)
> 
> # label the yahoo data
> axis(side=2, at=yat, labels=yat, col='red', line=2)
>
> # manually label, farther out than normal, the Y axis
> mtext(side=2, line=4, 'close ($)')
> 
> # this code proceeds as in part 6 to neatly label the X axis
> # put X axis labels on first date present in each quarter
> locs <- tapply(X=yahoo2$date, FUN=min, INDEX=format(yahoo2$date, '%Y%m'))
> 
> at = yahoo2$date %in% locs
>  
> at = at & format(yahoo2$date, '%m') %in% c('01', '04', '07', '10')
> axis(side=1, at=yahoo2$date[ at ],   labels=format(yahoo2$date[at], '%b-%y'))
> abline(v=yahoo2$date[at], col='grey', lwd=0.5)
> 
> legend(x=as.Date('2009-01-01'), y=35, 
+     legend=c('GOOG daily close', 'GOOG 30 day MA', 'YHOO daily close', 'YHOO 30 day ma'), 
+     col=c(rep('blue',2), rep('red', 2)), lwd=c(1.5, 3.5, 1.5, 3.5))

I think this plot came out much better. On the left hand side, the contrast between the two scales is clear, and the tick marks neatly line up. You can also clearly see the percentage change in the two stocks mirrored each other. My last nitpick is that an inch could be reclaimed from the bottom of the plot by not showing a data range where the stocks never venture, but I kind of like that you really get a good sense of the range of the data relative to the lower bound, zero.