Stochastic Nonsense

Put something smart here.

Plotting Multiple Series in R -- Part 4 in a Series

This is post #04 in a running series about plotting in R.

Frequently, you want to simultaneously plot multiple series on the same plot. Let’s try plotting daily observations along with a 30 day moving average.

To start, I have observations for YHOO stock from 12 April 1996 through 2 July 2009.

First, the data needs cleaning — I turn the column names into lower case for convenience with the tolower function and turn the text dates formatted as yyyy-mm-dd into dates instead of factors via the as.Date constructor for Date classes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
yahoo <- read.csv(file='~/stuff/blog/YHOO stock prices [19960412, 20090702].csv', header=T, sep=',')
str(yahoo)
'data.frame': 3329 obs. of  7 variables:
$ Date     : Factor w/ 3329 levels "1996-04-12","1996-04-15",..: 3329 3328 3327 3326 3325 3324 3323 3322 3321 3320 ...
$ Open     : num  15.2 15.5 15.8 15.9 15.6 ...
$ High     : num  15.3 15.7 15.9 16 15.8 ...
$ Low      : num  14.9 15.3 15.3 15.6 15.5 ...
$ Close    : num  15 15.4 15.7 15.9 15.7 ...
$ Volume   : int  16919900 12716100 16033900 12312100 26449100 19827800 30979700 15866300 26488700 20323100 ...
$ Adj.Close: num  15 15.4 15.7 15.9 15.7 ...

colnames(yahoo) <- tolower( colnames(yahoo) )
yahoo$date <- as.Date( as.character( yahoo$date ) )

# order yahoo into the same way we want to display it
yahoo <- yahoo[ order(yahoo$date), ]

Now, let’s take a first pass at plotting:

1
2
plot(x=yahoo$date, y=yahoo$close,
+     main='YHOO stock close', xlab='date', ylab='close ($)')

That isn’t very pretty, not least of which because we’re displaying too much data to be useful. Let’s cut it down to just data from January 1 2008 and on:

1
2
3
yahoo2 <- yahoo[ yahoo$date >= as.Date('2008-01-01'), ]
plot(x=yahoo2$date, y=yahoo2$close,
  main='YHOO stock close', xlab='date', ylab='close ($)')

It’s worth pointing out that R’s plotting code will attempt to set the upper and lower y bounds to something reasonable based on that data you present it with. However, sometimes, particularly to get a sense of scale, you really want to see the full range. You can accomplish this by explicitly setting the y axis limits with ylim. I also make the data more presentable.

1
2
3
plot(x=yahoo2$date, y=yahoo2$close, ylim=c(0,1.1*max(yahoo2$close)),
  col='black', type='l',
  main='YHOO stock close', xlab='date', ylab='close ($)')

Also, I wish to plot the moving average, so I create the function ma30 to calculate it. I also add ma30 as a column, using the whole data range so that the moving average is correct at the beginning of our subset:

1
2
3
4
5
6
7
8
9
10
ma30 <- function( x, na.rm=F ){
  val <- rep( 0, length( x ) )
  for( j in 1:length( x ) ){
      val[ j ] <- sum( x[ max( j - 29, 1 ):j ], na.rm=na.rm) / length( max( j-29,1):j )
  }
  val
}

yahoo$close30 <- ma30(yahoo$close)
yahoo2 <- yahoo[ yahoo$date >= as.Date('2008-01-01'), ]

And finally, I replot the data, adding the moving average as a second series and making it slightly bolder (lwd=2) to emphasize the moving average over the daily observations:

1
2
3
4
plot(x=yahoo2$date, y=yahoo2$close, ylim=c(0,1.1*max(yahoo2$close)),
  col='black', type='l',
  main='YHOO stock close', xlab='date', ylab='close ($)')
points(x=yahoo2$date, y=yahoo2$close30, col='red', type='l', lwd=2)