Click to learn more about author Steve Miller.
I just completed the annual “maintenance” on my little stock market indexes returns “app”. I’ve supported a variant of the script for seven years, changing it pretty significantly year to year. The 2016 version was half python and half R, but this year I opted for R entirely. Who knows, maybe next will be all-in python.
The script serves two purposes: the first is to keep me apprised of daily stock market performance; the second, to test the latest language and visualization goodies in R and python. Goodness knows there’s no shortage for either language.
My starting point is the file download page for Russell Indexes, certainly among the most revered source in the industry. Each evening after market close I run this notebook, which first downloads and munges the latest Russell index data, then computes and graphs portfolio performance, measured as the growth of a hypothetical initial $1 investment. Typically I look at year-to-date and two year-to-date performance, but the functions can handle any start date for which return data’s available. The 82 files accessed comprise 41 separate portfolios, each sourced from a current year and historical data file.
The remainder of this script reads, wrangles, computes and graphs Russell index returns, one notebook cell at a time.
Define the ubiquitous frequenciesdyn function.
Set the url strings for historical and year-to date daily values files for selected Russell portfolios. In the end, I work primarily with the U.S. market Top 200, 1000, 3000, Midcap, 2500, and 2000 Russell indexes.
Read the individual files and consolidate into the wdta data.table with attributes name, pdate, idxwodiv, and idxwdiv. Write the data.table to a text file.
Start cleaning the dirty data by eliminating garbage characters and shortening the value names. Convert the date string to an R date variable.
One data problem observed several years ago is a duplication of the final index record from last year in both the historical and year-to-date files for each index. Identify and eliminate one of the duplicates for each index. Check that the de-duping succeded.
Alas, there remain problem duplicate records across name, idxwodiv, and idxwdiv, generally surrounding holidays. Eliminate these duplicates, keeping the final one for each repeated group. The cell frequencies invocation at the end confirms that the dups are indeed deleted.
Now “melt” the nndta data.table around idxwodiv/idxwdiv to the new ndtamelt, which is longer and “skinnier” than nndta. Compute daily percent change for each portfolio.
Write out data sets for the final data.tables.
Now define a function to subset the “tall” data.table for a given start date. Include the major Russell U.S. size portfolios Top200, 1000, 3000, Midcap, 2500, and 2000. For each size, include “Balanced”, “Growth”, and “Value” portfolios. Calculate the growth of $1 over the timeframe from “sdte”.
Graph the data computed in mkdata using ggplot2. The facets are ordered by portfolio company size from left to right and top to bottom — with Top200 representing the 200 largest companies in the Russell 3000, and the 2000 denoting the smallest 2000 firms in the 3000.
Graph the growth of $1 in 2017 through June 5 for eighteen Russell portfolios. Note how “larger” company portfolios from top left to bottom right are better-performing so far this year, as are growth portfolios compared to balanced and value. The Top200 growth index is up over 30\%, while 2000 value is in the red for the year.
Small and value reigned in 2016, however, demonstrating a year-to-year change in styles. An individual who invested $1000 in a Russell 2000 value index portfolio at the beginning of 2016 would be smiling at the over $1600 she’d be counting now.
Finally, generate growth of $1 return graphs for all downloaded portfolios to a pdf file.
That’s it till next month!