shemz ~

Local Quantification in a Global Space

Download FX tick data using R

with one comment

“The most valuable commodity I know of is information.” – Gordon Gekko in Wall Street, 1987.

And getting reliable information is what holds the key to it all. There are a lot of sources these days which provide OHLC data for various time units, but to test short term strategies, true tick data is often required. For foreign exchange OTC market, I found some sources on the internet who provide tick data for free. I chose to go with Gain Capital data as it is a respectable company in the business (they provide futures brokerage services as Open-e-cry) and their data appear to be continuous without much blackouts. Also bid ask spread is variable which might be interesting for testing spread trading strategies or to simply understand the price evolution in time domain (yes it may follow some established statistical processes which might often be consistent across different geographies and markets).

I wanted to use R for analyzing the data, so I decided to download monthly data using an R script (code attached below). The script requires three inputs: instrument name as string, year and month as numeric. So for example, to get Euro vs Japanese Yen data for January 2013, the input can be (“eurjpy”, 2013, 01). The output is a list of two xts objects containing bid and ask prices. The output format can be, and should be, modified to fit your purpose. The presented one fits mine.

While you can download the data for multiple months and merge them to backtest over a longer period of time, I recommend testing your strategy for a month to see if tick data really affects its profitability. The size of tick data is very large (over 5 million rows for majors), and so a backtest can take much longer than that using equivalent OHLC minute data (though vector operations can significantly reduce the time requirement). Also asynchronicity of the data should be considered, which makes the data very coarsely granular in time domain.

One caveat with this script is that on Linux, there is no default download utility defined for R. So this option must be explicitly set by using the command

option(download.file.method = "wget") #or any other downloader

Or set it in the your .Rprofile to load it automatically when R starts.

Code: (Released under BSD or compatible license)

# This function downloads historical tick data from Gain Capital
# Weekly CSV files are downloaded for input pair, month and year
# from Gain Capital archives, combines and parses it as monthly
# zoo objects; and finally returns it as a list of buy and sell
# XTS objects. The code can be modified for desired output object.

# Please visit Gain Capital's website for the list of instruments.
# pair = instrument, eg. "eurusd", "EURUSD", "EuRUsD", "EUR_USD"
# year and month are numeric, eg. year = 2012; month = 1 or 01

# Linux .Rprofile: option("download.file.method"="wget")

gaindata <- function(pair, year, month)
{
# Required packages, loads both zoo and xts
require("xts")

# Parsing the arguments
pair = toupper(pair)
pair = paste(substr(pair, 1, 3), "_", substr(pair,4, 6), sep="")
year = as.numeric(year)
month = as.numeric(month)
mon = c('January', 'February', 'March', 'April', 'May', 'June', 'July',
'August', 'September', 'October', 'November', 'December') [month]
slash = ifelse((month<10), ("/0"), ("/"))

# Download html file; works on linux with
tmpname = paste("gaindatatempfile", year, month, sep="")
htmlurl = paste("http://ratedata.gaincapital.com/", year, slash, month,
" ", mon, "/", sep="")
download.file(htmlurl, tmpname, quiet=TRUE)

# Parse the html file for pair records
txtl=invisible(readLines(file(tmpname)))
txtlist = substr(txtl, 145, 157)
pairlist = grep(pair, txtlist)

# Close the connection and delete the html file
closeAllConnections()
unlink (tmpname)

# Preparing to download data files
dlurl = paste("http://ratedata.gaincapital.com/", year, slash, month,
" ", mon, "/", txtlist[pairlist], ".zip", sep="")

for (i in 1:length(pairlist))
{
# Tempororary string variables
zipfile = paste(txtlist[pairlist[i]], ".zip", sep="")
csvfile = paste(txtlist[pairlist[i]], ".csv", sep="")

# Download zip file and extract the csv
download.file(dlurl[i], zipfile)
tmpdata = read.csv(unz(zipfile, csvfile), header=TRUE ,sep=",")

# Create zoo objects and append in series
tempzoo = zoo(tmpdata[, 5:6], as.POSIXct(strptime((tmpdata[,4]),
"%Y-%m-%d %H:%M:%OS")))
if (i == 1) {data = tempzoo}
else {data = rbind(data, tempzoo)}

# Close all connections and elete the zip file
closeAllConnections()
unlink (zipfile)
}

# Separate into two different buy and sell XTS series
data.buy = as.xts(data[, colnames(data) != "RateBid"])
data.sell = as.xts(data[, colnames(data) != "RateAsk"])

# Group the buy and sell series and return the list object
retlist = list("buy" = data.buy, "sell" = data.sell)
return (retlist)
}

Dukascopy is another good broker which provides tick data for free. Unfortunately downloading data from Dukascopy is a little tricky task as it requires parsing multiple php files. But there are other applications already written for this, and the one that I use is called Tickstory, which downloads the data from Dukascopy and exports it as a CSV file.

 

Written by shemz

February 15, 2013 at 11:54 am

Posted in Uncategorized