Crypto Backtester 🧪🔍⚖️📈

In this guide a cryptocurrency backtest is created to compare different rebalancing and trading strategies. It covers working with crypto price api's to retrieve information, manipulate this information using pandas and numpy into something that can be used in time series computations to implement different trading strategies. Interactive time series plotting is covered using plotly. This guide provides a useful reference for creating your own crypto back tester as well as a general best practices when dealing the time series data in python.

Context and Background

Portfolio rebalancing refers to the practice of keeping the original proportions of your investments intact regardless of individual asset price fluctuations. There is a plethora of reasons why rebalance your portfolio is a good idea, most of which revolve around diversification of risk through holding assets that have low correlation between each other.

The way this works in practice is simple. Letʼs suppose you decide to invest $100 in BTC, ETH and NEO. Your portfolio begins with three assets in equal proportions. A few days later, NEO takes off and outperforms the other assets in your portfolio. Now, your basket is disproportionately weighted in NEOʼs favor.

Rebalancing your portfolio, in this case, would entail selling some NEO and redistributing the profits across the rest of your portfolio to bring back the original proportions. Portfolio rebalancing spreads the wealth from your stronger assets, selling them at higher prices, and rebuys other assets in your portfolio at lower prices.

This guide will focus less on the economic reasoning and justification and more on the technical implementation details.

Tokens Selected and Models Tested

There are numerous different kinds of rebalancing strategies that could be implemented. This guide will focus on two popular methods, primarily time and threshold based rebalancing. Time based rebalancing will only consider to preform a rebalance if enough time has passed. Threshold rebalancing will only consider to preform a rebalnce if the allocations within the fund have deviated past a predefined threshold. There are tones of other options and flavours out there to explore, some of which can be found here. However the processes and principles explored with the simple time and threshold can be extended to any generic rebalancing strategy you can think of and so these are sufficient for this guide.

From a token perspective there is literally thousands of them that could be included in the models. For simplicity sake we will limit our exploration to a select few. More complex portfolios could include more tokens but again the basic ideas that are explored here remain the same.

The tokens explored are: ["btc", "eth", "xrp", "ltc", "dash", "xmr", "doge"] as well as USD to represent a stable asset in the fund.

Local Setup

It is recomended that all packages are installed within a python virtual environment if you want to run this notebook locally. This will ensure that packages are not installed to your local environment and makes the execution consistent between machines. This can be done as follows:

Install virtualenv if you dont already have it

pip install virtualenv

Create the virtual enviroment

virtualenv venv -p python3

Activate the virtual enviroment. You shell should now say (venv) on the left side.

source venv/bin/activate

Install packages

pip install -r requirments.txt

Ensure kernel can be accessed by Jupyter

python -m ipykernel install --user --name=CryptoBacktester

Start the notebook.

jupyter notebook

Now navigate to http://localhost:8888 where you can find the live notebook. From within the notebook your Kernel should be set to CryptoBacktester in the top right. If it's not click the Kernel tab and change it to CryptoBacktester.

Initial Setup

We begin by loading in all the packages that we need. Nothing too out of the ordinary here except for Cryptory which provided a collection of API end points for retrieving historic crypto (and other markets) price information. We also need to import the plotly.offline to enable us to plot unlimited plot.ly figures without hitting into their limits. We also turn off the checking of ssl certificates because this is required to pull the crypto prices from within the python virtualenv which does not contain ssl certificates.

In [1]:
# load package
from cryptory import Cryptory
from plotly.offline import init_notebook_mode, iplot #import offline mode

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.offline as pyo
import plotly.graph_objs as go
import ssl

#init notebook for plotly
init_notebook_mode()

#disable ssl for cryptory API & virtualenv
ssl._create_default_https_context = ssl._create_unverified_context
np.set_printoptions(suppress=True)

Pulling Data from Crypto API and Extracting to Pandas Dataframe

First we setup the Cryptory object to pull data from 2017-01-01 to the present date and extract and print some sample data. Note that this means that the latest data is the most recent time the script was run, which is a strong advantage over using csv files.

In [2]:
# initialise object 
# pull data from start of 2017 to present day
my_cryptory = Cryptory(from_date = "2017-01-01")

# get historical bitcoin prices from coinmarketcap
my_cryptory.extract_coinmarketcap("bitcoin").head()
Out[2]:
date open high low close volume marketcap
0 2019-05-05 5831.07 5833.86 5708.04 5795.71 14808830723 102494420158
1 2019-05-04 5769.20 5886.89 5645.47 5831.17 17567780766 103112021259
2 2019-05-03 5505.55 5865.88 5490.20 5768.29 18720780006 101986240859
3 2019-05-02 5402.42 5522.26 5394.22 5505.28 14644460907 97330112147
4 2019-05-01 5350.91 5418.00 5347.65 5402.70 13679528236 95501110091

The next step is to define all the tokens we want to get prices for and capture them within a dataframe. We are using these tokens specifically as they are all high market cap and go back to 2017-01-01.

The process of reading in the data involves iterating over all token's and calling the extract_bitinfocharts on each symbol. This calls the Cryptory package API to get the price & date information for this specific token in USDs. This is then joined onto the other sets of price records based on common date fields. This can be thought of as an inner join from SQL on the common date rows.

In [3]:
all_coins_df = my_cryptory.extract_bitinfocharts("btc") #start by filling the dataframe(df) with btc
# coins of interest
bitinfocoins = ["btc", "eth", "xrp", "ltc", "dash", "xmr", "doge"] #then fill it with all the others
for coin in bitinfocoins[1:]: # [1:] skips the first item(btc)
    all_coins_df = all_coins_df.merge(my_cryptory.extract_bitinfocharts(coin), on="date", how="left")

We then replace all nan data with zero, make the date column the index so it is iterable later and inset a new column named usd_price with a value of 1 for all rows. This is used later to represent having usd within the portfolio as well as the other cryptos. We then re-order the data so that the top (position 0) is the oldest data. This makes sense to do because when iterating it's more logical to work from the back (position 0) to the front (end position).

In [4]:
all_coins_df = all_coins_df.fillna(0) #remove nans
all_coins_df.set_index('date',inplace=True) # make date index
all_coins_df.insert(0,"usd_price",1) #add usd row
all_coins_df = all_coins_df.reindex(index=all_coins_df.index[::-1]) #re-order data
all_coins_df.head() #print sample rows
Out[4]:
usd_price btc_price eth_price xrp_price ltc_price dash_price xmr_price doge_price
date
2017-01-01 1 970.988 8.233 0.00651 4.389 11.356 13.532 0.000224
2017-01-02 1 1010.000 8.182 0.00640 4.539 11.593 14.671 0.000222
2017-01-03 1 1017.000 8.811 0.00632 4.525 12.383 16.125 0.000220
2017-01-04 1 1075.000 10.440 0.00642 4.585 14.748 16.807 0.000226
2017-01-05 1 1045.000 10.479 0.00650 4.404 14.815 16.713 0.000225

Next, we extract the names of all the tokens as the columns from the dataframe. we can use this later to itterate over.

In [5]:
tokens = all_coins_df.columns.tolist()
tokens
Out[5]:
['usd_price',
 'btc_price',
 'eth_price',
 'xrp_price',
 'ltc_price',
 'dash_price',
 'xmr_price',
 'doge_price']

Basic Time Series Data Visualization and Plotting

At this point we have created a pandas dataframe that contains daily price data for 7 different crypto currencies against the USD. Next we will visualize this data in a simple time series plot.

This process involves iterating over all token names and using this as the key to extract column data from the pandas dataframe. This information is stored in a go.Scatter object which is appended to a plot_data array which is used in the plotting of the price information. Other information is also appended such as the name of the crypto as well as the type of plot to generate.

In [6]:
plot_data = []

for index, coin in enumerate(tokens):
    coin_chart_info = go.Scatter(
        x = all_coins_df.index, #set all the x's to the index from the dataframe. This was the date
        y = all_coins_df[coin].tolist(), #set all the y's to the current coins magnitude in the dataframe
        mode = 'lines',
        name = coin[:-6]) # the [:-6] is to remove the `_price` part of the name from the dataframe
    plot_data.append(coin_chart_info) #add this coins chart info to the plot_data array
    tokens[index] = coin[:-6]

The iplot function can also take a layout parameter which we use to define the heading for the plot as well as moving the legend to the top left. We then generate the iplot.

And now we have a beautiful interactive plot! You can zoom in on it by dragging your mouse. Double click to zoom out again. You can also toggle coins from the legend. Try turning btc off by clicking it.

In [7]:
layout = go.Layout(
    title = "Historic Price Over Time",
    autosize=True,
    showlegend=True,
    legend=dict(x=0.05, y=0.95))

fig = dict(data=plot_data, layout = layout)
iplot(fig)

Time based Rebalancing

Next we jump right into build a rebalancing strategy. This initial implementation will not consider fees or rebalancing intervals; it will simply take whatever the distribution within the portfolio is at the end of the day and compare this to what it should be. It will then “execute the trades” to bring the portfolio back to equilibrium. After this, we will consider how to define a collection of different portfolio strategies and compare and contrast them on the same plot.

The first thing we do is grab top row from the all_coins_df dataframe that stores all the historic information. Then, this row from the dataframe is converted to a np.array where each position in the array corresponds to a diffrent column from the dataframe. We will require this information to calculate the initial allocation within the fund.

In [8]:
initial_price = all_coins_df.head(1)
initial_price = np.array(initial_price)[0]
initial_price
Out[8]:
array([  1.      , 970.988   ,   8.233   ,   0.00651 ,   4.389   ,
        11.356   ,  13.532   ,   0.000224])

For the simplest case we will set all tokens in our dataset to be part of the fund. Each token will get 1/8 of the total value.

In [9]:
initial_weights = np.ones(len(tokens)) / len(tokens)
initial_weights
Out[9]:
array([0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125])

Next, the amount of capital the fund starts off with is set. We set this to 1000 usd as an arbitrary number.

In [10]:
INVEST = 1000

The initial allocations are now calculated. These allocations are in the respective currencies base price. For example the initial allocation of bitcoin is 0.12873486. This is 1/8 of the 1000 usd, at the given starting price of bitcoin of 970.988.

In [11]:
initial_allocation = INVEST * initial_weights / initial_price
initial_allocation[np.isinf(initial_allocation)] = 0
initial_allocation
Out[11]:
array([   125.        ,      0.12873486,     15.18280092,  19201.22887865,
           28.48029164,     11.00739697,      9.23736329, 558035.71428571])

We can also work out the initial position. This is the value of each set of tokens purchased, based off the allocations we defined. This should logically be the total investment $\times$ the fraction per token and works out to 125 per token.

In [12]:
initial_positions = initial_allocation * initial_price
initial_positions
Out[12]:
array([125., 125., 125., 125., 125., 125., 125., 125.])

At any point in time the value of the portfolio is the sum of a row in the pandas dataframe. Therefore the starting value initial_portfolio_value is simply the sum of the initial_positions row.

In [13]:
initial_portfolio_value = initial_positions.sum().sum()
initial_portfolio_value
Out[13]:
1000.0

We now know all the information we need to start doing some rebalancing simulations. This process will happen as follows:

  • For each row in the all_coins_df dataframe we itterate over and do a series of calculations. Each row corisponds to a collection of prices over all crypto pairs listed for a given day. These prices are stored in the current_price variable on each loop. This amounts to preforming some logic at every day in the data. At each day the following is calculated and stored:
    • hodl as the value of not trading at all based on: $CurrentAllocation\times CurrentPrice$. At each day this information is stored in a new slot within the hodl variable.
    • rb as the value of the portfolio based on rebalancing calculation: $CurrentAllocation \times CurrentPrice$. At each day the current value of the rebalancing portfolio is stored in rb.
    • Each day a check is preformed to see if there has been a deviation of the price from the desired value. If there has been then the current_allocation is updated and will be used in the next day to preform "trades" based on $CurrentPortfolioValue \times \frac{InitialWeights}{CurrentPrice}$

Note that at no point are we considering trading fees.

In [14]:
hodl = {}
rb = {}

current_allocation = initial_allocation

for i, current_price in all_coins_df.iterrows():
    # hodl positions
    hodl[i] = initial_allocation * current_price
    
    # current positions
    current_positions = current_allocation * current_price
    rb[i] = current_positions
    
    # rebalance
    current_portfolio_value = current_positions.sum()
    if(current_price.min() > 0):
        current_allocation = current_portfolio_value * initial_weights / current_price

Next we convert the generated dictionaries to panda DataFrames so we can do computation over them and use them in plotting more easily.

In [15]:
hodl = pd.DataFrame(hodl).T
rb = pd.DataFrame(rb).T

At this point we have two dataframes (hodl and rb) which represent the values of each crypto at each day over the whole period. We will print out the top 15 rows of one of these frames to see what it looks like.

In the table below each row shows the value of each crypto in time, starting from the same $\frac{1}{8}\times1000$. Every day The value of each crypto is changing with the change in price. As this is the hodl strategy, no balancing is done at all as can be seen by the fixed usd_price table.

In [16]:
hodl.head(10)
Out[16]:
usd_price btc_price eth_price xrp_price ltc_price dash_price xmr_price doge_price
2017-01-01 125.0 125.000000 125.000000 125.000000 125.000000 125.000000 125.000000 125.000000
2017-01-02 125.0 130.022204 124.225677 122.887865 129.272044 127.608753 135.521357 123.883929
2017-01-03 125.0 130.923348 133.775659 121.351767 128.873320 136.304597 148.952483 122.767857
2017-01-04 125.0 138.389970 158.508442 123.271889 130.582137 162.337091 155.252365 126.116071
2017-01-05 125.0 134.527924 159.100571 124.807988 125.427204 163.074586 154.384053 125.558036
2017-01-06 125.0 119.463886 151.934289 119.623656 114.205969 145.176559 136.916199 123.883929
2017-01-07 125.0 111.405599 148.427062 120.775730 108.282069 131.527386 118.884866 122.767857
2017-01-08 125.0 116.992177 151.600267 122.119816 112.183869 139.078461 123.448123 124.441964
2017-01-09 125.0 114.667740 158.265517 119.815668 113.778765 136.733885 120.787762 123.325893
2017-01-10 125.0 115.744093 157.855581 119.047619 126.480975 136.260567 125.203222 123.883929

We can also plot the hodl portfolio over time. Remember that at the beginning we had 12.5% of each asset (125 usd). We can see how these have evolved to the end of time. We will also define a useful function fundSort which we can use to sort the array of dataframes such that the put tokens that had the highest value first after the period.

This figure is really interesting as we can see how much 125 usd could have grown to, if invested in all the cryptos from the beginning of the period. We can see some absolutely crazy numbers here; XRP grew from 125 USD in jan 2017 and was worth at a maximum ~70k USD about a year later! That's a growth of ~583 in about a year.

In [17]:
def fundSort(fund):
    return fund['y'][-1]
In [18]:
plot_data = []

for index, coin in enumerate(hodl):
    coin_chart_info = go.Scatter(
        x = hodl.index, #set all the x's to the index from the dataframe. This was the date
        y = hodl[coin].tolist(), #set all the y's to the current coins magnitude in the dataframe
        mode = 'lines',
        # the [:-6] is to remove the `_price` part of the name from the dataframe
        name = coin[:-6] + "->\t\t$" + str(round(hodl[coin][-1],3))) 
    plot_data.append(coin_chart_info) #add this coins chart info to the plot_data array
    

#sort the data to be plotted based on the total value of the holdings after the period
plot_data.sort(key = fundSort, reverse = True) 

#spesify the layout for the plot
layout = go.Layout(
    title = "Hodl Portfolio asset value over time",
    autosize=True,
    showlegend=True,
    legend=dict(x=0.05, y=0.95))

#display the plot inline
fig = dict(data=plot_data, layout = layout)
iplot(fig)

Next, we can generate a similar plot but for the rebalanced fund values over time. These should all follow a similar value as we've enforced that at all periods in time the fund maintains a $\frac{1}{8}$ of its value in each respective crypto.

In [19]:
plot_data = []

for index, coin in enumerate(rb):
    coin_chart_info = go.Scatter(
        x = rb.index,
        y = rb[coin].tolist(),
        mode = 'lines',
        name = coin[:-6] + "->\t\t$" + str(round(rb[coin][-1],3)))
    plot_data.append(coin_chart_info)

plot_data.sort(key = fundSort, reverse = True) 

layout = go.Layout(
    title = "Rebalanced Portfolio asset value over time",
    autosize=True,
    showlegend=True,
    legend=dict(x=0.05, y=0.95))

fig = dict(data=plot_data, layout = layout)
iplot(fig)

Next it would be useful to compare the final values for hodl vs rebalance to get a relative comparison of these two methods. Remember now that we are comparing two portfolios that start out with the exact same cryptos in it.

We now don't want to look at the value of each individual crypto within the fund, but rather the sum in value of all cryptos over the whole duration. To this end we sum each row along axis=1 indicating to sum along each rows.

Then, the performance of each fund is computed by taking the final value/initial value to get a simple relative change in value for the portfolio.

In [20]:
hodl_value = hodl.sum(axis=1)
rb_value = rb.sum(axis=1)

hodl_perf = hodl_value.iloc[-1] / hodl_value.iloc[0]
rb_perf = rb_value.iloc[-1] / rb_value.iloc[0]

Next we create the data objects for the two plots and sort.

In [21]:
hodlInfo = go.Scatter(
        x = hodl_value.index,
        y = hodl_value.tolist(),
        mode = 'lines',
        name = 'hodl {:.1f}%'.format(hodl_perf * 100))

rebalanceInfo = go.Scatter(
        x = rb_value.index,
        y = rb_value.tolist(),
        mode = 'lines',
        name = 'rebalance {:.1f}%'.format(rb_perf * 100))

plot_data = [hodlInfo, rebalanceInfo]
plot_data.sort(key = fundSort, reverse = True) 
In [22]:
layout = go.Layout(
    title = "Portfolio value over time",
    autosize=True,
    showlegend=True,
    legend=dict(x=0.8, y=0.9))

fig = dict(data=plot_data, layout = layout)
iplot(fig)

Custom Portfolio Backtesting

At this point we have looked at implementing a simple portfolio backtesting setup. However it is not well suited to comparing multiple strategies against each other and as a result it makes it hard to find an ideal model. Next we will define a more general process for backtesting funds by defining a JSON object structure that can be used to specify all portfolio characteristics that are to be tested.

Primarily we are concerned with:

  • what tokens go into the portfolio
  • what weighting each token takes on
  • what frequency the rebalancing must occur at
  • the name of the portfolio so we can track it later.

All this information for a bunch of different portfolios is encoded as objects within an array below for 6 different funds. The process of backtesting is very similar to the specific process outlined before except we now iterate over all elements within the funds array and create a backtest for each one sequentially.

In [23]:
funds = [
    {
    'coins': ['eth_price', 'btc_price', 'usd_price'],
    'ratio': [0.4, 0.4, 0.2],
    'rebalance': 7,
    'name': 'ETH40, BTC40, USD20 @ 7',
    },
    {
    'coins': ["btc_price", "eth_price", "xrp_price", "ltc_price", "dash_price", "xmr_price", "doge_price"],
    'ratio': [0.25, 0.25, 0.10, 0.10, 0.10, 0.10, 0.10],
    'rebalance': 2,
    'name': 'ETH25, BTC25, OTHERS10 @ 2',
    },
    {
    'coins': ['eth_price', 'btc_price'],
    'ratio': [0.5, 0.5],
    'rebalance': 15,
    'name': 'ETH50, BTC50 @ 15',
    },
    {
    'coins': ['eth_price', 'btc_price'],
    'ratio': [0.5, 0.5],
    'rebalance': 0,
    'name': 'ETH50, BTC50 @ HODL',
    },
    {
    'coins': ['eth_price'],
    'ratio': [1],
    'rebalance': 0,
    'name': 'ETH100 @ HODL',
    },
    {
    'coins': ['btc_price'],
    'ratio': [1],
    'rebalance': 0,
    'name': 'BTC100 @ HODL',
    }
]

We we now do two nested loops:

  1. First we iterate over all the funds specified in the json object and for each one calculate the key metrics for that specific fund, such as the initial allocations, initial weightings and initial prices. The logic and justification for each fund is the same as done before except now generalized.

  2. Next, we iterate overall values in the coin_prices dataframe storing the pricing information. At each point we calculate the value of the specific fund and store the results.

At the end of this process we have one dictionary fund_returns that has time series information for all 6 funds specified.

In [24]:
fund_returns = {}
for index, fund in enumerate(funds): # for all funds in the list of funds we run a backtest
    
    # grab only the coin prices for the coins in the fund
    coin_prices = all_coins_df.loc[:,fund['coins']]
    
    # the initial weights are spesified by the fund JSON. read this in and store it
    fund_initial_weights = np.array(fund['ratio'])
    
    #calculate the initial prices of the coins in the fund
    fund_initial_price = coin_prices.head(1)
    fund_initial_price = np.array(fund_initial_price)[0]
    
    #calculate the initial allocation based on the weights and starting prices
    fund_initial_allocation = INVEST * fund_initial_weights / fund_initial_price
    fund_initial_allocation[np.isinf(fund_initial_allocation)] = 0
    
    #the initial position is defined by the initial allocation and initial price
    fund_initial_positions = fund_initial_allocation * fund_initial_price
    
    #lastly we define the current allocation as the initial allocation as we start at time = 0
    fund_current_allocation = fund_initial_allocation
    
    day_count = 1
    fund_returns[fund['name']] = {} #create the object to store all the results for this spesific fund
    for i, current_price in coin_prices.iterrows(): #for each day of pricing data generate a backtest result
        fund_current_positions = fund_current_allocation * current_price
        fund_returns[fund['name']][i] = fund_current_positions

        # rebalance
        current_portfolio_value = fund_current_positions.sum()
        if day_count >= fund['rebalance'] and current_price.min() > 0:
            #update the allocation for each token (preform the rebalance)
            fund_current_allocation = current_portfolio_value * fund_initial_weights / current_price
            #restart counting for next rebalance
            day_count = 1
        #increment time
        day_count += 1

Lets print the top 15 results from one of the funds to see that it makes sense given the portfolio strategy we had specified. Printing ETH40, BTC 40, USD 20 @ 7 shows 3 different coins (eth, btc & usd) in the specified starting ratio of eth: $40\%\times 1000 = \$400$, btc: $40\%\times 1000 = \$400$ and usd: $20\%\times 1000 = \$200$. At the end of a 7 day period we see a rebalencing where some ether is sold to buy btc and usd. then again on the 14th (two weeks after start) we see another trade occurring to reestablish the ratios.

In [25]:
pd.DataFrame(fund_returns['ETH40, BTC40, USD20 @ 7']).T.head(15)
Out[25]:
eth_price btc_price usd_price
2017-01-01 400.000000 400.000000 200.000000
2017-01-02 397.522167 416.071053 200.000000
2017-01-03 428.082109 418.954714 200.000000
2017-01-04 507.227013 442.847903 200.000000
2017-01-05 509.121827 430.489357 200.000000
2017-01-06 486.189724 382.284436 200.000000
2017-01-07 474.966598 356.497918 200.000000
2017-01-08 421.406432 433.275455 206.292903
2017-01-09 439.933965 424.667005 206.292903
2017-01-10 438.794459 428.653228 206.292903
2017-01-11 423.516629 402.082696 206.292903
2017-01-12 401.064128 372.491865 206.292903
2017-01-13 404.398240 383.888907 206.292903
2017-01-14 398.911506 407.903354 198.916010
2017-01-15 403.935266 403.239730 198.916010

Lets first plot the internal workings of one of the funds to see the evolution of the positions over time. We want to see the act of rebalancing and the effect that it has on the funds components in time.

The plot that is generated shows the value of eth and btc following closely, which is to be expected as they both have 40% of the fund allocated to them. USD has 20% allocated to it and we can see it's price following at $\approx \frac{1}{2}$ that of the other two. the jaggad steps in the usd plot are the weekly rebalancing intervals.

In [26]:
#extract the dataframe for this spesific portfolio and plot it's components over time
df = pd.DataFrame(fund_returns['ETH40, BTC40, USD20 @ 7']).T    

plot_data = []
for coin in df:
    coin_chart_info = go.Scatter(
        x = df.index,
        y = df[coin].tolist(),
        mode = 'lines',
        name = coin[:-6] + "->\t\t$" + str(round(df[coin][-1],3)))
    plot_data.append(coin_chart_info)

plot_data.sort(key = fundSort, reverse = True) 

layout = go.Layout(
    title = "ETH40, BTC40, USD20 @ 7 Components Value Over Time",
    autosize=True,
    showlegend=True,
    legend=dict(x=0.05, y=0.95))

fig = dict(data=plot_data, layout = layout)
iplot(fig)

Next we will compare all 6 models defined before in one plot. For each model we calculate the performance metrics in the same way we did before, except we now do it for each and every fund and store the results in an array called fund_computed_plot which is used later on in plotting.

In [27]:
fund_computed_plot = []
for fund in fund_returns:
    fund_df = pd.DataFrame(fund_returns[fund]).T
    
    fund_value = fund_df.sum(axis = 1)
    fund_performance = (fund_value.iloc[-1] / fund_value.iloc[0]) * 100
    fund_performance = round(fund_performance, 3)
    fund_plot = go.Scatter(
        x = fund_value.index,
        y = fund_value.tolist(),
        mode = 'lines',
        name = '{0} -> {1}%'.format(fund, fund_performance))
    fund_computed_plot.append(fund_plot)

Sort the funds to get the best at the top of the index.

In [28]:
fund_computed_plot.sort(key = fundSort, reverse = True)

And lastly plot them all on one figure

In [29]:
layout = go.Layout(
    title = "Time Based Rebalancing Strategies",
    autosize=True,
    showlegend=True,
    legend=dict(x=0.7, y=0.9))

fig = dict(data=fund_computed_plot, layout = layout)
iplot(fig)

Threshold Rebalancing Strategies

What if rather than rebalencing based on a time period we rebalanced base on a deviation from the desired ratios. This rebalencing method is very similar in practice and only slightly modifies the rebalencing condition.

We will start off by defining a threshold that will quantify the required deviation to preform a rebalance. To begin with we set this to 0.2 This means that if there is a 20% deviation from the desired ratio on any day then the portfolio will preform a rebalance.

In [30]:
THR = 0.2

The initial_allocation that we had calculated before based off the $\frac{1}{8}$ for each asset based off a 1000 usd starting value will be used again for the initial_allocation

In [31]:
initial_allocation
Out[31]:
array([   125.        ,      0.12873486,     15.18280092,  19201.22887865,
           28.48029164,     11.00739697,      9.23736329, 558035.71428571])

As before we will loop through all price information in the all_coins_df and for each day we will calculate the current position and then make a decision about rebalancing. The key logic here about choosing to rebalance (or not as the case may be) is defined by the logical statement if any(weights_diff > THR) where weights_diff shows the difference between the desired weight and the actual weight at the end of each period. If there is any asset in the portfolio that exceeds the desired threshold then the whole fund rebalances to accommodate this.

In [32]:
hodl = {}
rb = {}

current_allocation = initial_allocation

for i, current_price in all_coins_df.iterrows(): #for all days price information, we calculate positions
    # hodl positions
    hodl[i] = initial_allocation * current_price
    
    # current positions
    current_positions = current_allocation * current_price
    rb[i] = current_positions
    
    # rebalance
    current_portfolio_value = current_positions.sum()
    current_weights = current_positions / current_portfolio_value
    weights_diff = np.abs(current_weights - initial_weights)
    if any(weights_diff > THR) and current_price.min() > 0:
        current_allocation = current_portfolio_value * initial_weights / current_price

hodl = pd.DataFrame(hodl).T
rb = pd.DataFrame(rb).T

As before we find some performance metrics to compare the portfolios

In [33]:
hodl_value = hodl.sum(axis=1)
rb_value = rb.sum(axis=1)

hodl_perf = hodl_value.iloc[-1] / hodl_value.iloc[0]
rb_perf = rb_value.iloc[-1] / rb_value.iloc[0]

And then plot in the same way as before. We can see that this simple threshold rebalencing strategy out preforms just hodling.

In [34]:
hodlInfo = go.Scatter(
        x = hodl_value.index,
        y = hodl_value.tolist(),
        mode = 'lines',
        name = 'hodl all tokens at  {:.1f}%'.format(hodl_perf * 100))

rebalanceInfo = go.Scatter(
        x = rb_value.index,
        y = rb_value.tolist(),
        mode = 'lines',
        name = 'Threshold rebalance {:.1f}%'.format(rb_perf * 100))

plot_data = [hodlInfo, rebalanceInfo]
plot_data.sort(key = fundSort, reverse = True) 

layout = go.Layout(
    title = "Threshold rebalencing",
    autosize=True,
    showlegend=True,
    legend=dict(x=0.8, y=0.9))

fig = dict(data=plot_data, layout = layout)
iplot(fig)

As we did with the time based rebalencing strategies it would be ideal if we could predefine a number of portfolios and backtest them as a bunch and compare them all at the end. To achieve this we implement a similar strategy as was used before, defining a JSON object to store all the funds we want to test. The only difference now is instead of defining a rebalance period we rather specify the threshold that is required to preform a rebalance.

In [35]:
funds = [
    {
    'coins': ['eth_price', 'btc_price', 'usd_price'],
    'ratio': [0.4, 0.4, 0.2],
    'threshold': 0.05,
    'name': 'ETH40, BTC40, USD20 @ 0.05'
    },
    {
    'coins': ["btc_price", "eth_price", "xrp_price", "ltc_price", "dash_price", "xmr_price", "doge_price"],
    'ratio': [0.25, 0.25, 0.10, 0.10, 0.10, 0.10, 0.10],
    'threshold': 0.05,
    'name': 'ETH20, BTC20, allOthers10 @ 0.05',
    },
    {
    'coins': ['eth_price', 'btc_price'],
    'ratio': [0.5, 0.5],
    'threshold': 0.10,
    'name': 'ETH50, BTC50 @ 0.10'
    },
    {
    'coins': ['eth_price', 'btc_price'],
    'ratio': [0.5, 0.5],
    'threshold': 0.01,
    'name': 'ETH50, BTC50 @ 0.01'
    },
    {
    'coins': ['eth_price'],
    'ratio': [1],
    'threshold': 0,
    'name': 'ETH100 @ HODL',
    },
    {
    'coins': ['btc_price'],
    'ratio': [1],
    'threshold': 0.,
    'name': 'BTC100 @ HODL',
    }
]

As before we now iterate over all funds and calculate and calculate the key fund values at the beginning of the period. We then use each time frames price information to backtest the fund over all time considering the rebalancing thresholds to perform trades.

In [36]:
fund_returns = {}
for index, fund in enumerate(funds):
    
    coin_prices = all_coins_df.loc[:,fund['coins']]
    
    fund_initial_weights = np.array(fund['ratio'])
    
    fund_initial_price = coin_prices.head(1)
    fund_initial_price = np.array(fund_initial_price)[0]
    
    fund_initial_allocation = INVEST * fund_initial_weights / fund_initial_price
    fund_initial_allocation[np.isinf(fund_initial_allocation)] = 0
    
    fund_initial_positions = fund_initial_allocation * fund_initial_price
        
    fund_current_allocation = fund_initial_allocation
    
    fund_returns[fund['name']] = {}
    for i, current_price in coin_prices.iterrows():
        fund_current_positions = fund_current_allocation * current_price
        fund_returns[fund['name']][i] = fund_current_positions

        # rebalance
        current_portfolio_value = fund_current_positions.sum()
        fund_current_weights = fund_current_positions / current_portfolio_value
        weights_diff = np.abs(fund_current_weights - fund_initial_weights)
        if any(weights_diff > fund['threshold']) and current_price.min() > 0:
            fund_current_allocation = current_portfolio_value * fund_initial_weights / current_price
        

It is again useful to look at some portfolio information for one of the funds generated over time and track how it evolves with the trades preformed. Looking at the ETH40, BTC40, USD20 @ 0.05 which starts off at the same allocations as the previously examined but now will only trade with there is a 5% deviation from the portfolio allocations. Looking at the time table below we can see that between 2017-01-06 and 2017-01-07 there was enough market movement to sufficiently justify a rebalance, as can be seen by the movement of the USD value.

In [37]:
pd.DataFrame(fund_returns['ETH40, BTC40, USD20 @ 0.05']).T.head(10)
Out[37]:
eth_price btc_price usd_price
2017-01-01 400.000000 400.000000 200.000000
2017-01-02 397.522167 416.071053 200.000000
2017-01-03 428.082109 418.954714 200.000000
2017-01-04 507.227013 442.847903 200.000000
2017-01-05 509.121827 430.489357 200.000000
2017-01-06 486.189724 382.284436 200.000000
2017-01-07 417.523869 398.560629 213.694832
2017-01-08 426.450065 418.546967 213.694832
2017-01-09 445.199346 410.231147 213.694832
2017-01-10 444.046201 414.081865 213.694832

Plotting this portfolios value over time is done in the same way as before. This plot showcases one of the advantages of a threshold based rebalancing strategy wherein when there is a lot of market movement the algorithm trades at a much higher frequency when compared to more side tracking movement when there is little to no trading done.

In [38]:
#extract the dataframe for this spesific portfolio and plot it's components over time
df = pd.DataFrame(fund_returns['ETH40, BTC40, USD20 @ 0.05']).T    

plot_data = []
for coin in df:
    coin_chart_info = go.Scatter(
        x = df.index,
        y = df[coin].tolist(),
        mode = 'lines',
        name = coin[:-6] + "->\t\t$" + str(round(df[coin][-1],3)))
    plot_data.append(coin_chart_info)

plot_data.sort(key = fundSort, reverse = True) 

layout = go.Layout(
    title = "ETH40, BTC40, USD20 @ 0.05 Components Value Over Time",
    autosize=True,
    showlegend=True,
    legend=dict(x=0.05, y=0.95))

fig = dict(data=plot_data, layout = layout)
iplot(fig)

We will now combined all daily information to form one plot to represent each fund and then compare them against each other over all time.

This plot shows that the best preforming portfolio out of any of those identified thus far: ETH20%, BTC20% and all other cryptos at 10% with a threshold based rebalancing set to 5%. This generated a return of almost 3000% from the start to the end which is considerably higher than any of the other strategies. Importantly it is more than both the BTC and ETH hodl strategy, showing the importance of having a spread in distributions with other less correlated assets.

In [39]:
fund_computed_plot = []
for fund in fund_returns:
    fund_df = pd.DataFrame(fund_returns[fund]).T
    
    fund_value = fund_df.sum(axis=1)
    fund_performance = (fund_value.iloc[-1] / fund_value.iloc[0]) * 100
    fund_performance = round(fund_performance, 3)
    fund_plot = go.Scatter(
        x = fund_value.index,
        y = fund_value.tolist(),
        mode = 'lines',
        name = '{0} -> {1}%'.format(fund, fund_performance))
    fund_computed_plot.append(fund_plot)
    
fund_computed_plot.sort(key = fundSort, reverse = True)

layout = go.Layout(
    title = "Threshold Based Rebalancing Strategies",
    autosize=True,
    showlegend=True,
    legend=dict(x=0.7, y=0.9))

fig = dict(data=fund_computed_plot, layout = layout)
iplot(fig)

Extending this Process To Other Models

The process outlined above for each model can be extended to numerous different trading strategies to create a more generic back tester. The introduction of fees is important to a final model but was excluded here as different strategies would require different fee considerations based off volume, market maker and taker fees. All these considerations will fundamentally change the way the models are designed.

Package Selection:

Why use Cryptory?

It's easy to work with and has lots of data sources. They have tones of code examples as well that makes processing data very easy. Their API also contains other market information like google search trends and other awesome things see here.

An alternative, also awesome package is cryptocompy from here which lets you access a whole bunch more APIs from cryptocompare.

Why use Plotly?

Interactive plots are really nice and look good! It also has a really simple API. It is a bit of a pain to get to work offline though but it's worth it I think.

Why use pandas & numpy?

There are not better packages for manipulating and working with data and maths!

Conclusion

The returns here are not meant to indicate trading strategies that should be followed as there are numerous other complexities that have been ignored here such as trading fees, price slipages, exchange front running and tax implications. All these things would need to be considered before a portfolio can be considered. This tutorial was more about working with python, jupyter, numpy, pandas & plotly all with time series data. It also included processes for reading and processing data.

Further readings

  1. What is rebalancing
  2. Types of stratergies
  3. Case Against Rebalancing Your Portfolio
  4. How and when to rebalance your portfolio
  5. Portfolio Rebalancing for Cryptocurrency
  6. The Whitepaper for Portfolio Rebalancing in Crypto. This one is really good.