In [1]:
import yfinance as yp
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import pandas as pd
THE FOLLOWING INFORMATION IS NOT FINANCIAL ADVICE
First we are going to start by gathering data using yahoo finance's API from corn futures, represented with the ticker "ZC=f" and Soybean futures represented with the ticker "ZS=F". We are also going to create a list called farmEuipmentTickers which is composed of tickers from a variety of companies that produce farming equitment used to produce corn and soybeans. Deere & Company (DE), AGCO corporation (AGCO), Lindsay Corporation (LNN), VanEck Agribusiness ETF (MOO), iShares MSCI Agriculture Producers ETF (VEGI).
In [2]:
corn = yp.Ticker("ZC=F")
soybean = yp.Ticker("ZS=F")
cornHistory = corn.history(period="1y")
soybeanHistory = soybean.history(period="1y")
farmEquipmentTickers = ["DE", "AGCO", "LNN", "MOO", "VEGI"]
timeFrames = ['6mo', '1y', '2y', '5y', '10y', 'ytd', 'max']
performanceTracker_Corn = {}
performanceTracker_Soy = {}
corn = yp.Ticker("ZC=F")
soybean = yp.Ticker("ZS=F")
Next up, we're going to determine what tickers have the strongest correlation coeffiencent (Insert equation for Pearson's correlation coefiicent below) using Pearson's method demonstrated below. This will give us an idea of what equities to look further into when attempting to find a pair for soybean/corn futures.
In [3]:
for ticker in farmEquipmentTickers:
tempCornPerformance = []
tempSoyPerformance = []
stock = yp.Ticker(ticker)
for tf in timeFrames:
cornHistory = corn.history(period=tf)
soybeanHistory = soybean.history(period=tf)
stockHistory = stock.history(period=tf)
cornCorr = cornHistory['Close'].corr(stockHistory['Close'])
soyCorr = soybeanHistory['Close'].corr(stockHistory['Close'])
tempCornPerformance.append(cornCorr)
tempSoyPerformance.append(soyCorr)
avgCornCorr = np.mean(tempCornPerformance)
avgSoyCorr = np.mean(tempSoyPerformance)
performanceTracker_Corn[ticker] = avgCornCorr
performanceTracker_Soy[ticker] = avgSoyCorr
print(f"Correlation Coefficient between a ticker and Corn Futures: {performanceTracker_Corn}\nCorrelation Coefficient between a stock and Soybean Futures: {performanceTracker_Soy}")
Correlation Coefficient between a ticker and Corn Futures: {'DE': 0.5122989259702883, 'AGCO': 0.6276846292915437, 'LNN': 0.3988388243725085, 'MOO': 0.6345065838719115, 'VEGI': 0.6398776590590355} Correlation Coefficient between a stock and Soybean Futures: {'DE': 0.5502785947144005, 'AGCO': 0.7033431173285274, 'LNN': 0.41742724875689596, 'MOO': 0.6667577928988615, 'VEGI': 0.6614483940117848}
As you can see from the dicitonaries above, the equity/future pair with the strongest correlation coefficient was Soybean futures and AGCO. This surprised me due to the fact that corn makes up a larger portion of agriculture grown in America yet the correlation between the tickers and futures was in every case stronger for soybeans.
Next up, we will move on to actually building out the strategy after finding the correlation and pair. To start, we are just simply going to graph the price of Soybean futures along with the price of AGCO in the past year.
In [4]:
agco = yp.Ticker("AGCO")
agcoHistory = corn.history(period="1y")
soybeanHistory = soybean.history(period="1y")
plt.plot(agcoHistory.index, agcoHistory['Close'], label="AGCO")
plt.plot(soybeanHistory.index, soybeanHistory['Close'], label="Soybeans")
plt.xlabel("Time")
plt.legend()
plt.show()
Well, since the price is so different it's hard to see if there is actually a correlation or not, along with this it is hard to determine if there is actually a valid pair. So we will standardize the graph using z-score (Insert z-score equation below), which is demonstrated below. And I know you may be asking "Is it a legal pair even if the pairing isn't shown without first standardization. However, this process can simply help in identifying true relationships between the stocks by eliminating differences in scale and ensuring that both stocks contribute equally to the analysis.
In [5]:
standardAgco = stats.zscore(agcoHistory['Close'])
standardSoybean = stats.zscore(soybeanHistory['Close'])
plt.plot(agcoHistory.index, standardAgco, label="AGCO")
plt.plot(soybeanHistory.index, standardSoybean, label="Soybeans")
plt.xlabel("Time")
plt.legend()
plt.show()
Ok, thats better, now as you can see, there is a relativey strong pattern within the movement. Specifically, since we will be using meanreversion, we can see that the equitites cross, then diverge, then cross again multiple times which shows promise in our strategy. Next, we will build out our dataframe. This will include what we've looked at before such as the Soybean futures and AGCO prices, along wih the standardized versions of these prices which we found in the step before for comparison. We will then also find the standardized spread which is the spread between the standardized prices which we will graph, along with the spread of the original prices which we will use later when we convert the standardized standard deviations back to the original.
Along with this, we will also create the column "Spread Z-score" which we will use later. This may cause some confusion as we already have the column "Standardized Spread". However, there is a difference. I know, you may be saying but the stock spread is already standardized with the column "Standardized Spread". However, see the chart below, it's not neccesarily standardized, it's simply the spread of the standardized price. Yes there is a difference... I believe.
In [6]:
df = pd.DataFrame({
'Soybean Price': soybeanHistory['Close'],
'AGCO Price': agcoHistory['Close'],
'Standardized AGCO': standardAgco,
'Standardized Soybean': standardSoybean
})
df['Original Spread'] = df['Soybean Price'] - df['AGCO Price']
df['Standardized Spread'] = df['Standardized Soybean'] - df['Standardized AGCO']
spreadZscore = stats.zscore(df['Standardized Spread'])
df['Spread Z-score'] = spreadZscore
print(df)
Soybean Price AGCO Price Standardized AGCO \ Date 2023-08-14 00:00:00-04:00 1391.00 475.75 0.992376 2023-08-15 00:00:00-04:00 1323.25 464.00 0.583110 2023-08-16 00:00:00-04:00 1334.75 469.50 0.774681 2023-08-17 00:00:00-04:00 1336.75 473.00 0.896590 2023-08-18 00:00:00-04:00 1362.75 479.50 1.122992 ... ... ... ... 2024-08-07 00:00:00-04:00 1020.25 383.25 -2.229502 2024-08-08 00:00:00-04:00 1009.75 379.25 -2.368827 2024-08-09 00:00:00-04:00 1028.00 376.75 -2.455904 2024-08-12 00:00:00-04:00 1012.00 383.25 -2.229502 2024-08-13 00:00:00-04:00 959.75 396.50 -1.767990 Standardized Soybean Original Spread \ Date 2023-08-14 00:00:00-04:00 1.822068 915.25 2023-08-15 00:00:00-04:00 1.064243 859.25 2023-08-16 00:00:00-04:00 1.192877 865.25 2023-08-17 00:00:00-04:00 1.215248 863.75 2023-08-18 00:00:00-04:00 1.506074 883.25 ... ... ... 2024-08-07 00:00:00-04:00 -2.324999 637.00 2024-08-08 00:00:00-04:00 -2.442448 630.50 2024-08-09 00:00:00-04:00 -2.238311 651.25 2024-08-12 00:00:00-04:00 -2.417280 628.75 2024-08-13 00:00:00-04:00 -3.001729 563.25 Standardized Spread Spread Z-score Date 2023-08-14 00:00:00-04:00 0.829692 1.635256 2023-08-15 00:00:00-04:00 0.481132 0.948273 2023-08-16 00:00:00-04:00 0.418196 0.824230 2023-08-17 00:00:00-04:00 0.318658 0.628050 2023-08-18 00:00:00-04:00 0.383082 0.755024 ... ... ... 2024-08-07 00:00:00-04:00 -0.095497 -0.188217 2024-08-08 00:00:00-04:00 -0.073621 -0.145102 2024-08-09 00:00:00-04:00 0.217594 0.428860 2024-08-12 00:00:00-04:00 -0.187778 -0.370096 2024-08-13 00:00:00-04:00 -1.233739 -2.431599 [252 rows x 7 columns]
Ok, the dataframe looks good, now we will graph the standardized spread, ideally we want this to cross the x-axis or y = 0 multiple times going back and fourth idicating multiple price convergences of the standardized price along with patterns that we can predict amongst the original prices.
In [7]:
plt.plot(df.index, df['Standardized Spread'])
plt.axhline(0, color='black')
plt.show()
Ok, graph looks promising, Next up we're going to graph the Spread Z-score which I mentioned earlier. This will specifically indicate when we should enter/exit positions along with when to go long/short.
In [8]:
plt.plot(df.index, df['Spread Z-score'])
plt.axhline(0, color='black')
plt.axhline(-1.0, color='red', linestyle='--')
plt.axhline(1.0, color='orange', linestyle='--')
plt.show()
Ok, so we know the strategy. When the spread above falls below -1 we go long on Soybean futures while shorting AGCO. if the spread is above 1, we will go long on AGCO while shorting Soybean futures. For further understanding as to why we do this, look at the next graph where I have overlayed the standardized prices of AGCO and Soybean Futures onto the z-score spread.
In [9]:
plt.plot(df.index, df['Spread Z-score'])
plt.axhline(0, color='black')
plt.axhline(-1.0, color='red', linestyle='--')
plt.axhline(1.0, color='orange', linestyle='--')
# New lines
plt.plot(df.index, df['Standardized AGCO'], color='green', label='AGCO Price')
plt.plot(df.index, df['Standardized Soybean'], color='purple', label='Soybean Price')
plt.legend()
plt.show()
For further explanation, when the z-score spread (blue line) drops below -1, it means the price of AGCO is overvalued compared to Soybean futures and they will converge in the future (standardized pricing), therefore we short AGCO and go long on Soybean futures. When the z-score spread (blue line) goes above 1, it means the price of soybean futures is overvalued compared to AGCO and they will converge in the future (standardized pricing), therefore we short soybean futures and go long on AGCO.
So we know when to short/ go long on AGCO and Soybean futures in terms of 1 and -1, however what does this
In [127]:
spreadMean = np.mean(df['Original Spread'])
spreadSTD = np.std(df['Original Spread'])
lowerBound = spreadMean - spreadSTD
upperBound = spreadMean + spreadSTD
print(f"1 on z-score graph equals: {upperBound}\n0 on z-score graph equals: {spreadMean}\n-1 on z-score graph equals: {lowerBound}")
# soybeanMean = np.mean(df['Soybean Price'])
# soybeanSTD = np.std(df['Soybean Price'])
# lowerBound = soybeanMean - soybeanSTD
# upperBound = soybeanMean + soybeanSTD
plt.plot(df.index, df['Original Spread'])
plt.axhline(785, color='black')
plt.axhline(851, color='red')
plt.axhline(719, color='green')
plt.show()
1 on z-score graph equals: 851.3186753553349 0 on z-score graph equals: 785.2628458498024 -1 on z-score graph equals: 719.2070163442698
In [11]:
soybeanMean = np.mean(df['Soybean Price'])
soybeanSTD = np.std(df['Soybean Price'])
lowerBound = soybeanMean - soybeanSTD
upperBound = soybeanMean + soybeanSTD
print(f"1 on z-score graph equals: {upperBound}\n0 on z-score graph equals: {soybeanMean}\n-1 on z-score graph equals: {lowerBound}")
1 on z-score graph equals: 1317.5066807352257 0 on z-score graph equals: 1228.1061507936508 -1 on z-score graph equals: 1138.705620852076
Now we know that we have combined pairs trading and mean reversion into the following strategy, Spread > 1: Long AGCO, Short Soybean Spread < -1: Long Soybean, Short AGCO