In [1]:
import yfinance as yp
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import pandas as pd
THE FOLLOWING INFORMATION IS NOT FINANCIAL ADVICE

First we are going to start by gathering data using yahoo finance's API from corn futures, represented with the ticker "ZC=f" and Soybean futures represented with the ticker "ZS=F". We are also going to create a list called farmEuipmentTickers which is composed of tickers from a variety of companies that produce farming equitment used to produce corn and soybeans. Deere & Company (DE), AGCO corporation (AGCO), Lindsay Corporation (LNN), VanEck Agribusiness ETF (MOO), iShares MSCI Agriculture Producers ETF (VEGI).
In [2]:
corn = yp.Ticker("ZC=F")
soybean = yp.Ticker("ZS=F")

cornHistory = corn.history(period="1y")
soybeanHistory = soybean.history(period="1y")



farmEquipmentTickers = ["DE", "AGCO", "LNN", "MOO", "VEGI"]

timeFrames = ['6mo', '1y', '2y', '5y', '10y', 'ytd', 'max']

performanceTracker_Corn = {}
performanceTracker_Soy = {}

corn = yp.Ticker("ZC=F")
soybean = yp.Ticker("ZS=F")
Next up, we're going to determine what tickers have the strongest correlation coeffiencent (Insert equation for Pearson's correlation coefiicent below) using Pearson's method demonstrated below. This will give us an idea of what equities to look further into when attempting to find a pair for soybean/corn futures.
In [3]:
for ticker in farmEquipmentTickers:
    tempCornPerformance = []
    tempSoyPerformance = []
    stock = yp.Ticker(ticker)

    for tf in timeFrames:

        cornHistory = corn.history(period=tf)

        soybeanHistory = soybean.history(period=tf)

        stockHistory = stock.history(period=tf)

        cornCorr = cornHistory['Close'].corr(stockHistory['Close'])
        soyCorr = soybeanHistory['Close'].corr(stockHistory['Close'])

        tempCornPerformance.append(cornCorr)
        tempSoyPerformance.append(soyCorr)
        
    avgCornCorr = np.mean(tempCornPerformance)
    avgSoyCorr = np.mean(tempSoyPerformance)

    performanceTracker_Corn[ticker] = avgCornCorr
    performanceTracker_Soy[ticker] = avgSoyCorr

print(f"Correlation Coefficient between a ticker and Corn Futures: {performanceTracker_Corn}\nCorrelation Coefficient between a stock and Soybean Futures: {performanceTracker_Soy}")
Correlation Coefficient between a ticker and Corn Futures: {'DE': 0.5122989259702883, 'AGCO': 0.6276846292915437, 'LNN': 0.3988388243725085, 'MOO': 0.6345065838719115, 'VEGI': 0.6398776590590355}
Correlation Coefficient between a stock and Soybean Futures: {'DE': 0.5502785947144005, 'AGCO': 0.7033431173285274, 'LNN': 0.41742724875689596, 'MOO': 0.6667577928988615, 'VEGI': 0.6614483940117848}
As you can see from the dicitonaries above, the equity/future pair with the strongest correlation coefficient was Soybean futures and AGCO. This surprised me due to the fact that corn makes up a larger portion of agriculture grown in America yet the correlation between the tickers and futures was in every case stronger for soybeans.

Next up, we will move on to actually building out the strategy after finding the correlation and pair. To start, we are just simply going to graph the price of Soybean futures along with the price of AGCO in the past year.
In [4]:
agco = yp.Ticker("AGCO")

agcoHistory = corn.history(period="1y")
soybeanHistory = soybean.history(period="1y")

plt.plot(agcoHistory.index, agcoHistory['Close'], label="AGCO")
plt.plot(soybeanHistory.index, soybeanHistory['Close'], label="Soybeans")
plt.xlabel("Time")
plt.legend()
plt.show()
No description has been provided for this image
Well, since the price is so different it's hard to see if there is actually a correlation or not, along with this it is hard to determine if there is actually a valid pair. So we will standardize the graph using z-score (Insert z-score equation below), which is demonstrated below. And I know you may be asking "Is it a legal pair even if the pairing isn't shown without first standardization. However, this process can simply help in identifying true relationships between the stocks by eliminating differences in scale and ensuring that both stocks contribute equally to the analysis.
In [5]:
standardAgco = stats.zscore(agcoHistory['Close'])
standardSoybean = stats.zscore(soybeanHistory['Close'])

plt.plot(agcoHistory.index, standardAgco, label="AGCO")
plt.plot(soybeanHistory.index, standardSoybean, label="Soybeans")
plt.xlabel("Time")
plt.legend()
plt.show()
No description has been provided for this image
Ok, thats better, now as you can see, there is a relativey strong pattern within the movement. Specifically, since we will be using meanreversion, we can see that the equitites cross, then diverge, then cross again multiple times which shows promise in our strategy. Next, we will build out our dataframe. This will include what we've looked at before such as the Soybean futures and AGCO prices, along wih the standardized versions of these prices which we found in the step before for comparison. We will then also find the standardized spread which is the spread between the standardized prices which we will graph, along with the spread of the original prices which we will use later when we convert the standardized standard deviations back to the original.

Along with this, we will also create the column "Spread Z-score" which we will use later. This may cause some confusion as we already have the column "Standardized Spread". However, there is a difference. I know, you may be saying but the stock spread is already standardized with the column "Standardized Spread". However, see the chart below, it's not neccesarily standardized, it's simply the spread of the standardized price. Yes there is a difference... I believe.
In [6]:
df = pd.DataFrame({
    'Soybean Price': soybeanHistory['Close'],
    'AGCO Price': agcoHistory['Close'],
    'Standardized AGCO': standardAgco,
    'Standardized Soybean': standardSoybean
})

df['Original Spread'] = df['Soybean Price'] - df['AGCO Price']
df['Standardized Spread'] = df['Standardized Soybean'] - df['Standardized AGCO']

spreadZscore = stats.zscore(df['Standardized Spread'])
df['Spread Z-score'] = spreadZscore

print(df)
                           Soybean Price  AGCO Price  Standardized AGCO  \
Date                                                                      
2023-08-14 00:00:00-04:00        1391.00      475.75           0.992376   
2023-08-15 00:00:00-04:00        1323.25      464.00           0.583110   
2023-08-16 00:00:00-04:00        1334.75      469.50           0.774681   
2023-08-17 00:00:00-04:00        1336.75      473.00           0.896590   
2023-08-18 00:00:00-04:00        1362.75      479.50           1.122992   
...                                  ...         ...                ...   
2024-08-07 00:00:00-04:00        1020.25      383.25          -2.229502   
2024-08-08 00:00:00-04:00        1009.75      379.25          -2.368827   
2024-08-09 00:00:00-04:00        1028.00      376.75          -2.455904   
2024-08-12 00:00:00-04:00        1012.00      383.25          -2.229502   
2024-08-13 00:00:00-04:00         959.75      396.50          -1.767990   

                           Standardized Soybean  Original Spread  \
Date                                                               
2023-08-14 00:00:00-04:00              1.822068           915.25   
2023-08-15 00:00:00-04:00              1.064243           859.25   
2023-08-16 00:00:00-04:00              1.192877           865.25   
2023-08-17 00:00:00-04:00              1.215248           863.75   
2023-08-18 00:00:00-04:00              1.506074           883.25   
...                                         ...              ...   
2024-08-07 00:00:00-04:00             -2.324999           637.00   
2024-08-08 00:00:00-04:00             -2.442448           630.50   
2024-08-09 00:00:00-04:00             -2.238311           651.25   
2024-08-12 00:00:00-04:00             -2.417280           628.75   
2024-08-13 00:00:00-04:00             -3.001729           563.25   

                           Standardized Spread  Spread Z-score  
Date                                                            
2023-08-14 00:00:00-04:00             0.829692        1.635256  
2023-08-15 00:00:00-04:00             0.481132        0.948273  
2023-08-16 00:00:00-04:00             0.418196        0.824230  
2023-08-17 00:00:00-04:00             0.318658        0.628050  
2023-08-18 00:00:00-04:00             0.383082        0.755024  
...                                        ...             ...  
2024-08-07 00:00:00-04:00            -0.095497       -0.188217  
2024-08-08 00:00:00-04:00            -0.073621       -0.145102  
2024-08-09 00:00:00-04:00             0.217594        0.428860  
2024-08-12 00:00:00-04:00            -0.187778       -0.370096  
2024-08-13 00:00:00-04:00            -1.233739       -2.431599  

[252 rows x 7 columns]
Ok, the dataframe looks good, now we will graph the standardized spread, ideally we want this to cross the x-axis or y = 0 multiple times going back and fourth idicating multiple price convergences of the standardized price along with patterns that we can predict amongst the original prices.
In [7]:
plt.plot(df.index, df['Standardized Spread'])
plt.axhline(0, color='black')
plt.show()
No description has been provided for this image
Ok, graph looks promising, Next up we're going to graph the Spread Z-score which I mentioned earlier. This will specifically indicate when we should enter/exit positions along with when to go long/short.
In [8]:
plt.plot(df.index, df['Spread Z-score'])
plt.axhline(0, color='black')
plt.axhline(-1.0, color='red', linestyle='--')
plt.axhline(1.0, color='orange', linestyle='--')

plt.show()
No description has been provided for this image
Ok, so we know the strategy. When the spread above falls below -1 we go long on Soybean futures while shorting AGCO. if the spread is above 1, we will go long on AGCO while shorting Soybean futures. For further understanding as to why we do this, look at the next graph where I have overlayed the standardized prices of AGCO and Soybean Futures onto the z-score spread.
In [9]:
plt.plot(df.index, df['Spread Z-score'])
plt.axhline(0, color='black')
plt.axhline(-1.0, color='red', linestyle='--')
plt.axhline(1.0, color='orange', linestyle='--')

# New lines
plt.plot(df.index, df['Standardized AGCO'], color='green', label='AGCO Price')
plt.plot(df.index, df['Standardized Soybean'], color='purple', label='Soybean Price')
plt.legend()

plt.show()
No description has been provided for this image
For further explanation, when the z-score spread (blue line) drops below -1, it means the price of AGCO is overvalued compared to Soybean futures and they will converge in the future (standardized pricing), therefore we short AGCO and go long on Soybean futures. When the z-score spread (blue line) goes above 1, it means the price of soybean futures is overvalued compared to AGCO and they will converge in the future (standardized pricing), therefore we short soybean futures and go long on AGCO.
So we know when to short/ go long on AGCO and Soybean futures in terms of 1 and -1, however what does this 
In [127]:
spreadMean = np.mean(df['Original Spread'])
spreadSTD = np.std(df['Original Spread'])
lowerBound = spreadMean - spreadSTD
upperBound = spreadMean + spreadSTD

print(f"1 on z-score graph equals: {upperBound}\n0 on z-score graph equals: {spreadMean}\n-1 on z-score graph equals: {lowerBound}")

# soybeanMean = np.mean(df['Soybean Price'])
# soybeanSTD = np.std(df['Soybean Price'])
# lowerBound = soybeanMean - soybeanSTD
# upperBound = soybeanMean + soybeanSTD

plt.plot(df.index, df['Original Spread'])
plt.axhline(785, color='black')
plt.axhline(851, color='red')
plt.axhline(719, color='green')
plt.show()
1 on z-score graph equals: 851.3186753553349
0 on z-score graph equals: 785.2628458498024
-1 on z-score graph equals: 719.2070163442698
No description has been provided for this image
In [11]:
soybeanMean = np.mean(df['Soybean Price'])
soybeanSTD = np.std(df['Soybean Price'])
lowerBound = soybeanMean - soybeanSTD
upperBound = soybeanMean + soybeanSTD

print(f"1 on z-score graph equals: {upperBound}\n0 on z-score graph equals: {soybeanMean}\n-1 on z-score graph equals: {lowerBound}")
1 on z-score graph equals: 1317.5066807352257
0 on z-score graph equals: 1228.1061507936508
-1 on z-score graph equals: 1138.705620852076

Now we know that we have combined pairs trading and mean reversion into the following strategy, Spread > 1: Long AGCO, Short Soybean Spread < -1: Long Soybean, Short AGCO