Skip to content

Commit

Permalink
Hurst exponent with OHLC
Browse files Browse the repository at this point in the history
  • Loading branch information
lsbardel committed Dec 30, 2024
1 parent 5bb1148 commit 94afa73
Show file tree
Hide file tree
Showing 14 changed files with 172 additions and 32 deletions.
1 change: 1 addition & 0 deletions notebooks/api/ta/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ Timeseries Analysis
:maxdepth: 1

ohlc
paths
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Paths
===========

.. module:: quantflow.utils.paths
.. module:: quantflow.ta.paths

.. autoclass:: Paths
:members:
Expand Down
1 change: 0 additions & 1 deletion notebooks/api/utils/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,4 @@ Utils
.. toctree::
:maxdepth: 1

paths
marginal1d
158 changes: 141 additions & 17 deletions notebooks/applications/hurst.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,38 +6,55 @@ jupytext:
format_version: 0.13
jupytext_version: 1.16.6
kernelspec:
display_name: Python 3 (ipykernel)
display_name: .venv
language: python
name: python3
---

# Hurst Exponent

The [Hurst exponent](https://en.wikipedia.org/wiki/Hurst_exponent) is used as a measure of long-term memory of time series. It relates to the autocorrelations of the time series, and the rate at which these decrease as the lag between pairs of values increases.

The [Hurst exponent](https://en.wikipedia.org/wiki/Hurst_exponent) is used as a measure of long-term memory of time series. It relates to the auto-correlations of the time series, and the rate at which these decrease as the lag between pairs of values increases.
It is a statistics which can be used to test if a time-series is mean reverting or it is trending.

The idea idea behind the Hurst exponent is that if the time-series $x_t$ follows a Brownian motion (aka Weiner process), than variance between two time points will increase linearly with the time difference. that is to say

\begin{align}
\text{Var}(x_{t_2} - x_{t_1}) &\propto t_2 - t_1 \\
&\propto \Delta t^{2H}\\
H &= 0.5
\end{align}

where $H$ is the Hurst exponent.

Trending time-series have a Hurst exponent H > 0.5, while mean reverting time-series have H < 0.5. Understanding in which regime a time-series is can be useful for trading strategies.

These are some references to understand the Hurst exponent and its applications:

* [Hurst Exponent for Algorithmic Trading](https://robotwealth.com/demystifying-the-hurst-exponent-part-1/)
* [Basics of Statistical Mean Reversion Testing](https://www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing/)

## Study with the Weiner Process

We want to construct a mechanism to estimate the Hurst exponent via OHLC data because it is widely available from data provider and easily constructed as an online signal during trading.
We want to construct a mechanism to estimate the Hurst exponent via OHLC data because it is widely available from data providers and easily constructed as an online signal during trading.

In order to evaluate results against known solutions, we consider the Weiner process as generator of timeseries.

The Weiner process is a continuous-time stochastic process named in honor of Norbert Wiener. It is often also called Brownian motion due to its historical connection with the physical model of Brownian motion of particles in water, named after the botanist Robert Brown.

We use the **WeinerProcess** from the stochastic process library and sample one path over a time horizon of 1 (day) with a time step every second.

```{code-cell} ipython3
from quantflow.sp.weiner import WeinerProcess
from quantflow.utils.dates import start_of_day
p = WeinerProcess(sigma=0.5)
paths = p.sample(1, 1, 24*60*60)
paths.plot()
p = WeinerProcess(sigma=2.0)
paths = p.sample(n=1, time_horizon=1, time_steps=24*60*60)
paths.plot(title="A path of Weiner process with sigma=2.0")
```

In order to down-sample the timeseries, we need to convert it into a dataframe with dates as indices.

```{code-cell} ipython3
df = paths.as_datetime_df(start=start_of_day()).reset_index()
from quantflow.utils.dates import start_of_day
df = paths.as_datetime_df(start=start_of_day(), unit="d").reset_index()
df
```

Expand All @@ -50,9 +67,19 @@ The value should be close to the **sigma** of the WeinerProcess defined above.
float(paths.paths_std(scaled=True)[0])
```

### Range-base Variance estimators
The evaluation of the hurst exponent is done by calculating the variance for several time windows and by fitting a line to the log-log plot of the variance vs the time window.

We now turn our attention to range-based volatility estimators. These estimators depends on OHLC timeseries, which are widely available from data providers such as [FMP](https://site.financialmodelingprep.com/).
```{code-cell} ipython3
paths.hurst_exponent()
```

As expected, the Hurst exponent should be close to 0.5, since we have calculated the exponent from the paths of a Weiner process.

+++

### Range-based Variance Estimators

We now turn our attention to range-based variance estimators. These estimators depends on OHLC timeseries, which are widely available from data providers such as [FMP](https://site.financialmodelingprep.com/).
To analyze range-based variance estimators, we use he **quantflow.ta.OHLC** tool which allows to down-sample a timeserie to OHLC series and estimate variance with three different estimators

* **Parkinson** (1980)
Expand All @@ -65,14 +92,111 @@ For this we build an OHLC estimator as template and use it to create OHLC estima

```{code-cell} ipython3
import pandas as pd
import polars as pl
import math
from quantflow.ta.ohlc import OHLC
ohlc = OHLC(serie="0", period="10m", rogers_satchell_variance=True, parkinson_variance=True, garman_klass_variance=True)
template = OHLC(serie="0", period="10m", rogers_satchell_variance=True, parkinson_variance=True, garman_klass_variance=True)
seconds_in_day = 24*60*60
def rstd(pdf: pl.Series, range_seconds: float) -> float:
"""Calculate the standard deviation from a range-based variance estimator"""
variance = pdf.mean()
# scale the variance by the number of seconds in the period
variance = seconds_in_day * variance / range_seconds
return math.sqrt(variance)
results = []
for period in ("2m", "5m", "10m", "30m", "1h", "4h"):
operator = ohlc.model_copy(update=dict(period=period))
result = operator(df).sum()
results.append(dict(period=period, pk=result["0_pk"].item(), gk=result["0_gk"].item(), rs=result["0_rs"].item()))
vdf = pd.DataFrame(results)
for period in ("10s", "20s", "30s", "1m", "2m", "3m", "5m", "10m", "30m"):
ohlc = template.model_copy(update=dict(period=period))
rf = ohlc(df)
ts = pd.to_timedelta(period).to_pytimedelta().total_seconds()
data = dict(period=period)
for name in ("pk", "gk", "rs"):
estimator = rf[f"0_{name}"]
data[name] = rstd(estimator, ts)
results.append(data)
vdf = pd.DataFrame(results).set_index("period")
vdf
```

These numbers are different from the realized variance because they are based on the range of the prices, not on the actual prices. The realized variance is a more direct measure of the volatility of the process, while the range-based estimators are more robust to market microstructure noise.

The Parkinson estimator is always higher than both the Garman-Klass and Rogers-Satchell estimators, the reason is due to the use of the high and low prices only, which are always further apart than the open and close prices. The GK and RS estimators are similar and are more accurate than the Parkinson estimator, especially for greater periods.

```{code-cell} ipython3
pd.options.plotting.backend = "plotly"
fig = vdf.plot(markers=True, title="Weiner Standard Deviation from Range-based estimators - correct value is 2.0")
fig.show()
```

To estimate the Hurst exponent with the range-based estimators, we calculate the variance of the log of the range for different time windows and fit a line to the log-log plot of the variance vs the time window.

```{code-cell} ipython3
from typing import Sequence
import numpy as np
from quantflow.ta.ohlc import OHLC
from collections import defaultdict
from quantflow.ta.base import DataFrame
default_periods = ("10s", "20s", "30s", "1m", "2m", "3m", "5m", "10m", "30m")
def ohlc_hurst_exponent(
df: DataFrame,
series: Sequence[str],
periods: Sequence[str] = default_periods,
) -> DataFrame:
results = {}
estimator_names = ("pk", "gk", "rs")
for serie in series:
template = OHLC(
serie=serie,
period="10m",
rogers_satchell_variance=True,
parkinson_variance=True,
garman_klass_variance=True
)
time_range = []
estimators = defaultdict(list)
for period in periods:
ohlc = template.model_copy(update=dict(period=period))
rf = ohlc(df)
ts = pd.to_timedelta(period).to_pytimedelta().total_seconds()
time_range.append(ts)
for name in estimator_names:
estimators[name].append(rf[f"{serie}_{name}"].mean())
results[serie] = [float(np.polyfit(np.log(time_range), np.log(estimators[name]), 1)[0])/2.0 for name in estimator_names]
return pd.DataFrame(results, index=estimator_names)
```

```{code-cell} ipython3
ohlc_hurst_exponent(df, series=["0"])
```

The Hurst exponent should be close to 0.5, since we have calculated the exponent from the paths of a Weiner process. But the Hurst exponent is not exactly 0.5 because the range-based estimators are not the same as the realized variance. Interestingly, the Parkinson estimator gives a Hurst exponent closer to 0.5 than the Garman-Klass and Rogers-Satchell estimators.

## Mean Reverting Time Series

We now turn our attention to mean reverting time series, where the Hurst exponent is less than 0.5.

```{code-cell} ipython3
from quantflow.sp.ou import Vasicek
import pandas as pd
pd.options.plotting.backend = "plotly"
p = Vasicek(kappa=2)
paths = {f"kappa={k}": Vasicek(kappa=k).sample(n=1, time_horizon=1, time_steps=24*60*6) for k in (1.0, 10.0, 50.0, 100.0, 500.0)}
pdf = pd.DataFrame({k: p.path(0) for k, p in paths.items()}, index=paths["kappa=1.0"].dates(start=start_of_day()))
pdf.plot()
```

We can now estimate the Hurst exponent from the realized variance. As we can see the Hurst exponent decreases as we increase the mean reversion parameter.

```{code-cell} ipython3
pd.DataFrame({k: [p.hurst_exponent()] for k, p in paths.items()})
```

And we can also estimate the Hurst exponent from the range-based estimators. As we can see the Hurst exponent decreases as we increase the mean reversion parameter along the same lines as the realized variance.

```{code-cell} ipython3
ohlc_hurst_exponent(pdf.reset_index(), list(paths), periods=("10m", "20m", "30m", "1h"))
```
2 changes: 1 addition & 1 deletion quantflow/sp/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
from pydantic import BaseModel, ConfigDict, Field
from scipy.optimize import Bounds

from quantflow.ta.paths import Paths
from quantflow.utils.marginal import Marginal1D, default_bounds
from quantflow.utils.numbers import sigfig
from quantflow.utils.paths import Paths
from quantflow.utils.transforms import lower_bound, upper_bound
from quantflow.utils.types import FloatArray, FloatArrayLike, Vector

Expand Down
2 changes: 1 addition & 1 deletion quantflow/sp/bns.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from pydantic import Field
from scipy.special import xlogy

from ..utils.paths import Paths
from ..ta.paths import Paths
from ..utils.types import FloatArrayLike, Vector
from .base import Im, StochasticProcess1D
from .ou import GammaOU
Expand Down
2 changes: 1 addition & 1 deletion quantflow/sp/cir.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

from quantflow.utils.types import FloatArrayLike, Vector

from ..utils.paths import Paths
from ..ta.paths import Paths
from .base import Im, IntensityProcess


Expand Down
2 changes: 1 addition & 1 deletion quantflow/sp/heston.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
import numpy as np
from pydantic import Field

from ..ta.paths import Paths
from ..utils.distributions import DoubleExponential, Exponential
from ..utils.paths import Paths
from ..utils.types import FloatArrayLike, Vector
from .base import StochasticProcess1D
from .cir import CIR
Expand Down
2 changes: 1 addition & 1 deletion quantflow/sp/jump_diffusion.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
import numpy as np
from pydantic import Field

from ..ta.paths import Paths
from ..utils.distributions import Normal
from ..utils.paths import Paths
from ..utils.types import FloatArrayLike, Vector
from .base import StochasticProcess1D
from .poisson import CompoundPoissonProcess, D
Expand Down
2 changes: 1 addition & 1 deletion quantflow/sp/ou.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
from scipy.optimize import Bounds
from scipy.stats import gamma, norm

from ..ta.paths import Paths
from ..utils.distributions import Exponential
from ..utils.paths import Paths
from ..utils.types import Float, FloatArrayLike, Vector
from .base import Im, IntensityProcess
from .poisson import CompoundPoissonProcess, D
Expand Down
2 changes: 1 addition & 1 deletion quantflow/sp/poisson.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
from scipy.optimize import Bounds
from scipy.stats import poisson

from ..ta.paths import Paths
from ..utils.distributions import Distribution1D
from ..utils.functions import factorial
from ..utils.paths import Paths
from ..utils.transforms import TransformResult
from ..utils.types import FloatArray, FloatArrayLike, Vector
from .base import Im, StochasticProcess1D, StochasticProcess1DMarginal
Expand Down
2 changes: 1 addition & 1 deletion quantflow/sp/weiner.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from pydantic import Field
from scipy.stats import norm

from ..utils.paths import Paths
from ..ta.paths import Paths
from ..utils.types import FloatArrayLike, Vector
from .base import StochasticProcess1D

Expand Down
24 changes: 20 additions & 4 deletions quantflow/utils/paths.py → quantflow/ta/paths.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@
from pydantic import BaseModel, Field
from scipy.integrate import cumulative_trapezoid

from . import plot
from .bins import pdf as bins_pdf
from .dates import utcnow
from .types import FloatArray
from quantflow.utils import plot
from quantflow.utils.bins import pdf as bins_pdf
from quantflow.utils.dates import utcnow
from quantflow.utils.types import FloatArray


class Paths(BaseModel, arbitrary_types_allowed=True):
Expand Down Expand Up @@ -58,6 +58,10 @@ def ys(self) -> list[list[float]]:
"""Paths as list of list (for visualization tools)"""
return self.data.transpose().tolist() # type: ignore

def path(self, i: int) -> FloatArray:
"""Path i"""
return self.data[:, i]

def dates(
self, *, start: datetime | None = None, unit: str = "d"
) -> pd.DatetimeIndex:
Expand Down Expand Up @@ -116,6 +120,18 @@ def integrate(self) -> Paths:
data=cumulative_trapezoid(self.data, dx=self.dt, axis=0, initial=0),
)

def hurst_exponent(self, lags: int | None = None) -> float:
"""Estimate the Hurst exponent of the paths"""
ts = self.time_steps // 2
n = min(lags or ts, 100)
lags = []
tau = []
for lag in range(2, n):
variances = np.var(self.data[lag:, :] - self.data[:-lag, :], axis=0)
tau.extend(variances)
lags.extend([lag] * self.samples)
return float(np.polyfit(np.log(lags), np.log(tau), 1)[0]) / 2.0

def cross_section(self, t: float | None = None) -> FloatArray:
"""Cross section of paths at time t"""
index = self.time_steps
Expand Down
2 changes: 1 addition & 1 deletion quantflow_tests/test_utils.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import numpy as np

from quantflow.ta.paths import Paths
from quantflow.utils.numbers import round_to_step, to_decimal
from quantflow.utils.paths import Paths


def test_round_to_step():
Expand Down

0 comments on commit 94afa73

Please sign in to comment.