Skip to content

Commit

Permalink
Merge pull request #1 from x512/dev
Browse files Browse the repository at this point in the history
- enhancement: ``drop_rf()`` calls ``get_factors()`` if called beforehand. 
- added: ``.drop_mkt()``, ``--nomkt``
- fix: hml_devil using a persistent cache.
- enhancement: cleanup README.md
- chore: isort imports, some error messages.
- todo: ``hml_devil_factors`` returns HML Devil. Rename HML_Devil...
  • Loading branch information
x512 authored Dec 23, 2023
2 parents bb2ec14 + 4202c44 commit b856900
Show file tree
Hide file tree
Showing 11 changed files with 302 additions and 242 deletions.
8 changes: 2 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,16 +28,12 @@ lib64/
MANIFEST
sdist/
var/
venv.bak/
venv/
*venv.bak/
*venv/
wheels/

.nox/
.tox/
.vscode/
.ruff_cache
.cache/
*_venv/

**/*.csv
**/*.xlsx
211 changes: 114 additions & 97 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# getfactormodels

![Python 3.11](https://img.shields.io/badge/Python-3.7+-306998.svg?logo=python&logoColor=ffde57&style=flat-square) ![PyPI - Version](https://img.shields.io/pypi/v/getfactormodels?style=flat-square&label=PyPI)
![PyPI - Status](https://img.shields.io/pypi/status/getfactormodels?style=flat-square)


Reliably retrieve data for various multi-factor asset pricing models.
Expand All @@ -27,141 +28,145 @@ _Thanks to: Kenneth French, Robert Stambaugh, Lin Sun, Zhiguo He, AQR Capital Ma

`getfactormodels` requires Python ``>=3.7``

* Install with pip:
* The easiest way to install getfactormodels is via pip:

```shell
pip install getfactormodels
$ pip install getfactormodels
```

## Usage

>[!WARNING]
>Please be aware that `getfactormodels` was recently released (Dec 20, 2023) and is not stable while this message is displayed.
>[!IMPORTANT]
>![PyPI - Status](https://img.shields.io/pypi/status/getfactormodels?style=flat-square)
>
#### Python
>``getfactormodels`` is new. It was released on December 20, 2023. Don't rely on it for anything.
After installing, import ``getfactormodels`` and call ``get_factors()`` with the ``model`` and ``frequency`` parameters. Optionally, specify a ``start_date`` and ``end_date``
* For example, to retrieve the daily q-factor model data:
After installation, import and call the ``get_factors()`` function with the ``model`` and ``frequency`` params:

```py
import getfactormodels

getfactormodels.get_factors(model='q', frequency='d')
```
> _Trimmed output:_
```txt
> df
Mkt-RF R_ME R_IA R_ROE R_EG RF
date
1967-01-03 0.000778 0.004944 0.001437 -0.007118 -0.008563 0.000187
1967-01-04 0.001667 -0.003487 -0.000631 -0.002044 -0.000295 0.000187
1967-01-05 0.012990 0.004412 -0.005688 0.000838 -0.003075 0.000187
1967-01-06 0.007230 0.006669 0.008897 0.003603 0.002669 0.000187
1967-01-09 0.008439 0.006315 0.000331 0.004949 0.002979 0.000187
... ... ... ... ... ... ...
2022-12-23 0.005113 -0.001045 0.004000 0.010484 0.003852 0.000161
2022-12-27 -0.005076 -0.001407 0.010190 0.009206 0.003908 0.000161
2022-12-28 -0.012344 -0.004354 0.000133 -0.010457 -0.004953 0.000161
2022-12-29 0.018699 0.008568 -0.008801 -0.012686 -0.002162 0.000161
2022-12-30 -0.002169 0.001840 0.001011 -0.004151 -0.003282 0.000161

[14096 rows x 6 columns]
```

* or, retreive the monthly liquidity factors of Pastor and Stambaugh for the 1990s:

```py
import getfactormodels as getfactormodels

df = getfactormodels.get_factors(model='liquidity', frequency='m', start_date='1990-01-01', end_date='1999-12-31')
```
> If you don't have time to type `liquidity`, type `liq`, or `ps`--there's a handy regex.
* For example, retrieving the monthly ${q}^{5}$ factor model:

```python
import getfactormodels

data = getfactormodels.get_factors(model='q', frequency='m')
```

* or, saving the monthly 3-factor model of Fama & French to a file:
> _Trimmed output:_
```txt
> print(data)
Mkt-RF R_ME R_IA R_ROE R_EG RF
date
1967-01-03 0.000778 0.004944 0.001437 -0.007118 -0.008563 0.000187
1967-01-04 0.001667 -0.003487 -0.000631 -0.002044 -0.000295 0.000187
1967-01-05 0.012990 0.004412 -0.005688 0.000838 -0.003075 0.000187
1967-01-06 0.007230 0.006669 0.008897 0.003603 0.002669 0.000187
1967-01-09 0.008439 0.006315 0.000331 0.004949 0.002979 0.000187
... ... ... ... ... ... ...
2022-12-23 0.005113 -0.001045 0.004000 0.010484 0.003852 0.000161
2022-12-27 -0.005076 -0.001407 0.010190 0.009206 0.003908 0.000161
2022-12-28 -0.012344 -0.004354 0.000133 -0.010457 -0.004953 0.000161
2022-12-29 0.018699 0.008568 -0.008801 -0.012686 -0.002162 0.000161
2022-12-30 -0.002169 0.001840 0.001011 -0.004151 -0.003282 0.000161
[14096 rows x 6 columns]
```

```py
import getfactormodels as gfm
* Retrieving the daily data for the Fama-French 3-factor model, since `start_date`:

df = gfm.get_factors(model='ff3', frequency='m', output="ff3_data.csv")
```
>The output parameter accepts a filename, path or directory, and can be one of csv, md, txt, xlsx, pkl.
```python
import getfactormodels as gfm

* You can also import just the models that you need.:
df = gfm.get_factors(model='ff3', frequency='d', start_date=`2006-01-01`)
```

* For example, to import only the *ICR* and *q*-factor models:
* Retrieving data for Stambaugh and Yuan's monthly *Mispricing* factors, between `start_date` and `end_date`, and saving the data to a file:

```py
from getfactormodels import icr_factors, q_factors
```python
import getfactormodels as gfm

df = gfm.get_factors(model='mispricing', start_date='1970-01-01', end_date=1999-12-31, output='mispricing_factors.csv')
```

# Passing a model function with no params defaults to monthly.
df = icr_factors()
>``output`` can be a filename, directory, or path. If no extension is specified, defaults to .csv (can be one of: .xlsx, .csv, .txt, .pkl, .md)
# The 'q' models, and the 3-factor model of Fama-French also have weekly data.
df = q_factors(frequency="W", start_date="1992-01-01)
```
You can import only the models that you need:

* If using ``ff_factors()``, then an additional ``model`` parameter should be specified:
* For example, to import only the *ICR* and *q-factor* models:

```py
from getfactormodels import ff_factors
```python
from getfactormodels import icr_factors, q_factors

# To get annual data for the 5-factor model:
data = ff_factors(model="5", frequency="Y", output=".xlsx")
# Passing a model function without params defaults to monthly data.
df = icr_factors()

# Daily 3-factor model data, since 1970 (not specifying an end date
# will return data up until today):
data = ff_factors(model="3", frequency="D", start_date="1970-01-01")
```
> Output allows just an extension to be specified.
# The 'q' models, and the 3-factor model of Fama-French have weekly data available:
df = q_factors(frequency="W", start_date="1992-01-01, output='.xlsx')
```

* or import all the models:
>``output`` allows just a file extension (with the `.`, else it'll be passed as a filename).

```py
from getfactormodels.models import models
* When using `ff_factors()`, specify an additional `model` parameter (**this might be changed**):

df = models.barillas_shanken_factors('m')
```python
# To get annual data for the 5-factor model:
data = ff_factors(model="5", frequency="Y", output=".xlsx")

# Daily 3-factor model data, since 1970 (not specifying an end date
# will return data up until today):
data = ff_factors(model="3", frequency="D", start_date="1970-01-01")
```

* There's also the `FactorExtractor` class that the CLI uses (it doesn't really do a whole lot yet):
There's also a ``FactorExtractor`` class (which doesn't do much yet, it's mainly used by the CLI; lots to do):

```python
from getfactormodels import FactorExtractor
from getfactormodels import FactorExtractor

fe = FactorExtractor(model='carhart', frequency='m', start_date='1980-01-01', end_date='1980-05-01')
fe.get_factors()
fe.to_file('carhart_factors.md')
```
fe = FactorExtractor(model='carhart', start_date='1980-01-01', end_date='1980-05-01)
fe.get_factors()
fe.drop_rf()
fe.to_file('~/carhart_factors.md')
```

* _The resulting ``carhart_factors.md`` file will look like this:_
* _The resulting ``carhart_factors.md`` file will look like this:_

| date | Mkt-RF | SMB | HML | MOM | RF |
|:--------------------|---------:|--------:|--------:|--------:|-------:|
| 1980-01-31 00:00:00 | 0.0551 | 0.0162 | 0.0175 | 0.0755 | 0.008 |
| 1980-02-29 00:00:00 | -0.0122 | -0.0185 | 0.0061 | 0.0788 | 0.0089 |
| 1980-03-31 00:00:00 | -0.129 | -0.0664 | -0.0101 | -0.0955 | 0.0121 |
| 1980-04-30 00:00:00 | 0.0397 | 0.0105 | 0.0106 | -0.0043 | 0.0126 |
| date | Mkt-RF | SMB | HML | MOM |
|:--------------------|---------:|--------:|--------:|--------:|
| 1980-01-31 00:00:00 | 0.0551 | 0.0162 | 0.0175 | 0.0755 |
| 1980-02-29 00:00:00 | -0.0122 | -0.0185 | 0.0061 | 0.0788 |
| 1980-03-31 00:00:00 | -0.129 | -0.0664 | -0.0101 | -0.0955 |
| 1980-04-30 00:00:00 | 0.0397 | 0.0105 | 0.0106 | -0.0043 |

>``.drop_rf()`` will return the DataFrame without the `RF` column. You can also drop the "Mkt-RF" column with ``.drop_mkt()``

### CLI

#### Using the CLI
* You can also use getfactormodels from the command line.
``bash >=4.2``

```bash
$ getfactormodels -h
* You can also use getfactormodels from the command line. It's very barebones, here's the `-h`:

usage: getfactormodels [-h] -m MODEL [-f FREQ] [-s START] [-e END] [-o OUTPUT] [--no_rf]
```
```shell
$ getfactormodels -h

usage: getfactormodels [-h] -m MODEL [-f FREQ] [-s START] [-e END] [-o OUTPUT] [--no_rf] [--no_mkt]
```

* An example of how to use the CLI to retrieve the Fama-French 3-factor model data:
```bash
getfactormodels --model ff3 --frequency M --start-date 1960-01-01 --end-date 2020-12-31 --output "filename.csv"
```
> Accepted file extensions are .csv, .txt, .xlsx, and .md. If no extension is given, the output file will be .csv. The --output flag allows a filename, filepath or a directory. If only an extension is provided (including the . else it'll be passed as a filename), a name will be generated.

* Here's another example that retrieves the annual Fama-French 5-factor data without the RF column:

```sh
getfactormodels -m 5 -f Y -s 1960-01-01 -e 2020-12-31 --no_rf -o ~/some_dir/filename.xlsx
```shell
$ getfactormodels --model ff3 --frequency M --start-date 1960-01-01 --end-date 2020-12-31 --output ".csv"
```
> `--no_rf` will return the factor model without an RF column.

* Here's another example that retrieves the annual Fama-French 5-factor data without the RF column (using ``--no_rf``)

```shell
$ getfactormodels -m ff5 -f Y -s 1960-01-01 -e 2020-12-31 --no_rf -o ~/some_dir/filename.xlsx
```
* To return the factors without the risk-free rate `RF`, or the excess market return `Mkt-RF`, columns:

## Data Availability

>[TODO]

## References
1. <a id="1"></a> E. F. Fama and K. R. French, ‘Common risk factors in the returns on stocks and bonds’, *Journal of Financial Economics*, vol. 33, no. 1, pp. 3–56, 1993. [PDF](https://people.duke.edu/~charvey/Teaching/BA453_2006/FF_Common_risk.pdf)
Expand Down Expand Up @@ -200,3 +205,15 @@ After installing, import ``getfactormodels`` and call ``get_factors()`` with the

[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat-square&labelColor=ef8336)](https://pycqa.github.io/isort/)
[![Ruff](https://img.shields.io/badge/-ruff-%23261230?style=flat-square&logo=ruff&logoColor=d7ff64)](https://simpleicons.org/?q=ruff)
---

#### Known issues

* The first `hml_devil_factors()` retrieval is slow, because the download from aqr.com is slow. It's the only model, so far, implementing a cache—daily data expires at the end of the day and is only re-downloaded when the requested`end_date` exceeds the file's last index date. Similar for monthly, expiring at EOM and re-downloaded when needed.

#### Todo

- [ ] Docs
- [ ] Examples
- [ ] Tests
- [ ] Error handling
8 changes: 4 additions & 4 deletions getfactormodels/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
Expand All @@ -20,10 +20,10 @@
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
__version__ = "0.0.3"
__version__ = "0.0.4"

from .__main__ import FactorExtractor, get_factors
from .models import models # noqa: F401
from .models import models # noqa: F401, RUF100 (silent flake8 in VScode)
from .models.models import (barillas_shanken_factors, carhart_factors,
dhs_factors, ff_factors, hml_devil_factors,
icr_factors, liquidity_factors, mispricing_factors,
Expand Down
Loading

0 comments on commit b856900

Please sign in to comment.