Skip to content

Commit

Permalink
feat: switch from edi-energy.de and scraping to bdew-mako.de and …
Browse files Browse the repository at this point in the history
…a real(ly bad) API; drop support for Python 3.9 (#261)

---------

Co-authored-by: Konstantin <konstantin.klein+github@hochfrequenz.de>
  • Loading branch information
hf-kklein and Konstantin authored Jan 11, 2025
1 parent 987db13 commit f1321a2
Show file tree
Hide file tree
Showing 31 changed files with 32,019 additions and 22,914 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/unittests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
python-version: ["3.10", "3.11", "3.12", "3.13"]
os: [ubuntu-latest]
steps:
- uses: actions/checkout@v4
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
__pycache__/
*.py[cod]
*$py.class

foo
# C extensions
*.so

Expand Down
24 changes: 16 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
# edi-energy.de scraper

<!--- you need to replace the `organization/repo_name` in the status badge URLs --->

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
![Unittests status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Unittests/badge.svg)
![Coverage status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Coverage/badge.svg)
![Linting status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Linting/badge.svg)
![Black status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Black/badge.svg)
![PyPi Status Badge](https://img.shields.io/pypi/v/edi_energy_scraper)
![Python Versions (officially) supported](https://img.shields.io/pypi/pyversions/edi_energy_scraper.svg)

The Python package `edi_energy_scraper` provides easy to use methods to mirror the website edi-energy.de.
The Python package `edi_energy_scraper` provides easy to use methods to mirror the free documents on bdew-mako.de.

### Rationale / Why?

If you'd like to be informed about new regulations or data formats being published on edi-energy.de you can either
If you'd like to be informed about new regulations or data formats being published on bdew-mako.de you can either

- visit the site every day and hope that you see the changes if this is your favourite hobby,
- or automate the task.
Expand All @@ -21,7 +21,6 @@ This repository helps you with the latter. It allows you to create an up-to-date
computer. Other than if you mirrored the files using `wget` or `curl`, you'll get a clean and intuitive directory
structure.


From there you can e.g. commit the files into a VCS (like e.g. our [edi_energy_mirror](https://github.com/Hochfrequenz/edi_energy_mirror)), scrape the PDF/Word files for later use...

We're all hoping for the day of true digitization on which this repository will become obsolete.
Expand All @@ -46,6 +45,7 @@ Then import it and start the download:
import asyncio
from edi_energy_scraper import EdiEnergyScraper


# add the following lines to enable debug logging to stdout (CLI)
# import logging
# import sys
Expand All @@ -68,20 +68,28 @@ This creates a directory structure:
```
-|-your_script_cwd.py
|-edi_energy_de
|- past (contains archived files)
|- FV2310 (contains files valid since 2023-10-01)
|- ahb.pdf
|- ahb.docx
|- ...
|- current (contains files valid as of today)
|- FV2404 (contains files valid since 2024-04-03)
|- mig.pdf
|- mig.docx
|- ...
|- future (contains files valid in the future)
|- FV2504 (contains files valid since 2025-06-06)
|- allgemeine_festlegungen.pdf
|- schema.xsd
|- ...
```

> [!TIP]
> You can extract the information encoded into the filenames:
> ```python
> from edi_energy_scraper import DocumentMetadata
> structured_information = DocumentMetadata.from_filename("AHB_COMDIS_1.0f_99991231_20250605_20250605_8872.pdf")
> # DocumentMetadata(kind='MIG', edifact_format=<EdifactFormat.REQOTE: 'REQOTE'>, valid_from=datetime.date(2023, 9, 30), valid_unt...traordinary_publication=True, is_error_correction=False, is_informational_reading_version=True, additional_text=None, id=10071)
```
## How to use this Repository on Your Machine (for development)

Please follow the instructions in
Expand Down
7 changes: 5 additions & 2 deletions mwe.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,15 @@
"""

import asyncio
from pathlib import Path

from edi_energy_scraper import EdiEnergyScraper
from edi_energy_scraper.scraper import EdiEnergyScraper

my_target_dir = Path(__file__).parent / "foo"


async def mirror():
scraper = EdiEnergyScraper(path_to_mirror_directory="edi_energy_de")
scraper = EdiEnergyScraper(path_to_mirror_directory=my_target_dir)
await scraper.mirror()


Expand Down
16 changes: 8 additions & 8 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ build-backend = "hatchling.build"
name = "edi_energy_scraper"
description = "a scraper to mirror edi-energy.de"
license = { text = "MIT" }
requires-python = ">=3.9"
requires-python = ">=3.10"
authors = [
{ name = "Hochfrequenz Unternehmensberatung GmbH", email = "info+github@hochfrequenz.de" },
]
Expand All @@ -27,19 +27,19 @@ classifiers = [
"Operating System :: OS Independent",
"Programming Language :: Python",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
]
dependencies = [
"beautifulsoup4>=4.11.1",
"aiohttp>=3.8.4",
"aiohttp-requests>=0.2.2",
"pypdf>=3.4.1",
"efoli>=1.4.0",
"pytz>=2022.7.1",
"pydantic>=2",
"pytz>=2024.2",
"more_itertools"
]
dynamic = ["readme", "version"]

Expand All @@ -66,8 +66,8 @@ tests = [
"freezegun==1.5.1",
"pytest==8.3.4",
"pytest-asyncio==0.25.0",
"pytest-datafiles==3.0.0",
"pytest-mock==3.14.0"
"pytest-mock==3.14.0",
"syrupy==4.8.0"
]
type_check = [
"mypy==1.14.1",
Expand Down Expand Up @@ -115,9 +115,9 @@ max-line-length = 120
# even if they have no @pytest.mark.asyncio marker.
# https://github.com/pytest-dev/pytest-asyncio#auto-mode
asyncio_mode = "auto"
markers = ["datafiles"]

# the following lines are needed if you would like to build a python package
markers = ["snapshot: mark a test as a snapshot test"]
# the following lines are needed if you would like to build a python package,
# and you want to use semantic versioning
# [build-system]
# requires = ["setuptools>=41.0", "wheel", "setuptools_scm[toml]>=3.4"]
Expand Down
28 changes: 15 additions & 13 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,14 @@
#
# pip-compile pyproject.toml
#
aiohttp[speedups]==3.9.5
# via
# aiohttp-requests
# edi_energy_scraper (pyproject.toml)
aiohttp-requests==0.2.4
aiohttp==3.9.5
# via edi_energy_scraper (pyproject.toml)
aiosignal==1.3.1
# via aiohttp
annotated-types==0.7.0
# via pydantic
attrs==24.2.0
# via aiohttp
beautifulsoup4==4.12.3
# via edi_energy_scraper (pyproject.toml)
brotli==1.1.0
# via aiohttp
coworker==2.0.1
# via aiohttp-requests
efoli==1.4.0
# via edi_energy_scraper (pyproject.toml)
frozenlist==1.5.0
Expand All @@ -28,17 +20,27 @@ frozenlist==1.5.0
# aiosignal
idna==3.10
# via yarl
more-itertools==10.5.0
# via edi_energy_scraper (pyproject.toml)
multidict==6.1.0
# via
# aiohttp
# yarl
propcache==0.2.1
# via yarl
pydantic==2.10.4
# via edi_energy_scraper (pyproject.toml)
pydantic-core==2.27.2
# via pydantic
pypdf==5.1.0
# via edi_energy_scraper (pyproject.toml)
pytz==2024.2
# via
# edi_energy_scraper (pyproject.toml)
# efoli
soupsieve==2.6
# via beautifulsoup4
typing-extensions==4.12.2
# via
# pydantic
# pydantic-core
yarl==1.18.3
# via aiohttp
Loading

0 comments on commit f1321a2

Please sign in to comment.