From 22932385df3f5df216dcc510b4cdecad343f7642 Mon Sep 17 00:00:00 2001 From: Dongjoon Hyun Date: Mon, 20 Nov 2023 21:41:35 -0800 Subject: [PATCH] [SPARK-46020][INFRA] Add `Python 3.12` to Infra docker image ### What changes were proposed in this pull request? This PR aims to add `Python 3.12` to Infra docker images. Note that `Python 3.12` has a breaking change in the installation. - `distutils` module itself is removed at Python 3.12 via [PEP-632](https://peps.python.org/pep-0632) in favor of `packaging` package. - Apache Spark 4.0.0 is ready for Python 3.12 via SPARK-45390 by removing `distutils` usages - https://github.com/apache/spark/pull/43192 - However, some 3rd party packages are not ready for Python 3.12. So, this PR skips those kind of packages. ### Why are the changes needed? This PR is a preparation to add a daily `Python 3.12` GitHub Action job later for Apache Spark 4.0.0. As of today, Apache Spark 4.0.0 has Python 3.8 ~ Python 3.11 test coverage. - Python 3.9 (Main) - https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml - PyPy3.8, Python 3.10, Python 3.11 (Daily) - https://github.com/apache/spark/actions/workflows/build_python.yml ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ``` $ docker run -it --rm ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-6939290578 python3.12 --version Python 3.12.0 $ docker run -it --rm ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-6939290578 python3.12 -m pip freeze alembic==1.12.1 blinker==1.7.0 certifi==2019.11.28 chardet==3.0.4 charset-normalizer==3.3.2 click==8.1.7 cloudpickle==2.2.1 contourpy==1.2.0 coverage==7.3.2 cycler==0.12.1 databricks-cli==0.18.0 dbus-python==1.2.16 distro-info==0.23+ubuntu1.1 docker==6.1.3 entrypoints==0.4 et-xmlfile==1.1.0 Flask==3.0.0 fonttools==4.45.0 gitdb==4.0.11 GitPython==3.1.40 googleapis-common-protos==1.56.4 greenlet==3.0.1 gunicorn==21.2.0 idna==2.8 importlib-metadata==6.8.0 itsdangerous==2.1.2 Jinja2==3.1.2 joblib==1.3.2 kiwisolver==1.4.5 lxml==4.9.3 Mako==1.3.0 Markdown==3.5.1 MarkupSafe==2.1.3 matplotlib==3.8.2 mlflow==2.8.1 numpy==1.26.2 oauthlib==3.2.2 openpyxl==3.1.2 packaging==23.2 pandas==2.1.3 Pillow==10.1.0 plotly==5.18.0 protobuf==4.25.1 pyarrow==14.0.1 PyGObject==3.36.0 PyJWT==2.8.0 pyparsing==3.1.1 python-apt==2.0.1+ubuntu0.20.4.1 python-dateutil==2.8.2 pytz==2023.3.post1 PyYAML==6.0.1 querystring-parser==1.2.4 requests==2.31.0 requests-unixsocket==0.2.0 scikit-learn==1.3.2 scipy==1.11.4 setuptools==45.2.0 six==1.14.0 smmap==5.0.1 SQLAlchemy==2.0.23 sqlparse==0.4.4 tabulate==0.9.0 tenacity==8.2.3 threadpoolctl==3.2.0 typing_extensions==4.8.0 tzdata==2023.3 unattended-upgrades==0.1 unittest-xml-reporting==3.2.0 urllib3==2.1.0 websocket-client==1.6.4 Werkzeug==3.0.1 wheel==0.34.2 zipp==3.17.0 ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43922 from dongjoon-hyun/SPARK-46020. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/infra/Dockerfile | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 141c079f3938e..5cf492ad86330 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -127,3 +127,12 @@ RUN python3.11 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57' ' RUN python3.11 -m pip install 'torch<=2.0.1' torchvision --index-url https://download.pytorch.org/whl/cpu RUN python3.11 -m pip install torcheval RUN python3.11 -m pip install deepspeed + +# Install Python 3.12 at the last stage to avoid breaking the existing Python installations +RUN add-apt-repository ppa:deadsnakes/ppa +RUN apt-get update && apt-get install -y \ + python3.12 python3.12-distutils \ + && rm -rf /var/lib/apt/lists/* +RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12 +RUN python3.12 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.8.1' coverage matplotlib openpyxl 'scikit-learn>=1.3.2' +RUN python3.12 -m pip install 'protobuf==4.25.1' 'googleapis-common-protos==1.56.4'