Skip to content

Commit

Permalink
[SPARK-50477][INFRA][FOLLOW-UP] Python 3.9 testing image clean up
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
1, `Python 3.9` was installed twice;
2, add `apt-get autoremove` and `apt-get clean`;
3, explicitly install `tzdata` which is needed for timezone related test (it was installed with python 3.9)

### Why are the changes needed?
clean up

### Does this PR introduce _any_ user-facing change?
no, infra-only

### How was this patch tested?
PR builder with
```
default: '{"PYSPARK_IMAGE_TO_TEST": "python-309", "PYTHON_TO_TEST": "python3.9"}'
```

https://github.com/zhengruifeng/spark/runs/34168664848

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #49123 from zhengruifeng/py_image_309_followup.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
  • Loading branch information
zhengruifeng committed Dec 10, 2024
1 parent a89fcfc commit 5f34af9
Showing 1 changed file with 8 additions and 10 deletions.
18 changes: 8 additions & 10 deletions dev/spark-test-image/python-309/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For PySpark wi
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
LABEL org.opencontainers.image.version=""

ENV FULL_REFRESH_DATE 20241119
ENV FULL_REFRESH_DATE 20241205

ENV DEBIAN_FRONTEND noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN true
Expand All @@ -51,29 +51,27 @@ RUN apt-get update && apt-get install -y \
libtiff5-dev \
libxml2-dev \
openjdk-17-jdk-headless \
pandoc \
pkg-config \
qpdf \
tzdata \
software-properties-common \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
zlib1g-dev

# Install Python 3.9
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update && apt-get install -y \
python3.9 python3.9-distutils \
python3.9 \
python3.9-distutils \
&& apt-get autoremove --purge -y \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

ARG BASIC_PIP_PKGS="numpy pyarrow>=18.0.0 six==1.16.0 pandas==2.2.3 scipy plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
# Python deps for Spark Connect
ARG CONNECT_PIP_PKGS="grpcio==1.67.0 grpcio-status==1.67.0 protobuf==5.28.3 googleapis-common-protos==1.65.0 graphviz==0.20.3"

# Install Python 3.9
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update && apt-get install -y \
python3.9 python3.9-distutils \
&& rm -rf /var/lib/apt/lists/*
# Install Python 3.9 packages
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
RUN python3.9 -m pip install --ignore-installed blinker>=1.6.2 # mlflow needs this
RUN python3.9 -m pip install --force $BASIC_PIP_PKGS unittest-xml-reporting $CONNECT_PIP_PKGS && \
Expand Down

0 comments on commit 5f34af9

Please sign in to comment.