Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerization of sge #25

Open
punowo opened this issue Feb 15, 2023 · 8 comments
Open

Dockerization of sge #25

punowo opened this issue Feb 15, 2023 · 8 comments

Comments

@punowo
Copy link

punowo commented Feb 15, 2023

Thanks for maintaining this.

This isn't an issue per se ( feel free to close it ), but more of an inquiry since I've been unsuccessfully trying to dockerize this piece of software for testing purposes, following and trying to build upon both https://github.com/kgutwin/simple-sge and https://github.com/gawbul/docker-sge;

Ideally simple-sge is exactly what I need but with some small modifications ( some packages added ) but since I am not able to build a docker image starting from the simple-sge files I've been playing around both trying to dockerize this version of sge and trying to adapt gawbul/docker-sge files to what I need.

I stopped my initial attempt when I couldn't get past yes "" | ./install_qmaster

#FROM ubuntu:latest
FROM jrei/systemd-ubuntu:latest
ENV TERM=xterm SGE_ROOT=/opt/sge
WORKDIR /tmp
COPY . .
RUN apt update
RUN apt install -y dos2unix sudo git build-essential libhwloc-dev libssl-dev libtirpc-dev libmotif-dev libxext-dev libncurses-dev libdb5.3-dev libpam0g-dev pkgconf libsystemd-dev cmake
RUN find . -type f -exec dos2unix {} \;
RUN cmake -S . -B build -DCMAKE_INSTALL_PREFIX=/opt/sge
RUN cmake --build build -j
RUN sudo cmake --install build

USER root
RUN chmod 777 /opt/sge
RUN useradd -r -d /opt/sge sge
RUN chown -R sge /opt/sge
WORKDIR /opt/sge
RUN echo ${SGE_ROOT}
RUN echo | ./install_qmaster
CMD [ "echo 'done'" ]
#RUN yes "" | ./install_execd
#RUN source /opt/sge/default/common/settings.sh
#RUN qhost -q #you should be able to see five lines of output
#RUN qconf -as $HOSTNAME #add this node as submit host

but sometimes later I found out about inst_sge because it was used in the dockerfile of gawbul/docker-sge; Posting the whole file for informational purposes: ( please ignore the docker stuff, I'm trying to expose docker commands to sge, in theory )

FROM phusion/baseimage:bionic-1.0.0

# expose ports
EXPOSE 6444
EXPOSE 6445
EXPOSE 6446

# set environment variables
ARG HOME=/root

# regenerate host ssh keys
RUN /etc/my_init.d/00_regen_ssh_host_keys.sh

# install required software
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update -y \
    && apt-get install -y wget sudo bsd-mailx tcsh db5.3-util libhwloc5 libmunge2 libxm4 libjemalloc1 xterm openjdk-8-jre-headless \
    && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# turn off password requirement for sudo groups users
RUN sed -i "s/^\%sudo\tALL=(ALL:ALL)\sALL/%sudo ALL=(ALL) NOPASSWD:ALL/" /etc/sudoers

# Download and install debian packages from Son of Grid Engine
RUN wget --no-check-certificate https://master.dl.sourceforge.net/project/gridengine/SGE/releases/8.1.9/sge-common_8.1.9_all.deb -P /root/
RUN wget --no-check-certificate https://master.dl.sourceforge.net/project/gridengine/SGE/releases/8.1.9/sge-doc_8.1.9_all.deb -P /root/
RUN wget --no-check-certificate https://master.dl.sourceforge.net/project/gridengine/SGE/releases/8.1.9/sge_8.1.9_amd64.deb -P /root/
RUN dpkg -i /root/*.deb

# Refresh local apt keys & update
RUN apt-key adv --refresh-keys --keyserver keyserver.ubuntu.com \
    && apt-get update -qq

# Install docker
RUN apt-get remove docker docker-engine docker.io containerd runc -y
RUN apt-get update -y
RUN apt-get install ca-certificates curl gnupg lsb-release -y
RUN mkdir -m 0755 -p /etc/apt/keyrings
RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
RUN echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
    $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
RUN apt-get update -y
RUN apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y

# add files to container from local directory
ADD sge_auto_install.conf /root/sge_auto_install.conf
ADD docker_sge_init.sh /etc/my_init.d/01_docker_sge_init.sh
ADD sge_exec_host.conf /root/sge_hostgrp.conf
ADD sge_exec_host.conf /root/sge_exec_host.conf
ADD sge_queue.conf /root/sge_queue.conf
RUN chmod ug+x /etc/my_init.d/01_docker_sge_init.sh

# setup SGE env
ARG SGE_ROOT=/opt/sge
ARG SGE_CELL=default
RUN ln -s $SGE_ROOT/$SGE_CELL/common/settings.sh /etc/profile.d/sge_settings.sh

# install SGE
RUN useradd -r -m -U -G sudo -d /home/sgeuser -s /bin/bash -c "Docker SGE user" sgeuser
RUN cd $SGE_ROOT && ./inst_sge -m -x -s -auto $HOME/sge_auto_install.conf \
    && sleep 10 \
    && /etc/init.d/sgemaster.docker-sge restart \
    && /etc/init.d/sgeexecd.docker-sge restart \
    && sed -i "s/HOSTNAME/`hostname`/" $HOME/sge_exec_host.conf \
    && sed -i "s/HOSTNAME/`hostname`/" $HOME/sge_hostgrp.conf \
    && /opt/sge/bin/lx-amd64/qconf -au sgeuser arusers \
    && /opt/sge/bin/lx-amd64/qconf -Me $HOME/sge_exec_host.conf \
    && /opt/sge/bin/lx-amd64/qconf -Aq $HOME/sge_queue.conf

# clean up
RUN rm /root/*.deb

ADD ./entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

USER sgeuser
ENTRYPOINT ["/entrypoint.sh"]

I cannot seem to be able to add a non-root user or start as a non-root user in the container started from an image based on phusion/baseimage ( phusion/baseimage-docker#617 ), I do not know if the documentation they have on github also applies to ubuntu:bionic and I cannot understand what this command does from simple-sge files: ( which I am not able to build )

# allow root to qsub (yes, it's a security hole but it simplifies the container)
update_conf "/min_/s/100/0/; s/posix_compliant/unix_behavior/" conf

and since I cannot use qsub as root I'm thinking about trying to dockerize daimh/sge again but this time using inst_sge some.conf instead of yes "" | ./install_qmaster and similar scripts. gawbul/docker-sge@06145c5

Do you have any knowledge about someone else attempting to do this ? What other problems might I encounter ? Am I doing something wrong ? Again, I know sge isn't supposed to run in a container, it's just for testing purposes.

Thank you, have a nice day.

EDIT: added a few links + fixes

@punowo
Copy link
Author

punowo commented Feb 15, 2023

I've managed to build simple-sge. ( https://github.com/kgutwin/simple-sge )

FROM ubuntu:16.04

# SET UP THE REPOSITORY
# Update the apt package index and install packages to allow apt to use a repository over HTTPS:
RUN apt-get update -y 
RUN apt-get remove -y docker docker-engine docker.io containerd runc
RUN apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common

# Add Docker’s official GPG key:
RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -

# Verify that you now have the key with the fingerprint 9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88, 
# by searching for the last 8 characters of the fingerprint.
RUN  apt-key fingerprint 0EBFCD88

# Use the following command to set up the stable repository. 
RUN add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
    $(lsb_release -cs) \
    stable"

# Update the apt package index, and install the latest version of Docker Engine and containerd, 
# or go to the next step to install a specific version:
RUN apt-get update -y
RUN apt-get install -y docker-ce docker-ce-cli containerd.io

ADD install-sge.sh /root/install-sge.sh
ADD boot-sge.sh /root/boot-sge.sh

RUN apt-get update -y && apt-get upgrade -y && bash /root/install-sge.sh
ENV PATH="${PATH}:/opt/sge/bin/lx-amd64"

ENTRYPOINT ["/root/boot-sge.sh"

but I get this error:

sgeadmin@08d1087ab8af:/opt/sge/bin/lx-amd64$ ./qstat -f
error: cell directory "/opt/sge/default" doesn't exist

@daimh
Copy link
Owner

daimh commented Feb 17, 2023

Thanks for trying this repo!

I noticed that your code is

RUN echo | ./install_qmaster

can you please change it to the line below and try it again?

RUN yes "" | ./install_qmaster

@punowo
Copy link
Author

punowo commented Feb 20, 2023

Thanks for trying this repo!

I noticed that your code is

RUN echo | ./install_qmaster

can you please change it to the line below and try it again?

RUN yes "" | ./install_qmaster

RUN yes "" didn't work; it was the reason I was attempting to find something else, hence the echo which acts like "ENTER" because of the new line it outputs. Looking back I probably should have had an instance outside docker and used it for some time before attempting this, but I needed an weird environment that made qsub and qstat available. I cannot go on ahead looking at this right now but I'll post a zip of the my current working effort based on stuff found on github in the case anyone stumbles upon this.
sge.zip ( please be aware that this was for testing purposes and it's probably done in a bad manner; there's also some redundant ssh stuff )

@daimh
Copy link
Owner

daimh commented Mar 5, 2023

'Yes "" | ./install_qmaster ' in docker failed at the last step. Error message is

System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
sgeqmaster start problem

This is a both weird and interesting use case, I am not sure if you can achieve your goal. For example, If you only need to run 'qsub/qstat', your don't need to run qmaster inside docker.

@daimh
Copy link
Owner

daimh commented Mar 5, 2023

Further, after 'cmake install..', you can run 'qsub/qstat' as long as you copied the /opt/sge/default/common from the master node, and set environment variable SGE_ROOT.

Please post an update. I am interested as I do see the value of your 'weird environment'. :D

@punowo
Copy link
Author

punowo commented Mar 13, 2023

Sorry I wasn't able to get back on this, I've been moved to something else and I'm no longer able to touch this setup.

@daimh
Copy link
Owner

daimh commented Mar 19, 2023

Thanks for the update!

@daimh daimh closed this as completed Mar 19, 2023
@ml-evs
Copy link

ml-evs commented Sep 20, 2024

I just wanted to necro this issue (please feel free to tell me to make another!) as I've been going through a similar process of containerising this version of SGE for our test runner at https://github.com/Matgenix/jobflow-remote. I have a multi-stage build that switches out either Slurm or SGE (and probably others in the future). I hit the same point as the above story, i.e., failing to launch sgemaster.

Some differences:

  • I'm using a ubuntu 24.04 base without systemd, so I manually run sgemaster

  • I have to run ./install_qmaster in the container entrypoint, not at build time, so that the docker hostname is recognised. Ideally I would just use localhost and configure things at build time but this seemed to be non-trivial! (I was hoping to get it working at container runtime then try to manipulate the configs without using the helper scripts)

  • yes "" | ./install_qmaster works as root (via sudo), with the only error being that it cannot launch directly with systemd. When I then manually run sgemaster start I hit the message above:

    jobflow@993181bbdd7b:~$ /opt/sge/default/common/sgemaster start
       Starting Grid Engine qmaster
    
    sge_qmaster start problem
    
    sge_qmaster didn't start!
    

    This is after sourcing the default settings. Any ideas on where to begin debugging this? If I can get this working, I'd be happy to contribute a dockerized build back to the CI of this project.

EDIT: This warned me off trying to run ./install_qmaster during the build step:

0.296 Unsupported local hostname
0.296 --------------------------
0.296
0.296 The current hostname is resolved as follows:
0.296
0.298 Hostname: localhost
0.298 Aliases: buildkitsandbox
0.298 Host Address(es): 127.0.0.1
0.299
0.299 It is not supported for a Grid Engine installation that the local hostname
0.299 contains the hostname "localhost" and/or the IP address "127.0.x.x" of the
0.299 loopback interface.
0.299 The "localhost" hostname should be reserved for the loopback interface
0.299 ("127.0.0.1") and the real hostname should be assigned to one of the
0.299 physical or logical network interfaces of this machine.

Thanks for the hard work packaging up SGE in this way, outside of Docker this works nicely for us!

@daimh daimh reopened this Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants