Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI for Singularity Image #7

Closed
arpita0911patel opened this issue Feb 13, 2024 · 18 comments · Fixed by #13
Closed

Add CI for Singularity Image #7

arpita0911patel opened this issue Feb 13, 2024 · 18 comments · Fixed by #13
Assignees
Labels
enhancement New feature or request

Comments

@arpita0911patel
Copy link
Member

No description provided.

@arpita0911patel arpita0911patel added the enhancement New feature or request label Feb 13, 2024
@benlee0423
Copy link
Contributor

benlee0423 commented Feb 28, 2024

  1. merged singularity directory into main branch
  2. Apply docker changes into singularity directory in docker_changes branch. ( TO DO)

@benlee0423
Copy link
Contributor

benlee0423 commented Feb 29, 2024

in line 32 singularity_ngen.def
cp /opt/ohpc/admin/modulefiles/spack /apps/modulesfiles/all

This gives permission for all users to access modules.

@benlee0423
Copy link
Contributor

In install_netcdf_cxx.sh
https://api.github.com/repos/Unidata/netcdf-cxx4/releases/latest

Verify
BOOST_VERSION=1.79.0

@benlee0423
Copy link
Contributor

Singularity> cat /usr/include/boost/version.hpp | grep "BOOST_LIB_VERSION"
// BOOST_LIB_VERSION must be defined to be the same as BOOST_VERSION
#define BOOST_LIB_VERSION "1_75"

@benlee0423
Copy link
Contributor

benlee0423 commented Mar 5, 2024

singularity run --bind /home/ubuntu/workspace/input/AWI_004:/ngen/ngen/data ngen.sif /ngen/ngen/data
Select an option (type a number): 
1) Run NextGen model framework in serial mode	 3) Run Bash shell
2) Run NextGen model framework in parallel mode	 4) Exit
#? 2


Selected files:
Catchment: ./config/datastream.gpkg
Nexus: ./config/datastream.gpkg
Realization: ./config/realization.json

/ngen/HelloNGEN.sh: line 56: /dmod/bin/partitionGenerator: No such file or directory
Singularity> ls -lh /dmod/bin
total 0
lrwxrwxrwx 1 root root 24 Mar  1 05:18 ngen-parallel -> /ngen/parallelbuild/ngen
lrwxrwxrwx 1 root root 22 Mar  1 05:18 ngen-serial -> /ngen/serialbuild/ngen
lrwxrwxrwx 1 root root 38 Mar  1 05:18 partitionGenerator -> /ngen/parallelbuild/partitionGenerator
Singularity> ls -lh /ngen/parallelbuild/partitionGenerator
ls: cannot access '/ngen/parallelbuild/partitionGenerator': No such file or directory

@benlee0423
Copy link
Contributor

Getting an build error because boost version.
Unpacking objects: 100% (14/14), 3.71 KiB | 1.24 MiB/s, done.
From https://github.com/csdms/bmi-example-c

  • branch 93e9ef960d2e4d5132fba638e445379a20ac0259 -> FETCH_HEAD

Currently Loaded Modules:

  1. mpi/openmpi-x86_64

CMake Error at CMakeLists.txt:44 (find_path):
Could not find NETCDF_MODULE_DIR using the following files: netcdf.mod

gmake: Makefile: No such file or directory
gmake: *** No rule to make target 'Makefile'. Stop.
chmod: cannot access '/dmod/bin/*': No such file or directory
CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find Boost: Found unsuitable version "1.75.0", but required is at
least "1.79.0" (found /usr/include, )
Call Stack (most recent call first):
/usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:592 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake/Modules/FindBoost.cmake:2344 (find_package_handle_standard_args)
CMakeLists.txt:168 (find_package)

gmake: Makefile: No such file or directory
gmake: *** No rule to make target 'Makefile'. Stop.
CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find Boost: Found unsuitable version "1.75.0", but required is at
least "1.79.0" (found /usr/include, )
Call Stack (most recent call first):
/usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:592 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake/Modules/FindBoost.cmake:2344 (find_package_handle_standard_args)
CMakeLists.txt:168 (find_package)

gmake: Makefile: No such file or directory
gmake: *** No rule to make target 'Makefile'. Stop.
make: *** No rule to make target 'partitionGenerator'. Stop.

  • rm -rf /tmp/ngen /tmp/t-route /tmp/netcdf /tmp/extern /tmp/guide
    INFO: Adding runscript
    INFO: Creating SIF file...
    INFO: Build complete: ciroh-ngen-singularity_latest.sif

@benlee0423
Copy link
Contributor

benlee0423 commented Mar 5, 2024

Getting an error with 004 input with newly built singularity image.

Initializing formulations
[ {
name : bmi_c++,
params : {
allow_exceed_end_time : true,
fixed_time_step : false,
init_config : /dev/null,
library_file : /dmod/shared_libs/libslothmodel.so,
main_output_variable : z,
model_params : {
EVAPOTRANS : 0,
sloth_ice_fraction_schaake(1,double,m,node) : 0,
sloth_ice_fraction_xinanjiang(1,double,1,node) : 0,
sloth_smp(1,double,1,node) : 0,
},
model_type_name : SLOTH,
name : bmi_c++,
registration_function : none,
uses_forcing_file : false,
},
},
{
name : bmi_c,
params : {
allow_exceed_end_time : true,
fixed_time_step : false,
init_config : ./config/config.ini,
library_file : /dmod/shared_libs/libcfebmi.so.1.0.0,
main_output_variable : Q_OUT,
model_params : {
Cgw : 0.000460921,
Klf : 0.16817,
Kn : 0.401787,
b : 8.66053,
expon : 7.30882,
max_gw_storage : 0.0402199,
maxsmc : 0.543673,
refkdt : 3.66134,
satdk : 0.000117609,
slope : 0.815479,
},
model_type_name : CFE,
name : bmi_c,
registration_function : register_bmi_cfe,
uses_forcing_file : false,
variables_names_map : {
atmosphere_water__liquid_equivalent_precipitation_rate : precip_rate,
ice_fraction_schaake : sloth_ice_fraction_schaake,
ice_fraction_xinanjiang : sloth_ice_fraction_xinanjiang,
soil_moisture_profile : sloth_smp,
water_potential_evaporation_flux : EVAPOTRANS,
},
},
},
]
Not Using Routing
Building Feature Index
Catchment topology is dendritic.
Running Models
Running timestep 0
Too many open files
Couldn't open file "/usr/share/udunits/udunits2.xml"

@benlee0423
Copy link
Contributor

In Serial run, I am getting the following error.

Schaake Magic Constant calculated
All CFE config params present
GIUH ordinates string value found in config ('1.00,0.00')
Counted number of GIUH ordinates (2)
Finished function parsing CFE config
At declaration of smc_profile size, soil_reservoir.n_soil_layers = 0
terminate called after throwing an instance of 'std::runtime_error'
  what():  Errno 24 (Too many open files) opening ./forcings/cat-1490610.csv
/ngen/HelloNGEN.sh: line 134:  1006 Aborted                 (core dumped) $run_command

real	5m7.275s
user	4m37.435s
sys	0m20.575s

@benlee0423
Copy link
Contributor

Too many open files is due to the following setting in HelloGEN.sh

# Increasing `ulimit` to Open files
ulimit -n 10000

@benlee0423
Copy link
Contributor

Parallel run gets an error with ulimit unlimited

mpirun noticed that process rank 1 with PID 0 on node ip-172-31-71-136 exited on signal 6 (Aborted).

@benlee0423
Copy link
Contributor

benlee0423 commented Mar 6, 2024

~/workspace/Ngen-Singularity/singularity$ singularity run --bind /home/ubuntu/workspace/input/AWI_004:/ngen/ngen/data ciroh-ngen-singularity_latest.sif /ngen/ngen/data

/ngen/HelloNGEN.sh: line 12: ulimit: open files: cannot modify limit: Operation not permitted

ulimit -n max is 1000000.
Set in HelloGEN.sh

ulimit -n 1000000

@benlee0423
Copy link
Contributor

Can we use the same HelloNGEN.sh in both Docker and Singularity?
It does same except ulimit -n 1000000.

@benlee0423
Copy link
Contributor

PR #10 merged

@benlee0423
Copy link
Contributor

benlee0423 commented Mar 20, 2024

Getting the following error

NGen Framework 0.1.0
NGen Framework 0.1.0
terminate called after throwing an instance of 'pybind11::error_already_set'
  what():  ModuleNotFoundError: No module named 'numpy'
terminate called after throwing an instance of 'pybind11::error_already_set'
  what():  ModuleNotFoundError: No module named 'numpy'

And, this is what I got from shell.

Singularity> pip3 install numpy
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: numpy in /usr/local/lib64/python3.9/site-packages (1.26.4)
WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /home/ubuntu/.local/lib/python3.9/site-packages
sysconfig: /home/ubuntu/.local/lib64/python3.9/site-packages
WARNING: Additional context:
user = True
home = None
root = None
prefix = None

@benlee0423
Copy link
Contributor

module show mpi

-------------------------------------------------------------------------------------------------------------------------------------------------------------
  /usr/share/modulefiles/mpi/openmpi-x86_64:
-------------------------------------------------------------------------------------------------------------------------------------------------------------
conflict("mpi")
prepend_path("PATH","/usr/lib64/openmpi/bin")
prepend_path("LD_LIBRARY_PATH","/usr/lib64/openmpi/lib")
prepend_path("PKG_CONFIG_PATH","/usr/lib64/openmpi/lib/pkgconfig")
prepend_path("MANPATH",":/usr/share/man/openmpi-x86_64")
setenv("MPI_BIN","/usr/lib64/openmpi/bin")
setenv("MPI_SYSCONFIG","/etc/openmpi-x86_64")
setenv("MPI_FORTRAN_MOD_DIR","/usr/lib64/gfortran/modules/openmpi")
setenv("MPI_INCLUDE","/usr/include/openmpi-x86_64")
setenv("MPI_LIB","/usr/lib64/openmpi/lib")
setenv("MPI_MAN","/usr/share/man/openmpi-x86_64")
setenv("MPI_PYTHON3_SITEARCH","/usr/lib64/python3.9/site-packages/openmpi")
setenv("MPI_COMPILER","openmpi-x86_64")
setenv("MPI_SUFFIX","_openmpi")
setenv("MPI_HOME","/usr/lib64/openmpi")

@benlee0423
Copy link
Contributor

Run command in terminal

singularity run --bind /home/ubuntu/workspace/AWI_09_004:/ngen/ngen/data ciroh-ngen-singularity.sif "/ngen/ngen/data auto"

Run command inside running image

mpirun --allow-run-as-root -n 2 /dmod/bin/ngen-parallel ./config/datastream.gpkg all ./config/datastream.gpkg all ./config/realization.json ./partitions_2.json 

@benlee0423
Copy link
Contributor

benlee0423 commented Mar 20, 2024

This task is blocked by issue 12

@benlee0423
Copy link
Contributor

unblocked by using ngen commit id f91e2ea

@benlee0423 benlee0423 linked a pull request Apr 6, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants