Question about the dgl package under the cuda version #94
-
I encountered a problem when I was running the Current run is terminating due to exception: [13:45:09] /opt/dgl/src/runtime/c_runtime_api.cc:82: Check failed: allow_missing:
Device API cuda is not enabled. Please install the cuda version of dgl.
Stack trace:
[bt] (0) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN4dmlc15LogMessageFatalD2
Ev+0x75) [0x7feb10fed8f5]
[bt] (1) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl7runtime16DeviceAPIM
anager6GetAPIESsb+0x202) [0x7feb1135ca92]
[bt] (2) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl7runtime9DeviceAPI3G
etE10DGLContextb+0x1e1) [0x7feb11359071]
[bt] (3) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl7runtime7NDArray5Emp
tyESt6vectorIlSaIlEE11DGLDataType10DGLContext+0x13b) [0x7feb1137454b]
[bt] (4) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZNK3dgl7runtime7NDArray6Co
pyToERK10DGLContext+0xc3) [0x7feb113aed53]
[bt] (5) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl9UnitGraph6CopyToESt
10shared_ptrINS_15BaseHeteroGraphEERK10DGLContext+0x3ff) [0x7feb114bc24f]
[bt] (6) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl11HeteroGraph6CopyTo
ESt10shared_ptrINS_15BaseHeteroGraphEERK10DGLContext+0xf6) [0x7feb113bb5d6]
[bt] (7) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(+0x51b396) [0x7feb113ca396]
[bt] (8) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7feb11
3582a8]
Engine run is terminating due to exception: [13:45:09] /opt/dgl/src/runtime/c_runtime_api.cc:82: Check failed: allow_missing:
Device API cuda is not enabled. Please install the cuda version of dgl.
Stack trace:
[bt] (0) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN4dmlc15LogMessageFatalD2
Ev+0x75) [0x7feb10fed8f5]
[bt] (1) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl7runtime16DeviceAPIM
anager6GetAPIESsb+0x202) [0x7feb1135ca92]
[bt] (2) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl7runtime9DeviceAPI3G
etE10DGLContextb+0x1e1) [0x7feb11359071]
[bt] (3) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl7runtime7NDArray5Emp
tyESt6vectorIlSaIlEE11DGLDataType10DGLContext+0x13b) [0x7feb1137454b]
[bt] (4) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZNK3dgl7runtime7NDArray6Co
pyToERK10DGLContext+0xc3) [0x7feb113aed53]
[bt] (5) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl9UnitGraph6CopyToESt
10shared_ptrINS_15BaseHeteroGraphEERK10DGLContext+0x3ff) [0x7feb114bc24f]
[bt] (6) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl11HeteroGraph6CopyTo
ESt10shared_ptrINS_15BaseHeteroGraphEERK10DGLContext+0xf6) [0x7feb113bb5d6]
[bt] (7) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(+0x51b396) [0x7feb113ca396]
[bt] (8) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7feb11
3582a8]
Traceback (most recent call last):
File "/home/awp_Admin_ZixCQwyA/gxy/matbench-discovery-1.1.1/models/alignn/train_alignn2.py", line 178, in <module>
train_hist = train_dgl(
^^^^^^^^^^
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/alignn/train.py", line 1128, in train_dgl
trainer.run(train_loader, max_epochs=config.epochs)
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/ignite/engine/engine.py", line 906, in run
return self._internal_run()
^^^^^^^^^^^^^^^^^^^^
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/ignite/engine/engine.py", line 949, in _inte
rnal_run
return next(self._internal_run_generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/ignite/engine/engine.py", line 1007, in _int
ernal_run_as_gen
self._handle_exception(e)
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/ignite/engine/engine.py", line 645, in _hand
le_exception
raise e
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/ignite/engine/engine.py", line 973, in _inte
rnal_run_as_gen
epoch_time_taken += yield from self._run_once_on_dataset_as_gen()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/ignite/engine/engine.py", line 1112, in _run
_once_on_dataset_as_gen
self._handle_exception(e)
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/ignite/engine/engine.py", line 645, in _hand
le_exception
raise e
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/ignite/engine/engine.py", line 1089, in _run
_once_on_dataset_as_gen
self.state.output = self._process_function(self, self.state.batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/ignite/engine/__init__.py", line 113, in upd
ate
x, y = prepare_batch(batch, device=device, non_blocking=non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/alignn/graphs.py", line 678, in prepare_line
_graph_batch
g.to(device, non_blocking=non_blocking),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/heterograph.py", line 5709, in to
ret._graph = self._graph.copy_to(utils.to_dgl_context(device))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/heterograph_index.py", line 255, in copy
_to
return _CAPI_DGLHeteroCopyTo(self, ctx.device_type, ctx.device_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "dgl/_ffi/_cython/./function.pxi", line 295, in dgl._ffi._cy3.core.FunctionBase.__call__
File "dgl/_ffi/_cython/./function.pxi", line 227, in dgl._ffi._cy3.core.FuncCall
File "dgl/_ffi/_cython/./function.pxi", line 217, in dgl._ffi._cy3.core.FuncCall3
dgl._ffi.base.DGLError: [13:45:09] /opt/dgl/src/runtime/c_runtime_api.cc:82: Check failed: allow_missing: Device API cuda is n
ot enabled. Please install the cuda version of dgl.
Stack trace:
[bt] (0) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN4dmlc15LogMessageFatalD2
Ev+0x75) [0x7feb10fed8f5]
[bt] (1) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl7runtime16DeviceAPIM
anager6GetAPIESsb+0x202) [0x7feb1135ca92]
[bt] (2) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl7runtime9DeviceAPI3G
etE10DGLContextb+0x1e1) [0x7feb11359071]
[bt] (3) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl7runtime7NDArray5Emp
tyESt6vectorIlSaIlEE11DGLDataType10DGLContext+0x13b) [0x7feb1137454b]
[bt] (4) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZNK3dgl7runtime7NDArray6Co
pyToERK10DGLContext+0xc3) [0x7feb113aed53]
[bt] (5) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl9UnitGraph6CopyToESt
10shared_ptrINS_15BaseHeteroGraphEERK10DGLContext+0x3ff) [0x7feb114bc24f]
[bt] (6) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(_ZN3dgl11HeteroGraph6CopyTo
ESt10shared_ptrINS_15BaseHeteroGraphEERK10DGLContext+0xf6) [0x7feb113bb5d6]
[bt] (7) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(+0x51b396) [0x7feb113ca396]
[bt] (8) /home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7feb11
3582a8] Then, I tried to install the dgl package under the cuda 12.1 version, but the cuda version on my server is 12.0, so it seems that there is still a problem occurred: Traceback (most recent call last):
File "/home/awp_Admin_ZixCQwyA/gxy/matbench-discovery-1.1.1/models/alignn/train_alignn2.py", line 11, in <module>
from alignn.config import TrainingConfig
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/alignn/config.py", line 9, in <module>
from alignn.models.modified_cgcnn import CGCNNConfig
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/alignn/models/modified_cgcnn.py", line 5, in
<module>
import dgl
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/__init__.py", line 14, in <module>
from .backend import backend_name, load_backend # usort: skip
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/backend/__init__.py", line 122, in <modu
le>
load_backend(get_preferred_backend())
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/backend/__init__.py", line 51, in load_b
ackend
from .._ffi.base import load_tensor_adapter # imports DGL C library
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/_ffi/base.py", line 50, in <module>
_LIB, _LIB_NAME, _DIR_NAME = _load_lib()
^^^^^^^^^^^
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/site-packages/dgl/_ffi/base.py", line 39, in _load_lib
lib = ctypes.CDLL(lib_path[0])
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/awp_Admin_ZixCQwyA/miniconda3/envs/mat/lib/python3.11/ctypes/__init__.py", line 376, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libcudart.so.12: cannot open shared object file: No such file or directory Considering the problems above, I wonder have you ever encountered this problem? Or have you ever resolved these problems? Since the cuda versions that support DGL only include 11.6, 11.7, 11.8, 12.1. Thank you!!! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
i've seen similar errors in the past and like you said, it usually points at a CUDA-torch (or in this case DGL) version mismatch. i think you can either update your CUDA to 12.1 or try build DGL from source with your current CUDA. maybe @pbenner has some better advice |
Beta Was this translation helpful? Give feedback.
-
DGL is sometimes quite tricky. There is a good description on the alignn repo: But I also have the requirements file for training ALIGNN-FF here: Hope this helps! |
Beta Was this translation helpful? Give feedback.
DGL is sometimes quite tricky. There is a good description on the alignn repo:
https://github.com/usnistgov/alignn
But I also have the requirements file for training ALIGNN-FF here:
https://github.com/pbenner/matbench-discovery/blob/alignn/models/alignn/alignn-ff-requirements.txt
Hope this helps!