Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: cublas runtime error : the GPU program failed to execute at /tmp/pip-req-build-jh50bw28/aten/src/THC/THCBlas.cu:259 #25

Open
Jiaqi-Chen-00 opened this issue Jul 21, 2024 · 0 comments

Comments

@Jiaqi-Chen-00
Copy link

RuntimeError: cublas runtime error : the GPU program failed to execute at /tmp/pip-req-build-jh50bw28/aten/src/THC/THCBlas.cu:259

When I run:

CUDA_VISIBLE_DEVICES=0 xvfb-run -a -s "-screen 0 800x600x24" python main.py out/total3d/20110611514267/out_config.yaml --mode demo --demo_path demo/inputs/1

My environment:

(Im3D) xxx@viscam4:~/projects/ig_llm/rtx_3090/Implicit3DUnderstanding$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

(Im3D) xxx@viscam4:~/projects/ig_llm/rtx_3090/Implicit3DUnderstanding$ conda list torch
# packages in environment at /viscam/u/xxx/anaconda3/envs/Im3D:
#
# Name                    Version                   Build  Channel
pytorch                   1.1.0           cuda100py36he554f03_0  
torchvision               0.3.0           cuda100py36h72fc40a_0

(Im3D) xxx@viscam4:~/projects/ig_llm/rtx_3090/Implicit3DUnderstanding$ nvidia-smi
Sun Jul 21 14:15:57 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04    Driver Version: 515.43.04    CUDA Version: 11.7     |

(Im3D) xxx@viscam4:~/projects/ig_llm/rtx_3090/Implicit3DUnderstanding$ gpustat
viscam4.stanford.edu        Sun Jul 21 14:16:15 2024  515.43.04
[0] NVIDIA GeForce RTX 3090 | 37'C,   0 % |   308 / 24576 MB |

Bug (shown after a long time loading):

Begin to resume from the last checkpoint.
Loading checkpoint from out/total3d/20110611514267/model_best.pth.
Warning: Could not find epoch in checkpoint!
Warning: Could not find min_loss in checkpoint!
Warning: Could not find step in checkpoint!
set() subnet missed.
Checkpoint out/total3d/20110611514267/model_best.pth resumed.

Loading data.
Traceback (most recent call last):
  File "main.py", line 42, in <module>
    demo.run(cfg)
  File "/viscam/projects/inv_engine/xxx/ig_llm/rtx_3090/Implicit3DUnderstanding/demo.py", line 147, in run
    est_data = net(data)
  File "/viscam/u/xxx/anaconda3/envs/Im3D/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/viscam/projects/inv_engine/xxx/ig_llm/rtx_3090/Implicit3DUnderstanding/models/total3d/modules/network.py", line 112, in forward
    data['split'], data['rel_pair_counts'])
  File "/viscam/u/xxx/anaconda3/envs/Im3D/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/viscam/projects/inv_engine/xxx/ig_llm/rtx_3090/Implicit3DUnderstanding/models/total3d/modules/object_detection.py", line 103, in forward
    r_features = self.relnet(a_features, g_features, split, rel_pair_counts)
  File "/viscam/u/xxx/anaconda3/envs/Im3D/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/viscam/projects/inv_engine/xxx/ig_llm/rtx_3090/Implicit3DUnderstanding/models/total3d/modules/relation_net.py", line 54, in forward
    g_weights = self.fc_g(g_features)
  File "/viscam/u/xxx/anaconda3/envs/Im3D/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/viscam/u/xxx/anaconda3/envs/Im3D/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 92, in forward
    return F.linear(input, self.weight, self.bias)
  File "/viscam/u/xxx/anaconda3/envs/Im3D/lib/python3.6/site-packages/torch/nn/functional.py", line 1406, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: cublas runtime error : the GPU program failed to execute at /tmp/pip-req-build-jh50bw28/aten/src/THC/THCBlas.cu:259
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant