-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
!!! Exception during processing !!! All input tensors need to be on the same GPU, but found some tensors to not be on a GPU: #6
Comments
I have the same problem, I have noticed this reply from here kijai/ComfyUI-HunyuanVideoWrapper#130
I tried running ComfyUI on CPU only and the model loaded perfectly, so there is something wrong with memory management. I'm investigating... |
Ok, maybe I understand what is going on, unfortunately not good news: it's clearly a memory issue, as the error specifies that tensors are not all loaded on the same memory of the same device; "tensor crossover" between multiple devices is not supported. The devices in question are two, CPU and GPU, each of which obviously with its own memory, RAM and VRAM respectively. The problem is exactly this: all tensors must be loaded either all only on RAM or all only on VRAM. By making a change to the code, I think I managed to force the loading of all tensors on VRAM only, however now my ComfyUI goes into torch.OutOfMemoryError, since my video card unfortunately only has 6 GB of VRAM. The file I modified is I changed the whole function functional_linear_4bits(x, weight, bias) in line 14 into this:
For the sake of fairness, I would like to point out that ChatGPT helped me modify the code I hope this could help |
thank you for your reply def functional_linear_4bits(x, weight, bias): |
This fixed it for me! Thank you so much. |
This is what ended up working for me:
Differences between original
|
Did not work for me, the model loads now and starts to generate but runs on CPU, and I get: Allocation on device ,works fine on Forgeui I think there is an issue with the swap part, managed to reproduce the 'Allocation on device" when switching swap method from queue to async, may be same problem here? This is logs from forge when works correctly: |
Hi the same issue i am having i copied the below code now new error out of memory for bnbnf4v2 I have 6gb GPU and 16 gb vram I am able to run on forge but on comfui failing is there any way to limit GPU weights like forge ? def functional_linear_4bits(x, weight, bias):
|
here's my workflow:
here's the logs:
C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build
[START] Security scan
[DONE] Security scan
ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2024-12-25 22:57:57.250645
** Platform: Windows
** Python version: 3.12.7 (tags/v3.12.7:0b05ead, Oct 1 2024, 03:06:41) [MSC v.1941 64 bit (AMD64)]
** Python executable: C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\python_embeded\python.exe
** ComfyUI Path: C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI
** Log path: C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\comfyui.log
Prestartup times for custom nodes:
10.3 seconds: C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager
Total VRAM 4096 MB, total RAM 16054 MB
pytorch version: 2.5.1+cu124
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3050 Laptop GPU : cudaMallocAsync
Using pytorch cross attention
[Prompt Server] web root: C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\web
Loading: ComfyUI-Manager (V2.55.4)
ComfyUI Revision: 2890 [9a616b81] *DETACHED | Released on '2024-12-04'
Import times for custom nodes:
0.0 seconds: C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\websocket_image_save.py
0.2 seconds: C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_bnb_nf4_fp4_Loaders-master
2.0 seconds: C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager
Starting server
To see the GUI go to: http://127.0.0.1:8188
FETCH DATA from: C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json [DONE]
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json
got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Using pytorch attention in VAE
Using pytorch attention in VAE
Requested to load FluxClipModel_
loaded partially 1884.2000005722045 1883.88671875 0
Requested to load Flux
loaded completely 1745.7710005722047 1745.7708854675293 False
0%| | 0/20 [00:00<?, ?it/s]
!!! Exception during processing !!! All input tensors need to be on the same GPU, but found some tensors to not be on a GPU:
[(torch.Size([4718592, 1]), device(type='cpu')), (torch.Size([1, 3072]), device(type='cuda', index=0)), (torch.Size([1, 3072]), device(type='cuda', index=0)), (torch.Size([147456]), device(type='cpu')), (torch.Size([16]), device(type='cpu'))]
Traceback (most recent call last):
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\execution.py", line 199, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\execution.py", line 170, in _map_node_over_list
process_inputs(input_dict, i)
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\execution.py", line 159, in process_inputs
results.append(getattr(obj, func)(**inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 633, in sample
samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 904, in sample
output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 110, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 873, in outer_sample
output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 857, in inner_sample
samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 110, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 714, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\k_diffusion\sampling.py", line 155, in sample_euler
denoised = model(x, sigma_hat * s_in, **extra_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 384, in call
out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 839, in call
return self.predict_noise(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 842, in predict_noise
return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 364, in sampling_function
out = calc_cond_batch(model, conds, x, timestep, model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 200, in calc_cond_batch
return executor.execute(model, conds, x_in, timestep, model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 110, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 313, in calc_cond_batch
output = model.apply_model(input_x, timestep, **c).chunk(batch_chunks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\model_base.py", line 128, in apply_model
return comfy.patcher_extension.WrapperExecutor.new_class_executor(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 110, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\model_base.py", line 157, in _apply_model
model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\ldm\flux\model.py", line 184, in forward
out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance, control, transformer_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\ldm\flux\model.py", line 110, in forward_orig
vec = self.time_in(timestep_embedding(timesteps, 256).to(img.dtype))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\comfy\ldm\flux\layers.py", line 58, in forward
return self.out_layer(self.silu(self.in_layer(x)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self.call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1747, in call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_bnb_nf4_fp4_Loaders-master_init.py", line 161, in forward
return functional_linear_4bits(x, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_bnb_nf4_fp4_Loaders-master_init.py", line 15, in functional_linear_4bits
out = bnb.matmul_4bit(x, weight.t(), bias=bias, quant_state=weight.quant_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\python_embeded\Lib\site-packages\bitsandbytes\autograd_functions.py", line 528, in matmul_4bit
out = F.gemv_4bit(A, B.t(), out, state=quant_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\python_embeded\Lib\site-packages\bitsandbytes\functional.py", line 1989, in gemv_4bit
is_on_gpu([B, A, out, absmax, state.code])
File "C:\Users\alienware\Desktop\very important shit\ai (32 gb)\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\python_embeded\Lib\site-packages\bitsandbytes\functional.py", line 464, in is_on_gpu
raise RuntimeError(
RuntimeError: All input tensors need to be on the same GPU, but found some tensors to not be on a GPU:
[(torch.Size([4718592, 1]), device(type='cpu')), (torch.Size([1, 3072]), device(type='cuda', index=0)), (torch.Size([1, 3072]), device(type='cuda', index=0)), (torch.Size([147456]), device(type='cpu')), (torch.Size([16]), device(type='cpu'))]
Prompt executed in 32.31 seconds
I have two gpus:
Iris intel xe graphics (power saving)
nvidia rtx 3050 laptop (high performance)
this is the error I get when trying to generate a normal image using flux_dev_bnb_nf4_v2
any ideas on how to fix this? thanks.
The text was updated successfully, but these errors were encountered: