Unable to load model if n_gpu_layers > 0 #98

XEmAX32 · 2024-12-14T00:22:23Z

Loading models with n_gpu_layers: 0 works fine but otherwise doesn't work.
I'm using an iPhone 13 with iOS 18.1.

Minimum reproducible code is almost same as the example, but for the record:

      const context = await initLlama({
        model: modelPath,
        use_mlock: true,
        n_ctx: 4096,
        n_gpu_layers: 1,
      }, (progress) => console.log(progress));

At the moment I've seen two different scenarios:
A) app crashes (ex. Llama-3.2-1B-Instruct-Q4_0_4_4)
B) initLlama fails with the following error (ex. Llama-3.2-3B-Instruct-Q6_K)
C) crash: app killed by system because it is using too much memory; even with Extended Virtual Addressing capability enabled (ex. Llama-3.2-3B-Instruct-Q4_0)

'err', { [Error: Failed to load the model]
  code: 'llama_cpp_error',
  nativeStackIOS: 
   [ '0   LlamaTest                           0x000000010243eec4 RCTJSErrorFromCodeMessageAndNSError + 112',
     '1   LlamaTest                           0x00000001027aac40 ___ZZN8facebook5react15ObjCTurboModule13createPromiseERNS_3jsi7RuntimeENSt3__112basic_stringIcNS5_11char_traitsIcEENS5_9allocatorIcEEEEU13block_pointerFvU13block_pointerFvP11objc_objectEU13block_pointerFvP8NSStringSH_P7NSErrorEEENK3$_0clES4_RKNS2_5ValueEPSQ_m_block_invoke.109 + 332',
     '2   LlamaTest                           0x0000000102d7fce4 -[RNLlama initContext:withContextParams:withResolver:withRejecter:] + 572',
     '3   CoreFoundation                      0x0000000197a4a374 1532D3D8-9B3B-3F2F-B35F-55A20DDF411B + 131956',
     '4   CoreFoundation                      0x0000000197a493c4 1532D3D8-9B3B-3F2F-B35F-55A20DDF411B + 127940',
     '5   CoreFoundation                      0x0000000197abecb8 1532D3D8-9B3B-3F2F-B35F-55A20DDF411B + 609464',
     '6   LlamaTest                           0x000000010279b56c ___ZN8facebook5react15ObjCTurboModule23performMethodInvocationERNS_3jsi7RuntimeEbPKcP12NSInvocationP14NSMutableArray_block_invoke + 240',
     '7   LlamaTest                           0x00000001027b1efc _ZZN8facebook5react15ObjCTurboModule23performMethodInvocationERNS_3jsi7RuntimeEbPKcP12NSInvocationP14NSMutableArrayENK3$_2clEv + 96',
     '8   LlamaTest                           0x00000001027b1e90 _ZNSt3__18__invokeB8ue170006IRZN8facebook5react15ObjCTurboModule23performMethodInvocationERNS1_3jsi7RuntimeEbPKcP12NSInvocationP14NSMutableArrayE3$_2JEEEDTclclsr3stdE7declvalIT_EEspclsr3stdE7declvalIT0_EEEEOSF_DpOSG_ + 24',
     '9   LlamaTest                           0x00000001027b1e48 _ZNSt3__128__invoke_void_return_wrapperIvLb1EE6__callB8ue170006IJRZN8facebook5react15ObjCTurboModule23performMethodInvocationERNS3_3jsi7RuntimeEbPKcP12NSInvocationP14NSMutableArrayE3$_2EEEvDpOT_ + 24',
     '10  LlamaTest                           0x00000001027b1e24 _ZNSt3__110__function12__alloc_funcIZN8facebook5react15ObjCTurboModule23performMethodInvocationERNS2_3jsi7RuntimeEbPKcP12NSInvocationP14NSMutableArrayE3$_2NS_9allocatorISE_EEFvvEEclB8ue170006Ev + 28',
     '11  LlamaTest                           0x00000001027b0b88 _ZNSt3__110__function6__funcIZN8facebook5react15ObjCTurboModule23performMethodInvocationERNS2_3jsi7RuntimeEbPKcP12NSInvocationP14NSMutableArrayE3$_2NS_9allocatorISE_EEFvvEEclEv + 28',
     '12  LlamaTest                           0x000000010234f8c8 _ZNKSt3__110__function12__value_funcIFvvEEclB8ue170006Ev + 68',
     '13  LlamaTest                           0x000000010234f7e8 _ZNKSt3__18functionIFvvEEclEv + 24',
     '14  LlamaTest                           0x00000001027c19c8 ___ZN12_GLOBAL__N_129ModuleNativeMethodCallInvoker11invokeAsyncERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEONS1_8functionIFvvEEE_block_invoke + 44',

     '15  libdispatch.dylib                   0x00000001058a8a30 _dispatch_call_block_and_release + 32',
     '16  libdispatch.dylib                   0x00000001058aa71c _dispatch_client_callout + 20',
     '17  libdispatch.dylib                   0x00000001058b25e8 _dispatch_lane_serial_drain + 828',
     '18  libdispatch.dylib                   0x00000001058b3360 _dispatch_lane_invoke + 408',
     '19  libdispatch.dylib                   0x00000001058c05f0 _dispatch_root_queue_drain_deferred_wlh + 328',
     '20  libdispatch.dylib                   0x00000001058bfc00 _dispatch_workloop_worker_thread + 580',
     '21  libsystem_pthread.dylib             0x000000021ff8bc7c _pthread_wqthread + 288',
     '22  libsystem_pthread.dylib             0x000000021ff88488 start_wqthread + 8' ],
  domain: 'RCTErrorDomain',
  userInfo: null }

I've investigated scenario A), apparently ggml-metal throws the following error MUL MAT-MAT not implemented through function LM_GGML_ABORT, that as far as I understand means that there is no matrix multiplication available for matrices involved in the kind of quantized species used.

This scenario seems not to be managed by llama.rn, in my opinion a good solution would be a more verbose error message + not crashing. I will try to fix this in the weekend.

As per scenario B) and C) I'm still trying to understand what's going on, posting it here in case anyone else is having these same problems.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to load model if n_gpu_layers > 0 #98

Unable to load model if n_gpu_layers > 0 #98

XEmAX32 commented Dec 14, 2024

Unable to load model if n_gpu_layers > 0 #98

Unable to load model if n_gpu_layers > 0 #98

Comments

XEmAX32 commented Dec 14, 2024