Skip to content

Eval bug: GGML_ASSERT(n_outputs_max <= cparams.n_outputs_max) failed #24967

Description

@Lyaaaaaaaaaaaaaaa

Name and Version

version: 9776 (ac4105d)
built with GNU 11.4.0 for Linux x86_64

Operating systems

Linux

GGML backends

Vulkan

Hardware

System:
  Kernel: 6.17.0-35-generic arch: x86_64 bits: 64 compiler: gcc v: 13.3.0 clocksource: tsc
  Desktop: Cinnamon v: 6.6.7 tk: GTK v: 3.24.41 wm: Muffin v: 6.6.3 vt: 7 dm: LightDM v: 1.30.0
    Distro: Linux Mint 22.3 Zena base: Ubuntu 24.04 noble
CPU:
  Info: dual core model: Intel Core i3-10110U bits: 64 type: MT MCP smt: enabled
    arch: Comet/Whiskey Lake note: check rev: C cache: L1: 128 KiB L2: 512 KiB L3: 4 MiB
  Speed (MHz): avg: 3100 min/max: 400/4100 cores: 1: 3100 2: 3100 3: 3100 4: 3100 bogomips: 20799
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel CometLake-U GT2 [UHD Graphics] vendor: Dell driver: i915 v: kernel arch: Gen-9.5
    ports: active: HDMI-A-1 off: eDP-1 empty: DP-1,HDMI-A-2 bus-ID: 00:02.0 chip-ID: 8086:9b41
    class-ID: 0300
  Display: x11 server: X.Org v: 21.1.11 with: Xwayland v: 23.2.6 driver: X: loaded: modesetting
    unloaded: fbdev,vesa dri: iris gpu: i915 display-ID: :0 screens: 1
  API: EGL v: 1.5 hw: drv: intel iris platforms: device: 0 drv: iris device: 1 drv: swrast gbm:
    drv: iris surfaceless: drv: iris x11: drv: iris inactive: wayland
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 25.2.8-0ubuntu0.24.04.2 glx-v: 1.4
    direct-render: yes renderer: Mesa Intel UHD Graphics (CML GT2) device-ID: 8086:9b41
  API: Vulkan v: 1.3.275 layers: 7 surfaces: xcb,xlib device: 0 type: integrated-gpu driver: N/A
    device-ID: 8086:9b41 device: 1 type: cpu driver: N/A device-ID: 10005:0000

Models

https://huggingface.co/Lyaaaaaaaaaaaaaaa/emotion-english-distilroberta-melinna-Q4_K_M-GGUF
https://huggingface.co/Lyaaaaaaaaaaaaaaa/emotion-english-distilroberta-base-Q4_K_M-GGUF
https://huggingface.co/Lyaaaaaaaaaaaaaaa/emotion-english-Q4_K_M-GGUF
https://huggingface.co/mradermacher/youtube-sentiment-v2-GGUF

I did try on more models but they all have the same error.

Problem description & steps to reproduce

I can't run multiple models

CLI error

  1. I run : ./llamacpp/llama-cli -m youtube-sentiment-v2.Q4_K_M.gguf -p "The meaning to life and the universe is"
  2. Notice error
    You can change the model by any of the models I listed above.

I used GGUF my repo to create my models.
I tried on 9 models and they all get the same errors.

  1. Use gguf my repo on https://huggingface.co/Linna/emotion-english-distilroberta-melinna
  2. run it
  3. Notice issue

Server issue

I tried to run the same models using the server instead of CLI. It loads but when I try to generated I get another error.

https://huggingface.co/Lyaaaaaaaaaaaaaaa/emotion-english-distilroberta-base-Q4_K_M-GGUF exemple

  1. Run https://huggingface.co/Lyaaaaaaaaaaaaaaa/emotion-english-distilroberta-base-Q4_K_M-GGUF with llama-server
  2. Connect to web interface
  3. Try to generate anything
  4. Notice error
    send_error: task id = 0, error: the current context does not logits computation. skipping

First Bad Commit

No response

Relevant log output

First command

./llamacpp/llama-cli -m youtube-sentiment-v2.Q4_K_M.gguf -p "The meaning to life and the universe is"

Loading model... -/home/runner/work/llama.cpp/llama.cpp/src/llama-context.cpp:2219: GGML_ASSERT(n_outputs_max <= cparams.n_outputs_max) failed
[New LWP 24796]
[New LWP 24795]
[New LWP 24794]
[New LWP 24793]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x0000752821d10813 in __GI___wait4 (pid=24797, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30	../sysdeps/unix/sysv/linux/wait4.c: Aucun fichier ou dossier de ce nom
#0  0x0000752821d10813 in __GI___wait4 (pid=24797, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x0000752821f51c3b in ggml_print_backtrace () from /path/to/llama/libggml-base.so.0
#2  0x0000752821f51dd2 in ggml_abort () from /path/to/llama/libggml-base.so.0
#3  0x0000752821558f16 in llama_context::output_reserve(int) () from /path/to/llama/libllama.so.0
#4  0x0000752821559dad in llama_context::encode(llama_batch const&) () from /path/to/llama/libllama.so.0
#5  0x000075282155eb60 in llama_decode () from /path/to/llama/libllama.so.0
#6  0x00007528219919b4 in common_init_from_params(common_params&, bool) () from /path/to/llama/libllama-common.so.0
#7  0x00007528225629a3 in server_context_impl::load_model(common_params&) () from /path/to/llama/libllama-cli-impl.so
#8  0x00007528224a7674 in llama_cli(int, char**) () from /path/to/llama/libllama-cli-impl.so
#9  0x0000752821c2a1ca in __libc_start_call_main (main=main@entry=0x5b34ef475260 <main>, argc=argc@entry=5, argv=argv@entry=0x7ffe93bbfb28) at ../sysdeps/nptl/libc_start_call_main.h:58
warning: 58	../sysdeps/nptl/libc_start_call_main.h: Aucun fichier ou dossier de ce nom
#10 0x0000752821c2a28b in __libc_start_main_impl (main=0x5b34ef475260 <main>, argc=5, argv=0x7ffe93bbfb28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe93bbfb18) at ../csu/libc-start.c:360
warning: 360	../csu/libc-start.c: Aucun fichier ou dossier de ce nom
#11 0x00005b34ef475295 in _start ()
\[Inferior 1 (process 24790) detached]
Abandon (core dumped)

Second command

./llamacpp/llama-completion -fit off -m models/emotions/youtube-sentiment-v2.Q4_K_M.gguf -p "The meaning to life and the universe is" --verbose

0.00.055.533 I llama_completion: llama backend init
0.00.055.542 I llama_completion: load the model and apply lora adapter, if any
0.00.062.445 I llama_model_loader: loaded meta data with 38 key-value pairs and 104 tensors from models/emotions/youtube-sentiment-v2.Q4_K_M.gguf (version GGUF V3 (latest))
0.00.062.514 I llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
0.00.062.534 I llama_model_loader: - kv   0:                       general.architecture str              = bert
0.00.062.536 I llama_model_loader: - kv   1:                               general.type str              = model
0.00.062.538 I llama_model_loader: - kv   2:                               general.name str              = Youtube Sentiment v2
0.00.062.540 I llama_model_loader: - kv   3:                            general.version str              = v2
0.00.062.541 I llama_model_loader: - kv   4:                           general.basename str              = youtube-sentiment
0.00.062.541 I llama_model_loader: - kv   5:                         general.size_label str              = 67M
0.00.062.594 I llama_model_loader: - kv   6:                               general.tags arr[str,5]       = ["distilbert", "emotion", "youtube", ...
0.00.062.624 I llama_model_loader: - kv   7:          bert.attention.layer_norm_epsilon f32              = 0,000000
0.00.062.630 I llama_model_loader: - kv   8:                           bert.block_count u32              = 6
0.00.062.633 I llama_model_loader: - kv   9:                        bert.context_length u32              = 512
0.00.062.634 I llama_model_loader: - kv  10:                      bert.embedding_length u32              = 768
0.00.062.635 I llama_model_loader: - kv  11:                   bert.feed_forward_length u32              = 3072
0.00.062.636 I llama_model_loader: - kv  12:                  bert.attention.head_count u32              = 12
0.00.062.637 I llama_model_loader: - kv  13:                      bert.attention.causal bool             = false
0.00.062.648 I llama_model_loader: - kv  14:              bert.classifier.output_labels arr[str,7]       = ["LABEL_0", "LABEL_1", "LABEL_2", "LA...
0.00.062.649 I llama_model_loader: - kv  15:            tokenizer.ggml.token_type_count u32              = 1
0.00.062.650 I llama_model_loader: - kv  16:                       tokenizer.ggml.model str              = bert
0.00.062.652 I llama_model_loader: - kv  17:                         tokenizer.ggml.pre str              = jina-v2-en
0.00.067.708 I llama_model_loader: - kv  18:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
0.00.068.881 I llama_model_loader: - kv  19:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
0.00.068.885 I llama_model_loader: - kv  20:                tokenizer.ggml.bos_token_id u32              = 101
0.00.068.886 I llama_model_loader: - kv  21:                tokenizer.ggml.eos_token_id u32              = 102
0.00.068.887 I llama_model_loader: - kv  22:            tokenizer.ggml.unknown_token_id u32              = 100
0.00.068.888 I llama_model_loader: - kv  23:          tokenizer.ggml.seperator_token_id u32              = 102
0.00.068.890 I llama_model_loader: - kv  24:            tokenizer.ggml.padding_token_id u32              = 0
0.00.068.891 I llama_model_loader: - kv  25:               tokenizer.ggml.mask_token_id u32              = 103
0.00.068.892 I llama_model_loader: - kv  26:               tokenizer.ggml.add_bos_token bool             = true
0.00.068.893 I llama_model_loader: - kv  27:               tokenizer.ggml.add_eos_token bool             = true
0.00.068.894 I llama_model_loader: - kv  28:               tokenizer.ggml.add_sep_token bool             = false
0.00.068.895 I llama_model_loader: - kv  29:               general.quantization_version u32              = 2
0.00.068.896 I llama_model_loader: - kv  30:                          general.file_type u32              = 15
0.00.068.898 I llama_model_loader: - kv  31:                                general.url str              = https://huggingface.co/mradermacher/y...
0.00.068.899 I llama_model_loader: - kv  32:              mradermacher.quantize_version str              = 2
0.00.068.900 I llama_model_loader: - kv  33:                  mradermacher.quantized_by str              = mradermacher
0.00.068.901 I llama_model_loader: - kv  34:                  mradermacher.quantized_at str              = 2025-06-22T20:09:52+02:00
0.00.068.902 I llama_model_loader: - kv  35:                  mradermacher.quantized_on str              = leia
0.00.068.904 I llama_model_loader: - kv  36:                         general.source.url str              = https://huggingface.co/Anuj5504/youtu...
0.00.068.905 I llama_model_loader: - kv  37:                  mradermacher.convert_type str              = hf
0.00.068.907 I llama_model_loader: - type  f32:   65 tensors
0.00.068.909 I llama_model_loader: - type q4_K:   34 tensors
0.00.068.909 I llama_model_loader: - type q6_K:    5 tensors
0.00.068.911 I print_info: file format = GGUF V3 (latest)
0.00.068.912 I print_info: file type   = Q4_K - Medium
0.00.068.916 I print_info: file size   = 44,63 MiB (5,59 BPW) 
0.00.069.159 I llama_prepare_model_devices: using device Vulkan0 (Intel(R) UHD Graphics (CML GT2)) (0000:00:02.0) - 9112 MiB free
0.00.074.469 D init_tokenizer: initializing tokenizer for type 3
0.00.088.063 I load: 0 unused tokens
0.00.091.849 D load: control token:    101 '[CLS]' is not marked as EOG
0.00.092.343 D load: control token:    103 '[MASK]' is not marked as EOG
0.00.092.949 D load: control token:    100 '[UNK]' is not marked as EOG
0.00.093.183 D load: control token:    102 '[SEP]' is not marked as EOG
0.00.098.820 W load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
0.00.098.822 I load: printing all EOG tokens:
0.00.098.823 I load:   - 0 ('[PAD]')
0.00.098.824 I load:   - 102 ('[SEP]')
0.00.098.902 I load: special tokens cache size = 5
0.00.102.131 I load: token to piece cache size = 0,2032 MB
0.00.102.146 I print_info: arch                  = bert
0.00.102.146 I print_info: vocab_only            = 0
0.00.102.147 I print_info: no_alloc              = 0
0.00.102.147 I print_info: n_ctx_train           = 512
0.00.102.148 I print_info: n_embd_inp            = 768
0.00.102.149 I print_info: n_embd                = 768
0.00.102.149 I print_info: n_embd_out            = 768
0.00.102.150 I print_info: n_layer               = 6
0.00.102.150 I print_info: n_layer_all           = 6
0.00.102.160 I print_info: n_head                = 12
0.00.102.161 I print_info: n_head_kv             = 12
0.00.102.161 I print_info: n_rot                 = 64
0.00.102.162 I print_info: n_swa                 = 0
0.00.102.162 I print_info: is_swa_any            = 0
0.00.102.163 I print_info: n_embd_head_k         = 64
0.00.102.163 I print_info: n_embd_head_v         = 64
0.00.102.164 I print_info: n_gqa                 = 1
0.00.102.167 I print_info: n_embd_k_gqa          = 768
0.00.102.169 I print_info: n_embd_v_gqa          = 768
0.00.102.171 I print_info: f_norm_eps            = 1,0e-12
0.00.102.172 I print_info: f_norm_rms_eps        = 0,0e+00
0.00.102.173 I print_info: f_clamp_kqv           = 0,0e+00
0.00.102.173 I print_info: f_max_alibi_bias      = 0,0e+00
0.00.102.174 I print_info: f_logit_scale         = 0,0e+00
0.00.102.174 I print_info: f_attn_scale          = 0,0e+00
0.00.102.174 I print_info: f_attn_value_scale    = 0,0000
0.00.102.176 I print_info: n_ff                  = 3072
0.00.102.176 I print_info: n_expert              = 0
0.00.102.176 I print_info: n_expert_used         = 0
0.00.102.176 I print_info: n_expert_groups       = 0
0.00.102.177 I print_info: n_group_used          = 0
0.00.102.177 I print_info: causal attn           = 0
0.00.102.177 I print_info: pooling type          = -1
0.00.102.177 I print_info: rope type             = 2
0.00.102.178 I print_info: rope scaling          = linear
0.00.102.179 I print_info: freq_base_train       = 10000,0
0.00.102.180 I print_info: freq_scale_train      = 1
0.00.102.180 I print_info: n_ctx_orig_yarn       = 512
0.00.102.181 I print_info: rope_yarn_log_mul     = 0,0000
0.00.102.181 I print_info: rope_finetuned        = unknown
0.00.102.182 I print_info: n_cls_out             = 7
0.00.102.182 I print_info: cls_label[ 0]         = LABEL_0
0.00.102.183 I print_info: cls_label[ 1]         = LABEL_1
0.00.102.183 I print_info: cls_label[ 2]         = LABEL_2
0.00.102.184 I print_info: cls_label[ 3]         = LABEL_3
0.00.102.184 I print_info: cls_label[ 4]         = LABEL_4
0.00.102.185 I print_info: cls_label[ 5]         = LABEL_5
0.00.102.185 I print_info: cls_label[ 6]         = LABEL_6
0.00.102.186 I print_info: model type            = 22M
0.00.102.188 I print_info: model params          = 66,96 M
0.00.102.188 I print_info: general.name          = Youtube Sentiment v2
0.00.102.191 I print_info: vocab type            = WPM
0.00.102.193 I print_info: n_vocab               = 30522
0.00.102.193 I print_info: n_merges              = 0
0.00.102.193 I print_info: BOS token             = 101 '[CLS]'
0.00.102.194 I print_info: EOS token             = 102 '[SEP]'
0.00.102.195 I print_info: UNK token             = 100 '[UNK]'
0.00.102.195 I print_info: SEP token             = 102 '[SEP]'
0.00.102.195 I print_info: PAD token             = 0 '[PAD]'
0.00.102.196 I print_info: MASK token            = 103 '[MASK]'
0.00.102.196 I print_info: LF token              = 0 '[PAD]'
0.00.102.197 I print_info: FIM PAD token         = 0 '[PAD]'
0.00.102.197 I print_info: EOG token             = 0 '[PAD]'
0.00.102.198 I print_info: EOG token             = 102 '[SEP]'
0.00.102.198 I print_info: max token length      = 21
0.00.102.200 I load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
0.00.106.806 D load_tensors: layer   0 assigned to device Vulkan0, is_swa = 0
0.00.106.814 D load_tensors: layer   1 assigned to device Vulkan0, is_swa = 0
0.00.106.815 D load_tensors: layer   2 assigned to device Vulkan0, is_swa = 0
0.00.106.815 D load_tensors: layer   3 assigned to device Vulkan0, is_swa = 0
0.00.106.816 D load_tensors: layer   4 assigned to device Vulkan0, is_swa = 0
0.00.106.816 D load_tensors: layer   5 assigned to device Vulkan0, is_swa = 0
0.00.106.817 D load_tensors: layer   6 assigned to device Vulkan0, is_swa = 0
0.00.106.886 D create_tensor: loading tensor token_embd.weight
0.00.106.916 D create_tensor: loading tensor position_embd.weight
0.00.106.938 D create_tensor: loading tensor cls.weight
0.00.106.947 D create_tensor: loading tensor cls.bias
0.00.106.952 D create_tensor: loading tensor cls.output.weight
0.00.106.956 D create_tensor: loading tensor cls.output.bias
0.00.106.963 D create_tensor: loading tensor token_embd_norm.weight
0.00.106.968 D create_tensor: loading tensor token_embd_norm.bias
0.00.106.981 D create_tensor: loading tensor blk.0.attn_q.weight
0.00.106.988 D create_tensor: loading tensor blk.0.attn_k.weight
0.00.106.996 D create_tensor: loading tensor blk.0.attn_v.weight
0.00.107.005 D create_tensor: loading tensor blk.0.attn_q.bias
0.00.107.013 D create_tensor: loading tensor blk.0.attn_k.bias
0.00.107.023 D create_tensor: loading tensor blk.0.attn_v.bias
0.00.107.033 D create_tensor: loading tensor blk.0.attn_output.weight
0.00.107.043 D create_tensor: loading tensor blk.0.attn_output.bias
0.00.107.054 D create_tensor: loading tensor blk.0.attn_output_norm.weight
0.00.107.064 D create_tensor: loading tensor blk.0.attn_output_norm.bias
0.00.107.075 D create_tensor: loading tensor blk.0.ffn_up.weight
0.00.107.085 D create_tensor: loading tensor blk.0.ffn_up.bias
0.00.107.095 D create_tensor: loading tensor blk.0.ffn_down.weight
0.00.107.105 D create_tensor: loading tensor blk.0.ffn_down.bias
0.00.107.113 D create_tensor: loading tensor blk.0.layer_output_norm.weight
0.00.107.121 D create_tensor: loading tensor blk.0.layer_output_norm.bias
0.00.107.139 D create_tensor: loading tensor blk.1.attn_q.weight
0.00.107.151 D create_tensor: loading tensor blk.1.attn_k.weight
0.00.107.160 D create_tensor: loading tensor blk.1.attn_v.weight
0.00.107.168 D create_tensor: loading tensor blk.1.attn_q.bias
0.00.107.177 D create_tensor: loading tensor blk.1.attn_k.bias
0.00.107.186 D create_tensor: loading tensor blk.1.attn_v.bias
0.00.107.196 D create_tensor: loading tensor blk.1.attn_output.weight
0.00.107.206 D create_tensor: loading tensor blk.1.attn_output.bias
0.00.107.217 D create_tensor: loading tensor blk.1.attn_output_norm.weight
0.00.107.227 D create_tensor: loading tensor blk.1.attn_output_norm.bias
0.00.107.237 D create_tensor: loading tensor blk.1.ffn_up.weight
0.00.107.247 D create_tensor: loading tensor blk.1.ffn_up.bias
0.00.107.260 D create_tensor: loading tensor blk.1.ffn_down.weight
0.00.107.270 D create_tensor: loading tensor blk.1.ffn_down.bias
0.00.107.280 D create_tensor: loading tensor blk.1.layer_output_norm.weight
0.00.107.290 D create_tensor: loading tensor blk.1.layer_output_norm.bias
0.00.107.305 D create_tensor: loading tensor blk.2.attn_q.weight
0.00.107.315 D create_tensor: loading tensor blk.2.attn_k.weight
0.00.107.325 D create_tensor: loading tensor blk.2.attn_v.weight
0.00.107.335 D create_tensor: loading tensor blk.2.attn_q.bias
0.00.107.345 D create_tensor: loading tensor blk.2.attn_k.bias
0.00.107.356 D create_tensor: loading tensor blk.2.attn_v.bias
0.00.107.366 D create_tensor: loading tensor blk.2.attn_output.weight
0.00.107.396 D create_tensor: loading tensor blk.2.attn_output.bias
0.00.107.408 D create_tensor: loading tensor blk.2.attn_output_norm.weight
0.00.107.418 D create_tensor: loading tensor blk.2.attn_output_norm.bias
0.00.107.429 D create_tensor: loading tensor blk.2.ffn_up.weight
0.00.107.440 D create_tensor: loading tensor blk.2.ffn_up.bias
0.00.107.451 D create_tensor: loading tensor blk.2.ffn_down.weight
0.00.107.461 D create_tensor: loading tensor blk.2.ffn_down.bias
0.00.107.472 D create_tensor: loading tensor blk.2.layer_output_norm.weight
0.00.107.482 D create_tensor: loading tensor blk.2.layer_output_norm.bias
0.00.107.498 D create_tensor: loading tensor blk.3.attn_q.weight
0.00.107.508 D create_tensor: loading tensor blk.3.attn_k.weight
0.00.107.523 D create_tensor: loading tensor blk.3.attn_v.weight
0.00.107.534 D create_tensor: loading tensor blk.3.attn_q.bias
0.00.107.544 D create_tensor: loading tensor blk.3.attn_k.bias
0.00.107.555 D create_tensor: loading tensor blk.3.attn_v.bias
0.00.107.566 D create_tensor: loading tensor blk.3.attn_output.weight
0.00.107.578 D create_tensor: loading tensor blk.3.attn_output.bias
0.00.107.592 D create_tensor: loading tensor blk.3.attn_output_norm.weight
0.00.107.605 D create_tensor: loading tensor blk.3.attn_output_norm.bias
0.00.107.616 D create_tensor: loading tensor blk.3.ffn_up.weight
0.00.107.628 D create_tensor: loading tensor blk.3.ffn_up.bias
0.00.107.639 D create_tensor: loading tensor blk.3.ffn_down.weight
0.00.107.653 D create_tensor: loading tensor blk.3.ffn_down.bias
0.00.107.666 D create_tensor: loading tensor blk.3.layer_output_norm.weight
0.00.107.677 D create_tensor: loading tensor blk.3.layer_output_norm.bias
0.00.107.693 D create_tensor: loading tensor blk.4.attn_q.weight
0.00.107.705 D create_tensor: loading tensor blk.4.attn_k.weight
0.00.107.717 D create_tensor: loading tensor blk.4.attn_v.weight
0.00.107.729 D create_tensor: loading tensor blk.4.attn_q.bias
0.00.107.740 D create_tensor: loading tensor blk.4.attn_k.bias
0.00.107.752 D create_tensor: loading tensor blk.4.attn_v.bias
0.00.107.764 D create_tensor: loading tensor blk.4.attn_output.weight
0.00.107.777 D create_tensor: loading tensor blk.4.attn_output.bias
0.00.107.792 D create_tensor: loading tensor blk.4.attn_output_norm.weight
0.00.107.804 D create_tensor: loading tensor blk.4.attn_output_norm.bias
0.00.107.816 D create_tensor: loading tensor blk.4.ffn_up.weight
0.00.107.828 D create_tensor: loading tensor blk.4.ffn_up.bias
0.00.107.841 D create_tensor: loading tensor blk.4.ffn_down.weight
0.00.107.852 D create_tensor: loading tensor blk.4.ffn_down.bias
0.00.107.864 D create_tensor: loading tensor blk.4.layer_output_norm.weight
0.00.107.875 D create_tensor: loading tensor blk.4.layer_output_norm.bias
0.00.107.894 D create_tensor: loading tensor blk.5.attn_q.weight
0.00.107.906 D create_tensor: loading tensor blk.5.attn_k.weight
0.00.107.919 D create_tensor: loading tensor blk.5.attn_v.weight
0.00.107.935 D create_tensor: loading tensor blk.5.attn_q.bias
0.00.107.947 D create_tensor: loading tensor blk.5.attn_k.bias
0.00.107.959 D create_tensor: loading tensor blk.5.attn_v.bias
0.00.107.971 D create_tensor: loading tensor blk.5.attn_output.weight
0.00.107.984 D create_tensor: loading tensor blk.5.attn_output.bias
0.00.107.996 D create_tensor: loading tensor blk.5.attn_output_norm.weight
0.00.108.008 D create_tensor: loading tensor blk.5.attn_output_norm.bias
0.00.108.021 D create_tensor: loading tensor blk.5.ffn_up.weight
0.00.108.033 D create_tensor: loading tensor blk.5.ffn_up.bias
0.00.108.047 D create_tensor: loading tensor blk.5.ffn_down.weight
0.00.108.059 D create_tensor: loading tensor blk.5.ffn_down.bias
0.00.108.076 D create_tensor: loading tensor blk.5.layer_output_norm.weight
0.00.108.090 D create_tensor: loading tensor blk.5.layer_output_norm.bias
0.00.108.366 D done_getting_tensors: tensor 'token_embd.weight' (q6_K) (and 1 others) cannot be used with preferred buffer type Vulkan_Host, using CPU instead
0.00.112.302 I load_tensors: offloading output layer to GPU
0.00.112.307 I load_tensors: offloading 5 repeating layers to GPU
0.00.112.307 I load_tensors: offloaded 7/7 layers to GPU
0.00.112.313 I load_tensors:   CPU_Mapped model buffer size =    19,84 MiB
0.00.112.315 I load_tensors:      Vulkan0 model buffer size =    24,79 MiB
..................................
0.00.124.292 I common_init_result: added [PAD] logit bias = -inf
0.00.124.297 I common_init_result: added [SEP] logit bias = -inf
0.00.124.841 I llama_context: constructing llama_context
0.00.124.868 I llama_context: n_seq_max     = 1
0.00.124.868 I llama_context: n_ctx         = 512
0.00.124.869 I llama_context: n_ctx_seq     = 512
0.00.124.869 I llama_context: n_batch       = 2048
0.00.124.869 I llama_context: n_ubatch      = 512
0.00.124.870 I llama_context: causal_attn   = 0
0.00.124.872 I llama_context: flash_attn    = auto
0.00.124.873 I llama_context: kv_unified    = false
0.00.124.879 I llama_context: freq_base     = 10000,0
0.00.124.880 I llama_context: freq_scale    = 1
0.00.124.881 I llama_context: n_rs_seq      = 0
0.00.124.881 I llama_context: n_outputs_max = 2048
0.00.124.936 D set_abort_callback: call
0.00.125.129 I llama_context: Vulkan_Host  output buffer size =     0,12 MiB
0.00.125.133 D llama_context: enumerating backends
0.00.125.146 D llama_context: backend_ptrs.size() = 2
0.00.125.148 I sched_reserve: reserving ...
0.00.125.153 D sched_reserve: max_nodes = 1024
0.00.126.029 D sched_reserve: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
0.00.126.040 D graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
0.00.126.662 I sched_reserve: Flash Attention was auto, set to enabled
0.00.126.667 I sched_reserve: resolving fused Gated Delta Net support:
0.00.126.669 D graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
0.00.127.124 I sched_reserve: fused Gated Delta Net (autoregressive) enabled
0.00.127.128 D graph_reserve: reserving a graph for ubatch with n_tokens =   16, n_seqs =  1, n_outputs =   16
0.00.127.514 I sched_reserve: fused Gated Delta Net (chunked) enabled
0.00.127.520 D graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
0.00.128.261 D graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
0.00.129.005 D graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
0.00.129.416 I sched_reserve:    Vulkan0 compute buffer size =     9,50 MiB
0.00.129.419 I sched_reserve: Vulkan_Host compute buffer size =     5,01 MiB
0.00.129.419 I sched_reserve: graph nodes  = 200
0.00.129.420 I sched_reserve: graph splits = 2
0.00.129.422 I sched_reserve: reserve took 4,27 ms, sched copies = 1
0.00.129.499 D set_adapters_lora: adapters = (nil)
0.00.129.501 D adapters_lora_are_same: adapters = (nil)
0.00.129.502 I common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
0.00.129.511 D decode: cannot decode batches with this context (calling encode() instead)
0.00.160.311 I llama_completion: llama threadpool init, n_threads = 2
0.00.160.330 D attach_threadpool: call
0.00.160.334 I 
/home/runner/work/llama.cpp/llama.cpp/tools/completion/completion.cpp:271: GGML_ASSERT(!llama_vocab_get_add_eos(vocab)) failed
0.00.160.390 I system_info: n_threads = 2 (n_threads_batch = 2) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
0.00.160.392 I 
[New LWP 25382]
[New LWP 25381]
[New LWP 25380]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x000078969fd10813 in __GI___wait4 (pid=25383, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30	../sysdeps/unix/sysv/linux/wait4.c: Aucun fichier ou dossier de ce nom
#0  0x000078969fd10813 in __GI___wait4 (pid=25383, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x000078969fe3cc3b in ggml_print_backtrace () from /path/to/llama/libggml-base.so.0
#2  0x000078969fe3cdd2 in ggml_abort () from /path/to/llama/libggml-base.so.0
#3  0x000078969ff21726 in llama_completion(int, char**) () from /path/to/llama/libllama-completion-impl.so
#4  0x000078969fc2a1ca in __libc_start_call_main (main=main@entry=0x561f63059060 <main>, argc=argc@entry=8, argv=argv@entry=0x7ffdf61a06b8) at ../sysdeps/nptl/libc_start_call_main.h:58
warning: 58	../sysdeps/nptl/libc_start_call_main.h: Aucun fichier ou dossier de ce nom
#5  0x000078969fc2a28b in __libc_start_main_impl (main=0x561f63059060 <main>, argc=8, argv=0x7ffdf61a06b8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffdf61a06a8) at ../csu/libc-start.c:360
warning: 360	../csu/libc-start.c: Aucun fichier ou dossier de ce nom
#6  0x0000561f63059095 in _start ()
[Inferior 1 (process 25378) detached]
Abandon (core dumped)

Server's issue

See file: server_issue.txt

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions