Eval bug: GGML_ASSERT(n_outputs_max <= cparams.n_outputs_max) failed

### Name and Version

version: 9776 (ac4105d68)
built with GNU 11.4.0 for Linux x86_64

### Operating systems

Linux

### GGML backends

Vulkan

### Hardware

```
System:
  Kernel: 6.17.0-35-generic arch: x86_64 bits: 64 compiler: gcc v: 13.3.0 clocksource: tsc
  Desktop: Cinnamon v: 6.6.7 tk: GTK v: 3.24.41 wm: Muffin v: 6.6.3 vt: 7 dm: LightDM v: 1.30.0
    Distro: Linux Mint 22.3 Zena base: Ubuntu 24.04 noble
CPU:
  Info: dual core model: Intel Core i3-10110U bits: 64 type: MT MCP smt: enabled
    arch: Comet/Whiskey Lake note: check rev: C cache: L1: 128 KiB L2: 512 KiB L3: 4 MiB
  Speed (MHz): avg: 3100 min/max: 400/4100 cores: 1: 3100 2: 3100 3: 3100 4: 3100 bogomips: 20799
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel CometLake-U GT2 [UHD Graphics] vendor: Dell driver: i915 v: kernel arch: Gen-9.5
    ports: active: HDMI-A-1 off: eDP-1 empty: DP-1,HDMI-A-2 bus-ID: 00:02.0 chip-ID: 8086:9b41
    class-ID: 0300
  Display: x11 server: X.Org v: 21.1.11 with: Xwayland v: 23.2.6 driver: X: loaded: modesetting
    unloaded: fbdev,vesa dri: iris gpu: i915 display-ID: :0 screens: 1
  API: EGL v: 1.5 hw: drv: intel iris platforms: device: 0 drv: iris device: 1 drv: swrast gbm:
    drv: iris surfaceless: drv: iris x11: drv: iris inactive: wayland
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 25.2.8-0ubuntu0.24.04.2 glx-v: 1.4
    direct-render: yes renderer: Mesa Intel UHD Graphics (CML GT2) device-ID: 8086:9b41
  API: Vulkan v: 1.3.275 layers: 7 surfaces: xcb,xlib device: 0 type: integrated-gpu driver: N/A
    device-ID: 8086:9b41 device: 1 type: cpu driver: N/A device-ID: 10005:0000
```

### Models

https://huggingface.co/Lyaaaaaaaaaaaaaaa/emotion-english-distilroberta-melinna-Q4_K_M-GGUF
https://huggingface.co/Lyaaaaaaaaaaaaaaa/emotion-english-distilroberta-base-Q4_K_M-GGUF
https://huggingface.co/Lyaaaaaaaaaaaaaaa/emotion-english-Q4_K_M-GGUF
https://huggingface.co/mradermacher/youtube-sentiment-v2-GGUF

I did try on more models but they all have the same error.

### Problem description & steps to reproduce

I can't run multiple models

### CLI error

1. I run : `./llamacpp/llama-cli -m youtube-sentiment-v2.Q4_K_M.gguf -p "The meaning to life and the universe is"`
2. Notice error
You can change the model by any of the models I listed above.

I used [GGUF my repo]() to create my models.
I tried on 9 models and they all get the same errors.
1. Use gguf my repo on https://huggingface.co/Linna/emotion-english-distilroberta-melinna
2. run it
3. Notice issue


### Server issue

I tried to run the same models using the server instead of CLI. It loads but when I try to generated I get another error.

https://huggingface.co/Lyaaaaaaaaaaaaaaa/emotion-english-distilroberta-base-Q4_K_M-GGUF exemple

1. Run https://huggingface.co/Lyaaaaaaaaaaaaaaa/emotion-english-distilroberta-base-Q4_K_M-GGUF with llama-server
2. Connect to web interface
3. Try to generate anything
4. Notice error
`send_error: task id = 0, error: the current context does not logits computation. skipping`

### First Bad Commit

_No response_

### Relevant log output

### First command
`./llamacpp/llama-cli -m youtube-sentiment-v2.Q4_K_M.gguf -p "The meaning to life and the universe is"`
```
Loading model... -/home/runner/work/llama.cpp/llama.cpp/src/llama-context.cpp:2219: GGML_ASSERT(n_outputs_max <= cparams.n_outputs_max) failed
[New LWP 24796]
[New LWP 24795]
[New LWP 24794]
[New LWP 24793]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x0000752821d10813 in __GI___wait4 (pid=24797, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30	../sysdeps/unix/sysv/linux/wait4.c: Aucun fichier ou dossier de ce nom
#0  0x0000752821d10813 in __GI___wait4 (pid=24797, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x0000752821f51c3b in ggml_print_backtrace () from /path/to/llama/libggml-base.so.0
#2  0x0000752821f51dd2 in ggml_abort () from /path/to/llama/libggml-base.so.0
#3  0x0000752821558f16 in llama_context::output_reserve(int) () from /path/to/llama/libllama.so.0
#4  0x0000752821559dad in llama_context::encode(llama_batch const&) () from /path/to/llama/libllama.so.0
#5  0x000075282155eb60 in llama_decode () from /path/to/llama/libllama.so.0
#6  0x00007528219919b4 in common_init_from_params(common_params&, bool) () from /path/to/llama/libllama-common.so.0
#7  0x00007528225629a3 in server_context_impl::load_model(common_params&) () from /path/to/llama/libllama-cli-impl.so
#8  0x00007528224a7674 in llama_cli(int, char**) () from /path/to/llama/libllama-cli-impl.so
#9  0x0000752821c2a1ca in __libc_start_call_main (main=main@entry=0x5b34ef475260 <main>, argc=argc@entry=5, argv=argv@entry=0x7ffe93bbfb28) at ../sysdeps/nptl/libc_start_call_main.h:58
warning: 58	../sysdeps/nptl/libc_start_call_main.h: Aucun fichier ou dossier de ce nom
#10 0x0000752821c2a28b in __libc_start_main_impl (main=0x5b34ef475260 <main>, argc=5, argv=0x7ffe93bbfb28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe93bbfb18) at ../csu/libc-start.c:360
warning: 360	../csu/libc-start.c: Aucun fichier ou dossier de ce nom
#11 0x00005b34ef475295 in _start ()
\[Inferior 1 (process 24790) detached]
Abandon (core dumped)
```


### Second command
`./llamacpp/llama-completion -fit off -m models/emotions/youtube-sentiment-v2.Q4_K_M.gguf -p "The meaning to life and the universe is" --verbose`
```
0.00.055.533 I llama_completion: llama backend init
0.00.055.542 I llama_completion: load the model and apply lora adapter, if any
0.00.062.445 I llama_model_loader: loaded meta data with 38 key-value pairs and 104 tensors from models/emotions/youtube-sentiment-v2.Q4_K_M.gguf (version GGUF V3 (latest))
0.00.062.514 I llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
0.00.062.534 I llama_model_loader: - kv   0:                       general.architecture str              = bert
0.00.062.536 I llama_model_loader: - kv   1:                               general.type str              = model
0.00.062.538 I llama_model_loader: - kv   2:                               general.name str              = Youtube Sentiment v2
0.00.062.540 I llama_model_loader: - kv   3:                            general.version str              = v2
0.00.062.541 I llama_model_loader: - kv   4:                           general.basename str              = youtube-sentiment
0.00.062.541 I llama_model_loader: - kv   5:                         general.size_label str              = 67M
0.00.062.594 I llama_model_loader: - kv   6:                               general.tags arr[str,5]       = ["distilbert", "emotion", "youtube", ...
0.00.062.624 I llama_model_loader: - kv   7:          bert.attention.layer_norm_epsilon f32              = 0,000000
0.00.062.630 I llama_model_loader: - kv   8:                           bert.block_count u32              = 6
0.00.062.633 I llama_model_loader: - kv   9:                        bert.context_length u32              = 512
0.00.062.634 I llama_model_loader: - kv  10:                      bert.embedding_length u32              = 768
0.00.062.635 I llama_model_loader: - kv  11:                   bert.feed_forward_length u32              = 3072
0.00.062.636 I llama_model_loader: - kv  12:                  bert.attention.head_count u32              = 12
0.00.062.637 I llama_model_loader: - kv  13:                      bert.attention.causal bool             = false
0.00.062.648 I llama_model_loader: - kv  14:              bert.classifier.output_labels arr[str,7]       = ["LABEL_0", "LABEL_1", "LABEL_2", "LA...
0.00.062.649 I llama_model_loader: - kv  15:            tokenizer.ggml.token_type_count u32              = 1
0.00.062.650 I llama_model_loader: - kv  16:                       tokenizer.ggml.model str              = bert
0.00.062.652 I llama_model_loader: - kv  17:                         tokenizer.ggml.pre str              = jina-v2-en
0.00.067.708 I llama_model_loader: - kv  18:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
0.00.068.881 I llama_model_loader: - kv  19:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
0.00.068.885 I llama_model_loader: - kv  20:                tokenizer.ggml.bos_token_id u32              = 101
0.00.068.886 I llama_model_loader: - kv  21:                tokenizer.ggml.eos_token_id u32              = 102
0.00.068.887 I llama_model_loader: - kv  22:            tokenizer.ggml.unknown_token_id u32              = 100
0.00.068.888 I llama_model_loader: - kv  23:          tokenizer.ggml.seperator_token_id u32              = 102
0.00.068.890 I llama_model_loader: - kv  24:            tokenizer.ggml.padding_token_id u32              = 0
0.00.068.891 I llama_model_loader: - kv  25:               tokenizer.ggml.mask_token_id u32              = 103
0.00.068.892 I llama_model_loader: - kv  26:               tokenizer.ggml.add_bos_token bool             = true
0.00.068.893 I llama_model_loader: - kv  27:               tokenizer.ggml.add_eos_token bool             = true
0.00.068.894 I llama_model_loader: - kv  28:               tokenizer.ggml.add_sep_token bool             = false
0.00.068.895 I llama_model_loader: - kv  29:               general.quantization_version u32              = 2
0.00.068.896 I llama_model_loader: - kv  30:                          general.file_type u32              = 15
0.00.068.898 I llama_model_loader: - kv  31:                                general.url str              = https://huggingface.co/mradermacher/y...
0.00.068.899 I llama_model_loader: - kv  32:              mradermacher.quantize_version str              = 2
0.00.068.900 I llama_model_loader: - kv  33:                  mradermacher.quantized_by str              = mradermacher
0.00.068.901 I llama_model_loader: - kv  34:                  mradermacher.quantized_at str              = 2025-06-22T20:09:52+02:00
0.00.068.902 I llama_model_loader: - kv  35:                  mradermacher.quantized_on str              = leia
0.00.068.904 I llama_model_loader: - kv  36:                         general.source.url str              = https://huggingface.co/Anuj5504/youtu...
0.00.068.905 I llama_model_loader: - kv  37:                  mradermacher.convert_type str              = hf
0.00.068.907 I llama_model_loader: - type  f32:   65 tensors
0.00.068.909 I llama_model_loader: - type q4_K:   34 tensors
0.00.068.909 I llama_model_loader: - type q6_K:    5 tensors
0.00.068.911 I print_info: file format = GGUF V3 (latest)
0.00.068.912 I print_info: file type   = Q4_K - Medium
0.00.068.916 I print_info: file size   = 44,63 MiB (5,59 BPW) 
0.00.069.159 I llama_prepare_model_devices: using device Vulkan0 (Intel(R) UHD Graphics (CML GT2)) (0000:00:02.0) - 9112 MiB free
0.00.074.469 D init_tokenizer: initializing tokenizer for type 3
0.00.088.063 I load: 0 unused tokens
0.00.091.849 D load: control token:    101 '[CLS]' is not marked as EOG
0.00.092.343 D load: control token:    103 '[MASK]' is not marked as EOG
0.00.092.949 D load: control token:    100 '[UNK]' is not marked as EOG
0.00.093.183 D load: control token:    102 '[SEP]' is not marked as EOG
0.00.098.820 W load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
0.00.098.822 I load: printing all EOG tokens:
0.00.098.823 I load:   - 0 ('[PAD]')
0.00.098.824 I load:   - 102 ('[SEP]')
0.00.098.902 I load: special tokens cache size = 5
0.00.102.131 I load: token to piece cache size = 0,2032 MB
0.00.102.146 I print_info: arch                  = bert
0.00.102.146 I print_info: vocab_only            = 0
0.00.102.147 I print_info: no_alloc              = 0
0.00.102.147 I print_info: n_ctx_train           = 512
0.00.102.148 I print_info: n_embd_inp            = 768
0.00.102.149 I print_info: n_embd                = 768
0.00.102.149 I print_info: n_embd_out            = 768
0.00.102.150 I print_info: n_layer               = 6
0.00.102.150 I print_info: n_layer_all           = 6
0.00.102.160 I print_info: n_head                = 12
0.00.102.161 I print_info: n_head_kv             = 12
0.00.102.161 I print_info: n_rot                 = 64
0.00.102.162 I print_info: n_swa                 = 0
0.00.102.162 I print_info: is_swa_any            = 0
0.00.102.163 I print_info: n_embd_head_k         = 64
0.00.102.163 I print_info: n_embd_head_v         = 64
0.00.102.164 I print_info: n_gqa                 = 1
0.00.102.167 I print_info: n_embd_k_gqa          = 768
0.00.102.169 I print_info: n_embd_v_gqa          = 768
0.00.102.171 I print_info: f_norm_eps            = 1,0e-12
0.00.102.172 I print_info: f_norm_rms_eps        = 0,0e+00
0.00.102.173 I print_info: f_clamp_kqv           = 0,0e+00
0.00.102.173 I print_info: f_max_alibi_bias      = 0,0e+00
0.00.102.174 I print_info: f_logit_scale         = 0,0e+00
0.00.102.174 I print_info: f_attn_scale          = 0,0e+00
0.00.102.174 I print_info: f_attn_value_scale    = 0,0000
0.00.102.176 I print_info: n_ff                  = 3072
0.00.102.176 I print_info: n_expert              = 0
0.00.102.176 I print_info: n_expert_used         = 0
0.00.102.176 I print_info: n_expert_groups       = 0
0.00.102.177 I print_info: n_group_used          = 0
0.00.102.177 I print_info: causal attn           = 0
0.00.102.177 I print_info: pooling type          = -1
0.00.102.177 I print_info: rope type             = 2
0.00.102.178 I print_info: rope scaling          = linear
0.00.102.179 I print_info: freq_base_train       = 10000,0
0.00.102.180 I print_info: freq_scale_train      = 1
0.00.102.180 I print_info: n_ctx_orig_yarn       = 512
0.00.102.181 I print_info: rope_yarn_log_mul     = 0,0000
0.00.102.181 I print_info: rope_finetuned        = unknown
0.00.102.182 I print_info: n_cls_out             = 7
0.00.102.182 I print_info: cls_label[ 0]         = LABEL_0
0.00.102.183 I print_info: cls_label[ 1]         = LABEL_1
0.00.102.183 I print_info: cls_label[ 2]         = LABEL_2
0.00.102.184 I print_info: cls_label[ 3]         = LABEL_3
0.00.102.184 I print_info: cls_label[ 4]         = LABEL_4
0.00.102.185 I print_info: cls_label[ 5]         = LABEL_5
0.00.102.185 I print_info: cls_label[ 6]         = LABEL_6
0.00.102.186 I print_info: model type            = 22M
0.00.102.188 I print_info: model params          = 66,96 M
0.00.102.188 I print_info: general.name          = Youtube Sentiment v2
0.00.102.191 I print_info: vocab type            = WPM
0.00.102.193 I print_info: n_vocab               = 30522
0.00.102.193 I print_info: n_merges              = 0
0.00.102.193 I print_info: BOS token             = 101 '[CLS]'
0.00.102.194 I print_info: EOS token             = 102 '[SEP]'
0.00.102.195 I print_info: UNK token             = 100 '[UNK]'
0.00.102.195 I print_info: SEP token             = 102 '[SEP]'
0.00.102.195 I print_info: PAD token             = 0 '[PAD]'
0.00.102.196 I print_info: MASK token            = 103 '[MASK]'
0.00.102.196 I print_info: LF token              = 0 '[PAD]'
0.00.102.197 I print_info: FIM PAD token         = 0 '[PAD]'
0.00.102.197 I print_info: EOG token             = 0 '[PAD]'
0.00.102.198 I print_info: EOG token             = 102 '[SEP]'
0.00.102.198 I print_info: max token length      = 21
0.00.102.200 I load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
0.00.106.806 D load_tensors: layer   0 assigned to device Vulkan0, is_swa = 0
0.00.106.814 D load_tensors: layer   1 assigned to device Vulkan0, is_swa = 0
0.00.106.815 D load_tensors: layer   2 assigned to device Vulkan0, is_swa = 0
0.00.106.815 D load_tensors: layer   3 assigned to device Vulkan0, is_swa = 0
0.00.106.816 D load_tensors: layer   4 assigned to device Vulkan0, is_swa = 0
0.00.106.816 D load_tensors: layer   5 assigned to device Vulkan0, is_swa = 0
0.00.106.817 D load_tensors: layer   6 assigned to device Vulkan0, is_swa = 0
0.00.106.886 D create_tensor: loading tensor token_embd.weight
0.00.106.916 D create_tensor: loading tensor position_embd.weight
0.00.106.938 D create_tensor: loading tensor cls.weight
0.00.106.947 D create_tensor: loading tensor cls.bias
0.00.106.952 D create_tensor: loading tensor cls.output.weight
0.00.106.956 D create_tensor: loading tensor cls.output.bias
0.00.106.963 D create_tensor: loading tensor token_embd_norm.weight
0.00.106.968 D create_tensor: loading tensor token_embd_norm.bias
0.00.106.981 D create_tensor: loading tensor blk.0.attn_q.weight
0.00.106.988 D create_tensor: loading tensor blk.0.attn_k.weight
0.00.106.996 D create_tensor: loading tensor blk.0.attn_v.weight
0.00.107.005 D create_tensor: loading tensor blk.0.attn_q.bias
0.00.107.013 D create_tensor: loading tensor blk.0.attn_k.bias
0.00.107.023 D create_tensor: loading tensor blk.0.attn_v.bias
0.00.107.033 D create_tensor: loading tensor blk.0.attn_output.weight
0.00.107.043 D create_tensor: loading tensor blk.0.attn_output.bias
0.00.107.054 D create_tensor: loading tensor blk.0.attn_output_norm.weight
0.00.107.064 D create_tensor: loading tensor blk.0.attn_output_norm.bias
0.00.107.075 D create_tensor: loading tensor blk.0.ffn_up.weight
0.00.107.085 D create_tensor: loading tensor blk.0.ffn_up.bias
0.00.107.095 D create_tensor: loading tensor blk.0.ffn_down.weight
0.00.107.105 D create_tensor: loading tensor blk.0.ffn_down.bias
0.00.107.113 D create_tensor: loading tensor blk.0.layer_output_norm.weight
0.00.107.121 D create_tensor: loading tensor blk.0.layer_output_norm.bias
0.00.107.139 D create_tensor: loading tensor blk.1.attn_q.weight
0.00.107.151 D create_tensor: loading tensor blk.1.attn_k.weight
0.00.107.160 D create_tensor: loading tensor blk.1.attn_v.weight
0.00.107.168 D create_tensor: loading tensor blk.1.attn_q.bias
0.00.107.177 D create_tensor: loading tensor blk.1.attn_k.bias
0.00.107.186 D create_tensor: loading tensor blk.1.attn_v.bias
0.00.107.196 D create_tensor: loading tensor blk.1.attn_output.weight
0.00.107.206 D create_tensor: loading tensor blk.1.attn_output.bias
0.00.107.217 D create_tensor: loading tensor blk.1.attn_output_norm.weight
0.00.107.227 D create_tensor: loading tensor blk.1.attn_output_norm.bias
0.00.107.237 D create_tensor: loading tensor blk.1.ffn_up.weight
0.00.107.247 D create_tensor: loading tensor blk.1.ffn_up.bias
0.00.107.260 D create_tensor: loading tensor blk.1.ffn_down.weight
0.00.107.270 D create_tensor: loading tensor blk.1.ffn_down.bias
0.00.107.280 D create_tensor: loading tensor blk.1.layer_output_norm.weight
0.00.107.290 D create_tensor: loading tensor blk.1.layer_output_norm.bias
0.00.107.305 D create_tensor: loading tensor blk.2.attn_q.weight
0.00.107.315 D create_tensor: loading tensor blk.2.attn_k.weight
0.00.107.325 D create_tensor: loading tensor blk.2.attn_v.weight
0.00.107.335 D create_tensor: loading tensor blk.2.attn_q.bias
0.00.107.345 D create_tensor: loading tensor blk.2.attn_k.bias
0.00.107.356 D create_tensor: loading tensor blk.2.attn_v.bias
0.00.107.366 D create_tensor: loading tensor blk.2.attn_output.weight
0.00.107.396 D create_tensor: loading tensor blk.2.attn_output.bias
0.00.107.408 D create_tensor: loading tensor blk.2.attn_output_norm.weight
0.00.107.418 D create_tensor: loading tensor blk.2.attn_output_norm.bias
0.00.107.429 D create_tensor: loading tensor blk.2.ffn_up.weight
0.00.107.440 D create_tensor: loading tensor blk.2.ffn_up.bias
0.00.107.451 D create_tensor: loading tensor blk.2.ffn_down.weight
0.00.107.461 D create_tensor: loading tensor blk.2.ffn_down.bias
0.00.107.472 D create_tensor: loading tensor blk.2.layer_output_norm.weight
0.00.107.482 D create_tensor: loading tensor blk.2.layer_output_norm.bias
0.00.107.498 D create_tensor: loading tensor blk.3.attn_q.weight
0.00.107.508 D create_tensor: loading tensor blk.3.attn_k.weight
0.00.107.523 D create_tensor: loading tensor blk.3.attn_v.weight
0.00.107.534 D create_tensor: loading tensor blk.3.attn_q.bias
0.00.107.544 D create_tensor: loading tensor blk.3.attn_k.bias
0.00.107.555 D create_tensor: loading tensor blk.3.attn_v.bias
0.00.107.566 D create_tensor: loading tensor blk.3.attn_output.weight
0.00.107.578 D create_tensor: loading tensor blk.3.attn_output.bias
0.00.107.592 D create_tensor: loading tensor blk.3.attn_output_norm.weight
0.00.107.605 D create_tensor: loading tensor blk.3.attn_output_norm.bias
0.00.107.616 D create_tensor: loading tensor blk.3.ffn_up.weight
0.00.107.628 D create_tensor: loading tensor blk.3.ffn_up.bias
0.00.107.639 D create_tensor: loading tensor blk.3.ffn_down.weight
0.00.107.653 D create_tensor: loading tensor blk.3.ffn_down.bias
0.00.107.666 D create_tensor: loading tensor blk.3.layer_output_norm.weight
0.00.107.677 D create_tensor: loading tensor blk.3.layer_output_norm.bias
0.00.107.693 D create_tensor: loading tensor blk.4.attn_q.weight
0.00.107.705 D create_tensor: loading tensor blk.4.attn_k.weight
0.00.107.717 D create_tensor: loading tensor blk.4.attn_v.weight
0.00.107.729 D create_tensor: loading tensor blk.4.attn_q.bias
0.00.107.740 D create_tensor: loading tensor blk.4.attn_k.bias
0.00.107.752 D create_tensor: loading tensor blk.4.attn_v.bias
0.00.107.764 D create_tensor: loading tensor blk.4.attn_output.weight
0.00.107.777 D create_tensor: loading tensor blk.4.attn_output.bias
0.00.107.792 D create_tensor: loading tensor blk.4.attn_output_norm.weight
0.00.107.804 D create_tensor: loading tensor blk.4.attn_output_norm.bias
0.00.107.816 D create_tensor: loading tensor blk.4.ffn_up.weight
0.00.107.828 D create_tensor: loading tensor blk.4.ffn_up.bias
0.00.107.841 D create_tensor: loading tensor blk.4.ffn_down.weight
0.00.107.852 D create_tensor: loading tensor blk.4.ffn_down.bias
0.00.107.864 D create_tensor: loading tensor blk.4.layer_output_norm.weight
0.00.107.875 D create_tensor: loading tensor blk.4.layer_output_norm.bias
0.00.107.894 D create_tensor: loading tensor blk.5.attn_q.weight
0.00.107.906 D create_tensor: loading tensor blk.5.attn_k.weight
0.00.107.919 D create_tensor: loading tensor blk.5.attn_v.weight
0.00.107.935 D create_tensor: loading tensor blk.5.attn_q.bias
0.00.107.947 D create_tensor: loading tensor blk.5.attn_k.bias
0.00.107.959 D create_tensor: loading tensor blk.5.attn_v.bias
0.00.107.971 D create_tensor: loading tensor blk.5.attn_output.weight
0.00.107.984 D create_tensor: loading tensor blk.5.attn_output.bias
0.00.107.996 D create_tensor: loading tensor blk.5.attn_output_norm.weight
0.00.108.008 D create_tensor: loading tensor blk.5.attn_output_norm.bias
0.00.108.021 D create_tensor: loading tensor blk.5.ffn_up.weight
0.00.108.033 D create_tensor: loading tensor blk.5.ffn_up.bias
0.00.108.047 D create_tensor: loading tensor blk.5.ffn_down.weight
0.00.108.059 D create_tensor: loading tensor blk.5.ffn_down.bias
0.00.108.076 D create_tensor: loading tensor blk.5.layer_output_norm.weight
0.00.108.090 D create_tensor: loading tensor blk.5.layer_output_norm.bias
0.00.108.366 D done_getting_tensors: tensor 'token_embd.weight' (q6_K) (and 1 others) cannot be used with preferred buffer type Vulkan_Host, using CPU instead
0.00.112.302 I load_tensors: offloading output layer to GPU
0.00.112.307 I load_tensors: offloading 5 repeating layers to GPU
0.00.112.307 I load_tensors: offloaded 7/7 layers to GPU
0.00.112.313 I load_tensors:   CPU_Mapped model buffer size =    19,84 MiB
0.00.112.315 I load_tensors:      Vulkan0 model buffer size =    24,79 MiB
..................................
0.00.124.292 I common_init_result: added [PAD] logit bias = -inf
0.00.124.297 I common_init_result: added [SEP] logit bias = -inf
0.00.124.841 I llama_context: constructing llama_context
0.00.124.868 I llama_context: n_seq_max     = 1
0.00.124.868 I llama_context: n_ctx         = 512
0.00.124.869 I llama_context: n_ctx_seq     = 512
0.00.124.869 I llama_context: n_batch       = 2048
0.00.124.869 I llama_context: n_ubatch      = 512
0.00.124.870 I llama_context: causal_attn   = 0
0.00.124.872 I llama_context: flash_attn    = auto
0.00.124.873 I llama_context: kv_unified    = false
0.00.124.879 I llama_context: freq_base     = 10000,0
0.00.124.880 I llama_context: freq_scale    = 1
0.00.124.881 I llama_context: n_rs_seq      = 0
0.00.124.881 I llama_context: n_outputs_max = 2048
0.00.124.936 D set_abort_callback: call
0.00.125.129 I llama_context: Vulkan_Host  output buffer size =     0,12 MiB
0.00.125.133 D llama_context: enumerating backends
0.00.125.146 D llama_context: backend_ptrs.size() = 2
0.00.125.148 I sched_reserve: reserving ...
0.00.125.153 D sched_reserve: max_nodes = 1024
0.00.126.029 D sched_reserve: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
0.00.126.040 D graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
0.00.126.662 I sched_reserve: Flash Attention was auto, set to enabled
0.00.126.667 I sched_reserve: resolving fused Gated Delta Net support:
0.00.126.669 D graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
0.00.127.124 I sched_reserve: fused Gated Delta Net (autoregressive) enabled
0.00.127.128 D graph_reserve: reserving a graph for ubatch with n_tokens =   16, n_seqs =  1, n_outputs =   16
0.00.127.514 I sched_reserve: fused Gated Delta Net (chunked) enabled
0.00.127.520 D graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
0.00.128.261 D graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
0.00.129.005 D graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
0.00.129.416 I sched_reserve:    Vulkan0 compute buffer size =     9,50 MiB
0.00.129.419 I sched_reserve: Vulkan_Host compute buffer size =     5,01 MiB
0.00.129.419 I sched_reserve: graph nodes  = 200
0.00.129.420 I sched_reserve: graph splits = 2
0.00.129.422 I sched_reserve: reserve took 4,27 ms, sched copies = 1
0.00.129.499 D set_adapters_lora: adapters = (nil)
0.00.129.501 D adapters_lora_are_same: adapters = (nil)
0.00.129.502 I common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
0.00.129.511 D decode: cannot decode batches with this context (calling encode() instead)
0.00.160.311 I llama_completion: llama threadpool init, n_threads = 2
0.00.160.330 D attach_threadpool: call
0.00.160.334 I 
/home/runner/work/llama.cpp/llama.cpp/tools/completion/completion.cpp:271: GGML_ASSERT(!llama_vocab_get_add_eos(vocab)) failed
0.00.160.390 I system_info: n_threads = 2 (n_threads_batch = 2) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
0.00.160.392 I 
[New LWP 25382]
[New LWP 25381]
[New LWP 25380]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x000078969fd10813 in __GI___wait4 (pid=25383, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30	../sysdeps/unix/sysv/linux/wait4.c: Aucun fichier ou dossier de ce nom
#0  0x000078969fd10813 in __GI___wait4 (pid=25383, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x000078969fe3cc3b in ggml_print_backtrace () from /path/to/llama/libggml-base.so.0
#2  0x000078969fe3cdd2 in ggml_abort () from /path/to/llama/libggml-base.so.0
#3  0x000078969ff21726 in llama_completion(int, char**) () from /path/to/llama/libllama-completion-impl.so
#4  0x000078969fc2a1ca in __libc_start_call_main (main=main@entry=0x561f63059060 <main>, argc=argc@entry=8, argv=argv@entry=0x7ffdf61a06b8) at ../sysdeps/nptl/libc_start_call_main.h:58
warning: 58	../sysdeps/nptl/libc_start_call_main.h: Aucun fichier ou dossier de ce nom
#5  0x000078969fc2a28b in __libc_start_main_impl (main=0x561f63059060 <main>, argc=8, argv=0x7ffdf61a06b8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffdf61a06a8) at ../csu/libc-start.c:360
warning: 360	../csu/libc-start.c: Aucun fichier ou dossier de ce nom
#6  0x0000561f63059095 in _start ()
[Inferior 1 (process 25378) detached]
Abandon (core dumped)
```

### Server's issue
See file: [server_issue.txt](https://github.com/user-attachments/files/29284228/server_issue.txt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: GGML_ASSERT(n_outputs_max <= cparams.n_outputs_max) failed #24967

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

CLI error

Server issue

First Bad Commit

Relevant log output

First command

Second command

Server's issue

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Eval bug: GGML_ASSERT(n_outputs_max <= cparams.n_outputs_max) failed #24967

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

CLI error

Server issue

First Bad Commit

Relevant log output

First command

Second command

Server's issue

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions