System:
Kernel: 6.17.0-35-generic arch: x86_64 bits: 64 compiler: gcc v: 13.3.0 clocksource: tsc
Desktop: Cinnamon v: 6.6.7 tk: GTK v: 3.24.41 wm: Muffin v: 6.6.3 vt: 7 dm: LightDM v: 1.30.0
Distro: Linux Mint 22.3 Zena base: Ubuntu 24.04 noble
CPU:
Info: dual core model: Intel Core i3-10110U bits: 64 type: MT MCP smt: enabled
arch: Comet/Whiskey Lake note: check rev: C cache: L1: 128 KiB L2: 512 KiB L3: 4 MiB
Speed (MHz): avg: 3100 min/max: 400/4100 cores: 1: 3100 2: 3100 3: 3100 4: 3100 bogomips: 20799
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
Device-1: Intel CometLake-U GT2 [UHD Graphics] vendor: Dell driver: i915 v: kernel arch: Gen-9.5
ports: active: HDMI-A-1 off: eDP-1 empty: DP-1,HDMI-A-2 bus-ID: 00:02.0 chip-ID: 8086:9b41
class-ID: 0300
Display: x11 server: X.Org v: 21.1.11 with: Xwayland v: 23.2.6 driver: X: loaded: modesetting
unloaded: fbdev,vesa dri: iris gpu: i915 display-ID: :0 screens: 1
API: EGL v: 1.5 hw: drv: intel iris platforms: device: 0 drv: iris device: 1 drv: swrast gbm:
drv: iris surfaceless: drv: iris x11: drv: iris inactive: wayland
API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 25.2.8-0ubuntu0.24.04.2 glx-v: 1.4
direct-render: yes renderer: Mesa Intel UHD Graphics (CML GT2) device-ID: 8086:9b41
API: Vulkan v: 1.3.275 layers: 7 surfaces: xcb,xlib device: 0 type: integrated-gpu driver: N/A
device-ID: 8086:9b41 device: 1 type: cpu driver: N/A device-ID: 10005:0000
I did try on more models but they all have the same error.
I tried to run the same models using the server instead of CLI. It loads but when I try to generated I get another error.
Loading model... -/home/runner/work/llama.cpp/llama.cpp/src/llama-context.cpp:2219: GGML_ASSERT(n_outputs_max <= cparams.n_outputs_max) failed
[New LWP 24796]
[New LWP 24795]
[New LWP 24794]
[New LWP 24793]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x0000752821d10813 in __GI___wait4 (pid=24797, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30 ../sysdeps/unix/sysv/linux/wait4.c: Aucun fichier ou dossier de ce nom
#0 0x0000752821d10813 in __GI___wait4 (pid=24797, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30 in ../sysdeps/unix/sysv/linux/wait4.c
#1 0x0000752821f51c3b in ggml_print_backtrace () from /path/to/llama/libggml-base.so.0
#2 0x0000752821f51dd2 in ggml_abort () from /path/to/llama/libggml-base.so.0
#3 0x0000752821558f16 in llama_context::output_reserve(int) () from /path/to/llama/libllama.so.0
#4 0x0000752821559dad in llama_context::encode(llama_batch const&) () from /path/to/llama/libllama.so.0
#5 0x000075282155eb60 in llama_decode () from /path/to/llama/libllama.so.0
#6 0x00007528219919b4 in common_init_from_params(common_params&, bool) () from /path/to/llama/libllama-common.so.0
#7 0x00007528225629a3 in server_context_impl::load_model(common_params&) () from /path/to/llama/libllama-cli-impl.so
#8 0x00007528224a7674 in llama_cli(int, char**) () from /path/to/llama/libllama-cli-impl.so
#9 0x0000752821c2a1ca in __libc_start_call_main (main=main@entry=0x5b34ef475260 <main>, argc=argc@entry=5, argv=argv@entry=0x7ffe93bbfb28) at ../sysdeps/nptl/libc_start_call_main.h:58
warning: 58 ../sysdeps/nptl/libc_start_call_main.h: Aucun fichier ou dossier de ce nom
#10 0x0000752821c2a28b in __libc_start_main_impl (main=0x5b34ef475260 <main>, argc=5, argv=0x7ffe93bbfb28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe93bbfb18) at ../csu/libc-start.c:360
warning: 360 ../csu/libc-start.c: Aucun fichier ou dossier de ce nom
#11 0x00005b34ef475295 in _start ()
\[Inferior 1 (process 24790) detached]
Abandon (core dumped)
0.00.055.533 I llama_completion: llama backend init
0.00.055.542 I llama_completion: load the model and apply lora adapter, if any
0.00.062.445 I llama_model_loader: loaded meta data with 38 key-value pairs and 104 tensors from models/emotions/youtube-sentiment-v2.Q4_K_M.gguf (version GGUF V3 (latest))
0.00.062.514 I llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
0.00.062.534 I llama_model_loader: - kv 0: general.architecture str = bert
0.00.062.536 I llama_model_loader: - kv 1: general.type str = model
0.00.062.538 I llama_model_loader: - kv 2: general.name str = Youtube Sentiment v2
0.00.062.540 I llama_model_loader: - kv 3: general.version str = v2
0.00.062.541 I llama_model_loader: - kv 4: general.basename str = youtube-sentiment
0.00.062.541 I llama_model_loader: - kv 5: general.size_label str = 67M
0.00.062.594 I llama_model_loader: - kv 6: general.tags arr[str,5] = ["distilbert", "emotion", "youtube", ...
0.00.062.624 I llama_model_loader: - kv 7: bert.attention.layer_norm_epsilon f32 = 0,000000
0.00.062.630 I llama_model_loader: - kv 8: bert.block_count u32 = 6
0.00.062.633 I llama_model_loader: - kv 9: bert.context_length u32 = 512
0.00.062.634 I llama_model_loader: - kv 10: bert.embedding_length u32 = 768
0.00.062.635 I llama_model_loader: - kv 11: bert.feed_forward_length u32 = 3072
0.00.062.636 I llama_model_loader: - kv 12: bert.attention.head_count u32 = 12
0.00.062.637 I llama_model_loader: - kv 13: bert.attention.causal bool = false
0.00.062.648 I llama_model_loader: - kv 14: bert.classifier.output_labels arr[str,7] = ["LABEL_0", "LABEL_1", "LABEL_2", "LA...
0.00.062.649 I llama_model_loader: - kv 15: tokenizer.ggml.token_type_count u32 = 1
0.00.062.650 I llama_model_loader: - kv 16: tokenizer.ggml.model str = bert
0.00.062.652 I llama_model_loader: - kv 17: tokenizer.ggml.pre str = jina-v2-en
0.00.067.708 I llama_model_loader: - kv 18: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "...
0.00.068.881 I llama_model_loader: - kv 19: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
0.00.068.885 I llama_model_loader: - kv 20: tokenizer.ggml.bos_token_id u32 = 101
0.00.068.886 I llama_model_loader: - kv 21: tokenizer.ggml.eos_token_id u32 = 102
0.00.068.887 I llama_model_loader: - kv 22: tokenizer.ggml.unknown_token_id u32 = 100
0.00.068.888 I llama_model_loader: - kv 23: tokenizer.ggml.seperator_token_id u32 = 102
0.00.068.890 I llama_model_loader: - kv 24: tokenizer.ggml.padding_token_id u32 = 0
0.00.068.891 I llama_model_loader: - kv 25: tokenizer.ggml.mask_token_id u32 = 103
0.00.068.892 I llama_model_loader: - kv 26: tokenizer.ggml.add_bos_token bool = true
0.00.068.893 I llama_model_loader: - kv 27: tokenizer.ggml.add_eos_token bool = true
0.00.068.894 I llama_model_loader: - kv 28: tokenizer.ggml.add_sep_token bool = false
0.00.068.895 I llama_model_loader: - kv 29: general.quantization_version u32 = 2
0.00.068.896 I llama_model_loader: - kv 30: general.file_type u32 = 15
0.00.068.898 I llama_model_loader: - kv 31: general.url str = https://huggingface.co/mradermacher/y...
0.00.068.899 I llama_model_loader: - kv 32: mradermacher.quantize_version str = 2
0.00.068.900 I llama_model_loader: - kv 33: mradermacher.quantized_by str = mradermacher
0.00.068.901 I llama_model_loader: - kv 34: mradermacher.quantized_at str = 2025-06-22T20:09:52+02:00
0.00.068.902 I llama_model_loader: - kv 35: mradermacher.quantized_on str = leia
0.00.068.904 I llama_model_loader: - kv 36: general.source.url str = https://huggingface.co/Anuj5504/youtu...
0.00.068.905 I llama_model_loader: - kv 37: mradermacher.convert_type str = hf
0.00.068.907 I llama_model_loader: - type f32: 65 tensors
0.00.068.909 I llama_model_loader: - type q4_K: 34 tensors
0.00.068.909 I llama_model_loader: - type q6_K: 5 tensors
0.00.068.911 I print_info: file format = GGUF V3 (latest)
0.00.068.912 I print_info: file type = Q4_K - Medium
0.00.068.916 I print_info: file size = 44,63 MiB (5,59 BPW)
0.00.069.159 I llama_prepare_model_devices: using device Vulkan0 (Intel(R) UHD Graphics (CML GT2)) (0000:00:02.0) - 9112 MiB free
0.00.074.469 D init_tokenizer: initializing tokenizer for type 3
0.00.088.063 I load: 0 unused tokens
0.00.091.849 D load: control token: 101 '[CLS]' is not marked as EOG
0.00.092.343 D load: control token: 103 '[MASK]' is not marked as EOG
0.00.092.949 D load: control token: 100 '[UNK]' is not marked as EOG
0.00.093.183 D load: control token: 102 '[SEP]' is not marked as EOG
0.00.098.820 W load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
0.00.098.822 I load: printing all EOG tokens:
0.00.098.823 I load: - 0 ('[PAD]')
0.00.098.824 I load: - 102 ('[SEP]')
0.00.098.902 I load: special tokens cache size = 5
0.00.102.131 I load: token to piece cache size = 0,2032 MB
0.00.102.146 I print_info: arch = bert
0.00.102.146 I print_info: vocab_only = 0
0.00.102.147 I print_info: no_alloc = 0
0.00.102.147 I print_info: n_ctx_train = 512
0.00.102.148 I print_info: n_embd_inp = 768
0.00.102.149 I print_info: n_embd = 768
0.00.102.149 I print_info: n_embd_out = 768
0.00.102.150 I print_info: n_layer = 6
0.00.102.150 I print_info: n_layer_all = 6
0.00.102.160 I print_info: n_head = 12
0.00.102.161 I print_info: n_head_kv = 12
0.00.102.161 I print_info: n_rot = 64
0.00.102.162 I print_info: n_swa = 0
0.00.102.162 I print_info: is_swa_any = 0
0.00.102.163 I print_info: n_embd_head_k = 64
0.00.102.163 I print_info: n_embd_head_v = 64
0.00.102.164 I print_info: n_gqa = 1
0.00.102.167 I print_info: n_embd_k_gqa = 768
0.00.102.169 I print_info: n_embd_v_gqa = 768
0.00.102.171 I print_info: f_norm_eps = 1,0e-12
0.00.102.172 I print_info: f_norm_rms_eps = 0,0e+00
0.00.102.173 I print_info: f_clamp_kqv = 0,0e+00
0.00.102.173 I print_info: f_max_alibi_bias = 0,0e+00
0.00.102.174 I print_info: f_logit_scale = 0,0e+00
0.00.102.174 I print_info: f_attn_scale = 0,0e+00
0.00.102.174 I print_info: f_attn_value_scale = 0,0000
0.00.102.176 I print_info: n_ff = 3072
0.00.102.176 I print_info: n_expert = 0
0.00.102.176 I print_info: n_expert_used = 0
0.00.102.176 I print_info: n_expert_groups = 0
0.00.102.177 I print_info: n_group_used = 0
0.00.102.177 I print_info: causal attn = 0
0.00.102.177 I print_info: pooling type = -1
0.00.102.177 I print_info: rope type = 2
0.00.102.178 I print_info: rope scaling = linear
0.00.102.179 I print_info: freq_base_train = 10000,0
0.00.102.180 I print_info: freq_scale_train = 1
0.00.102.180 I print_info: n_ctx_orig_yarn = 512
0.00.102.181 I print_info: rope_yarn_log_mul = 0,0000
0.00.102.181 I print_info: rope_finetuned = unknown
0.00.102.182 I print_info: n_cls_out = 7
0.00.102.182 I print_info: cls_label[ 0] = LABEL_0
0.00.102.183 I print_info: cls_label[ 1] = LABEL_1
0.00.102.183 I print_info: cls_label[ 2] = LABEL_2
0.00.102.184 I print_info: cls_label[ 3] = LABEL_3
0.00.102.184 I print_info: cls_label[ 4] = LABEL_4
0.00.102.185 I print_info: cls_label[ 5] = LABEL_5
0.00.102.185 I print_info: cls_label[ 6] = LABEL_6
0.00.102.186 I print_info: model type = 22M
0.00.102.188 I print_info: model params = 66,96 M
0.00.102.188 I print_info: general.name = Youtube Sentiment v2
0.00.102.191 I print_info: vocab type = WPM
0.00.102.193 I print_info: n_vocab = 30522
0.00.102.193 I print_info: n_merges = 0
0.00.102.193 I print_info: BOS token = 101 '[CLS]'
0.00.102.194 I print_info: EOS token = 102 '[SEP]'
0.00.102.195 I print_info: UNK token = 100 '[UNK]'
0.00.102.195 I print_info: SEP token = 102 '[SEP]'
0.00.102.195 I print_info: PAD token = 0 '[PAD]'
0.00.102.196 I print_info: MASK token = 103 '[MASK]'
0.00.102.196 I print_info: LF token = 0 '[PAD]'
0.00.102.197 I print_info: FIM PAD token = 0 '[PAD]'
0.00.102.197 I print_info: EOG token = 0 '[PAD]'
0.00.102.198 I print_info: EOG token = 102 '[SEP]'
0.00.102.198 I print_info: max token length = 21
0.00.102.200 I load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
0.00.106.806 D load_tensors: layer 0 assigned to device Vulkan0, is_swa = 0
0.00.106.814 D load_tensors: layer 1 assigned to device Vulkan0, is_swa = 0
0.00.106.815 D load_tensors: layer 2 assigned to device Vulkan0, is_swa = 0
0.00.106.815 D load_tensors: layer 3 assigned to device Vulkan0, is_swa = 0
0.00.106.816 D load_tensors: layer 4 assigned to device Vulkan0, is_swa = 0
0.00.106.816 D load_tensors: layer 5 assigned to device Vulkan0, is_swa = 0
0.00.106.817 D load_tensors: layer 6 assigned to device Vulkan0, is_swa = 0
0.00.106.886 D create_tensor: loading tensor token_embd.weight
0.00.106.916 D create_tensor: loading tensor position_embd.weight
0.00.106.938 D create_tensor: loading tensor cls.weight
0.00.106.947 D create_tensor: loading tensor cls.bias
0.00.106.952 D create_tensor: loading tensor cls.output.weight
0.00.106.956 D create_tensor: loading tensor cls.output.bias
0.00.106.963 D create_tensor: loading tensor token_embd_norm.weight
0.00.106.968 D create_tensor: loading tensor token_embd_norm.bias
0.00.106.981 D create_tensor: loading tensor blk.0.attn_q.weight
0.00.106.988 D create_tensor: loading tensor blk.0.attn_k.weight
0.00.106.996 D create_tensor: loading tensor blk.0.attn_v.weight
0.00.107.005 D create_tensor: loading tensor blk.0.attn_q.bias
0.00.107.013 D create_tensor: loading tensor blk.0.attn_k.bias
0.00.107.023 D create_tensor: loading tensor blk.0.attn_v.bias
0.00.107.033 D create_tensor: loading tensor blk.0.attn_output.weight
0.00.107.043 D create_tensor: loading tensor blk.0.attn_output.bias
0.00.107.054 D create_tensor: loading tensor blk.0.attn_output_norm.weight
0.00.107.064 D create_tensor: loading tensor blk.0.attn_output_norm.bias
0.00.107.075 D create_tensor: loading tensor blk.0.ffn_up.weight
0.00.107.085 D create_tensor: loading tensor blk.0.ffn_up.bias
0.00.107.095 D create_tensor: loading tensor blk.0.ffn_down.weight
0.00.107.105 D create_tensor: loading tensor blk.0.ffn_down.bias
0.00.107.113 D create_tensor: loading tensor blk.0.layer_output_norm.weight
0.00.107.121 D create_tensor: loading tensor blk.0.layer_output_norm.bias
0.00.107.139 D create_tensor: loading tensor blk.1.attn_q.weight
0.00.107.151 D create_tensor: loading tensor blk.1.attn_k.weight
0.00.107.160 D create_tensor: loading tensor blk.1.attn_v.weight
0.00.107.168 D create_tensor: loading tensor blk.1.attn_q.bias
0.00.107.177 D create_tensor: loading tensor blk.1.attn_k.bias
0.00.107.186 D create_tensor: loading tensor blk.1.attn_v.bias
0.00.107.196 D create_tensor: loading tensor blk.1.attn_output.weight
0.00.107.206 D create_tensor: loading tensor blk.1.attn_output.bias
0.00.107.217 D create_tensor: loading tensor blk.1.attn_output_norm.weight
0.00.107.227 D create_tensor: loading tensor blk.1.attn_output_norm.bias
0.00.107.237 D create_tensor: loading tensor blk.1.ffn_up.weight
0.00.107.247 D create_tensor: loading tensor blk.1.ffn_up.bias
0.00.107.260 D create_tensor: loading tensor blk.1.ffn_down.weight
0.00.107.270 D create_tensor: loading tensor blk.1.ffn_down.bias
0.00.107.280 D create_tensor: loading tensor blk.1.layer_output_norm.weight
0.00.107.290 D create_tensor: loading tensor blk.1.layer_output_norm.bias
0.00.107.305 D create_tensor: loading tensor blk.2.attn_q.weight
0.00.107.315 D create_tensor: loading tensor blk.2.attn_k.weight
0.00.107.325 D create_tensor: loading tensor blk.2.attn_v.weight
0.00.107.335 D create_tensor: loading tensor blk.2.attn_q.bias
0.00.107.345 D create_tensor: loading tensor blk.2.attn_k.bias
0.00.107.356 D create_tensor: loading tensor blk.2.attn_v.bias
0.00.107.366 D create_tensor: loading tensor blk.2.attn_output.weight
0.00.107.396 D create_tensor: loading tensor blk.2.attn_output.bias
0.00.107.408 D create_tensor: loading tensor blk.2.attn_output_norm.weight
0.00.107.418 D create_tensor: loading tensor blk.2.attn_output_norm.bias
0.00.107.429 D create_tensor: loading tensor blk.2.ffn_up.weight
0.00.107.440 D create_tensor: loading tensor blk.2.ffn_up.bias
0.00.107.451 D create_tensor: loading tensor blk.2.ffn_down.weight
0.00.107.461 D create_tensor: loading tensor blk.2.ffn_down.bias
0.00.107.472 D create_tensor: loading tensor blk.2.layer_output_norm.weight
0.00.107.482 D create_tensor: loading tensor blk.2.layer_output_norm.bias
0.00.107.498 D create_tensor: loading tensor blk.3.attn_q.weight
0.00.107.508 D create_tensor: loading tensor blk.3.attn_k.weight
0.00.107.523 D create_tensor: loading tensor blk.3.attn_v.weight
0.00.107.534 D create_tensor: loading tensor blk.3.attn_q.bias
0.00.107.544 D create_tensor: loading tensor blk.3.attn_k.bias
0.00.107.555 D create_tensor: loading tensor blk.3.attn_v.bias
0.00.107.566 D create_tensor: loading tensor blk.3.attn_output.weight
0.00.107.578 D create_tensor: loading tensor blk.3.attn_output.bias
0.00.107.592 D create_tensor: loading tensor blk.3.attn_output_norm.weight
0.00.107.605 D create_tensor: loading tensor blk.3.attn_output_norm.bias
0.00.107.616 D create_tensor: loading tensor blk.3.ffn_up.weight
0.00.107.628 D create_tensor: loading tensor blk.3.ffn_up.bias
0.00.107.639 D create_tensor: loading tensor blk.3.ffn_down.weight
0.00.107.653 D create_tensor: loading tensor blk.3.ffn_down.bias
0.00.107.666 D create_tensor: loading tensor blk.3.layer_output_norm.weight
0.00.107.677 D create_tensor: loading tensor blk.3.layer_output_norm.bias
0.00.107.693 D create_tensor: loading tensor blk.4.attn_q.weight
0.00.107.705 D create_tensor: loading tensor blk.4.attn_k.weight
0.00.107.717 D create_tensor: loading tensor blk.4.attn_v.weight
0.00.107.729 D create_tensor: loading tensor blk.4.attn_q.bias
0.00.107.740 D create_tensor: loading tensor blk.4.attn_k.bias
0.00.107.752 D create_tensor: loading tensor blk.4.attn_v.bias
0.00.107.764 D create_tensor: loading tensor blk.4.attn_output.weight
0.00.107.777 D create_tensor: loading tensor blk.4.attn_output.bias
0.00.107.792 D create_tensor: loading tensor blk.4.attn_output_norm.weight
0.00.107.804 D create_tensor: loading tensor blk.4.attn_output_norm.bias
0.00.107.816 D create_tensor: loading tensor blk.4.ffn_up.weight
0.00.107.828 D create_tensor: loading tensor blk.4.ffn_up.bias
0.00.107.841 D create_tensor: loading tensor blk.4.ffn_down.weight
0.00.107.852 D create_tensor: loading tensor blk.4.ffn_down.bias
0.00.107.864 D create_tensor: loading tensor blk.4.layer_output_norm.weight
0.00.107.875 D create_tensor: loading tensor blk.4.layer_output_norm.bias
0.00.107.894 D create_tensor: loading tensor blk.5.attn_q.weight
0.00.107.906 D create_tensor: loading tensor blk.5.attn_k.weight
0.00.107.919 D create_tensor: loading tensor blk.5.attn_v.weight
0.00.107.935 D create_tensor: loading tensor blk.5.attn_q.bias
0.00.107.947 D create_tensor: loading tensor blk.5.attn_k.bias
0.00.107.959 D create_tensor: loading tensor blk.5.attn_v.bias
0.00.107.971 D create_tensor: loading tensor blk.5.attn_output.weight
0.00.107.984 D create_tensor: loading tensor blk.5.attn_output.bias
0.00.107.996 D create_tensor: loading tensor blk.5.attn_output_norm.weight
0.00.108.008 D create_tensor: loading tensor blk.5.attn_output_norm.bias
0.00.108.021 D create_tensor: loading tensor blk.5.ffn_up.weight
0.00.108.033 D create_tensor: loading tensor blk.5.ffn_up.bias
0.00.108.047 D create_tensor: loading tensor blk.5.ffn_down.weight
0.00.108.059 D create_tensor: loading tensor blk.5.ffn_down.bias
0.00.108.076 D create_tensor: loading tensor blk.5.layer_output_norm.weight
0.00.108.090 D create_tensor: loading tensor blk.5.layer_output_norm.bias
0.00.108.366 D done_getting_tensors: tensor 'token_embd.weight' (q6_K) (and 1 others) cannot be used with preferred buffer type Vulkan_Host, using CPU instead
0.00.112.302 I load_tensors: offloading output layer to GPU
0.00.112.307 I load_tensors: offloading 5 repeating layers to GPU
0.00.112.307 I load_tensors: offloaded 7/7 layers to GPU
0.00.112.313 I load_tensors: CPU_Mapped model buffer size = 19,84 MiB
0.00.112.315 I load_tensors: Vulkan0 model buffer size = 24,79 MiB
..................................
0.00.124.292 I common_init_result: added [PAD] logit bias = -inf
0.00.124.297 I common_init_result: added [SEP] logit bias = -inf
0.00.124.841 I llama_context: constructing llama_context
0.00.124.868 I llama_context: n_seq_max = 1
0.00.124.868 I llama_context: n_ctx = 512
0.00.124.869 I llama_context: n_ctx_seq = 512
0.00.124.869 I llama_context: n_batch = 2048
0.00.124.869 I llama_context: n_ubatch = 512
0.00.124.870 I llama_context: causal_attn = 0
0.00.124.872 I llama_context: flash_attn = auto
0.00.124.873 I llama_context: kv_unified = false
0.00.124.879 I llama_context: freq_base = 10000,0
0.00.124.880 I llama_context: freq_scale = 1
0.00.124.881 I llama_context: n_rs_seq = 0
0.00.124.881 I llama_context: n_outputs_max = 2048
0.00.124.936 D set_abort_callback: call
0.00.125.129 I llama_context: Vulkan_Host output buffer size = 0,12 MiB
0.00.125.133 D llama_context: enumerating backends
0.00.125.146 D llama_context: backend_ptrs.size() = 2
0.00.125.148 I sched_reserve: reserving ...
0.00.125.153 D sched_reserve: max_nodes = 1024
0.00.126.029 D sched_reserve: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
0.00.126.040 D graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1
0.00.126.662 I sched_reserve: Flash Attention was auto, set to enabled
0.00.126.667 I sched_reserve: resolving fused Gated Delta Net support:
0.00.126.669 D graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1
0.00.127.124 I sched_reserve: fused Gated Delta Net (autoregressive) enabled
0.00.127.128 D graph_reserve: reserving a graph for ubatch with n_tokens = 16, n_seqs = 1, n_outputs = 16
0.00.127.514 I sched_reserve: fused Gated Delta Net (chunked) enabled
0.00.127.520 D graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512
0.00.128.261 D graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1
0.00.129.005 D graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512
0.00.129.416 I sched_reserve: Vulkan0 compute buffer size = 9,50 MiB
0.00.129.419 I sched_reserve: Vulkan_Host compute buffer size = 5,01 MiB
0.00.129.419 I sched_reserve: graph nodes = 200
0.00.129.420 I sched_reserve: graph splits = 2
0.00.129.422 I sched_reserve: reserve took 4,27 ms, sched copies = 1
0.00.129.499 D set_adapters_lora: adapters = (nil)
0.00.129.501 D adapters_lora_are_same: adapters = (nil)
0.00.129.502 I common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
0.00.129.511 D decode: cannot decode batches with this context (calling encode() instead)
0.00.160.311 I llama_completion: llama threadpool init, n_threads = 2
0.00.160.330 D attach_threadpool: call
0.00.160.334 I
/home/runner/work/llama.cpp/llama.cpp/tools/completion/completion.cpp:271: GGML_ASSERT(!llama_vocab_get_add_eos(vocab)) failed
0.00.160.390 I system_info: n_threads = 2 (n_threads_batch = 2) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
0.00.160.392 I
[New LWP 25382]
[New LWP 25381]
[New LWP 25380]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x000078969fd10813 in __GI___wait4 (pid=25383, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30 ../sysdeps/unix/sysv/linux/wait4.c: Aucun fichier ou dossier de ce nom
#0 0x000078969fd10813 in __GI___wait4 (pid=25383, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30 in ../sysdeps/unix/sysv/linux/wait4.c
#1 0x000078969fe3cc3b in ggml_print_backtrace () from /path/to/llama/libggml-base.so.0
#2 0x000078969fe3cdd2 in ggml_abort () from /path/to/llama/libggml-base.so.0
#3 0x000078969ff21726 in llama_completion(int, char**) () from /path/to/llama/libllama-completion-impl.so
#4 0x000078969fc2a1ca in __libc_start_call_main (main=main@entry=0x561f63059060 <main>, argc=argc@entry=8, argv=argv@entry=0x7ffdf61a06b8) at ../sysdeps/nptl/libc_start_call_main.h:58
warning: 58 ../sysdeps/nptl/libc_start_call_main.h: Aucun fichier ou dossier de ce nom
#5 0x000078969fc2a28b in __libc_start_main_impl (main=0x561f63059060 <main>, argc=8, argv=0x7ffdf61a06b8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffdf61a06a8) at ../csu/libc-start.c:360
warning: 360 ../csu/libc-start.c: Aucun fichier ou dossier de ce nom
#6 0x0000561f63059095 in _start ()
[Inferior 1 (process 25378) detached]
Abandon (core dumped)
Name and Version
version: 9776 (ac4105d)
built with GNU 11.4.0 for Linux x86_64
Operating systems
Linux
GGML backends
Vulkan
Hardware
Models
https://huggingface.co/Lyaaaaaaaaaaaaaaa/emotion-english-distilroberta-melinna-Q4_K_M-GGUF
https://huggingface.co/Lyaaaaaaaaaaaaaaa/emotion-english-distilroberta-base-Q4_K_M-GGUF
https://huggingface.co/Lyaaaaaaaaaaaaaaa/emotion-english-Q4_K_M-GGUF
https://huggingface.co/mradermacher/youtube-sentiment-v2-GGUF
I did try on more models but they all have the same error.
Problem description & steps to reproduce
I can't run multiple models
CLI error
./llamacpp/llama-cli -m youtube-sentiment-v2.Q4_K_M.gguf -p "The meaning to life and the universe is"You can change the model by any of the models I listed above.
I used GGUF my repo to create my models.
I tried on 9 models and they all get the same errors.
Server issue
I tried to run the same models using the server instead of CLI. It loads but when I try to generated I get another error.
https://huggingface.co/Lyaaaaaaaaaaaaaaa/emotion-english-distilroberta-base-Q4_K_M-GGUF exemple
send_error: task id = 0, error: the current context does not logits computation. skippingFirst Bad Commit
No response
Relevant log output
First command
./llamacpp/llama-cli -m youtube-sentiment-v2.Q4_K_M.gguf -p "The meaning to life and the universe is"Second command
./llamacpp/llama-completion -fit off -m models/emotions/youtube-sentiment-v2.Q4_K_M.gguf -p "The meaning to life and the universe is" --verboseServer's issue
See file: server_issue.txt