I presume there is a minimum CPU requirement like needing AVX2, AVX-512, FP16C or something?
Could you document the minimum instruction set and extensions required.
root@1d1c4289f303:/llm-api# python app/main.py
2023-10-26 23:31:19,237 - INFO - llama - found an existing model /models/llama_601507219781/ggml-model-q4_0.bin
2023-10-26 23:31:19,237 - INFO - llama - setup done successfully for /models/llama_601507219781/ggml-model-q4_0.bin
Illegal instruction (core dumped)
root@1d1c4289f303:/llm-api#
--- modulename: llama, funcname: init
llama.py(289): self.verbose = verbose
llama.py(291): self.numa = numa
llama.py(292): if not Llama.__backend_initialized:
llama.py(293): if self.verbose:
llama.py(294): llama_cpp.llama_backend_init(self.numa)
--- modulename: llama_cpp, funcname: llama_backend_init
llama_cpp.py(475): return _lib.llama_backend_init(numa)
Illegal instruction (core dumped)
I assume this has CPU requirements.
ENV CMAKE_ARGS "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
OpenBLAS can be built for multiple targets with runtime detection of the target cpu by specifiying DYNAMIC_ARCH=1 in Makefile.rule, on the gmake command line or as -DDYNAMIC_ARCH=TRUE in cmake.
https://github.com/OpenMathLib/OpenBLAS/blob/develop/README.md
I presume there is a minimum CPU requirement like needing AVX2, AVX-512, FP16C or something?
Could you document the minimum instruction set and extensions required.
root@1d1c4289f303:/llm-api# python app/main.py
2023-10-26 23:31:19,237 - INFO - llama - found an existing model /models/llama_601507219781/ggml-model-q4_0.bin
2023-10-26 23:31:19,237 - INFO - llama - setup done successfully for /models/llama_601507219781/ggml-model-q4_0.bin
Illegal instruction (core dumped)
root@1d1c4289f303:/llm-api#
--- modulename: llama, funcname: init
llama.py(289): self.verbose = verbose
llama.py(291): self.numa = numa
llama.py(292): if not Llama.__backend_initialized:
llama.py(293): if self.verbose:
llama.py(294): llama_cpp.llama_backend_init(self.numa)
--- modulename: llama_cpp, funcname: llama_backend_init
llama_cpp.py(475): return _lib.llama_backend_init(numa)
Illegal instruction (core dumped)
I assume this has CPU requirements.
ENV CMAKE_ARGS "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
OpenBLAS can be built for multiple targets with runtime detection of the target cpu by specifiying DYNAMIC_ARCH=1 in Makefile.rule, on the gmake command line or as -DDYNAMIC_ARCH=TRUE in cmake.
https://github.com/OpenMathLib/OpenBLAS/blob/develop/README.md