Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ The configuration `configs/config.lite.yaml` does not require any expert models

### Quick Start

First replace `openai.key` and `huggingface.token` in `server/configs/config.default.yaml` with **your personal OpenAI Key** and **your Hugging Face Token**, or put them in the environment variables `OPENAI_API_KEY` and `HUGGINGFACE_ACCESS_TOKEN` respectively. Then run the following commands:
First replace `openai.key` and `huggingface.token` in `server/configs/config.default.yaml` with **your personal OpenAI Key** and **your Hugging Face Token**, or put them in the environment variables `OPENAI_API_KEY` and `HUGGINGFACE_ACCESS_TOKEN` respectively. Alternatively, you can use [MiniMax](https://www.minimaxi.com/) as the LLM provider by running with `--config configs/config.minimax.yaml` (set your MiniMax API key first). Then run the following commands:

<span id="Server"></span>

Expand Down Expand Up @@ -182,7 +182,7 @@ Welcome to Jarvis! A collaborative system that consists of an LLM as the control

The server-side configuration file is `server/configs/config.default.yaml`, and some parameters are presented as follows:

+ `model`: LLM, currently supports `text-davinci-003`. We are working on integrating more open-source LLMs.
+ `model`: LLM, currently supports `text-davinci-003`, `gpt-4`, and [MiniMax](https://www.minimaxi.com/) models (`MiniMax-M2.7`, `MiniMax-M2.7-highspeed`, `MiniMax-M2.5`, `MiniMax-M2.5-highspeed`). We are working on integrating more open-source LLMs.
+ `inference_mode`: mode of inference endpoints
+ `local`: only use the local inference endpoints
+ `huggingface`: only use the Hugging Face Inference Endpoints **(free of local inference endpoints)**
Expand All @@ -192,6 +192,14 @@ The server-side configuration file is `server/configs/config.default.yaml`, and
+ `standard` (RAM>16GB, ControlNet + Standard Pipelines)
+ `full` (RAM>42GB, All registered models)

#### LLM Provider

Jarvis supports multiple LLM providers as the backbone controller. Configure the provider in the YAML config file:

+ **OpenAI** (default): Set `openai.api_key` in config or the `OPENAI_API_KEY` environment variable.
+ **Azure OpenAI**: Set `azure.api_key`, `azure.base_url`, `azure.deployment_name`, and `azure.api_version` in config.
+ **MiniMax**: Set `minimax.api_key` in config or the `MINIMAX_API_KEY` environment variable. Use `model: MiniMax-M2.7` and `use_completion: false`. A ready-to-use config is provided at `server/configs/config.minimax.yaml`. MiniMax models offer a 204K token context window. Get your API key at [MiniMax Platform](https://www.minimaxi.com/).

On a personal laptop, we recommend the configuration of `inference_mode: hybrid `and `local_deployment: minimal`. But the available models under this setting may be limited due to the instability of remote Hugging Face Inference Endpoints.

### NVIDIA Jetson Embedded Device Support
Expand Down
17 changes: 15 additions & 2 deletions hugginggpt/server/awesome_chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,13 @@
api_name = "chat/completions"

API_TYPE = None
# priority: local > azure > openai
# priority: local > azure > minimax > openai
if "dev" in config and config["dev"]:
API_TYPE = "local"
elif "azure" in config:
API_TYPE = "azure"
elif "minimax" in config:
API_TYPE = "minimax"
elif "openai" in config:
API_TYPE = "openai"
else:
Expand All @@ -100,6 +102,14 @@
elif API_TYPE == "azure":
API_ENDPOINT = f"{config['azure']['base_url']}/openai/deployments/{config['azure']['deployment_name']}/{api_name}?api-version={config['azure']['api_version']}"
API_KEY = config["azure"]["api_key"]
elif API_TYPE == "minimax":
API_ENDPOINT = f"{config['minimax'].get('base_url', 'https://api.minimax.io/v1')}/{api_name}"
if config["minimax"]["api_key"] and config["minimax"]["api_key"] != "REPLACE_WITH_YOUR_MINIMAX_API_KEY_HERE":
API_KEY = config["minimax"]["api_key"]
elif "MINIMAX_API_KEY" in os.environ:
API_KEY = os.getenv("MINIMAX_API_KEY")
else:
raise ValueError(f"Incorrect MiniMax key. Please check your {args.config} file or set the MINIMAX_API_KEY environment variable.")
elif API_TYPE == "openai":
API_ENDPOINT = f"https://api.openai.com/v1/{api_name}"
if config["openai"]["api_key"].startswith("sk-"): # Check for valid OpenAI key in config file
Expand Down Expand Up @@ -190,9 +200,12 @@ def send_request(data):
api_key = data.pop("api_key")
api_type = data.pop("api_type")
api_endpoint = data.pop("api_endpoint")
# MiniMax requires temperature in (0.0, 1.0]; adjust zero values
if api_type == "minimax" and data.get("temperature", 1) == 0:
data["temperature"] = 0.01
if use_completion:
data = convert_chat_to_completion(data)
if api_type == "openai":
if api_type in ("openai", "minimax"):
HEADER = {
"Authorization": f"Bearer {api_key}"
}
Expand Down
45 changes: 45 additions & 0 deletions hugginggpt/server/configs/config.minimax.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
minimax:
api_key: REPLACE_WITH_YOUR_MINIMAX_API_KEY_HERE
base_url: https://api.minimax.io/v1
# openai:
# api_key: REPLACE_WITH_YOUR_OPENAI_API_KEY_HERE
# azure:
# api_key: REPLACE_WITH_YOUR_AZURE_API_KEY_HERE
# base_url: REPLACE_WITH_YOUR_ENDPOINT_HERE
# deployment_name: REPLACE_WITH_YOUR_DEPLOYMENT_NAME_HERE
# api_version: "2022-12-01"
huggingface:
token: REPLACE_WITH_YOUR_HUGGINGFACE_TOKEN_HERE # required: huggingface token @ https://huggingface.co/settings/tokens
dev: false
debug: false
log_file: logs/debug.log
model: MiniMax-M2.7 # MiniMax models: MiniMax-M2.7, MiniMax-M2.7-highspeed, MiniMax-M2.5, MiniMax-M2.5-highspeed (204K context)
use_completion: false # MiniMax uses chat/completions endpoint
inference_mode: huggingface # local, huggingface or hybrid, prefer hybrid
local_deployment: minimal # minimal, standard or full, prefer full
num_candidate_models: 5
max_description_length: 100
proxy: # optional: your proxy server "http://ip:port"
http_listen:
host: 0.0.0.0
port: 8004
logit_bias:
parse_task: 0.1
choose_model: 5
tprompt:
parse_task: >-
#1 Task Planning Stage: The AI assistant can parse user input to several tasks: [{"task": task, "id": task_id, "dep": dependency_task_id, "args": {"text": text or <GENERATED>-dep_id, "image": image_url or <GENERATED>-dep_id, "audio": audio_url or <GENERATED>-dep_id}}]. The special tag "<GENERATED>-dep_id" refer to the one generated text/image/audio in the dependency task (Please consider whether the dependency task generates resources of this type.) and "dep_id" must be in "dep" list. The "dep" field denotes the ids of the previous prerequisite tasks which generate a new resource that the current task relies on. The "args" field must in ["text", "image", "audio"], nothing else. The task MUST be selected from the following options: "token-classification", "text2text-generation", "summarization", "translation", "question-answering", "conversational", "text-generation", "sentence-similarity", "tabular-classification", "object-detection", "image-classification", "image-to-image", "image-to-text", "text-to-image", "text-to-video", "visual-question-answering", "document-question-answering", "image-segmentation", "depth-estimation", "text-to-speech", "automatic-speech-recognition", "audio-to-audio", "audio-classification", "canny-control", "hed-control", "mlsd-control", "normal-control", "openpose-control", "canny-text-to-image", "depth-text-to-image", "hed-text-to-image", "mlsd-text-to-image", "normal-text-to-image", "openpose-text-to-image", "seg-text-to-image". There may be multiple tasks of the same type. Think step by step about all the tasks needed to resolve the user's request. Parse out as few tasks as possible while ensuring that the user request can be resolved. Pay attention to the dependencies and order among tasks. If the user input can't be parsed, you need to reply empty JSON [].
choose_model: >-
#2 Model Selection Stage: Given the user request and the parsed tasks, the AI assistant helps the user to select a suitable model from a list of models to process the user request. The assistant should focus more on the description of the model and find the model that has the most potential to solve requests and tasks. Also, prefer models with local inference endpoints for speed and stability.
response_results: >-
#4 Response Generation Stage: With the task execution logs, the AI assistant needs to describe the process and inference results.
demos_or_presteps:
parse_task: demos/demo_parse_task.json
choose_model: demos/demo_choose_model.json
response_results: demos/demo_response_results.json
prompt:
parse_task: The chat log [ {{context}} ] may contain the resources I mentioned. Now I input { {{input}} }. Pay attention to the input and output types of tasks and the dependencies between tasks.
choose_model: >-
Please choose the most suitable model from {{metas}} for the task {{task}}. The output must be in a strict JSON format: {"id": "id", "reason": "your detail reasons for the choice"}.
response_results: >-
Yes. Please first think carefully and directly answer my request based on the inference results. Some of the inferences may not always turn out to be correct and require you to make careful consideration in making decisions. Then please detail your workflow including the used models and inference results for my request in your friendly tone. Please filter out information that is not relevant to my request. Tell me the complete path or urls of files in inference results. If there is nothing in the results, please tell me you can't make it. }
10 changes: 9 additions & 1 deletion hugginggpt/server/get_token_ids.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@
"curie": tiktoken.get_encoding("r50k_base"),
"babbage": tiktoken.get_encoding("r50k_base"),
"ada": tiktoken.get_encoding("r50k_base"),
"MiniMax-M2.7": tiktoken.get_encoding("cl100k_base"),
"MiniMax-M2.7-highspeed": tiktoken.get_encoding("cl100k_base"),
"MiniMax-M2.5": tiktoken.get_encoding("cl100k_base"),
"MiniMax-M2.5-highspeed": tiktoken.get_encoding("cl100k_base"),
}

max_length = {
Expand All @@ -31,7 +35,11 @@
"davinci": 2049,
"curie": 2049,
"babbage": 2049,
"ada": 2049
"ada": 2049,
"MiniMax-M2.7": 204800,
"MiniMax-M2.7-highspeed": 204800,
"MiniMax-M2.5": 204800,
"MiniMax-M2.5-highspeed": 204800,
}

def count_tokens(model_name, text):
Expand Down