Endpoint (single-model `/v1` server)
abstractcore-endpoint runs a single-model OpenAI-compatible server.
Table of Contents
When to use this vs the gateway Install Run Use with an OpenAI-compatible client Prompt cache control plane (optional)Unlike the multi-provider gateway (Server), this endpoint loads one provider+model once per worker process and reuses it across requests. It’s useful when you want to host a local backend (for example HF GGUF or MLX) as a stable /v1 endpoint.
Source: abstractcore/endpoint/app.py (entrypoint: abstractcore-endpoint).
When to use this vs the gateway
- Use Server when you want
model="provider/model"routing across many providers/models from one gateway process. - Use Endpoint when you want a dedicated “one worker = one model” process (simpler performance characteristics; fewer per-request initialization costs).
Install
pip install "abstractcore[server]"
Then install the provider extra you need:
pip install "abstractcore[mlx]" # Apple Silicon local inference
pip install "abstractcore[huggingface]" # Transformers / torch / llama-cpp-python (heavy)
Run
# CLI flags
abstractcore-endpoint --provider mlx --model mlx-community/Qwen3-4B --host 0.0.0.0 --port 8001
# Or via env vars
export ABSTRACTENDPOINT_PROVIDER=mlx
export ABSTRACTENDPOINT_MODEL=mlx-community/Qwen3-4B
export ABSTRACTENDPOINT_HOST=0.0.0.0
export ABSTRACTENDPOINT_PORT=8001
abstractcore-endpoint
Health check:
curl http://localhost:8001/health
Use with an OpenAI-compatible client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8001/v1", api_key="unused")
resp = client.chat.completions.create(
model="anything", # ignored/validated in single-model mode
messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)
Prompt cache control plane (optional)
If the underlying provider exposes prompt-cache controls, the endpoint also exposes a small control plane under /acore/prompt_cache/* (see abstractcore/endpoint/app.py):
GET /acore/prompt_cache/statsPOST /acore/prompt_cache/setPOST /acore/prompt_cache/updatePOST /acore/prompt_cache/forkPOST /acore/prompt_cache/clearPOST /acore/prompt_cache/prepare_modules
For caching concepts, see Session Management and Architecture.