Async/Await Guide
Use agenerate(), async streaming, and asyncio.gather() to run concurrent LLM calls with the same unified API across providers.
What you get
- Async generation:
await llm.agenerate(...) - Async streaming:
await llm.agenerate(..., stream=True)→async forchunks - Async sessions:
await session.agenerate(...)preserves history
Single async request
import asyncio
from abstractcore import create_llm
async def main():
llm = create_llm("openai", model="gpt-5-mini")
response = await llm.agenerate("Explain Python in one paragraph.")
print(response.content)
asyncio.run(main())
Concurrent requests with asyncio.gather()
Run multiple calls concurrently (especially useful for local gateways or when doing many small prompts).
import asyncio
from abstractcore import create_llm
async def main():
llm = create_llm("ollama", model="qwen3:4b-instruct")
tasks = [
llm.agenerate("Summarize Python in 1 sentence."),
llm.agenerate("Summarize JavaScript in 1 sentence."),
llm.agenerate("Summarize Rust in 1 sentence."),
]
responses = await asyncio.gather(*tasks)
for r in responses:
print("-", r.content.strip())
asyncio.run(main())
Async streaming
When stream=True, agenerate() returns an async iterator of chunks.
import asyncio
from abstractcore import create_llm
async def main():
llm = create_llm("anthropic", model="claude-haiku-4-5")
stream = await llm.agenerate("Write a haiku about coding.", stream=True)
async for chunk in stream:
if chunk.content:
print(chunk.content, end="", flush=True)
print()
asyncio.run(main())
Async sessions
BasicSession supports async while preserving conversation history.
import asyncio
from abstractcore import create_llm, BasicSession
async def main():
llm = create_llm("openai", model="gpt-5-mini")
session = BasicSession(provider=llm, system_prompt="You are a helpful assistant.")
r1 = await session.agenerate("My name is Alice.")
r2 = await session.agenerate("What is my name?")
print(r2.content)
asyncio.run(main())
Provider support notes
- Native async HTTP is used where available (OpenAI, Anthropic, Ollama, LMStudio, OpenRouter, openai-compatible, vLLM).
- Fallback async may use threads for local in-process providers (e.g., MLX, some HuggingFace backends) to keep the event loop responsive.