Async/Await Guide

Use agenerate(), async streaming, and asyncio.gather() to run concurrent LLM calls with the same unified API across providers.

What you get

  • Async generation: await llm.agenerate(...)
  • Async streaming: await llm.agenerate(..., stream=True)async for chunks
  • Async sessions: await session.agenerate(...) preserves history

Single async request

import asyncio
from abstractcore import create_llm

async def main():
    llm = create_llm("openai", model="gpt-5-mini")
    response = await llm.agenerate("Explain Python in one paragraph.")
    print(response.content)

asyncio.run(main())

Concurrent requests with asyncio.gather()

Run multiple calls concurrently (especially useful for local gateways or when doing many small prompts).

import asyncio
from abstractcore import create_llm

async def main():
    llm = create_llm("ollama", model="qwen3:4b-instruct")

    tasks = [
        llm.agenerate("Summarize Python in 1 sentence."),
        llm.agenerate("Summarize JavaScript in 1 sentence."),
        llm.agenerate("Summarize Rust in 1 sentence."),
    ]

    responses = await asyncio.gather(*tasks)
    for r in responses:
        print("-", r.content.strip())

asyncio.run(main())

Async streaming

When stream=True, agenerate() returns an async iterator of chunks.

import asyncio
from abstractcore import create_llm

async def main():
    llm = create_llm("anthropic", model="claude-haiku-4-5")

    stream = await llm.agenerate("Write a haiku about coding.", stream=True)
    async for chunk in stream:
        if chunk.content:
            print(chunk.content, end="", flush=True)
    print()

asyncio.run(main())

Async sessions

BasicSession supports async while preserving conversation history.

import asyncio
from abstractcore import create_llm, BasicSession

async def main():
    llm = create_llm("openai", model="gpt-5-mini")
    session = BasicSession(provider=llm, system_prompt="You are a helpful assistant.")

    r1 = await session.agenerate("My name is Alice.")
    r2 = await session.agenerate("What is my name?")
    print(r2.content)

asyncio.run(main())

Provider support notes

  • Native async HTTP is used where available (OpenAI, Anthropic, Ollama, LMStudio, OpenRouter, openai-compatible, vLLM).
  • Fallback async may use threads for local in-process providers (e.g., MLX, some HuggingFace backends) to keep the event loop responsive.

Related Documentation

API Reference

Sync/async API surface

Tool Calling System

Tools + streaming patterns

HTTP Server Guide

OpenAI-compatible gateway usage