About — ThinHarness

A minimal, opinionated agent harness — focused scope, straightforward code, easy to fork.

CI · passing License · MIT PyPI · thinharness

// why this exists

Why this exists

Production agents rarely stop at framework configuration. Things like orchestration, permissions, user/session storage, and deployment become specific to the application and its users.

ThinHarness exists for the gap between building the agent loop yourself and adopting a large agent runtime where the loop comes bundled with assumptions you don’t need and can’t easily change.

It owns a focused set of agent-loop primitives that generalize well and are tedious to rebuild, leaving the rest of the application stack for you to own.

I started building ThinHarness after running into this gap in practice. Filesystem-enabled agents are simple yet powerful, but you mostly get them by adopting a large framework with layers of abstraction. I usually needed only a small slice of the functionality, but that slice came with coupled assumptions that didn't match my application. Making it fit meant writing enough wrappers, adapters, and fixes that I ended up owning framework-shaped code anyway.

// loc comparison

How small, exactly

Framework-only LOC. Each row strips non-framework code (platform/deployment, voice/realtime, eval suites, UI/CLI, wire protocols) from the upstream package. Provider implementations stay in.

● yes ◑ partial ○ no

Library	LOC¹	Tool retries²	Sub- agents	Skills	FS tools	OTel tracing
ThinHarness	7,985	●	●	●	●	●
Claude Agent SDK	8,263³	○	●	●	●	◑
smolagents	9,840	○	●	○	○	◑
deepagents	17,664⁴	○	●	●	●	○
AWS Strands	32,526	◑	●	●	○	●
Microsoft Agent Framework	41,331	○	●	●	●	●
Pydantic AI	59,087	●	○	○	○	●
Google ADK	65,799	◑	●	●	●	●
OpenAI Agents SDK	73,796	○	●	●	◑	◑
Agno	113,477	◑	●	●	●	◑

Table focuses on harness-level features that differentiate the libraries. All listed also support MCP, lifecycle hooks, multi-turn conversations, structured output, and human-in-the-loop. It intentionally does not compare framework/platform features like vector DB integrations, hosted deployment, memory/session stores, or broad SaaS connectors.

1. LOC excludes anything that is not the core agent harness framework. See raw README source comments for exact commands.

2. Tool retries: a documented primitive (e.g. Pydantic AI's ModelRetry) that lets tools signal "model passed bad args — retry with this feedback," distinct from generic exception propagation.

3. Claude Agent SDK shells out to the Claude Code CLI binary, which is 200k+ LOC.

4. deepagents is a thin wrapper over LangChain/LangGraph; effective import surface is ≈112k LOC.

See docs/table.md for per-cell rationale and how the LOC numbers are measured.

// opinions

Opinions

ThinHarness has opinions. They are the reason it stays small.

purpose_built

Purpose-built agents, not universal agents

ThinHarness is for bounded agent loops inside software you control, not open-ended interactive assistants. For business use cases, focused agent loops orchestrated by deterministic code are usually a better fit than sprawling multi-agent systems with broad authority.

no_bash

No bash by default

Purpose-built business agents usually don't need a shell. Bash is a broad security and reliability surface: it gives the model open-ended authority instead of typed, bounded actions. ThinHarness keeps bash out of the default and built-in tool sets, but exposes an opt-in BashTool for exploratory runs before the workflow is hardened with typed tools.

Search is a top priority

The search tool exposes ripgrep as compact grouped path/line results, tuned for document and business-workflow agents rather than code navigation. There's also a jsonl_search variant, because JSONL is the right shape when you're replacing RAG with agent-driven search over structured data: ripgrep row prefiltering, jq-style field projection, where filters, range filters, and snippets from large multiline fields.

parallel_llm

Parallel LLM calls, built in

Fan out from inside the harness when a workflow needs efficient parallel processing or majority vote for reliability. Set builtin_parallel_llm_model to enable the default parallel_llm tool for plain-text batches; for validated structured output per call, instantiate ParallelLlmTool yourself with output_type (a Pydantic model). Each call is stateless, and large batches can write JSON to output_file.

no_token_streaming

No token streaming

Streaming is for workflow progress, not live chatbot text. ThinHarness emits run, model-turn, tool, retry, limit, and subagent events, but it does not stream provider token deltas. Token streaming would add provider-specific plumbing, event merging, cancellation edge cases, and more surface area to keep stable. For workflow-style agents, step-level updates are usually the useful signal.

providers

Three providers, no matrix

ThinHarness ships small provider classes for OpenAI, Anthropic, and OpenRouter. If your gateway speaks one of those protocols, you swap a base URL and move on. If not, the provider classes are small enough to fork or replace, and ignoring the bundled ones costs you nothing.

no_compaction

No compaction

Compaction is a workaround for context windows filling up across long, accumulating runs — useful for interactive coding sessions that sprawl over hours. For SDK-based business agents, the right answer to "context is getting big" is almost always better task decomposition: shorter runs, separate harness instances, narrower subagents.

no_deployment

No deployment layer

Agents still need serving, auth, durable jobs, user/session storage, and deployment in production. ThinHarness does not try to own that stack. A bundled deployment layer might work for some teams, but it will miss plenty of real production shapes; instead of adding more code and more options, ThinHarness leaves that application stack for you to own.

// install

Install

$ uv add thinharness # or pip install thinharness

Requires Python 3.11+.

// use

Use

import asyncio
from thinharness import Harness, HarnessConfig

async def main():
    async with Harness(HarnessConfig(root=".", model="openai:gpt-5.5")) as harness:
        result = await harness.run("Read README.md and summarize it.")
        print(result.text)

asyncio.run(main())

There's a synchronous wrapper too: Harness(...).run_sync(...).

// features

Features

Filesystem tools

read, write, batched exact-replacement edit, search, list, and glob with root-scoped path policies.

JSONL search

Opt-in jsonl_search for structured line-delimited data, with ripgrep prefiltering, field projection, equality/contains/regex/range where filters, and field-level snippets from large multiline string values.

Bash prototype tool

Opt-in BashTool for exploratory shell commands. It is lightweight, custom-registration only, and is not included in the default or built-in tool set.

Provider adapters

Built-in OpenAI, Anthropic, and OpenRouter adapters, plus public model/session protocols for implementing another provider.

Custom typed tools

Define sync or async ToolSpec handlers with Pydantic argument models, normalized ToolResult envelopes, sequential/approval flags, and per-tool retry settings.

Structured output

Pydantic-validated results with native, tool, prompted, and text modes.

Hooks

Lifecycle and tool-call interception for prompt submission, tool calls, subagents, limits, and run boundaries.

Subagents

Opt-in delegation through a built-in subagent tool and explicit SubAgentConfig.

Parallel LLM

Opt-in parallel_llm fan-out for batches of independent one-shot prompts, plus ParallelLlmTool(...).spec() for renameable tools with explicit model, path, prompt, and retry settings.

Skills

Explicit skill_read and skill_run tools for selected skill directories, with Python, shell, JavaScript, and Go script runners.

Resume

Clean new-turn continuation through self-contained transcript state that can replay across built-in providers and models, preserving native reasoning on same-provider resume and degrading it to text across providers.

MCP

Optional MCP client support with lazy tool discovery and collision checks.

Parallel tool calls

Same-turn tool batches run concurrently when every called tool is parallel-safe.

Human approvals

Mark custom tools as approval-required so a run pauses before side effects, returns pending call details plus resume state, then continues after an approve/reject decision.

Event streaming

Async coarse-grained run, model, tool, retry, limit, and subagent events for workflow visibility.

Tool retries

Tools raise ModelRetry to send structured feedback back to the model and trigger a retry within a per-tool budget.

Limits and notices

Configured request, tool-call, output-retry, and tool-retry budgets bound each run; near-limit guidance can warn the model before request or tool-call budgets are exhausted.

Tracing

Local plaintext JSONL traces plus OpenTelemetry-compatible spans for runs, provider calls, tools, and subagents.

// status

Status

Pre-1.0. APIs may shift, but I don't expect dramatic changes. Forking is a real option, not just a theoretical one: the codebase is small enough that pulling upstream changes into your fork by hand stays cheap. Each major feature (MCP, subagents, jsonl_search, parallel_llm, skills) lives in its own file with no hidden dependencies. If you don't use one, that's even less code to worry about. If you want to delete it entirely, that's a one-shot 10-word prompt to a coding agent.

// license

License

MIT. See LICENSE.