SelfCritiqueAgent

SelfCritiqueAgent

The SelfCritiqueAgent is a Crowe-native, reflection-driven agent that improves outputs through drafting, structured critique, self-reflection, and targeted revision. It iterates over a task multiple times, learning from its own feedback and retained memories to deliver higher-quality responses.

Overview

SelfCritiqueAgent employs three internal roles:

  • Author – produces the initial draft/output

  • Critic – evaluates the draft against explicit quality criteria

  • Reflector – distills lessons and patterns to guide revisions and future outputs

Initialization

from crowe.agents import SelfCritiqueAgent

agent = SelfCritiqueAgent(
    agent_name="self-critique-agent",
    system_prompt="...",          # Optional: custom system prompt
    model_name="openai/gpt-4o",
    max_cycles=3,
    memory_capacity=128
)

Parameters

Parameter
Type
Default
Description

agent_name

str

"self-critique-agent"

Agent identifier for logs and routing

system_prompt

str

SC_PROMPT_DEFAULT

Base system prompt for all sub-roles

model_name

str

"openai/gpt-4o"

Backbone model used by the roles

max_cycles

int

3

Maximum iterative improvement cycles

memory_capacity

int

128

Long-term memory capacity (items)

return_list

bool

False

If True, return conversation as list

return_dict

bool

False

If True, return conversation as dict

Methods

draft

Generate an initial draft for a task (Author role).

response = agent.draft(
    task: str,
    context: dict | None = None
) -> str

Args

  • task: The problem or instruction.

  • context (optional): Extra hints, constraints, or retrieved facts.


critique

Evaluate a response against objective criteria (Critic role).

feedback, score = agent.critique(
    task: str,
    response: str,
    rubric: dict | None = None
) -> tuple[str, float]

Returns

  • feedback: Structured critique text.

  • score: A float in [0, 1] indicating quality.


self_reflect

Produce self-reflection statements that generalize beyond a single attempt (Reflector role).

reflection = agent.self_reflect(
    task: str,
    response: str,
    feedback: str
) -> str

revise

Revise a response using the latest critique and reflection.

revised = agent.revise(
    task: str,
    original: str,
    feedback: str,
    reflection: str
) -> str

cycle

Run one improvement cycle: draft/critique/reflect/revise.

result = agent.cycle(
    task: str,
    iteration: int = 0,
    previous: str | None = None,
    rubric: dict | None = None
) -> dict

Returns a dictionary containing:

  • task, draft, feedback, reflection, revised, score, iteration


run_batch

Execute the iterative process for a list of tasks.

results = agent.run_batch(
    tasks: list[str],
    include_intermediates: bool = False,
    rubric: dict | None = None
) -> list[dict] | list[str]

Returns

  • If include_intermediates=False: List of final revised responses

  • If include_intermediates=True: Full histories per task

Example Usage

from crowe.agents import SelfCritiqueAgent

agent = SelfCritiqueAgent(
    agent_name="self-critique-agent",
    model_name="openai/gpt-4o",
    max_cycles=3
)

tasks = [
    "Explain quantum computing to a beginner.",
    "Write a Python function to sort a list of dictionaries by a key."
]

results = agent.run_batch(tasks)

for i, result in enumerate(results, start=1):
    print(f"\nTask {i}: {tasks[i-1]}")
    print(f"Response: {result}")

Memory System

SelfCritiqueAgent ships with InsightMemory, maintaining both short-term and long-term records of drafts, critiques, and reflections to guide future work.

Memory Features

  • Short-term traces for current session context

  • Long-term insights for durable lessons and reusable patterns

  • Capacity limits with automatic pruning

  • Relevance retrieval for similar future tasks

  • Similarity deduplication to avoid memory bloat

Best Practices

  • Define clear tasks: Supply constraints, audience, tone, and format.

  • Tune cycles: Increase max_cycles for complex or high-stakes outputs.

  • Curate memory: Adjust memory_capacity and periodically purge.

  • Choose the right model: Favor models with strong reasoning and instruction-following.

  • Harden for production: Add timeouts, retries, and validation if outputs feed downstream systems.

Last updated