Back to OpenAI briefs

Builders Unscripted: Alchemy's Agentic Development Loop

OpenAI talks with Alchemy product leader Matias Castello about Codex app workflows, automated code review, shared skills, Linear automation, and personal voice-to-code loops.

Processed May 30, 2026
Infographic mapping Alchemy agent workflows from shared skills to Linear planning, feature flags, and voice-to-code updates.

Executive Summary

Alchemy Product Leader Matias Castello details how his engineering teams and personal workflows leverage the Codex app server, automated code reviews, and modular multi-agent feature generation to eliminate deployment bottlenecks.

Automated code review loops using generative models successfully detect complex multi-migration bugs and race conditions prior to production deployment.

The paradigm of software development has shifted toward treating AI agents and autonomous actors as direct, primary consumers of developer infrastructure and platforms.

By establishing structured upfront preference profiles, builders can orchestrate autonomous AI agents to conduct competitive research and deploy feature-flagged experiments overnight.

Key Takeaways

  • Alchemy initial AI implementation began with automated Slack-integrated documentation edits to skip complex local site generation.
  • Retroactive testing verified that Codex successfully identified tricky, high-impact race conditions within a massive migration post-mortem.
  • Engineering team workflows evolved into interactive, iterative back-and-forth debugging sessions directly inside GitHub pull request comments.
  • Internal productivity is scaled by maintaining a shared repository of company-wide AI skills accessible across different organizational functions.
  • A personal macOS and iOS writing assistant application was constructed using the Codex app server backed by an active ChatGPT subscription.
  • The linear project management pipeline was entirely automated, delegating backlog generation, task breaking, and execution tracking to an LLM.
  • A custom Apple Watch complication was built to capture brief voice dictations, transcribing and routing precise code repository updates on the go.

Builder Implications

  • Assume a zero-to-one engineering task is fully viable with AI rather than defaulting to hiring a multi-person prototype team.
  • Consolidate personal and team workflows into a single configuration file like agents.md to force explicit architectural boundaries for agents.
  • Shift engineering paradigms to build infrastructure and APIs optimized explicitly for consumption by fully autonomous machine agents.
  • Utilize multi-model prompting loops to automatically generate user interface variants using cross-modal image-to-code iteration steps.
  • When an LLM produces unexpected or substandard software output, treat the error as an optimization problem in human-to-agent prompt communication.

Things to Verify

  • Confirm the structural consistency and error rates of local text expansion hooks running over the live Codex app server.
  • Evaluate the latency overhead introduced when using the Codex CLI and Codex harness to process high-frequency multi-file iterations.
  • Verify the exact pricing and credit consumption structures when pointing third-party client sessions directly to a ChatGPT API backend.
  • Assess the potential for model hallucination when instructing a research skill to scrape live competitive features from the open web.