Stay up to date with our latest developments, insights, and announcements.
Building integrations used to mean weeks of tedious data plumbing: implementing OAuth flows, writing custom code for each API endpoint, and wrestling with SDKs. We found a faster way using AI agents, curl, and a security-conscious architecture that keeps user credentials out of the LLM context window.
Onboarding to a new project is the most painful part of software engineering. If you don’t get it right, you probably won’t last long. And odds are, no one is there to help you.
The real challenge isn't writing code. It's understanding the problem.
Traditional analytics fail when every user session is a unique conversation. This guide shows how to use HDBSCAN to automatically discover user cohorts, identify failure patterns, and extract product insights from millions of chat transcripts—including the hyperparameter tuning, memory optimizations, and incremental clustering strategies needed to make it work in production.
You're collecting logs, tracking latency, maybe even running evals. But can you answer: which users are struggling? What patterns are emerging? Is quality actually improving? Here's what real LLM monitoring looks like.
Every day, thousands of engineers open log files and start reading conversations between users and their LLMs, one by one, searching for problems. This is how we monitor AI systems that serve millions of users and burn millions in compute. We've built the most sophisticated technology of our era, and we're debugging it like it's 1999.
Production data is messy. Hospital names become "Genral Hospital," product codes get mangled, and your downstream reports break. We needed a tool that could catch these issues without false positives - which led us down a rabbit hole of DAFSA compression experiments, BK-tree implementations, and ultimately discovering that 200 lines of Cython can save you from rewriting everything in Rust.
Most developers spend hours tweaking AI agent prompts and tools based on guesswork about what might improve performance. But advanced AI models have remarkable self-awareness about their own limitations - they know exactly what's frustrating them and often have specific ideas for improvement. This post explores a surprisingly effective approach: having direct conversations with your agents about their experience, and building systems that let them contribute to their own development through structured feedback loops.
We've open-sourced a new approach to AI-assisted development that wraps Claude Code with o3 to create agents that actually understand your codebase. Instead of struggling with generic responses, you get AI that knows your patterns, dependencies, and conventions—with full control over prompts and behavior. The system uses o3 for strategic planning and Claude Code for implementation, creating a powerful workflow that adapts to how you actually work.
The promise of AI agents in CI is compelling, but the reality is more complex. When they work, they feel like magic. When they don't, the black box nature makes debugging nearly impossible. The path forward requires more transparency, not more prompts.
AI coding tools fail with zero visibility into why. You retry and hope. What if you could actually see, debug, and control your agents?
Third-party AI developer tools often feel like playing the lottery - input your prompt and hope it works. Building your own tools gives you the control and visibility to transform unpredictable gambling into reliable engineering.
In the rapidly evolving landscape of software development, a transformative shift is underway. AI coding agents are emerging as fundamental building blocks that will reshape how we conceive, build, and maintain software. This transformation promises to redefine the entire Software Development Lifecycle (SDLC) in profound ways.
By 2028, the highest-performing engineering teams will distinguish themselves through custom AI systems that embody their unique engineering ethos. As AI increasingly writes code, the strategic advantage will shift from who can write the best code to who can best direct and customize AI to reflect their team's values. Commercial AI coding tools, designed to serve everyone rather than excel for anyone, create dangerous dependencies that limit engineering potential.
While building AI coding tools, I discovered some clever tricks that popular coding agents use to understand your codebase instantly. It turns out they're doing some interesting things with system prompts and context windows behind the scenes. Here's what I learned about how they work and why it matters for building better developer tools.
Bug reproduction remains a major pain point for software companies, often involving manual, slow, and error-prone methods. This post explores current industry approaches, the risks of ignoring "unreproducible" bugs, and how AI-driven automation offers a powerful solution.
AI-driven coding tools promise effortless bug fixes—but can they deliver without first reproducing the bug? Discover why bug reproduction remains essential to effectively leveraging AI in software development, preventing regressions, and ensuring reliable, trustworthy fixes.
Tracking down performance regressions can be tedious. This post shows how to automate the process using Playwright, Chrome DevTools Protocol (CDP), and git bisect to quickly pinpoint the commit responsible for slowdowns.
Every engineer knows the pain of a growing bug backlog. While there's no silver bullet for managing bugs, successful teams have developed battle-tested strategies that evolve with their growth. From radical "fix-or-delete" approaches to structured triage systems, here's how different organizations keep their bug trackers under control – and what you can learn from them.
Discover how using `git bisect` can quickly pinpoint the commit that introduced a bug. In this post, we explore five real-world scenarios—from batched merges to configuration changes—and explain how designing backwards-compatible tests can streamline your debugging process. Click to learn how to leverage `git bisect` for faster, more efficient problem-solving.
Discover how startups and mid-sized firms are transforming their approach to bug backlogs. This post dives into real-world case studies and practical strategies—from ruthless triage to dedicated "bug smash" sprints—that keep development teams agile and focused. Learn how a commitment to immediate bug fixes and a quality-first mindset can streamline your process and enhance product stability.
Regression tests are more than just bug fixes—they're a competitive advantage. Every test you add locks in a hard-won lesson, preventing your team from fighting the same fires twice. Over time, this builds a technical moat that protects your product’s stability while competitors struggle with recurring issues. Scaling companies that invest in regression testing today will move faster, ship more confidently, and outpace those who don’t.
Discover how AI-powered self-healing CI pipelines can revolutionize your development workflow. Learn how modern AI coding agents automatically detect test failures, propose fixes, and validate changes—keeping your team focused on building features, not debugging. Explore real-world scenarios for pull requests and merge queues, and see how this cutting-edge approach improves velocity while ensuring stability.
Discover how predictive test selection and automated test generation can help you scale your test suite without sacrificing speed, so you can ship faster with confidence and lower costs.
Learn how to use AI-powered Stagehand to write robust, natural-language tests that pinpoint bugs in your code—then combine them with `git bisect` to track down the exact commit where things went wrong, all while keeping your tests backward-compatible and easy to maintain.
Discover which dev tasks are prime for AI-driven automation—like squashing ‘no-repro’ bugs or maintaining documentation—so your team can spend less time on repetitive chores and more time solving real user problems.
Explore how focusing on real bug reports, feature flags, and automated bug-to-test conversion can help you balance test coverage and speed, ensuring you catch real-world issues without slowing down development.
Learn how to supercharge your Git bisect workflow by combining it with ephemeral environments for more accurate and efficient debugging.