Powering concierge customer experiences for the most impactful companies in the world.
Backed by @a16z, @accel, @baincapVC, @coatuemgmt, @indexventuresdecagon.ai San FranciscoJoined January 2024
One of the hardest parts of building self-improving agents is proving they are actually improving.
That’s why, alongside Duet Autopilot, we built DuetBench: the first benchmark designed specifically for CX agents that learn and improve over time.
To evaluate Duet Autopilot, we compared its performance against certified human agent builders and graded both on outcome and methodology across 90 diagnostic investigations from simple metric lookups to root-causing CSAT drops.
We also evaluated Autopilot on enterprise agent-building tasks. Starting from messy design documents, it had to build AOPs and tools from scratch, generate simulations, and pass every associated test before a task was considered complete.
Autopilot demonstrated an iterative approach to agent building. Rather than solving problems in a single pass, it ran simulations, identified broken branches, repaired the AOP or underlying tool, and repeated the process until the workflow passed.
Another notable result was that Autopilot improved the quality of its own test set through self critique, increasing simulation accuracy from 58% to 88% across 520 benchmark runs.
As self-improving systems become more common, verified evaluation will matter just as much as model capability.
Excited to share the research behind it. Full writeup below. ↓
Yesterday, we introduced Duet Autopilot as the first self-improving agent for CX. It’s a big claim, which is why we decided to build a benchmark to back it up.
DuetBench is the first benchmark designed specifically for CX agents that learn and improve over time.
Our AI at @DecagonAI is now doing more agent-building than we (humans) are.
Duet wrote 81% of our test simulations, and made 54% of the edits to our customers' agents.
AI products naturally evolve toward AI doing more of the work, and we're pushing that to the frontier
Introducing Duet Autopilot, the first verified self-improving AI agent for CX.
It automates agent improvement by turning conversation signals into validated improvements ready for human review, helping agents get better with every conversation. 🧵
.@DecagonAI's ability to push the frontier of agent development is truly impressive. Duet Autopilot is the first verified self-improving AI agent for CX – learn more below!
Today, we’re launching Duet Autopilot, the first verified self-improving AI agent for CX!
It automatically analyzes conversations, identifies opportunities for improvement, validates updates, and surfaces them for human review, improving itself with each cycle. 👇
Today, we’re launching Duet Autopilot, the first verified self-improving AI agent for CX!
It automatically analyzes conversations, identifies opportunities for improvement, validates updates, and surfaces them for human review, improving itself with each cycle. 👇
Introducing Duet Autopilot, the first verified self-improving AI agent for CX.
It automates agent improvement by turning conversation signals into validated improvements ready for human review, helping agents get better with every conversation. 🧵
a big part of onboarding @DecagonAI is to become a 'master' at the platform (goated education team)
its been interesting to be a user of the platform i work for (rare if youve spent your life in b2b as an engineer) because ive just become so impressed with our product and engineering team organically through the experience
Duet AP is another feature im extremely impressed with, the right interface for llms + data warehousing is proactive suggestions/automations
Introducing Duet Autopilot, the first verified self-improving AI agent for CX.
It automates agent improvement by turning conversation signals into validated improvements ready for human review, helping agents get better with every conversation. 🧵
Introducing Duet Autopilot, the first verified self-improving AI agent for CX.
It automates agent improvement by turning conversation signals into validated improvements ready for human review, helping agents get better with every conversation. 🧵
762 Followers 1K FollowingAnd being apart aint easy on this love affair, 2 strangers learn 2 fall in love again... 4ever urs 🎶
Motto: Have Faith. Believe. I tweet positivity.💗 🐾💗
460 Followers 1K Followingoperationalizing narrative AI | co-founder, COO @ plotdrive (YCS19) | 6.5 years building story production infrastructure | ex-VR audio @ycombinator s19
6K Followers 2K Followingenterprise ai agents & building https://t.co/Vzrm261P3N for low-resource languages in Kenya on weekends // ex. @crewaiinc, @streamlit (acq. by @snowflake, @bloomberg
34 Followers 485 Followingyou can just do things. research @ decagon. previously research & trading @ citadel securities | my own thoughts on AI & markets