Decagon @DecagonAI

Powering concierge customer experiences for the most impactful companies in the world. Backed by @a16z, @accel, @baincapVC, @coatuemgmt, @indexventures decagon.ai San Francisco Joined January 2024

Tweets

516
Followers

5K
Following

17
Likes

482

Ashwin Sreenivas @AshwinSreenivas

5 hours ago

One of the hardest parts of building self-improving agents is proving they are actually improving. That’s why, alongside Duet Autopilot, we built DuetBench: the first benchmark designed specifically for CX agents that learn and improve over time. To evaluate Duet Autopilot, we compared its performance against certified human agent builders and graded both on outcome and methodology across 90 diagnostic investigations from simple metric lookups to root-causing CSAT drops. We also evaluated Autopilot on enterprise agent-building tasks. Starting from messy design documents, it had to build AOPs and tools from scratch, generate simulations, and pass every associated test before a task was considered complete. Autopilot demonstrated an iterative approach to agent building. Rather than solving problems in a single pass, it ran simulations, identified broken branches, repaired the AOP or underlying tool, and repeated the process until the workflow passed. Another notable result was that Autopilot improved the quality of its own test set through self critique, increasing simulation accuracy from 58% to 88% across 520 benchmark runs. As self-improving systems become more common, verified evaluation will matter just as much as model capability. Excited to share the research behind it. Full writeup below. ↓

1 1 6 145 0

View Details

Decagon @DecagonAI

6 hours ago

Full research and methodology: decagon.ai/blog/duetbench

0 0 3 53 0

View Details

Decagon @DecagonAI

6 hours ago

As self-improving systems become more common, verified evaluation will matter just as much as model capability.

1 0 2 52 0

View Details

Decagon @DecagonAI

6 hours ago

Yesterday, we introduced Duet Autopilot as the first self-improving agent for CX. It’s a big claim, which is why we decided to build a benchmark to back it up. DuetBench is the first benchmark designed specifically for CX agents that learn and improve over time.

1 4 11 350 0

View Details

Daniel C. Liem @damndanielliem

24 hours ago

Our AI at @DecagonAI is now doing more agent-building than we (humans) are. Duet wrote 81% of our test simulations, and made 54% of the edits to our customers' agents. AI products naturally evolve toward AI doing more of the work, and we're pushing that to the frontier

Decagon @DecagonAI

a day ago

Introducing Duet Autopilot, the first verified self-improving AI agent for CX. It automates agent improvement by turning conversation signals into validated improvements ready for human review, helping agents get better with every conversation. 🧵

2 6 32 13K 8

3 1 7 4K 2

View Details

Kimberly Tan @kimberlywtan

a day ago

.@DecagonAI's ability to push the frontier of agent development is truly impressive. Duet Autopilot is the first verified self-improving AI agent for CX – learn more below!

Jesse Zhang @thejessezhang

a day ago

Today, we’re launching Duet Autopilot, the first verified self-improving AI agent for CX! It automatically analyzes conversations, identifies opportunities for improvement, validates updates, and surfaces them for human review, improving itself with each cycle. 👇

3 9 41 9K 7

0 1 6 1K 1

View Details

Decagon @DecagonAI

a day ago

@shiraviii Very exciting times!!

0 0 0 476 0

View Details

Decagon @DecagonAI

a day ago

@gramliu 🙌

0 0 0 16 0

View Details

Jesse Zhang @thejessezhang

a day ago

3 9 41 9K 7

View Details

collin @collinhqian

a day ago

Turn on Autopilot and go touch grass. More gamechanging than Pokemon Go 🤯

Decagon @DecagonAI

a day ago

2 6 32 13K 8

2 1 12 3K 2

View Details

Nick Lam @nick_lam_93

a day ago

a big part of onboarding @DecagonAI is to become a 'master' at the platform (goated education team) its been interesting to be a user of the platform i work for (rare if youve spent your life in b2b as an engineer) because ive just become so impressed with our product and engineering team organically through the experience Duet AP is another feature im extremely impressed with, the right interface for llms + data warehousing is proactive suggestions/automations