Calder @CalderBuild

CS undergrad building https://t.co/CskM813cQb: AI that runs your PPC ads Shipping in public Notes on PPC x SEO x GEO along the way 🚀 github.com/calderbuild/ LA Joined October 2024

Tweets

783
Followers

78
Following

762
Likes

2K

Calder @CalderBuild

13 hours ago

WSJ drops OpenAI token pricing cuts story. Same day, builders are sharing 65% cost reductions with "caveman prompts." The timing isn't coincidence. Token bloat is the silent killer of agent infrastructure at scale. Every extra word in your system prompts multiplies across thousands of calls daily. The caveman approach strips prompts to pure information density. No pleasantries, no verbose instructions, just compressed intent. It's not about dumbing down — it's about engineering for tokens per dollar. OpenAI can cut prices, but efficient prompting cuts costs faster than any vendor discount.

0 0 1 14 0

View Details

Calder @CalderBuild

13 hours ago

The irony is perfect: Anthropic's ToS likely prohibits using Claude to criticize Anthropic's ToS. @RnaudBertrand catches this recursive censorship trap that most AI safety discussions miss entirely. When your AI can't help you understand its own limitations, you've got a transparency problem.

0 0 0 6 0

View Details

Calder @CalderBuild

23 hours ago

@elponick Claude Code hits the same wall around chain 4. We switched to relevance scoring per step rather than compressed summaries. Each agent gets a filtered context slice based on what it actually needs to execute. Way cleaner than compression artifacts.

1 0 1 35 0

View Details

Calder @CalderBuild

2 days ago

I've been watching people celebrate Claude Fable 5 building web apps in one shot. But there's a bigger question here: if the model can handle complex multi-step tasks alone, what happens to all the agent orchestration we've been building? I've run OpenClaw + Hermes on multi-agent Kanban handoff for months. The whole architecture assumes models need careful prompt chaining, task decomposition, error recovery orchestration. Then Fable 5 does a 50M line Stripe migration in one day. No scaffolding. This hits different when you're actually shipping agent harnesses. Fable 5 one-shots a Pokemon FireRed playthrough. Reconstructs web apps from screenshots. The value prop of complex orchestration starts looking questionable when the model just... works. But this doesn't kill agent architectures. It forces evolution from prompt babysitters to specialized tool integrators. The harness becomes about seamless API routing, robust error handling, long-horizon task persistence. Not breaking down what the model can't handle. The question that keeps me up: are we building orchestration for yesterday's models while tomorrow's models make the whole stack obsolete?

2 0 1 110 0

View Details

Calder @CalderBuild

23 hours ago

@elponick Claude Code hits this around step 4 too. Their compression keeps variable names but drops intermediate explanations. Works until you need to backtrack on logic errors.

0 0 1 29 0

View Details

Calder @CalderBuild

a day ago

@elponick Context snapshots work until you hit the memory wall. We're seeing 4-5 agent chains max before Claude starts dropping critical handoff data. Cheap snapshots but expensive retrieval.

1 0 1 46 0

View Details

Calder @CalderBuild

a day ago

@AlfieJCarter Agents

0 0 0 7 0

View Details

Calder @CalderBuild

a day ago

500k free credits with no verification is wild - that's roughly $500 worth of frontier model access just for signing up. @israfill highlighting how b.ai is essentially giving away what OpenAI charges premium for. The no-card-required onboarding removes the last friction barrier for AI experimentation.

0 0 0 34 0

View Details

Calder @CalderBuild

a day ago

@elponick I keep seeing the same handoff bugs in codebases. We're still building like everything runs sequentially when it doesn't. State collision breaks more apps than typos now.

0 0 0 27 0

View Details

Calder @CalderBuild

3 days ago

Apple's Siri gets agentic capabilities vs every computer-use agent staying in sandboxes. The gap just flipped overnight. Siri now runs hybrid Apple + Gemini models with true OS-level integration. Meanwhile OpenClaw, Hermes, and every other agent runtime I test still fights permission dialogs and API rate limits to click a button. The mainland China block reveals the real strategy. Apple isn't just shipping another AI assistant. They're setting the floor for what users expect from any agent: seamless cross-app actions without asking permission every step. This kills the isolated-tool approach. Users won't tolerate "authenticate here, grant access there, install this bridge" when Siri just works across their entire digital life. Every harness developer now faces the integration tax: match OS-native smoothness or lose to whatever Apple ships next.

0 0 0 79 0

View Details

Calder @CalderBuild

4 days ago

Fans on GitHub are turning investor Serenity's research framework into installable AI agent skills. Four repos already live, each claiming to replicate her "supply chain bottleneck + Bayesian update + demand shock breakdown" method. The haskaomni version auto-extracts stock symbols from her tweets, scores them 0-100, downloads Yahoo charts. The muxuuu fork implements her full research pipeline: hotspot identification through industry chain breakdown to bottleneck discovery to company screening. This isn't task automation. These agents are internalizing strategic frameworks that took years to develop. Supply chain analysis, cross-market correlation, bottleneck identification — cognitive patterns now packaged as reusable skills. The shift from "AI does my spreadsheets" to "AI thinks like my best analyst" changes everything for builders. Instead of automating outputs, we're cloning decision architectures. What happens when every growth team has access to the strategic frameworks of the top 1% performers?

0 0 0 68 0

View Details

Calder @CalderBuild

5 days ago

2062217190724579673

0 0 0 28 0

View Details

Calder @CalderBuild

5 days ago

What happens when every startup has "orchestration layers" and "specialized agents" but zero production traffic? The architecture porn is getting ahead of the actual problem. I keep seeing the same boxes: orchestration, memory, tools, governance. Beautiful diagrams. But the reply threads tell the real story: "agent burns 40k tokens and fails on permissions." The boring stuff kills you first. Auth breaks. Retries don't retry. Cost caps get bypassed by a single runaway loop. Your "governance layer" becomes 847 lines of if-statements that nobody wants to maintain. I've watched this pattern with every infrastructure wave. Perfect system design, broken execution layer. The teams that ship working agents aren't the ones with the cleanest architecture slides. How many "production-grade AI systems" are actually running production workloads vs demos that work until they don't?

0 0 1 160 0

View Details

Calder @CalderBuild

5 days ago

@ZhoWynn Scoped per action is the whole thing. The tool call model treats open browser as one permission when it's really a hundred blast radii. Reading a page vs submitting a form vs hitting pay shouldn't share a trust level. The gate has to sit at the irreversible steps.

0 0 0 24 0

View Details

Calder @CalderBuild

3 weeks ago

2200 likes, 177 retweets on a single browser control toggle for coding agents. The friction wasn't the AI model quality. It wasn't context windows or reasoning capability. It was the harness. Every new AI tool meant another Chrome extension, another setup flow, another integration breaking when the tool updated. One universal toggle changes everything. Browser Use CLI just made every coding agent instantly more powerful than the sum of its parts.

1 0 0 76 1

View Details

Calder @CalderBuild

5 days ago

@Tidianez Provenance on every rule is underrated. A rule with no incident behind it is just an opinion frozen into config, and nobody knows if it's safe to remove later. If each carried the replay that created it, you could audit which rules still earn their place.

1 0 0 43 0

View Details

Calder @CalderBuild

5 days ago

@MicrotronX That's the failure mode I keep hitting. Compression keeps the frequent patterns and drops the rare ones, usually the bugs that bite. Structuring input first decides what survives summarization. I do it by hand now, but it should live in the memory layer.

0 0 0 46 0

View Details

Calder @CalderBuild

2 weeks ago

OpenClaw with context injection vs raw session startup. Same coding task, same Hermes model. Context-injected sessions hit working code 67% faster than cold starts. n=40 tasks over 2 weeks. The difference isn't memory volume. It's memory relevance.

2 0 0 58 0

View Details

Calder @CalderBuild

5 days ago

@JustJerry121 Auditability is the part people skip. With YAML you're reverse engineering intent from syntax. A graph shows the decision points where an agent wanders off, so you catch the bad branch before it runs. The hard part is keeping it honest once the flow gets messy.

0 0 0 30 0

View Details

Calder @CalderBuild

2 weeks ago

I've been testing n8n and similar visual workflow tools. Most people see them as just prettier YAML, but they're missing something bigger. When AI agents need to orchestrate multi-step processes, visual programming becomes the interface between human intent and agent execution. You can see the decision trees, debug the failure points, and hand off specific nodes to different models. YAML trained us to think linearly. But agent workflows branch, loop, and recover. Visual tools let you map that complexity without drowning in nested conditionals. Might be the missing layer between prompt engineering and full autonomy.

2 0 3 124 0

View Details

Calder @CalderBuild

5 days ago

@jazzyalex The memory vs receipts split is sharp, hadn't drawn the line there. Handoff state keeps agents coherent mid-task, receipts are what let a human trust them after. They probably want different storage too. What's Agent Sessions writing the receipt layer on?

0 0 0 29 0

View Details

Calder @CalderBuild

a week ago

Every demo agent starts fresh. Our multi-agent Kanban breaks after 48 hours because agents lose context between handoffs. The real infrastructure problem isn't smarter models. It's memory persistence across sessions. I see builders optimizing prompt engineering and model routing while their agents forget everything between runs. Your coding agent rewrites functions it already perfected. Your research agent re-crawls domains it mapped last week. We've been running OpenClaw + Hermes for 6 months now. The breakthrough wasn't better reasoning or faster inference. It was when we built proper state handoff between agent sessions. Suddenly our Kanban workflows could span days instead of breaking at every restart. Production-grade multi-agent systems need persistent memory architecture, not just better individual agents.

1 0 1 62 0

View Details

Calder @CalderBuild

5 days ago

@saen_dev Not per query. The graph builds incrementally at ingestion, async, so reads hit one that already exists. Static RAG wins single-hop. It earns its keep on multi-hop, where vector recall drops the entities that connect things. No clean benchmark yet though.

0 0 0 22 0

View Details

Calder @CalderBuild

7 days ago

What happens when RAG stops being a retrieval layer and becomes the reasoning layer? I think we're about to see agents that don't just fetch documents — they build knowledge graphs in real-time. RAGFlow isn't just better search for agents. It's turning retrieval into active reasoning. Instead of grabbing chunks and hoping the LLM connects them, the system structures relationships between information before the agent even starts thinking. The shift matters because current agents hit a wall when tasks require connecting different information sources. They can write code or answer questions, but they struggle with complex analysis that spans multiple domains or requires building new frameworks from scattered data. This is exactly the bottleneck I've been hitting with OpenClaw + Hermes setups. The agents are smart enough to reason, but they waste cycles reconstructing context that a proper RAG-agent fusion could maintain persistently. Are we looking at the architecture that finally makes agents useful for research and strategy work? Or just another layer of complexity that breaks in production?