Thanks to SSD streaming I was able to try DwarfStart on my M3 Max 64GB. I let DeepSeek Flash analyze a ~50K LOC codebase. Then I asked Opus to double-check and review the findings. I was impressed that 5 of the 6 findings reported by DeepSeek were accurate.
Thanks to SSD streaming I was able to try DwarfStart on my M3 Max 64GB. I let DeepSeek Flash analyze a ~50K LOC codebase. Then I asked Opus to double-check and review the findings. I was impressed that 5 of the 6 findings reported by DeepSeek were accurate.
@leogrease Useful datapoint. The next thing I’d want is a replay pack around the 6 findings: finding IDs, repro tests, prompt/model hashes, false-positive/triage time, and which issues survive a clean checkout. That’s what turns “LLM code review worked” into an eval harness.
@eddy_mane Good points. If I find the time to perform a detailed evaluation I will share. Possibly on an open source code base.


