- Published on
Part 1: I ragged every conversation I ever had with AI
- Authors
- Name
- Sero
- @0xSero
I extracted 727MB of conversations from Cursor, Claude, and Codex. Then I ran a privacy preserving analysis on 809/100,000 conversations spanning 4 months. What I found changed how I think about working with AI.
The Setup
I built a pipeline that:
- Extracted conversations from local AI tools: GitHub
- Analyzed behavioral signals without storing raw text
- Removed all environment variables and sensitive information
- Generated composite indices from keyword patterns
Total: 19,539 user messages, 11,726 assistant messages, 16,484 tool uses.
The Strengths According to Data
1. Bias toward actionability
My spec completeness score averaged 25%~ of all my conversations. I try to frame problems with clear next steps rather than abstract discussions.
Examples, my most common conversations are debugging related loops:
"Fix the TypeScript error in file X. Error: Property 'toString' does not exist on type 'never'."
Not "I have a problem." Just the problem, the file, the error, the ask.
2. Debug loops over single-shot asks
40.3% of my conversations included error sharing. I iterate. I share the error, get a fix, encounter the next error, share that, repeat. This is incredibly wasteful, given i am in the loop when i shouldn't be. I jump straight to the errors as opposed to having the model lint, typecheck, and fix the errors autonomously.
3. Multi-system orchestration
Top languages by conversation:
- Bash: 175
- Go: 156
- Rust: 133
- Solidity: 55
- TypeScript: 335
- SQL: 95
I move between frontend (React, Next.js), backend (Go, TypeScript), infrastructure (Docker, AWS), automation (n8n, Slack integrations).
4. Testing awareness
43.6% of conversations mentioned testing. I do not think of myself as test-disciplined. But I mention tests frequently even if I am bad at writing them. I actually wish I was more disciplined about testing, it saves a lot of time specifically because it helps catch bugs early and reduces the need for manual in-the-loop debugging.
The Improvement Opportunities
The report was honest. Here are the gaps.
1. Minimal reproduction
User conversations with reproduction language: 0.4% That means 99.6% of the time, I ask for help without providing a minimal reproducible example. I just dump the error and expect the AI to figure it out, but it's not always enough. I need to improve my minimal reproduction skills.
2. Test discipline
In repositories where I have tests I did much less debugging, and much more feature development. In repositories without tests I did much more debugging, and much less feature development. This is very obvious, but it's not always easy to remember. I need to improve my test discipline.
3. Security hygiene
Risky disclosure signals: 32.5%
32.5% of my conversations contained signals that could indicate sensitive data exposure. API keys, environment variables, addresses. The heuristics do not store tokens, but they flag patterns. This makes life easier, but if I had spent more time configuring my tools and environment I wouldn't have to rely on the AI to flag them. I need to improve my security hygiene.
4. Acceptance criteria
When delegating multi-file changes, I rarely specify acceptance criteria upfront. This shows up in the spec completeness index being barely above 25%. The data caught what I knew was true. I get vague, then iterate, instead of being clear, then shipping. This habit is not sustainable and has made me actively less productive. I need to improve my acceptance criteria.
The Time-Series Trends
The weekly breakdown told a story.
| Week | n | Spec | Debug | Test | Security | Risky% | Tool Uses |
|---|---|---|---|---|---|---|---|
| 2025-09-22 | 19 | 0.487 | 0.049 | 0.357 | 0.113 | 42.1% | 0 |
| 2025-10-13 | 156 | 0.323 | 0.073 | 0.078 | 0.036 | 20.5% | 0 |
| 2025-12-22 | 117 | 0.237 | 0.073 | 0.110 | 0.093 | 41.9% | 6,261 |
| 2026-01-05 | 70 | 0.213 | 0.050 | 0.098 | 0.118 | 38.6% | 3,044 |
Spec completeness degrades with volume
As conversations increased, spec completeness dropped. More volume, less care. This is the classic trade-off. Speed over quality. It's easy to surrender to the temptation of shipping quickly, but it's important to prioritize quality over speed. By taking the time to write clear acceptance criteria, I can ensure that my work meets the necessary standards and reduces the risk of errors or security vulnerabilities.
Specification, planning, and tests all improve my mental map of the system. By taking the time to write clear acceptance criteria, I can ensure that my work meets the necessary standards and reduces the risk of errors or security vulnerabilities.
Security awareness is cyclical
Security mentions spiked in weeks 50 (0.135), 47 (0.133), and 46 (0.118). These correlate with integration work, adding new tools, connecting new systems.
The Composite Indices
The report included heuristic proxies (0-1 scale):
| Index | User (Me) | Assistant |
|---|---|---|
| Spec completeness | 0.275 | 0.277 |
| Debug maturity | 0.056 | 0.040 |
| Testing discipline | 0.096 | 0.173 |
| Security awareness | 0.058 | 0.064 |
| Architecture-thinking | 0.168 | 0.129 |
The assistant beats me on testing (0.173 vs 0.096). It beats me on security (0.064 vs 0.058). But I beat it on architecture (0.168 vs 0.129).
This is telling. I think about structure more than execution. The AI executes better than I do.
What This Taught Me
1. I am an action-oriented, high-volume developer
100,000+ conversations in 8 months. I use AI as for practically everything from code generation to debugging to testing to security to architecture, I also over use it when I want to run commands, setup docker, even when I want to commit and push. I use it for everything. This is telling, I need to be more intentional about how I can decrease my token usage and optimize my workflows.
2. I iterate more than I plan
40% error sharing but only 0.4% repro language. I debug in public. This is efficient for me but exhausting for collaborators.
3. I delegate testing, rarely doing it myself
Talking about tests (43.6%) is not writing tests. The assistant has higher testing discipline than I do.
4. I leak too much sensitive data
32.5% of conversations contained risky disclosure signals. This is a concrete, measurable hygiene problem.
5. The assistant complements my weaknesses
The AI has higher testing discipline and security awareness. It catches what I miss. This is the right mental model. AI as amplifier, not replacement.
The Action Plan
Based on the data, here is what I am changing.
1. Add repro template
## Error
[exact error message]
## Expected
[what should happen]
## Actual
[what actually happens]
## Minimal repro
[shortest code that demonstrates the issue]
```
### 2. Test discipline checklist
- Write test before fix
- Run tests after fix
- Add regression check
- Document test coverage
### 3. Security scan before commit
```bash
# Pre-commit hook
grep -r "sk-\\|pk-\\|0x[a-fA-F0-9]{64}" --exclude-dir=node_modules
4. Acceptance criteria for delegation
## Deliverables
- [ ] File A modified
- [ ] File B created
- [ ] Tests pass
## Acceptance
- [ ] Compiles without errors
- [ ] Handles edge case X
- [ ] Matches style of existing code
```
## The Bigger Picture
The question is not whether I use AI. I clearly do, aggressively.
The question is: am I using it to grow, or to avoid growth?
The data suggests both. I ship faster (actionability is high). I think less (spec completeness is low). I delegate testing (assistant beats me).
This is a trade-off. Every speedup has a cost. Every delegation has a gap.
## What I Would Tell Someone Else
If you are analyzing your own AI usage:
1. Extract your data. It is easier than it sounds, and worth it
2. Run privacy-preserving analysis. Do not store raw conversations
3. Look for patterns, not scores. The indices are heuristics. Trends are truth
4. Find gaps. Where are you delegating what you should own?
5. Set concrete changes. Vague improvement goals produce vague results
## The Verdict
The workstyle report was humbling. It confirmed suspicions I had and revealed blind spots I did not.
I am not a great debugger (low debug maturity). I do not write tests (low testing discipline). I leak sensitive data (high risky disclosure).
But I am action-oriented (high constraint framing), multi-system (broad language distribution), and iterative (high error sharing).
The profile is not good or bad. It is just true.
And now that it is true, I can work with it.
---
Continue to Part 2: Training My Own Coding Model, where I take these insights and my actual conversations to train sero-nouscoder [I trained a model on all my chats](https://www.sybilsolutions.ai/blog/02-training-my-coding-model-the-pipeline)
```
```