SEO directories with 9 AIs in parallel: how we ship 200 listings a day without writing one
How we orchestrate 9 language models in parallel to research, write and validate SEO directory listings. Real case study, architecture and metrics.
Summary
- The case: 8 directory sites (USA + Spain), each between 5,000 and 50,000 listings. Zero human writers. Pure orchestrated AI.
- The stack: 9 LLM providers in a rotating pool (Cerebras Qwen3, Groq GPT-OSS, SiliconFlow GLM-4.7, Mistral, Gemini Flash, Cohere, Cloudflare Workers AI + 2 local Ollama models).
- Throughput: ~200 listings/day per site at an average quality of 7.6/10, scored listing-by-listing by an LLM-as-judge evaluator.
- The trick: adaptive concurrency + fail-over chain — if one provider saturates, the system rotates to the next without losing the listing.
- The SEO upside: automatic JSON-LD schema on each listing + rewrite loop wired to Search Console + IndexNow for Bing.
- What we learned: quality doesn't come from the most expensive model. It comes from the pipeline (research → first draft → evaluation → fix).
We’ve been running 8 directory sites for 14 months with zero human writers on the team. Together they hold over 45,000 published listings, and the average quality —graded listing by listing— is 7.6/10. This post walks through the architecture, which LLM providers (the “AIs” behind the curtain) we use and why we run nine of them in parallel instead of one.
If you came expecting “we plugged in GPT-4 and called it a day”, spoiler: the model is the least important variable. What moves the needle is the pipeline and the orchestration.
The number that matters
200 listings per day, per site. Scale that mentally: 8 sites × 200 listings/day × 30 days = 48,000 listings per month. To pull that off with humans you’d need roughly 20 full-time writers (at 10 listings/day per person). In London that’s around £45,000/month in salaries alone.
Our monthly LLM bill: between €40 and €90 depending on the month. Everything else is Cloudflare free, shared hosting, and an NVIDIA RTX 5060 Ti GPU we bought in April for €573.
Why 9 models, not 1
When we started 14 months ago, we tried GPT-4 turbo and nothing else. Three problems hit us in the first week:
- Aggressive rate limits. Tier 1 gives you ~3,500 tokens/minute. A single directory listing (1,500 words = 2,000 tokens) that includes research + writing + correction burns ~8,000 tokens. You’re capped at 25 listings/day on a good day.
- Cost. GPT-4 turbo at $10/MTok input + $30/MTok output. A full listing costs ~$0.15. At 200 listings/day × $0.15 = $30/day = $900/month. Not sustainable.
- Zero resilience. Whenever OpenAI’s API has an incident (every 3-4 weeks), the pipeline halts. Nothing to do but wait.
The fix was obvious, painful to implement: a rotating provider pool. The app keeps a priority-sorted list of models. If the first one fails or rate-limits, it jumps to the next without losing the listing.
The 9-provider stack
As of June 2026, our default chain is:
- Cerebras Qwen3-235B-A22B — the quality/speed king. Inference at 1,500 tokens/second (yes, 1,500). Cerebras Dev Tier is $10 prepaid.
- Groq GPT-OSS-120B — Meta’s open-source model, strong at reasoning. Generous free tier.
- SiliconFlow GLM-4.7 — Zhipu’s Chinese model, exceptional for research and data extraction. Starting credit free.
- Mistral Medium — fallback only. Strict free tier.
- Google Gemini 2.5 Flash — solid for creative tasks.
- Cohere Command R+ — great at classification and summarisation.
- Cloudflare Workers AI — small edge models (Llama 3.1 8B), dirt cheap.
- Ollama Qwen3:14b Q6_K (local) — runs on our GPU. 8-12 listings/hour, free.
- Ollama Qwen2.5:7b (local) — for quick extraction tasks (research).
The local models (Ollama is a runtime that runs LLMs on your own machine) are the safety net. If every cloud provider goes down at once (it’s happened twice), the local stack keeps producing. Cost per listing: €0 (electricity aside).
The pipeline: 4 phases per listing
A listing isn’t written in a single LLM call. It’s composed:
Phase 1 — Research
The system queries real information about the entity (business, place, product) through engines like Brave Search API and Firecrawl. It pulls 3-5 cross-referenced sources, extracts them with a small model (GLM-4.7 or NuExtract) and returns structured JSON: name, address, hours, services, reviews.
Time: ~6 seconds. Tokens: ~3,000 input + ~500 output.
Phase 2 — First draft
A large model (Qwen3-235B via Cerebras, usually) takes the research JSON + an SEO template prompt + the niche guidelines, and writes the full listing with H1, H2, H3 (the HTML headings that structure the document), bullets, short paragraphs and CTA.
Time: ~3 seconds (Cerebras is very fast). Tokens: ~4,000 input + ~2,500 output.
Phase 3 — Evaluation
A different LLM acts as judge. It reads the listing, cross-checks the source, inspects H1 length, presence of bold, internal linking, image alt text, total length, and returns a 0-10 score + list of issues.
If score < 6.5 → back to phase 2 with feedback. If score ≥ 6.5 → move to phase 4.
Time: ~2 seconds. Tokens: ~5,000 input + ~300 output.
Phase 4 — Publishing
The listing is published via WordPress REST API authenticated with Application Passwords (guide here), JSON-LD schemas (the structured data Google reads to understand the page) are generated, and Bing and Google get pinged via IndexNow (explanation here).
Total time end-to-end: ~15 seconds.
The secret: adaptive concurrency
The real problem is that providers saturate. Fire 50 listings at Cerebras Dev Tier in parallel and you’ll start getting 429 (Too Many Requests).
Our scheduler (the component that decides what runs and when) adjusts concurrency live:
- Starts with 4 parallel workers.
- If the 429 rate climbs above 5%, scales down to 2.
- If everything runs smooth for 5 minutes, ramps up to 6.
- If the local Ollama starts swapping (GPU runs out of memory), local workers drop to 1.
That’s the line between a hobby script and a production system.
Why JSON-LD schema matters
A listing without JSON-LD schema (the structured data from Schema.org that Google parses to understand the content) is invisible to AI Overviews, ChatGPT search, Perplexity and Bing Generative. Schema tells the AI: “this is a local business, here is its address, these are its hours, this is its rating”.
Our system emits schema automatically per directory type:
- Business directory →
LocalBusiness+PostalAddress+OpeningHoursSpecification+AggregateRating - Product directory →
Product+Offer+AggregateRating - Event directory →
Event+Place+Offer
Good schema is what lands your listing in AI answers before you have brand authority.
What we learned the hard way
Three lessons that cost us:
1. Quality doesn’t come from the most expensive model. We tested GPT-4o, Claude 3.7 Sonnet, Qwen3-235B, GLM-4.7. The gap in average output quality was 0.3 points out of 10. The cost gap was 20× to 50×. Conclusion: the pipeline weighs more than the model.
2. Fail-over must be silent. If a listing takes 12 seconds instead of 5 because the first provider died, fine. If the system crashes and returns a 500 to the user, not fine. The model chain must be invisible.
3. Free tier ≠ production. We tried Mistral free + Cloudflare free as primaries for a week. Throughput → 30% of expected. We reverted to Cerebras/SiliconFlow as primaries. Free tier is a great fallback, never a head-of-chain.
What’s coming in SeoNova
Everything you just read is what we’re packaging inside SeoNova. The idea: anyone with a WordPress install plugs in their site, picks a niche, and starts generating listings with the same pipeline we’ve been tuning for 14 months.
No Ollama setup, no juggling 9 API keys, no writing the scheduler. Just point at your WordPress, mark the directory type and wait.
If that sounds useful, join the waitlist for 50% off the first 3 months. Launching autumn 2026.
Frequently asked questions
The questions we hear the most about this topic
Why 9 models and not just one?
Doesn't this produce worse content than a human writer?
How do you avoid Google flagging this as 'scaled AI spam'?
Why bother with local Ollama when the cloud is fast?
What does it cost to build something similar?
Keep reading
More posts you might like
- Applied AI for SEO
LLM benchmark for SEO: 7 cloud and local models tested on 168 real listings
Real-world benchmark of 7 LLMs writing SEO listings. Quality score, tokens/sec, cost per listing. Winner and why.
8 min read - WordPress WPO
The WordPress + Cloudflare cache invalidation order that nobody explains
How to purge 5 cache layers in WordPress (OPcache, Object Cache, WP Rocket, LiteSpeed, Cloudflare) without serving stale content. Exact order.
9 min read - WordPress Security
Free Cloudflare: how to set it up and 5 WAF anti-bot rules for your WordPress
Why free Cloudflare is brutal, how to set it up step by step, and 5 WAF rules that block 60-70% of malicious bots without paying a cent.
9 min read