← Back to blog

SEO directories with 9 AIs in parallel: how we ship 200 listings a day without writing one

How we orchestrate 9 language models in parallel to research, write and validate SEO directory listings. Real case study, architecture and metrics.

By SeoNova · Published · 7 min read
9 nodes labeled with LLM model names (GPT-OSS, Qwen3, GLM-4.7, Llama, Mistral, Gemini, Cohere, Cloudflare, Ollama) connected with lines toward a generated listing tagged SCHEMA OK.
9 nodes labeled with LLM model names (GPT-OSS, Qwen3, GLM-4.7, Llama, Mistral, Gemini, Cohere, Cloudflare, Ollama) connected with lines toward a generated listing tagged SCHEMA OK.

We’ve been running 8 directory sites for 14 months with zero human writers on the team. Together they hold over 45,000 published listings, and the average quality —graded listing by listing— is 7.6/10. This post walks through the architecture, which LLM providers (the “AIs” behind the curtain) we use and why we run nine of them in parallel instead of one.

If you came expecting “we plugged in GPT-4 and called it a day”, spoiler: the model is the least important variable. What moves the needle is the pipeline and the orchestration.

The number that matters

200 listings per day, per site. Scale that mentally: 8 sites × 200 listings/day × 30 days = 48,000 listings per month. To pull that off with humans you’d need roughly 20 full-time writers (at 10 listings/day per person). In London that’s around £45,000/month in salaries alone.

Our monthly LLM bill: between €40 and €90 depending on the month. Everything else is Cloudflare free, shared hosting, and an NVIDIA RTX 5060 Ti GPU we bought in April for €573.

Why 9 models, not 1

When we started 14 months ago, we tried GPT-4 turbo and nothing else. Three problems hit us in the first week:

  1. Aggressive rate limits. Tier 1 gives you ~3,500 tokens/minute. A single directory listing (1,500 words = 2,000 tokens) that includes research + writing + correction burns ~8,000 tokens. You’re capped at 25 listings/day on a good day.
  2. Cost. GPT-4 turbo at $10/MTok input + $30/MTok output. A full listing costs ~$0.15. At 200 listings/day × $0.15 = $30/day = $900/month. Not sustainable.
  3. Zero resilience. Whenever OpenAI’s API has an incident (every 3-4 weeks), the pipeline halts. Nothing to do but wait.

The fix was obvious, painful to implement: a rotating provider pool. The app keeps a priority-sorted list of models. If the first one fails or rate-limits, it jumps to the next without losing the listing.

The 9-provider stack

As of June 2026, our default chain is:

  1. Cerebras Qwen3-235B-A22B — the quality/speed king. Inference at 1,500 tokens/second (yes, 1,500). Cerebras Dev Tier is $10 prepaid.
  2. Groq GPT-OSS-120B — Meta’s open-source model, strong at reasoning. Generous free tier.
  3. SiliconFlow GLM-4.7 — Zhipu’s Chinese model, exceptional for research and data extraction. Starting credit free.
  4. Mistral Medium — fallback only. Strict free tier.
  5. Google Gemini 2.5 Flash — solid for creative tasks.
  6. Cohere Command R+ — great at classification and summarisation.
  7. Cloudflare Workers AI — small edge models (Llama 3.1 8B), dirt cheap.
  8. Ollama Qwen3:14b Q6_K (local) — runs on our GPU. 8-12 listings/hour, free.
  9. Ollama Qwen2.5:7b (local) — for quick extraction tasks (research).

The local models (Ollama is a runtime that runs LLMs on your own machine) are the safety net. If every cloud provider goes down at once (it’s happened twice), the local stack keeps producing. Cost per listing: €0 (electricity aside).

The pipeline: 4 phases per listing

A listing isn’t written in a single LLM call. It’s composed:

Phase 1 — Research

The system queries real information about the entity (business, place, product) through engines like Brave Search API and Firecrawl. It pulls 3-5 cross-referenced sources, extracts them with a small model (GLM-4.7 or NuExtract) and returns structured JSON: name, address, hours, services, reviews.

Time: ~6 seconds. Tokens: ~3,000 input + ~500 output.

Phase 2 — First draft

A large model (Qwen3-235B via Cerebras, usually) takes the research JSON + an SEO template prompt + the niche guidelines, and writes the full listing with H1, H2, H3 (the HTML headings that structure the document), bullets, short paragraphs and CTA.

Time: ~3 seconds (Cerebras is very fast). Tokens: ~4,000 input + ~2,500 output.

Phase 3 — Evaluation

A different LLM acts as judge. It reads the listing, cross-checks the source, inspects H1 length, presence of bold, internal linking, image alt text, total length, and returns a 0-10 score + list of issues.

If score < 6.5 → back to phase 2 with feedback. If score ≥ 6.5 → move to phase 4.

Time: ~2 seconds. Tokens: ~5,000 input + ~300 output.

Phase 4 — Publishing

The listing is published via WordPress REST API authenticated with Application Passwords (guide here), JSON-LD schemas (the structured data Google reads to understand the page) are generated, and Bing and Google get pinged via IndexNow (explanation here).

Total time end-to-end: ~15 seconds.

The secret: adaptive concurrency

The real problem is that providers saturate. Fire 50 listings at Cerebras Dev Tier in parallel and you’ll start getting 429 (Too Many Requests).

Our scheduler (the component that decides what runs and when) adjusts concurrency live:

  • Starts with 4 parallel workers.
  • If the 429 rate climbs above 5%, scales down to 2.
  • If everything runs smooth for 5 minutes, ramps up to 6.
  • If the local Ollama starts swapping (GPU runs out of memory), local workers drop to 1.

That’s the line between a hobby script and a production system.

Why JSON-LD schema matters

A listing without JSON-LD schema (the structured data from Schema.org that Google parses to understand the content) is invisible to AI Overviews, ChatGPT search, Perplexity and Bing Generative. Schema tells the AI: “this is a local business, here is its address, these are its hours, this is its rating”.

Our system emits schema automatically per directory type:

  • Business directory → LocalBusiness + PostalAddress + OpeningHoursSpecification + AggregateRating
  • Product directory → Product + Offer + AggregateRating
  • Event directory → Event + Place + Offer

Good schema is what lands your listing in AI answers before you have brand authority.

What we learned the hard way

Three lessons that cost us:

1. Quality doesn’t come from the most expensive model. We tested GPT-4o, Claude 3.7 Sonnet, Qwen3-235B, GLM-4.7. The gap in average output quality was 0.3 points out of 10. The cost gap was 20× to 50×. Conclusion: the pipeline weighs more than the model.

2. Fail-over must be silent. If a listing takes 12 seconds instead of 5 because the first provider died, fine. If the system crashes and returns a 500 to the user, not fine. The model chain must be invisible.

3. Free tier ≠ production. We tried Mistral free + Cloudflare free as primaries for a week. Throughput → 30% of expected. We reverted to Cerebras/SiliconFlow as primaries. Free tier is a great fallback, never a head-of-chain.

What’s coming in SeoNova

Everything you just read is what we’re packaging inside SeoNova. The idea: anyone with a WordPress install plugs in their site, picks a niche, and starts generating listings with the same pipeline we’ve been tuning for 14 months.

No Ollama setup, no juggling 9 API keys, no writing the scheduler. Just point at your WordPress, mark the directory type and wait.

If that sounds useful, join the waitlist for 50% off the first 3 months. Launching autumn 2026.

Frequently asked questions

The questions we hear the most about this topic

Why 9 models and not just one?
Two reasons. First, no single cloud provider survives 200 requests/day without hitting aggressive rate limits on the free or cheap tier. Second, each model is best at a different sub-task: Qwen3-235B via Cerebras is unbeatable for long-form writing, GLM-4.7 wins at research/extraction, and local Ollama models are great for cheap tasks (classification, formatting). Mixing them lowers cost and adds resilience.
Doesn't this produce worse content than a human writer?
For an advanced technical niche, yes — a human expert wins every time. For a mass directory (verifiable data, standard format, thousands of listings), AI with a good pipeline matches or beats the average freelancer. We ran a blind test of 30 listings rated by an independent evaluator: AI scored 7.4 vs human 7.1. The difference is that AI costs 80% less and runs 50× faster.
How do you avoid Google flagging this as 'scaled AI spam'?
Google penalises *scaled content without value* — not AI itself. Three safety nets: (1) each listing carries unique verifiable data (NAP, hours, prices), (2) every niche has real research with citations, (3) JSON-LD schema declares the proper `LocalBusiness` type with real `AggregateRating`. Google confirmed in March 2024 that it penalises the outcome, not the method.
Why bother with local Ollama when the cloud is fast?
Cost and *fail-open* safety. If every cloud provider saturates at once (it happens), the local stack keeps producing. In our setup, local handles ~30% of listings during off-peak hours. Marginal cost is €0 (electricity aside). With an RTX 5060 Ti 16GB running Qwen3:14b Q6_K we ship 8-12 listings/hour at 6.5/10 quality.
What does it cost to build something similar?
Pure free tier from each provider + local Ollama on your PC: **€0/month**. For production throughput (≥200 listings/day steady), Cerebras Dev Tier is $10 prepaid (lasts ~2 weeks), SiliconFlow gives $5 starting credit. Total marginal cost: $20-40/month for 6,000 listings/month at ≥7.5 quality.

Keep reading

More posts you might like