When Crawlers Create Data: Fixing Ghost Inserts in a Next.js + Supabase App
I recently discovered this issue in my Dota 2 Explorer app where web crawlers were quietly flooding my database with "ghost" rows—unintended inserts created by bots that crawl and pre-render pages. If your app persists data during page loads, you might run into this when pages perform inserts during SSR. Here's how I found it and fixed it.
Symptoms
- You notice unexpected rows in Supabase without any real user sessions behind them.
- Spikes line up with deploys, social shares, and search engine crawl windows.
- You see duplicate "first-time" inserts for the same resource.
Root Cause
- You are executing database inserts during GET-rendered page requests (SSR/Route Handlers).
- Crawlers (Google, social scrapers, Vercel/Next.js bots) visit those same URLs.
- GET requests aren't supposed to mutate state. But if your SSR logic does, bots will cause real inserts.
The Fix: Defense in Depth
Apply three layers. Each one helps, but together they eliminate the problem and future-proof the app.
- Avoid side effects in GET – Move inserts to explicit POST endpoints or server actions guarded by user intent.
- Idempotency and uniqueness – Ensure multiple requests can't create duplicates.
- Bot/crawler detection – No-op early if a bot somehow hits a mutating path.
1. Move side effects out of GET
Refactor any "insert-on-view" code into a dedicated POST API route or a server action triggered by a user event. Now, pages only read data on SSR; writes only happen from explicit actions.
2. Idempotency and unique constraints
Add a unique index on match_id
so duplicates can't land:
create unique index if not exists matches_match_id_key
on public.matches (match_id);
In Supabase (TypeScript), upsert instead of insert:
await supabase
.from('matches')
.upsert([payload], { onConflict: 'match_id' });
Also consider adding an idempotency key: the client sends an Idempotency-Key
header, and your server stores it to short-circuit duplicates within a set window.
3. Detect and short-circuit bots
At render time or API boundaries, skip mutation work when the user-agent looks like a crawler. This is a guardrail, not the main fix.
Example in a Next.js server component/route handler:
import { headers } from 'next/headers';
const ua = headers().get('user-agent') ?? '';
const isBot =
/bot|crawler|spider|crawling/i.test(ua) ||
ua.includes('Vercel') ||
ua.includes('Next.js');
if (isBot) {
// Do not perform inserts/mutations here
}
Also add UA checks in middleware.ts
or specific route.ts
handlers for POST to ensure even accidental writes from bots are ignored.
A Safer Architecture for Data Fetching
- Pages (GET) are read-only.
- Mutations happen via POST/server actions only after:
- Auth and/or CSRF checks
- Human-triggered events (e.g., button clicks)
- Idempotency keys
- Database uniqueness rules remove race-condition duplicates.
What About Prefetch and Social Cards?
Next.js prefetch or social scrapers (OG/Twitter cards) will render pages. Avoid doing inserts in any code path needed to generate previews or index pages.
If you generate dynamic OG images, make sure those routes never mutate state.
Monitoring
- Log user agent, method, and path for all mutating requests.
- Track deduped vs. attempted inserts to confirm idempotency works.
- Alert on spikes of blocked-bot mutations to catch regressions.
Key Takeaways
- Treat GET as read-only. If it mutates state, crawlers will mutate your state.
- Use unique constraints and upserts to make writes idempotent.
- Add bot detection as a backstop, not your only defense.
If you see unexpected rows in Supabase, the root cause may be "insert-on-view" logic in SSR paths. Once you move that logic behind explicit user intent, enforce uniqueness, and add bot checks, the "ghost inserts" disappear.