When Crawlers Create Data: Fixing Ghost Inserts in a Next.js + Supabase App

I recently discovered this issue in my Dota 2 Explorer app where web crawlers were quietly flooding my database with "ghost" rows—unintended inserts created by bots that crawl and pre-render pages. If your app persists data during page loads, you might run into this when pages perform inserts during SSR. Here's how I found it and fixed it.

Symptoms

You notice unexpected rows in Supabase without any real user sessions behind them.
Spikes line up with deploys, social shares, and search engine crawl windows.
You see duplicate "first-time" inserts for the same resource.

Root Cause

You are executing database inserts during GET-rendered page requests (SSR/Route Handlers).
Crawlers (Google, social scrapers, Vercel/Next.js bots) visit those same URLs.
GET requests aren't supposed to mutate state. But if your SSR logic does, bots will cause real inserts.

The Fix: Defense in Depth

Apply three layers. Each one helps, but together they eliminate the problem and future-proof the app.

Avoid side effects in GET – Move inserts to explicit POST endpoints or server actions guarded by user intent.
Idempotency and uniqueness – Ensure multiple requests can't create duplicates.
Bot/crawler detection – No-op early if a bot somehow hits a mutating path.

1. Move side effects out of GET

Refactor any "insert-on-view" code into a dedicated POST API route or a server action triggered by a user event. Now, pages only read data on SSR; writes only happen from explicit actions.

2. Idempotency and unique constraints

Add a unique index on match_id so duplicates can't land:

create unique index if not exists matches_match_id_key
on public.matches (match_id);

In Supabase (TypeScript), upsert instead of insert:

await supabase
  .from('matches')
  .upsert([payload], { onConflict: 'match_id' });

Also consider adding an idempotency key: the client sends an Idempotency-Key header, and your server stores it to short-circuit duplicates within a set window.

3. Detect and short-circuit bots

At render time or API boundaries, skip mutation work when the user-agent looks like a crawler. This is a guardrail, not the main fix.

Example in a Next.js server component/route handler:

import { headers } from 'next/headers';

const ua = headers().get('user-agent') ?? '';

const isBot =
  /bot|crawler|spider|crawling/i.test(ua) ||
  ua.includes('Vercel') ||
  ua.includes('Next.js');

if (isBot) {
  // Do not perform inserts/mutations here
}

Also add UA checks in middleware.ts or specific route.ts handlers for POST to ensure even accidental writes from bots are ignored.

A Safer Architecture for Data Fetching

Pages (GET) are read-only.
Mutations happen via POST/server actions only after:
- Auth and/or CSRF checks
- Human-triggered events (e.g., button clicks)
- Idempotency keys
Database uniqueness rules remove race-condition duplicates.

What About Prefetch and Social Cards?

Next.js prefetch or social scrapers (OG/Twitter cards) will render pages. Avoid doing inserts in any code path needed to generate previews or index pages.

If you generate dynamic OG images, make sure those routes never mutate state.

Monitoring

Log user agent, method, and path for all mutating requests.
Track deduped vs. attempted inserts to confirm idempotency works.
Alert on spikes of blocked-bot mutations to catch regressions.

Key Takeaways

Treat GET as read-only. If it mutates state, crawlers will mutate your state.
Use unique constraints and upserts to make writes idempotent.
Add bot detection as a backstop, not your only defense.

If you see unexpected rows in Supabase, the root cause may be "insert-on-view" logic in SSR paths. Once you move that logic behind explicit user intent, enforce uniqueness, and add bot checks, the "ghost inserts" disappear.