Full Setup Guide

Build a LinkedIn
Lead Scraper

Track keywords, profiles, and posts on LinkedIn. Collect every person who likes or comments. 1,000 leads for about $6. Built with Claude Code.

What It Does

4 Ways to Collect Leads

Each mode solves a different use case. Run them on a schedule or trigger them manually.

Track Keywords

Pick a topic like "outbound", "AI", or "GTM". The tool searches LinkedIn for posts mentioning that keyword and collects every person liking and commenting on them.

Track Profiles

Add a LinkedIn profile URL. The tool checks their new posts and collects the people interacting with them. Track competitors, creators, or your own team.

Track Posts Over Time

Add a post link. The tool keeps checking that post and collects new people who like or comment as they come in. Great for Thought Leader Ads.

Instant Post Scrape

Paste a post link and get the full list right now. No waiting, no schedule. One-time pull of everyone who engaged.

Output: Every lead comes back with LinkedIn profile data. Send the list to Clay, export a CSV, push to a webhook, or pipe it anywhere you want.

How It Works

Create Monitor

Keyword, profile,
or post URL

Queue Job

Schedule or
trigger manually

Scrape

Fetch posts,
get engagers

Deduplicate

Skip people
you already have

Deliver

Webhook, CSV,
or Clay

Tech Stack

React + Vite

Frontend dashboard
hosted on Netlify

Supabase

Postgres database
+ auth + realtime

Python Worker

Background processor
on Railway (Docker)

Limadata

LinkedIn post search,
profile posts, metadata

Apify

Extract commenters
+ likers from posts

Webhook Delivery

Send leads to Clay,
CRM, or anywhere

Why this stack: React + Vite for a fast dashboard. Supabase handles auth, database, and realtime subscriptions (job status updates live). Python worker on Railway runs 24/7 for about $5/month. Limadata handles LinkedIn post search and profile data. Apify handles the actual engagement scraping (commenters + likers). Total infra cost: under $10/month before API usage.

Architecture

How the System Connects

The frontend creates monitors and queues jobs. The Python worker polls Supabase every 5 seconds, picks up jobs, calls the LinkedIn APIs, deduplicates results, and delivers leads via webhook.

Dashboard (React)           Database (Supabase)           Worker (Python on Railway)
┌─────────────────┐       ┌─────────────────────┐       ┌─────────────────────┐
│                 │       │                     │       │                     │
│  Create monitor │──────>│  monitors table     │       │  Polls every 5s     │
│  Queue job      │──────>│  queue_jobs table   │<──────│  Picks up "queued"  │
│  View results   │<──────│  runs table         │<──────│  Writes results     │
│  Approve large  │──────>│  seen_engagers      │<──────│  Deduplicates       │
│                 │       │                     │       │                     │
└─────────────────┘       └─────────────────────┘       └──────────┬──────────┘
                                                                   │
                                                        ┌──────────┴──────────┐
                                                        │  External APIs       │
                                                        │  Limadata           │
                                                        │  Apify              │
                                                        │  Webhook delivery   │
                                                        └─────────────────────┘

Realtime Updates

Supabase realtime subscriptions push job status changes to the dashboard. You see "queued" -> "running" -> "completed" live.

Deduplication

A seen_engagers table tracks every person you've already collected per monitor. Repeat runs only return new leads.

Approval Flow

Jobs with 1,000+ estimated leads pause and ask for approval before running. Prevents surprise API costs.

Phase 1

Project Setup

Create the frontend, set up the database, configure the worker, and get API access for LinkedIn data extraction.

# Create the project
npm create vite@latest social-engager -- --template react-ts
cd social-engager

# Install dependencies
npm install @supabase/supabase-js tailwindcss postcss autoprefixer
npm install lucide-react react-router-dom

# Init Tailwind
npx tailwindcss init -p

Six core tables. Monitors hold the config, queue_jobs holds pending work, runs hold results, seen_engagers handles dedup, and projects + profiles handle multi-tenant auth.

-- Core tables

CREATE TABLE profiles (
  id UUID REFERENCES auth.users PRIMARY KEY,
  email TEXT,
  full_name TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE projects (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  name TEXT NOT NULL,
  created_by UUID REFERENCES profiles(id),
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE monitors (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  project_id UUID REFERENCES projects(id),
  name TEXT NOT NULL,
  mode TEXT NOT NULL,          -- 'keyword' | 'profile' | 'posts' | 'direct'
  input TEXT NOT NULL,         -- keyword string, profile URL, or post URLs
  webhook_url TEXT,             -- where to send leads
  schedule TEXT,                -- 'daily' | 'weekly' | 'monthly' | null
  is_active BOOLEAN DEFAULT true,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE queue_jobs (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  monitor_id UUID REFERENCES monitors(id),
  project_id UUID REFERENCES projects(id),
  status TEXT DEFAULT 'queued',  -- queued | running | completed | failed | awaiting_approval
  payload JSONB,
  result JSONB,
  error TEXT,
  started_at TIMESTAMPTZ,
  completed_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE runs (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  monitor_id UUID REFERENCES monitors(id),
  project_id UUID REFERENCES projects(id),
  job_id UUID REFERENCES queue_jobs(id),
  leads_found INTEGER DEFAULT 0,
  new_leads INTEGER DEFAULT 0,
  cost DECIMAL(10,4) DEFAULT 0,
  posts_scraped INTEGER DEFAULT 0,
  details JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE seen_engagers (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  monitor_id UUID REFERENCES monitors(id),
  post_url TEXT NOT NULL,
  engager_profile_url TEXT NOT NULL,
  first_seen_at TIMESTAMPTZ DEFAULT NOW(),
  UNIQUE(monitor_id, post_url, engager_profile_url)
);

Enable Row-Level Security on all tables. Each table should have RLS policies scoped to the user's project. The backend worker uses the Supabase service role key to bypass RLS.

Limadata

For searching LinkedIn posts by keyword, fetching a profile's recent posts, and getting post metadata (engagement counts). Limadata provides LinkedIn data endpoints for post search and profile listing. Cost is per API call (about $0.04/call).

Apify

For extracting the actual people who liked and commented on a post. Apify runs scraping actors that pull LinkedIn post commenters and reactions. Cost is about $2 per 1,000 engagers extracted.

Frontend .env (Vite exposes VITE_ prefixed vars to the browser - only put public keys here):

Supabase (public)
VITE_SUPABASE_URL
VITE_SUPABASE_ANON_KEY

Backend .env (all secret - never expose these):

Supabase (service role)
SUPABASE_URL
SUPABASE_KEY
Limadata
LIMADATA_API_KEY
Apify
APIFY_API_KEY

social-engager/
│
├── src/                             ← React frontend
│   ├── App.tsx                      ← Routes + auth gating
│   ├── pages/
│   │   ├── Dashboard.tsx            ← Monitor management
│   │   ├── Queue.tsx                ← Job queue + approval flow
│   │   ├── History.tsx              ← Run execution history
│   │   ├── Trigger.tsx              ← Manual job triggering
│   │   └── Login.tsx                ← Supabase auth
│   ├── components/
│   │   ├── MonitorCard.tsx          ← Display monitor config + stats
│   │   ├── MonitorForm.tsx          ← Create/edit monitors
│   │   ├── UsageBanner.tsx          ← Cost + usage tracking
│   │   └── Layout.tsx               ← Navigation shell
│   ├── hooks/
│   │   ├── useMonitors.ts           ← CRUD for monitors
│   │   ├── useQueue.ts              ← Job enqueueing + approval
│   │   ├── useRuns.ts               ← Execution history queries
│   │   ├── useUsageStats.ts         ← Cost tracking per project
│   │   └── useJobNotifications.ts   ← Realtime status via Supabase
│   └── lib/
│       ├── supabase.ts              ← Supabase client init
│       ├── types.ts                 ← TypeScript interfaces
│       └── costs.ts                 ← Cost calculation helpers
│
├── backend/                         ← Python worker
│   ├── worker.py                    ← Main job processor (runs 24/7)
│   ├── Dockerfile                   ← Python 3.11-slim for Railway
│   └── requirements.txt             ← supabase, requests, python-dotenv
│
└── supabase/                        ← Database schema + migrations
    ├── setup.sql                    ← Full production schema
    └── migrations/                  ← Incremental changes

Phase 2

Build the Dashboard

A React frontend with monitor management, a job queue with approval flow, execution history, and realtime status updates via Supabase.

1

Monitor creation form
Select mode (keyword, profile, posts, direct). Enter the input (keyword string, LinkedIn profile URL, or post URLs). Optionally set a webhook URL and schedule (daily, weekly, monthly).
2

Monitor cards
Display each monitor with its mode, input, schedule, and stats (total leads found, last run date, cost to date). Toggle active/inactive.
3

Manual trigger button
Run any monitor on demand. Creates a queue_job with status "queued" and the monitor's config as payload.

1

Job queue page
List all jobs with status badges: queued (waiting), running (in progress), completed (done), failed (error), awaiting_approval (needs confirmation).
2

Approval flow
When estimated leads exceed 1,000, the worker sets status to "awaiting_approval". The dashboard shows the estimated count and cost. User clicks "Approve" to re-queue the job.
3

Realtime status updates
Subscribe to Supabase realtime on the queue_jobs table. Status changes appear live without refreshing.

// Supabase realtime subscription for job status
const channel = supabase
  .channel('job-updates')
  .on('postgres_changes', {
    event: 'UPDATE',
    schema: 'public',
    table: 'queue_jobs',
    filter: `project_id=eq.${projectId}`,
  }, (payload) => {
    updateJobInState(payload.new);
  })
  .subscribe();

Run History

Show every execution with: monitor name, leads found vs new leads (after dedup), posts scraped, cost, and timestamp. Filter by monitor or date range.

Usage Tracking

Show total leads collected, total cost, and monthly usage against your limit. Display a warning banner when approaching 80% of monthly cap. Break down cost per monitor.

Phase 3

Build the Worker

A Python script that runs 24/7 on Railway. It polls Supabase for queued jobs, calls the LinkedIn APIs, deduplicates results, and delivers leads.

# worker.py - main loop

import time
from supabase import create_client

supabase = create_client(SUPABASE_URL, SUPABASE_KEY)

while True:
    # Pick up the oldest queued job
    job = supabase.table("queue_jobs") \
        .select("*") \
        .eq("status", "queued") \
        .order("created_at") \
        .limit(1) \
        .execute()

    if job.data:
        process_job(job.data[0])
    else:
        time.sleep(5)

1

Keyword mode
Search LinkedIn for posts matching the keyword. Run two parallel searches (by relevance and by recency) to get a broader set. Take the top posts by engagement count. Scrape commenters + likers from each post.
2

Profile mode
Fetch the profile's recent posts via Limadata. For each post, scrape commenters + likers with Apify. Useful for tracking competitors or your own team's content.
3

Posts mode (recurring)
Take the post URLs from the monitor config. Check if engagement count has changed since last run (smart skip). If new engagement detected, scrape and deduplicate against seen_engagers. Only return NEW people.
4

Direct mode (one-time)
Take the post URL, scrape all commenters + likers in one shot. No dedup tracking, no repeat logic. One-time full pull.

def process_job(job):
    # 1. Mark job as running
    update_status(job["id"], "running")

    # 2. Get post URLs based on mode
    if job["mode"] == "keyword":
        posts = search_posts_by_keyword(job["input"])
    elif job["mode"] == "profile":
        posts = get_profile_posts(job["input"])
    else:
        posts = job["input"]  # direct post URLs

    # 3. Check approval threshold
    estimated = estimate_engagers(posts)
    if estimated > 1000 and not job.get("large_scrape_approved"):
        update_status(job["id"], "awaiting_approval")
        return

    # 4. Scrape engagers from each post (parallel, max 3 concurrent)
    all_engagers = []
    for post in posts:
        commenters = scrape_commenters(post["url"])
        likers = scrape_likers(post["url"])
        all_engagers.extend(commenters + likers)

    # 5. Deduplicate against seen_engagers
    new_leads = deduplicate(job["monitor_id"], all_engagers)

    # 6. Deliver via webhook
    if job["webhook_url"] and new_leads:
        send_to_webhook(job["webhook_url"], new_leads)

    # 7. Record results
    create_run(job, leads_found=len(all_engagers), new_leads=len(new_leads))
    update_status(job["id"], "completed")

The seen_engagers table tracks every (monitor_id, post_url, engager_profile_url) combination. Before delivering leads, check which ones are new.

def deduplicate(monitor_id, engagers):
    new_leads = []
    for engager in engagers:
        # Check if we've seen this person on this post before
        existing = supabase.table("seen_engagers") \
            .select("id") \
            .eq("monitor_id", monitor_id) \
            .eq("engager_profile_url", engager["profile_url"]) \
            .execute()

        if not existing.data:
            # New lead - record it and add to output
            supabase.table("seen_engagers").insert({
                "monitor_id": monitor_id,
                "post_url": engager["post_url"],
                "engager_profile_url": engager["profile_url"],
            }).execute()
            new_leads.append(engager)

    return new_leads

Phase 4

Safety + Cost Controls

Prevent runaway costs, handle stuck jobs, and protect against edge cases. These are the things that break in production.

1

Calculate cost per job
Track API calls and leads scraped. Cost = (engagers x cost per engager) + (API calls x cost per call). Store on each run record.
2

Monthly lead limit
Set a monthly cap (e.g., 5,000 leads). Query the runs table for the current billing cycle. Block new jobs when approaching the limit. Show a warning banner at 80%.
3

Per-job lead cap
Hard limit per individual job (e.g., 5,000 leads). Anything above 1,000 requires the approval flow before running.

Job timeout

Set a 10-minute hard timeout per job. If the worker is still running after 10 minutes, mark the job as failed with the error logged.

Stale job recovery

On startup, scan for jobs stuck in "running" for more than 15 minutes. Reset them to "queued" so they get retried.

Zero engagers

If Apify returns 0 results, mark the job as failed instead of completed. Something went wrong - don't record it as a successful run.

Parallel limits

Cap concurrent Apify actor calls to 3. More than that and you hit rate limits or timeouts. Use a semaphore or simple counter.

Smart skip (posts mode)

Before scraping a tracked post again, check if the engagement count changed since last run. If not, skip it - no new people to find.

Worker restart

Configure Railway (or your host) to auto-restart on failure. Set retry limits (e.g., 10 restarts) before alerting you.

Phase 5

Deploy Everything

Frontend to Netlify, worker to Railway, database already on Supabase. Three services, each deployed independently.

1

Build the project
Run npm run build. This creates a dist/ folder with the static site.
2

Deploy to Netlify
Drag the dist/ folder to Netlify, or connect your GitHub repo for automatic deploys on every push.
3

Add environment variables
Set VITE_SUPABASE_URL and VITE_SUPABASE_ANON_KEY in Netlify's environment variable settings.

1

Create a Dockerfile
Python 3.11-slim base image. Copy requirements.txt and worker.py. Install dependencies. Set the entrypoint to run worker.py.
2

Push to Railway
Connect your GitHub repo to Railway. It detects the Dockerfile and deploys automatically. Set environment variables in Railway's dashboard.
3

Configure restart policy
Set the service to auto-restart on failure with a retry limit. The worker should run continuously.

# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY worker.py .
CMD ["python", "worker.py"]

The Build Order

Start to Finish

Follow this sequence. Start with the database and worker, then build the frontend around it.

Supabase setup - create project, run schema, enable RLS
Get API access - Limadata + Apify API credentials
Worker skeleton - polling loop + job status updates
Direct mode - simplest: paste URL, scrape, return results
Keyword mode - search posts, scrape top results
Profile mode - fetch profile posts, scrape engagers
Deduplication - seen_engagers table + dedup logic
Posts mode - recurring scrape with smart skip
Webhook delivery - send leads to Clay or any endpoint
Approval flow - threshold check + awaiting_approval status
Cost tracking - per-job and monthly limits
Frontend dashboard - monitors, queue, history, usage
Realtime updates - Supabase subscriptions for live status
Deploy worker - Railway with Docker + auto-restart
Deploy frontend - Netlify with env vars

Cost breakdown: About $6 per 1,000 leads ($2 from Apify scraping + $4 from Limadata calls). Infrastructure (Supabase + Railway + Netlify) runs under $10/month on free/starter tiers. The whole system is mutable - when you want to add a new mode, change the scraping logic, or adjust cost limits, describe it to Claude Code. Done in minutes.

More GTM Engineering Tools

Claude Code GTM Framework

The master framework for structuring Claude Code projects for go-to-market operations. Folder architecture, governance, AI agents, and knowledge hierarchies for B2B SaaS.

AI-Powered Revenue Engine

Replace your $10K/year lead routing tool. Capture, qualify, route, book, enrich, and recover leads - serverless, under $50/month. Full step-by-step SOP.

Track Keywords

Track Profiles

Track Posts Over Time

Instant Post Scrape

Create Monitor

Queue Job

Scrape

Deduplicate

Deliver

React + Vite

Supabase

Python Worker

Limadata

Apify

Webhook Delivery

Project Setup

Create the Frontend

Set Up the Supabase Database

Get API Access

Set Up Environment Variables

Organize the File Structure

Build the Dashboard

Build the Monitor Management Page

Build the Job Queue + Approval Flow

Build the History + Usage Pages

Build the Worker

Build the Job Polling Loop

Implement Each Scraping Mode

Build the Scraping Pipeline

Build the Deduplication System

Safety + Cost Controls

Implement Cost Tracking + Limits

Handle Failures + Edge Cases

Job timeout

Stale job recovery

Zero engagers

Parallel limits

Smart skip (posts mode)

Worker restart

Deploy Everything

Deploy the Frontend to Netlify

Deploy the Worker to Railway