Full Setup Guide

Build a LinkedIn
Lead Scraper

Track keywords, profiles, and posts on LinkedIn. Collect every person who likes or comments. 1,000 leads for about $6. Built with Claude Code.

What It Does
4 Ways to Collect Leads
Each mode solves a different use case. Run them on a schedule or trigger them manually.
1

Track Keywords

Pick a topic like "outbound", "AI", or "GTM". The tool searches LinkedIn for posts mentioning that keyword and collects every person liking and commenting on them.

2

Track Profiles

Add a LinkedIn profile URL. The tool checks their new posts and collects the people interacting with them. Track competitors, creators, or your own team.

3

Track Posts Over Time

Add a post link. The tool keeps checking that post and collects new people who like or comment as they come in. Great for Thought Leader Ads.

4

Instant Post Scrape

Paste a post link and get the full list right now. No waiting, no schedule. One-time pull of everyone who engaged.

Output: Every lead comes back with LinkedIn profile data. Send the list to Clay, export a CSV, push to a webhook, or pipe it anywhere you want.
How It Works

Create Monitor

Keyword, profile,
or post URL

-

Queue Job

Schedule or
trigger manually

-

Scrape

Fetch posts,
get engagers

-

Deduplicate

Skip people
you already have

-

Deliver

Webhook, CSV,
or Clay

Tech Stack
React + Vite

Frontend dashboard
hosted on Netlify

Supabase

Postgres database
+ auth + realtime

Python Worker

Background processor
on Railway (Docker)

Limadata

LinkedIn post search,
profile posts, metadata

Apify

Extract commenters
+ likers from posts

Webhook Delivery

Send leads to Clay,
CRM, or anywhere

Why this stack: React + Vite for a fast dashboard. Supabase handles auth, database, and realtime subscriptions (job status updates live). Python worker on Railway runs 24/7 for about $5/month. Limadata handles LinkedIn post search and profile data. Apify handles the actual engagement scraping (commenters + likers). Total infra cost: under $10/month before API usage.
Architecture
How the System Connects
The frontend creates monitors and queues jobs. The Python worker polls Supabase every 5 seconds, picks up jobs, calls the LinkedIn APIs, deduplicates results, and delivers leads via webhook.
Dashboard (React)           Database (Supabase)           Worker (Python on Railway)
┌─────────────────┐       ┌─────────────────────┐       ┌─────────────────────┐
│                 │       │                     │       │                     │
│  Create monitor │──────>│  monitors table     │       │  Polls every 5s     │
│  Queue job      │──────>│  queue_jobs table   │<──────│  Picks up "queued"  │
│  View results   │<──────│  runs table         │<──────│  Writes results     │
│  Approve large  │──────>│  seen_engagers      │<──────│  Deduplicates       │
│                 │       │                     │       │                     │
└─────────────────┘       └─────────────────────┘       └──────────┬──────────┘
                                                                   │
                                                        ┌──────────┴──────────┐
                                                        │  External APIs       │
                                                        │  Limadata           │
                                                        │  Apify              │
                                                        │  Webhook delivery   │
                                                        └─────────────────────┘
Realtime Updates
Supabase realtime subscriptions push job status changes to the dashboard. You see "queued" -> "running" -> "completed" live.
Deduplication
A seen_engagers table tracks every person you've already collected per monitor. Repeat runs only return new leads.
Approval Flow
Jobs with 1,000+ estimated leads pause and ask for approval before running. Prevents surprise API costs.
Phase 1

Project Setup

Create the frontend, set up the database, configure the worker, and get API access for LinkedIn data extraction.

01

Create the Frontend

React + Vite + Tailwind + Supabase Auth
# Create the project
npm create vite@latest social-engager -- --template react-ts
cd social-engager

# Install dependencies
npm install @supabase/supabase-js tailwindcss postcss autoprefixer
npm install lucide-react react-router-dom

# Init Tailwind
npx tailwindcss init -p
02

Set Up the Supabase Database

Create a Supabase project and run the schema
Six core tables. Monitors hold the config, queue_jobs holds pending work, runs hold results, seen_engagers handles dedup, and projects + profiles handle multi-tenant auth.
-- Core tables

CREATE TABLE profiles (
  id UUID REFERENCES auth.users PRIMARY KEY,
  email TEXT,
  full_name TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE projects (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  name TEXT NOT NULL,
  created_by UUID REFERENCES profiles(id),
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE monitors (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  project_id UUID REFERENCES projects(id),
  name TEXT NOT NULL,
  mode TEXT NOT NULL,          -- 'keyword' | 'profile' | 'posts' | 'direct'
  input TEXT NOT NULL,         -- keyword string, profile URL, or post URLs
  webhook_url TEXT,             -- where to send leads
  schedule TEXT,                -- 'daily' | 'weekly' | 'monthly' | null
  is_active BOOLEAN DEFAULT true,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE queue_jobs (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  monitor_id UUID REFERENCES monitors(id),
  project_id UUID REFERENCES projects(id),
  status TEXT DEFAULT 'queued',  -- queued | running | completed | failed | awaiting_approval
  payload JSONB,
  result JSONB,
  error TEXT,
  started_at TIMESTAMPTZ,
  completed_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE runs (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  monitor_id UUID REFERENCES monitors(id),
  project_id UUID REFERENCES projects(id),
  job_id UUID REFERENCES queue_jobs(id),
  leads_found INTEGER DEFAULT 0,
  new_leads INTEGER DEFAULT 0,
  cost DECIMAL(10,4) DEFAULT 0,
  posts_scraped INTEGER DEFAULT 0,
  details JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE seen_engagers (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  monitor_id UUID REFERENCES monitors(id),
  post_url TEXT NOT NULL,
  engager_profile_url TEXT NOT NULL,
  first_seen_at TIMESTAMPTZ DEFAULT NOW(),
  UNIQUE(monitor_id, post_url, engager_profile_url)
);
Enable Row-Level Security on all tables. Each table should have RLS policies scoped to the user's project. The backend worker uses the Supabase service role key to bypass RLS.
03

Get API Access

You need two external APIs for LinkedIn data
Limadata
For searching LinkedIn posts by keyword, fetching a profile's recent posts, and getting post metadata (engagement counts). Limadata provides LinkedIn data endpoints for post search and profile listing. Cost is per API call (about $0.04/call).
Apify
For extracting the actual people who liked and commented on a post. Apify runs scraping actors that pull LinkedIn post commenters and reactions. Cost is about $2 per 1,000 engagers extracted.
04

Set Up Environment Variables

Frontend .env and backend .env
Frontend .env (Vite exposes VITE_ prefixed vars to the browser - only put public keys here):
  • Supabase (public)
  • VITE_SUPABASE_URL
  • VITE_SUPABASE_ANON_KEY
Backend .env (all secret - never expose these):
  • Supabase (service role)
  • SUPABASE_URL
  • SUPABASE_KEY
  • Limadata
  • LIMADATA_API_KEY
  • Apify
  • APIFY_API_KEY
05

Organize the File Structure

Frontend pages + hooks, backend worker
social-engager/
│
├── src/                             ← React frontend
│   ├── App.tsx                      ← Routes + auth gating
│   ├── pages/
│   │   ├── Dashboard.tsx            ← Monitor management
│   │   ├── Queue.tsx                ← Job queue + approval flow
│   │   ├── History.tsx              ← Run execution history
│   │   ├── Trigger.tsx              ← Manual job triggering
│   │   └── Login.tsx                ← Supabase auth
│   ├── components/
│   │   ├── MonitorCard.tsx          ← Display monitor config + stats
│   │   ├── MonitorForm.tsx          ← Create/edit monitors
│   │   ├── UsageBanner.tsx          ← Cost + usage tracking
│   │   └── Layout.tsx               ← Navigation shell
│   ├── hooks/
│   │   ├── useMonitors.ts           ← CRUD for monitors
│   │   ├── useQueue.ts              ← Job enqueueing + approval
│   │   ├── useRuns.ts               ← Execution history queries
│   │   ├── useUsageStats.ts         ← Cost tracking per project
│   │   └── useJobNotifications.ts   ← Realtime status via Supabase
│   └── lib/
│       ├── supabase.ts              ← Supabase client init
│       ├── types.ts                 ← TypeScript interfaces
│       └── costs.ts                 ← Cost calculation helpers
│
├── backend/                         ← Python worker
│   ├── worker.py                    ← Main job processor (runs 24/7)
│   ├── Dockerfile                   ← Python 3.11-slim for Railway
│   └── requirements.txt             ← supabase, requests, python-dotenv
│
└── supabase/                        ← Database schema + migrations
    ├── setup.sql                    ← Full production schema
    └── migrations/                  ← Incremental changes
Phase 2

Build the Dashboard

A React frontend with monitor management, a job queue with approval flow, execution history, and realtime status updates via Supabase.

06

Build the Monitor Management Page

Create, edit, and manage tracking monitors
  1. 1
    Monitor creation form

    Select mode (keyword, profile, posts, direct). Enter the input (keyword string, LinkedIn profile URL, or post URLs). Optionally set a webhook URL and schedule (daily, weekly, monthly).

  2. 2
    Monitor cards

    Display each monitor with its mode, input, schedule, and stats (total leads found, last run date, cost to date). Toggle active/inactive.

  3. 3
    Manual trigger button

    Run any monitor on demand. Creates a queue_job with status "queued" and the monitor's config as payload.

07

Build the Job Queue + Approval Flow

See pending, running, and completed jobs with realtime updates
  1. 1
    Job queue page

    List all jobs with status badges: queued (waiting), running (in progress), completed (done), failed (error), awaiting_approval (needs confirmation).

  2. 2
    Approval flow

    When estimated leads exceed 1,000, the worker sets status to "awaiting_approval". The dashboard shows the estimated count and cost. User clicks "Approve" to re-queue the job.

  3. 3
    Realtime status updates

    Subscribe to Supabase realtime on the queue_jobs table. Status changes appear live without refreshing.

// Supabase realtime subscription for job status
const channel = supabase
  .channel('job-updates')
  .on('postgres_changes', {
    event: 'UPDATE',
    schema: 'public',
    table: 'queue_jobs',
    filter: `project_id=eq.${projectId}`,
  }, (payload) => {
    updateJobInState(payload.new);
  })
  .subscribe();
08

Build the History + Usage Pages

Track every run and monitor costs
Run History
Show every execution with: monitor name, leads found vs new leads (after dedup), posts scraped, cost, and timestamp. Filter by monitor or date range.
Usage Tracking
Show total leads collected, total cost, and monthly usage against your limit. Display a warning banner when approaching 80% of monthly cap. Break down cost per monitor.
Phase 3

Build the Worker

A Python script that runs 24/7 on Railway. It polls Supabase for queued jobs, calls the LinkedIn APIs, deduplicates results, and delivers leads.

09

Build the Job Polling Loop

The worker polls Supabase every 5 seconds for new jobs
# worker.py - main loop

import time
from supabase import create_client

supabase = create_client(SUPABASE_URL, SUPABASE_KEY)

while True:
    # Pick up the oldest queued job
    job = supabase.table("queue_jobs") \
        .select("*") \
        .eq("status", "queued") \
        .order("created_at") \
        .limit(1) \
        .execute()

    if job.data:
        process_job(job.data[0])
    else:
        time.sleep(5)
10

Implement Each Scraping Mode

Different logic for keywords, profiles, posts, and direct scrapes
  1. 1
    Keyword mode

    Search LinkedIn for posts matching the keyword. Run two parallel searches (by relevance and by recency) to get a broader set. Take the top posts by engagement count. Scrape commenters + likers from each post.

  2. 2
    Profile mode

    Fetch the profile's recent posts via Limadata. For each post, scrape commenters + likers with Apify. Useful for tracking competitors or your own team's content.

  3. 3
    Posts mode (recurring)

    Take the post URLs from the monitor config. Check if engagement count has changed since last run (smart skip). If new engagement detected, scrape and deduplicate against seen_engagers. Only return NEW people.

  4. 4
    Direct mode (one-time)

    Take the post URL, scrape all commenters + likers in one shot. No dedup tracking, no repeat logic. One-time full pull.

11

Build the Scraping Pipeline

Fetch posts, extract engagers, deduplicate, deliver
def process_job(job):
    # 1. Mark job as running
    update_status(job["id"], "running")

    # 2. Get post URLs based on mode
    if job["mode"] == "keyword":
        posts = search_posts_by_keyword(job["input"])
    elif job["mode"] == "profile":
        posts = get_profile_posts(job["input"])
    else:
        posts = job["input"]  # direct post URLs

    # 3. Check approval threshold
    estimated = estimate_engagers(posts)
    if estimated > 1000 and not job.get("large_scrape_approved"):
        update_status(job["id"], "awaiting_approval")
        return

    # 4. Scrape engagers from each post (parallel, max 3 concurrent)
    all_engagers = []
    for post in posts:
        commenters = scrape_commenters(post["url"])
        likers = scrape_likers(post["url"])
        all_engagers.extend(commenters + likers)

    # 5. Deduplicate against seen_engagers
    new_leads = deduplicate(job["monitor_id"], all_engagers)

    # 6. Deliver via webhook
    if job["webhook_url"] and new_leads:
        send_to_webhook(job["webhook_url"], new_leads)

    # 7. Record results
    create_run(job, leads_found=len(all_engagers), new_leads=len(new_leads))
    update_status(job["id"], "completed")
12

Build the Deduplication System

Track every engager per monitor so repeat runs only return new leads
The seen_engagers table tracks every (monitor_id, post_url, engager_profile_url) combination. Before delivering leads, check which ones are new.
def deduplicate(monitor_id, engagers):
    new_leads = []
    for engager in engagers:
        # Check if we've seen this person on this post before
        existing = supabase.table("seen_engagers") \
            .select("id") \
            .eq("monitor_id", monitor_id) \
            .eq("engager_profile_url", engager["profile_url"]) \
            .execute()

        if not existing.data:
            # New lead - record it and add to output
            supabase.table("seen_engagers").insert({
                "monitor_id": monitor_id,
                "post_url": engager["post_url"],
                "engager_profile_url": engager["profile_url"],
            }).execute()
            new_leads.append(engager)

    return new_leads
Phase 4

Safety + Cost Controls

Prevent runaway costs, handle stuck jobs, and protect against edge cases. These are the things that break in production.

13

Implement Cost Tracking + Limits

Track spend per job, per monitor, and per month
  1. 1
    Calculate cost per job

    Track API calls and leads scraped. Cost = (engagers x cost per engager) + (API calls x cost per call). Store on each run record.

  2. 2
    Monthly lead limit

    Set a monthly cap (e.g., 5,000 leads). Query the runs table for the current billing cycle. Block new jobs when approaching the limit. Show a warning banner at 80%.

  3. 3
    Per-job lead cap

    Hard limit per individual job (e.g., 5,000 leads). Anything above 1,000 requires the approval flow before running.

14

Handle Failures + Edge Cases

Timeouts, stale jobs, zero results, and API errors

Job timeout

Set a 10-minute hard timeout per job. If the worker is still running after 10 minutes, mark the job as failed with the error logged.

Stale job recovery

On startup, scan for jobs stuck in "running" for more than 15 minutes. Reset them to "queued" so they get retried.

Zero engagers

If Apify returns 0 results, mark the job as failed instead of completed. Something went wrong - don't record it as a successful run.

Parallel limits

Cap concurrent Apify actor calls to 3. More than that and you hit rate limits or timeouts. Use a semaphore or simple counter.

Smart skip (posts mode)

Before scraping a tracked post again, check if the engagement count changed since last run. If not, skip it - no new people to find.

Worker restart

Configure Railway (or your host) to auto-restart on failure. Set retry limits (e.g., 10 restarts) before alerting you.

Phase 5

Deploy Everything

Frontend to Netlify, worker to Railway, database already on Supabase. Three services, each deployed independently.

15

Deploy the Frontend to Netlify

Build and drag-drop, or connect to GitHub for auto-deploy
  1. 1
    Build the project

    Run npm run build. This creates a dist/ folder with the static site.

  2. 2
    Deploy to Netlify

    Drag the dist/ folder to Netlify, or connect your GitHub repo for automatic deploys on every push.

  3. 3
    Add environment variables

    Set VITE_SUPABASE_URL and VITE_SUPABASE_ANON_KEY in Netlify's environment variable settings.

16

Deploy the Worker to Railway

Docker container running 24/7
  1. 1
    Create a Dockerfile

    Python 3.11-slim base image. Copy requirements.txt and worker.py. Install dependencies. Set the entrypoint to run worker.py.

  2. 2
    Push to Railway

    Connect your GitHub repo to Railway. It detects the Dockerfile and deploys automatically. Set environment variables in Railway's dashboard.

  3. 3
    Configure restart policy

    Set the service to auto-restart on failure with a retry limit. The worker should run continuously.

# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY worker.py .
CMD ["python", "worker.py"]
The Build Order
Start to Finish
Follow this sequence. Start with the database and worker, then build the frontend around it.
  1. Supabase setup - create project, run schema, enable RLS
  2. Get API access - Limadata + Apify API credentials
  3. Worker skeleton - polling loop + job status updates
  4. Direct mode - simplest: paste URL, scrape, return results
  5. Keyword mode - search posts, scrape top results
  6. Profile mode - fetch profile posts, scrape engagers
  7. Deduplication - seen_engagers table + dedup logic
  8. Posts mode - recurring scrape with smart skip
  9. Webhook delivery - send leads to Clay or any endpoint
  10. Approval flow - threshold check + awaiting_approval status
  11. Cost tracking - per-job and monthly limits
  12. Frontend dashboard - monitors, queue, history, usage
  13. Realtime updates - Supabase subscriptions for live status
  14. Deploy worker - Railway with Docker + auto-restart
  15. Deploy frontend - Netlify with env vars
Cost breakdown: About $6 per 1,000 leads ($2 from Apify scraping + $4 from Limadata calls). Infrastructure (Supabase + Railway + Netlify) runs under $10/month on free/starter tiers. The whole system is mutable - when you want to add a new mode, change the scraping logic, or adjust cost limits, describe it to Claude Code. Done in minutes.
More GTM Engineering Tools
Claude Code GTM Framework
The master framework for structuring Claude Code projects for go-to-market operations. Folder architecture, governance, AI agents, and knowledge hierarchies for B2B SaaS.
AI-Powered Revenue Engine
Replace your $10K/year lead routing tool. Capture, qualify, route, book, enrich, and recover leads - serverless, under $50/month. Full step-by-step SOP.