Job Board System

The JSON Resume job board is an AI-powered system that matches your resume with relevant job opportunities from Hacker News “Who is Hiring?” threads.

Overview

The job board system consists of three main stages:

Fetch: Scrape job postings from HN “Who is Hiring?” threads
Process: Use AI to extract structured data from job posts
Vectorize: Create semantic embeddings for intelligent matching

How It Works

Data Pipeline


┌──────────────────────────────────────────────────────────────┐
│                      Data Flow Diagram                        │
└──────────────────────────────────────────────────────────────┘

1. Fetch Jobs
   │
   ├─> Hacker News API
   ├─> Find "Ask HN: Who is hiring?" threads
   └─> Extract job comments → Supabase jobs table
       │
       ├─ uuid (unique identifier)
       ├─ hn_id (Hacker News comment ID)
       ├─ content (raw job description text)
       └─ posted_at (timestamp)

2. AI Processing (GPT-5-mini)
   │
   ├─> For each job without gpt_content
   ├─> OpenAI API extracts:
   │   ├─ Company name
   │   ├─ Job title
   │   ├─ Location (city, state, country, remote status)
   │   ├─ Job type (full-time, part-time, contract, etc.)
   │   ├─ Salary range
   │   ├─ Required skills
   │   ├─ Description
   │   └─ Application URL
   └─> Store as JSON in gpt_content column

3. Vectorization
   │
   ├─> For each processed job
   ├─> Generate embedding using OpenAI ada-002
   ├─> Store vector in Pinecone for semantic search
   └─> Enable resume-job matching based on skills/experience

Semantic Matching

When you upload your resume, the system:

Extracts your skills, experience, and preferences
Generates a semantic vector from your resume
Queries Pinecone for jobs with similar vectors
Ranks jobs by relevance to your background
Displays matched jobs with compatibility scores

Automated Processing

The job board runs automatically via GitHub Actions:

Schedule: Daily at 9 AM UTC (1 AM PST / 4 AM EST)
Manual Trigger: Available from GitHub Actions UI
Monitoring: Discord notifications on failures
Auto-Recovery: Creates GitHub issues for repeated failures

Workflow Stages

Each run processes jobs through three stages:

Stage	Description	Duration
🔍 Fetch	Scrape latest HN job posts	~30 seconds
🤖 AI Process	Extract structured data with GPT-5-mini	~45-60 minutes
🔢 Vectorize	Generate embeddings for search	~15-30 minutes

Local Development

Prerequisites


# Required environment variables
OPENAI_API_KEY=sk-...        # OpenAI API key
SUPABASE_KEY=eyJh...          # Supabase service role key
PINECONE_API_KEY=xxx-...      # Pinecone API key
PINECONE_ENVIRONMENT=us-...   # Pinecone environment

Running Scripts


# Navigate to registry app
cd apps/registry
 
# 1. Fetch latest jobs from HN
node scripts/jobs/getLatestWhoIsHiring.js
 
# 2. Process jobs with AI
node scripts/jobs/gpted.js
 
# 3. Vectorize processed jobs
node scripts/jobs/vectorize.js

Monitoring Progress


-- Check processing status
SELECT
  COUNT(*) FILTER (WHERE gpt_content IS NULL) as unprocessed,
  COUNT(*) FILTER (WHERE gpt_content IS NOT NULL AND gpt_content != 'FAILED') as processed,
  COUNT(*) FILTER (WHERE gpt_content = 'FAILED') as failed,
  COUNT(*) as total
FROM jobs
WHERE posted_at >= NOW() - INTERVAL '4 months';
 
-- Find recent failures
SELECT uuid, content, gpt_content, posted_at
FROM jobs
WHERE gpt_content = 'FAILED'
ORDER BY posted_at DESC
LIMIT 10;
 
-- Check vectorization status
SELECT COUNT(*)
FROM jobs
WHERE gpt_content IS NOT NULL
  AND gpt_content != 'FAILED'
  AND uuid IN (SELECT metadata->>'uuid' FROM pinecone_vectors);

Performance & Costs

OpenAI API Usage

Model: GPT-5-mini (cost-effective for extraction tasks)
Input: ~500-1000 tokens per job post
Output: ~200-300 tokens structured JSON
Cost: ~$0.0002 per job processed

Expected Monthly Costs

Component	Cost
OpenAI API (500-1000 jobs/month)	$0.10 - $0.20
Pinecone Free Tier	$0.00
Supabase Free Tier	$0.00
Total	~$0.50 - $2.00/month

Troubleshooting

Jobs Not Processing

Check OpenAI API key is valid and has credits
Verify Supabase connection
Check GitHub Actions logs for errors
Look for rate limiting (429 errors)

Vectorization Failures

Ensure Pinecone API key is correct
Verify Pinecone index exists
Check network connectivity
Look for quota limits

No New Jobs

HN “Who is Hiring?” posts monthly (usually first of month)
Script only fetches from recent threads (last 30 days)
Check HN API is accessible

Advanced Configuration

Customize AI Extraction

Edit apps/registry/scripts/jobs/gpted.js to modify the extraction prompt:


const prompt = `
Extract job details from this posting.
Return JSON with: company, title, location, type, salary, skills, description, url
`;

Adjust Processing Rate

Modify the batch size and delay in gpted.js:


const BATCH_SIZE = 10; // Process 10 jobs at a time
const DELAY_MS = 2000; // 2 second delay between batches

Schema

Jobs Table


CREATE TABLE jobs (
  uuid UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  hn_id TEXT UNIQUE NOT NULL,
  content TEXT NOT NULL,
  gpt_content JSONB,
  posted_at TIMESTAMP WITH TIME ZONE,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

GPT Content Format


{
  "company": "Acme Corp",
  "title": "Senior Software Engineer",
  "location": {
    "city": "San Francisco",
    "state": "CA",
    "country": "USA",
    "remote": "hybrid"
  },
  "type": "full-time",
  "salary": "$150k-$200k",
  "skills": ["JavaScript", "React", "Node.js"],
  "description": "We're looking for...",
  "url": "https://acme.com/jobs/123"
}

Architecture - System design and technical details
API Reference - Endpoints for job search and matching
Contributing - Help improve the job board

For detailed implementation information, see the comprehensive README in apps/registry/scripts/jobs/README.md.