Job Board System
The JSON Resume job board is an AI-powered system that matches your resume with relevant job opportunities from Hacker News “Who is Hiring?” threads.
Overview
The job board system consists of three main stages:
- Fetch: Scrape job postings from HN “Who is Hiring?” threads
- Process: Use AI to extract structured data from job posts
- Vectorize: Create semantic embeddings for intelligent matching
How It Works
Data Pipeline
┌──────────────────────────────────────────────────────────────┐
│                      Data Flow Diagram                        │
└──────────────────────────────────────────────────────────────┘
1. Fetch Jobs
   │
   ├─> Hacker News API
   ├─> Find "Ask HN: Who is hiring?" threads
   └─> Extract job comments → Supabase jobs table
       │
       ├─ uuid (unique identifier)
       ├─ hn_id (Hacker News comment ID)
       ├─ content (raw job description text)
       └─ posted_at (timestamp)
2. AI Processing (GPT-5-mini)
   │
   ├─> For each job without gpt_content
   ├─> OpenAI API extracts:
   │   ├─ Company name
   │   ├─ Job title
   │   ├─ Location (city, state, country, remote status)
   │   ├─ Job type (full-time, part-time, contract, etc.)
   │   ├─ Salary range
   │   ├─ Required skills
   │   ├─ Description
   │   └─ Application URL
   └─> Store as JSON in gpt_content column
3. Vectorization
   │
   ├─> For each processed job
   ├─> Generate embedding using OpenAI ada-002
   ├─> Store vector in Pinecone for semantic search
   └─> Enable resume-job matching based on skills/experienceSemantic Matching
When you upload your resume, the system:
- Extracts your skills, experience, and preferences
- Generates a semantic vector from your resume
- Queries Pinecone for jobs with similar vectors
- Ranks jobs by relevance to your background
- Displays matched jobs with compatibility scores
Automated Processing
The job board runs automatically via GitHub Actions:
- Schedule: Daily at 9 AM UTC (1 AM PST / 4 AM EST)
- Manual Trigger: Available from GitHub Actions UI
- Monitoring: Discord notifications on failures
- Auto-Recovery: Creates GitHub issues for repeated failures
Workflow Stages
Each run processes jobs through three stages:
| Stage | Description | Duration | 
|---|---|---|
| 🔍 Fetch | Scrape latest HN job posts | ~30 seconds | 
| 🤖 AI Process | Extract structured data with GPT-5-mini | ~45-60 minutes | 
| 🔢 Vectorize | Generate embeddings for search | ~15-30 minutes | 
Local Development
Prerequisites
# Required environment variables
OPENAI_API_KEY=sk-...        # OpenAI API key
SUPABASE_KEY=eyJh...          # Supabase service role key
PINECONE_API_KEY=xxx-...      # Pinecone API key
PINECONE_ENVIRONMENT=us-...   # Pinecone environmentRunning Scripts
# Navigate to registry app
cd apps/registry
 
# 1. Fetch latest jobs from HN
node scripts/jobs/getLatestWhoIsHiring.js
 
# 2. Process jobs with AI
node scripts/jobs/gpted.js
 
# 3. Vectorize processed jobs
node scripts/jobs/vectorize.jsMonitoring Progress
-- Check processing status
SELECT
  COUNT(*) FILTER (WHERE gpt_content IS NULL) as unprocessed,
  COUNT(*) FILTER (WHERE gpt_content IS NOT NULL AND gpt_content != 'FAILED') as processed,
  COUNT(*) FILTER (WHERE gpt_content = 'FAILED') as failed,
  COUNT(*) as total
FROM jobs
WHERE posted_at >= NOW() - INTERVAL '4 months';
 
-- Find recent failures
SELECT uuid, content, gpt_content, posted_at
FROM jobs
WHERE gpt_content = 'FAILED'
ORDER BY posted_at DESC
LIMIT 10;
 
-- Check vectorization status
SELECT COUNT(*)
FROM jobs
WHERE gpt_content IS NOT NULL
  AND gpt_content != 'FAILED'
  AND uuid IN (SELECT metadata->>'uuid' FROM pinecone_vectors);Performance & Costs
OpenAI API Usage
- Model: GPT-5-mini (cost-effective for extraction tasks)
- Input: ~500-1000 tokens per job post
- Output: ~200-300 tokens structured JSON
- Cost: ~$0.0002 per job processed
Expected Monthly Costs
| Component | Cost | 
|---|---|
| OpenAI API (500-1000 jobs/month) | $0.10 - $0.20 | 
| Pinecone Free Tier | $0.00 | 
| Supabase Free Tier | $0.00 | 
| Total | ~$0.50 - $2.00/month | 
Troubleshooting
Jobs Not Processing
- Check OpenAI API key is valid and has credits
- Verify Supabase connection
- Check GitHub Actions logs for errors
- Look for rate limiting (429 errors)
Vectorization Failures
- Ensure Pinecone API key is correct
- Verify Pinecone index exists
- Check network connectivity
- Look for quota limits
No New Jobs
- HN “Who is Hiring?” posts monthly (usually first of month)
- Script only fetches from recent threads (last 30 days)
- Check HN API is accessible
Advanced Configuration
Customize AI Extraction
Edit apps/registry/scripts/jobs/gpted.js to modify the extraction prompt:
const prompt = `
Extract job details from this posting.
Return JSON with: company, title, location, type, salary, skills, description, url
`;Adjust Processing Rate
Modify the batch size and delay in gpted.js:
const BATCH_SIZE = 10; // Process 10 jobs at a time
const DELAY_MS = 2000; // 2 second delay between batchesSchema
Jobs Table
CREATE TABLE jobs (
  uuid UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  hn_id TEXT UNIQUE NOT NULL,
  content TEXT NOT NULL,
  gpt_content JSONB,
  posted_at TIMESTAMP WITH TIME ZONE,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);GPT Content Format
{
  "company": "Acme Corp",
  "title": "Senior Software Engineer",
  "location": {
    "city": "San Francisco",
    "state": "CA",
    "country": "USA",
    "remote": "hybrid"
  },
  "type": "full-time",
  "salary": "$150k-$200k",
  "skills": ["JavaScript", "React", "Node.js"],
  "description": "We're looking for...",
  "url": "https://acme.com/jobs/123"
}Related Documentation
- Architecture - System design and technical details
- API Reference - Endpoints for job search and matching
- Contributing - Help improve the job board
For detailed implementation information, see the comprehensive README in apps/registry/scripts/jobs/README.md.