Recruit AI is an AI-driven recruitment platform that performs bidirectional matching between job descriptions and talent profiles. The system uses vector embeddings and multi-factor scoring to identify the best fits — recruiters find candidates for positions, and talent profiles get matched to relevant opportunities.
Built with a hexagonal architecture powered by Effect.ts, the platform separates core business logic from infrastructure concerns, making it fully testable and adapter-swappable.
Intelligent Matching System
The matching engine operates in two phases: semantic vector retrieval followed by multi-factor scoring. The current formula is a starting baseline — the real strategy is eval-driven optimization with recruiter-labeled ground truth data.
Matching Pipeline
- Hard Constraint Filtering: Work mode, location, and relocation preferences applied before vector search
- Vector Retrieval: Semantic similarity search via Qdrant with 3072-dimensional Gemini embeddings
- Multi-Factor Scoring: Weighted combination of similarity, keywords, experience, and constraints
- Bidirectional Matching: Jobs to talents and talents to jobs with the same scoring engine
Evaluation-Driven Development
The scoring formula weights are intentionally a rough starting point. The architecture is designed so that the formula can be systematically improved through evaluations once a proper labeled dataset is available. The project defines five evaluation groups that progressively validate and optimize each layer of the system:
Component Evals
- LLM Extraction Evals: Field-level accuracy and keyword F1 scores for resume and job description parsing across multiple LLM providers
- Embedding Model Comparison: Recall@10, MRR, and separation gap between good/poor matches across embedding providers
- Retrieval Quality: Filter correctness and eligible candidate recall with latency benchmarks across vector DB options
Scoring & End-to-End Evals
- Weight Optimization: Grid and Bayesian search across weight combinations, measured by rank correlation with recruiter preferences
- Factor Ablation: NDCG@10 impact analysis when removing individual scoring factors
- Agent-Driven Discovery: Agentic coding loop that iteratively modifies the scoring function and optimizes toward target metrics
- Full Pipeline Eval: End-to-end comparison of system rankings against recruiter-preferred orderings
The eval framework follows Anthropic's methodology — tasks, trials, and graders — with strict separation between dev sets (60%) for iteration and held-out test sets (40%) for final comparison. Both deterministic code-based graders and LLM rubric graders are used to capture objective accuracy and subjective quality.
AI-Powered Data Processing
The platform uses LLMs for structured field extraction from resumes and job descriptions, enriching raw text into queryable structured data.
Structured Extraction
- Resume Parsing: LLM extracts skills, experience, preferences, and constraints from uploaded resumes
- Job Description Analysis: Automatic extraction of requirements, responsibilities, and qualifications
- Interactive Clarification: AI asks follow-up questions to fill gaps in submitted data
- Embedding Generation: Automatic vector indexing for all processed entities
Status-Gated Visibility
- Processing Pipeline: Entities marked "extracting" remain invisible until fully processed
- Idempotent Upserts: Failed vector writes are safely retryable without data corruption
- Consistency Model: No cross-system transactions — status fields prevent partial data visibility
Hexagonal Architecture
The system follows a ports & adapters pattern with zero infrastructure dependencies in the core domain layer.
Core Domain
- Pure Scoring Functions: Matching logic isolated as testable pure functions with no Effect dependencies
- Domain Models: Type-safe entity definitions for talents, jobs, and matches
- Port Interfaces: Abstract service boundaries defined as Effect Context Tags
Adapters & Infrastructure
- PostgreSQL Adapter: Drizzle ORM for relational data storage with type-safe migrations
- Qdrant Adapter: Vector search with payload pre-filtering on hard constraints
- AI Adapter: Gemini 2.0 via Vercel AI SDK for embeddings and structured extraction
Why Qdrant Over pgvector
- Pre-Filtering: Native payload index filtering before ANN search eliminates wasted similarity comparisons
- High-Dimensional Support: 3072-dimensional Gemini embeddings exceed pgvector's 2000-dimension cap
- Batch Constraint Filtering: Efficient hard constraint application before expensive vector operations
Effect.ts Throughout
The entire backend is built with Effect.ts, providing typed dependency injection, errors as values, and streaming support.
- Dependency Injection: Services defined as
Context.Tag for compile-time verified wiring
- Typed Errors: All failure modes represented as discriminated unions in the type system
- Port-Based Abstraction: Swap adapter implementations without touching core business logic
- Effect HTTP API: Backend API layer built with
@effect/platform
Monorepo Structure
Built as a Turborepo monorepo with Bun, the codebase is split into focused packages:
- apps/web — Next.js frontend with shadcn/ui
- packages/core — Domain models, business logic ports, scoring algorithms
- packages/db — Drizzle schema, migrations, PostgreSQL adapters
- packages/vector — Qdrant integration layer
- packages/ai — LLM and embedding providers via Vercel AI SDK
- packages/api — Effect HTTP API layer
- packages/ui — Shared component library
- packages/env — Environment validation
Technology Stack
Frontend
- Next.js — App Router with React Server Components
- shadcn/ui — Component library with Tailwind CSS
- TypeScript — Strict mode across all packages
Backend & Data
- Effect.ts — Typed dependency injection and error handling
- Drizzle ORM — Type-safe PostgreSQL operations with migrations
- Qdrant — Vector database with payload filtering
- PostgreSQL — Relational storage for structured entity data
AI & Processing
- Gemini 2.0 Flash — LLM for structured extraction and clarification
- Gemini 2.0 Embedding — 3072-dimensional vector embeddings
- Vercel AI SDK — Unified AI provider interface
Tooling
- Turborepo — Build caching and task parallelization
- Bun — Package management and runtime
- Ultracite — Biome-based linting configuration