Home/Resources/Lead Qualification Schema

AI Lead Qualification Schema

Structured data model for autonomous lead qualification including scoring rules, enrichment patterns, and handoff criteria for AI agents.

Explanation

This schema defines the data structure and decision logic for AI-driven lead qualification systems. It enables autonomous agents to evaluate, score, and route leads without human intervention while maintaining strict quality standards and compliance requirements.

The schema is designed for integration with CRM systems (HubSpot, Salesforce, Pipedrive) and marketing automation platforms. Each field includes validation rules, enrichment sources, and confidence scoring to ensure data integrity throughout the qualification pipeline.

Core Lead Object Schema

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "QualifiedLead",
  "type": "object",
  "required": ["id", "email", "qualification_score", "status"],
  "properties": {
    "id": {
      "type": "string",
      "format": "uuid",
      "description": "Unique lead identifier"
    },
    "email": {
      "type": "string",
      "format": "email",
      "description": "Primary contact email",
      "validation": {
        "mx_record_check": true,
        "disposable_email_block": true,
        "role_account_flag": true
      }
    },
    "full_name": {
      "type": "string",
      "required": false,
      "enrichment_sources": ["LinkedIn", "Clearbit", "Apollo"]
    },
    "company": {
      "type": "object",
      "required": false,
      "properties": {
        "name": { "type": "string" },
        "domain": { "type": "string", "format": "hostname" },
        "industry": { "type": "string" },
        "employee_count": { "type": "integer" },
        "annual_revenue": { "type": "number" }
      }
    },
    "qualification_score": {
      "type": "number",
      "minimum": 0,
      "maximum": 100,
      "description": "Composite qualification score (0-100)"
    },
    "status": {
      "type": "string",
      "enum": ["new", "qualified", "unqualified", "contacted", "converted", "rejected"],
      "default": "new"
    },
    "lead_source": {
      "type": "string",
      "enum": ["organic", "paid_ads", "referral", "outbound", "partnership", "event"]
    },
    "intent_signals": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "signal_type": {
            "type": "string",
            "enum": ["page_view", "download", "demo_request", "pricing_view", "competitor_search"]
          },
          "score_weight": { "type": "number", "minimum": 0, "maximum": 1 },
          "timestamp": { "type": "string", "format": "date-time" },
          "confidence": { "type": "number", "minimum": 0, "maximum": 1 }
        }
      }
    },
    "engagement_metrics": {
      "type": "object",
      "properties": {
        "email_open_rate": { "type": "number", "minimum": 0, "maximum": 1 },
        "email_click_rate": { "type": "number", "minimum": 0, "maximum": 1 },
        "website_visits": { "type": "integer", "minimum": 0 },
        "content_downloads": { "type": "integer", "minimum": 0 },
        "last_engagement": { "type": "string", "format": "date-time" }
      }
    },
    "tags": {
      "type": "array",
      "items": { "type": "string" },
      "description": "Custom classification tags"
    },
    "enrichment_status": {
      "type": "string",
      "enum": ["pending", "enriched", "failed", "manual_review"],
      "default": "pending"
    },
    "created_at": {
      "type": "string",
      "format": "date-time",
      "auto_generate": true
    },
    "updated_at": {
      "type": "string",
      "format": "date-time",
      "auto_update": true
    },
    "owner_id": {
      "type": "string",
      "description": "Assigned sales rep or agent ID",
      "required": false
    }
  }
}

Qualification Scoring Algorithm

# Composite Score Calculation (0-100 scale)
def calculate_qualification_score(lead):
    score = 0
    
    # Firmographics (max 30 points)
    if lead.company.employee_count:
        if lead.company.employee_count >= 1000: score += 25
        elif lead.company.employee_count >= 100: score += 20
        elif lead.company.employee_count >= 50: score += 15
        elif lead.company.employee_count >= 10: score += 10
        else: score += 5
    
    if lead.company.annual_revenue:
        if lead.company.annual_revenue >= 10_000_000: score += 25
        elif lead.company.annual_revenue >= 1_000_000: score += 20
        elif lead.company.annual_revenue >= 100_000: score += 15
        elif lead.company.annual_revenue >= 10_000: score += 10
        else: score += 5
    
    # Intent Signals (max 40 points)
    intent_weights = {
        "demo_request": 15,
        "pricing_view": 12,
        "competitor_search": 10,
        "download": 8,
        "page_view": 3
    }
    for signal in lead.intent_signals:
        score += intent_weights.get(signal.signal_type, 0) * signal.score_weight
    
    # Engagement Metrics (max 20 points)
    if lead.engagement_metrics:
        score += min(lead.engagement_metrics.email_open_rate * 10, 10)
        score += min(lead.engagement_metrics.email_click_rate * 10, 10)
    
    # Fit Factors (max 10 points)
    if lead.company.industry in HIGH_FIT_INDUSTRIES:
        score += 10
    elif lead.company.industry in MEDIUM_FIT_INDUSTRIES:
        score += 5
    
    # Recency penalty (up to -20 points)
    days_since_engagement = (now() - lead.engagement_metrics.last_engagement).days
    if days_since_engagement > 30:
        score -= min(days_since_engagement // 10, 20)
    
    # Duplicate penalty
    if has_duplicate_contacts(lead.email):
        score -= 30
    
    return max(0, min(100, score))  # Clamp to 0-100

Decision Matrix

# Automated Routing Logic
QUALIFICATION_THRESHOLDS = {
    "hot_lead": {
        "min_score": 80,
        "status": "qualified",
        "auto_route": "senior_sales",
        "response_time": "immediate",
        "follow_up": "within_15_minutes"
    },
    "warm_lead": {
        "min_score": 60,
        "max_score": 79,
        "status": "qualified",
        "auto_route": "junior_sales",
        "response_time": "same_day",
        "follow_up": "within_4_hours",
        "nurture_sequence": "engagement_based"
    },
    "cold_lead": {
        "min_score": 40,
        "max_score": 59,
        "status": "contacted",
        "auto_route": "marketing_automation",
        "response_time": "within_week",
        "nurture_sequence": "drip_campaign"
    },
    "unqualified": {
        "max_score": 39,
        "status": "unqualified",
        "auto_route": "recycle",
        "response_time": "none",
        "reason": "insufficient_score"
    }
}

# Handoff Criteria (Human Review Required)
HANDOFF_RULES = {
    "score_above": 85,
    "high_value_account": { "min_revenue": 1_000_000, "min_employees": 500 },
    "enterprise_company": { "employee_count": 10000 },
    "c_suite_contact": { "title_keywords": ["CEO", "CTO", "CFO", "VP", "Director"] },
    "competitor_alert": { "current_vendor": True },
    "urgent_intent": { "multiple_signals": 3, "time_window": "24h" }
}

Data Enrichment Pipeline

# External API Integration for Data Enrichment
ENRICHMENT_SOURCES = {
    "clearbit": {
        "endpoint": "https://person.clearbit.com/v2/combined/find",
        "fields": ["name", "email", "company", "location", "avatar"],
        "confidence_threshold": 0.8,
        "rate_limit": "1000/hour"
    },
    "apollo": {
        "endpoint": "https://api.apollo.io/v1/mixed_people/search",
        "fields": ["name", "email", "company", "title", "linkedin_url"],
        "confidence_threshold": 0.7,
        "rate_limit": "10000/day"
    },
    "linkedin": {
        "method": "profile_scrape",
        "fields": ["name", "title", "company", "experience"],
        "confidence_threshold": 0.6,
        "rate_limit": "requires_auth"
    },
    "zoominfo": {
        "endpoint": "https://api.zoominfo.com/contact",
        "fields": ["phone", "company_revenue", "employee_count"],
        "confidence_threshold": 0.85,
        "rate_limit": "tiered"
    }
}

# Enrichment Workflow
1. Extract known data points (email, company domain, name)
2. Query enrichment APIs in parallel with timeout limits
3. Merge results using confidence-weighted algorithm
4. Validate against existing CRM to prevent duplicates
5. Store enriched data with source attribution
6. Update qualification score based on new data
7. Route to appropriate workflow based on final score

Duplicate Prevention & Merge Rules

# Deduplication Algorithm
def detect_duplicate(lead_a, lead_b):
    match_score = 0
    
    # Email match (highest weight)
    if normalize_email(lead_a.email) == normalize_email(lead_b.email):
        match_score += 100
    
    # Domain + name similarity
    if same_domain(lead_a.email, lead_b.email):
        if levenshtein_distance(lead_a.name, lead_b.name) < 3:
            match_score += 60
    
    # Phone match
    if lead_a.phone and lead_b.phone:
        if normalize_phone(lead_a.phone) == normalize_phone(lead_b.phone):
            match_score += 80
    
    # Company domain + name variation
    if lead_a.company.domain == lead_b.company.domain:
        if fuzzy_match(lead_a.company.name, lead_b.company.name):
            match_score += 40
    
    # LinkedIn URL match
    if lead_a.linkedin_url == lead_b.linkedin_url:
        match_score += 90
    
    return match_score >= 85  # Threshold for duplicate detection

# Merge Strategy (Keep Most Complete)
def merge_leads(primary, secondary):
    merged = primary.copy()
    
    for field in secondary:
        if not merged.get(field) or merged[field] == "unknown":
            merged[field] = secondary[field]
        elif isinstance(merged[field], list):
            merged[field] = list(set(merged[field] + secondary[field]))
    
    # Preserve highest qualification score
    merged.qualification_score = max(
        primary.qualification_score,
        secondary.qualification_score
    )
    
    # Combine engagement metrics
    merged.engagement_metrics = combine_metrics(
        primary.engagement_metrics,
        secondary.engagement_metrics
    )
    
    merged.status = "merged"
    merged.duplicate_of = primary.id
    
    return merged

Production Deployment Notes

Schema Evolution: Store schema version with each record to handle schema migrations gracefully.
Data Retention: Implement GDPR-compliant data retention policies (e.g., delete unqualified leads after 90 days).
API Rate Limits: Queue enrichment requests and implement exponential backoff for rate-limited APIs.
Validation Layers: Validate emails via MX record checks and verify phone numbers via SMS verification when available.
Consent Tracking: Record GDPR/CCPA consent status for each lead and enforce communication preferences.
Audit Trail: Log all automated decisions (score changes, status updates, routing decisions) for compliance.

Download as Markdown

Related Case Studies

View Implementation Case Studies →

Send the Broken Workflow

Get a diagnostic review of your current automation stack and a prioritized implementation plan for agentic AI.

Send the Broken Workflow