Skip to main content

Domain-Aware ML (Mixture of Experts)

Specialized ML models for different market categories.

Overview

Instead of one generic model, Predicta uses domain-specific specialist models for better accuracy:
DomainModelDomain Features
cryptocrypto_model.kerasBTC correlation, gas, funding rate
sportssports_model.kerasTeam ELO, injuries, weather
politicspolitics_model.kerasPolling, incumbency, news sentiment
entertainmententertainment_model.kerasSocial mentions, trending score
sciencescience_model.kerasHistorical success, regulatory stage
genericgeneric_model.kerasFallback for rare categories

Architecture

Market → [Domain Router] → Category → [Specialist Model] → P(YES)
              │                             │
         Keywords/Tags              10-dim domain features
         Polymarket tags            + GRU + Attention encoder

Files

FilePurpose
cortex/domain_router.pyClassify markets into domains
cortex/domain_features.pyExtract domain-specific features
cortex/training/bootstrap_by_domain.pyCreate domain-separated datasets
cortex/training/train_domain_model.pyTrain specialist models
cortex/inference_server.py/predict_domain endpoint

Usage

1. Generate Datasets

cd cortex/training
python bootstrap_by_domain.py

# Output:
# data/crypto_dataset.pkl
# data/sports_dataset.pkl
# data/politics_dataset.pkl
# data/generic_dataset.pkl

2. Train Models

python train_domain_model.py --domain crypto
python train_domain_model.py --domain sports
python train_domain_model.py --domain all

# Output:
# models/crypto_model.keras
# models/sports_model.keras

3. Inference

Models auto-load on server startup. Use the new endpoint:
POST /predict_domain
{
  "token_id": "...",
  "title": "Will Bitcoin reach $100k?",
  "volume": 1500000,
  "tags": ["crypto"]
}
Response includes domain routing info:
{
  "probability": 0.72,
  "uncertainty": 0.04,
  "domain": "crypto",
  "domain_confidence": 0.85,
  "model_used": "crypto_model"
}

Fallback Behavior

  1. If router confidence < 50% → Use generic_model
  2. If domain model missing → Use generic_model
  3. If generic missing → Use legacy model
  4. If all fail → HTTP 503