Skip to content

razrfly/volfefe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

143 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– Volfefe Machine

Automated Market Volatility Trading System - Detecting and trading on political market signals in real-time

Elixir Phoenix PostgreSQL License: MIT


🎯 What is Volfefe Machine?

Volfefe Machine is an intelligent, event-driven trading system that monitors real-time political and economic content, analyzes market impact using ML-based sentiment analysis, and executes automated trading strategies based on detected volatility signals.

The name? A playful nod to Trump's infamous "covfefe" tweet + volatility (vol) = Volfefe ⚑

Core Concept

Political/Economic Event β†’ Sentiment Analysis β†’ Market Impact Assessment β†’ Automated Trade

Starting with Truth Social posts (particularly Trump's tariff announcements), the system will expand to monitor news APIs, social media, and financial feeds to identify market-moving events before they cause significant price action.


πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  SOURCES (Modular Adapters)                             β”‚
β”‚  β€’ Truth Social (via Apify)   β€’ NewsAPI   β€’ RSS   β€’ Moreβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  INGESTION PIPELINE                                      β”‚
β”‚  Fetch β†’ Normalize β†’ Store (Postgres) β†’ Broadcast       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  MULTI-MODEL CLASSIFICATION (ML Analysis) βœ…             β”‚
β”‚  β€’ 3 Sentiment Models: DistilBERT, Twitter-RoBERTa,     β”‚
β”‚    FinBERT (weighted consensus)                          β”‚
β”‚  β€’ 1 NER Model: BERT-base-NER (entity extraction)       β”‚
β”‚  Output: Sentiment + Confidence + Entities (ORG/LOC/PER)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ASSET LINKING (Phase 2 - In Progress)                  β”‚
β”‚  Match entities β†’ Assets database β†’ ContentTargets      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  STRATEGY ENGINE (Rule-Based Decisions)                 β”‚
β”‚  Sector Mapping β†’ Company Selection β†’ Trade Type        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  EXECUTION (Alpaca API)                                  β”‚
β”‚  Paper Trading β†’ Live Trading (Options, Stocks)         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • Elixir 1.15+ and Erlang/OTP 26+
  • PostgreSQL 14+
  • Node.js 18+ (for Phoenix LiveView assets)

Installation

# Clone the repository
git clone https://github.com/razrfly/volfefe.git
cd volfefe

# Install dependencies
mix deps.get
cd assets && npm install && cd ..

# Set up environment variables
cp .env.example .env
# Edit .env with your actual credentials (database password, API tokens, etc.)

# Set up database
mix ecto.setup

# (Optional) Install Python dependencies for ML scripts
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

# Start Phoenix server
mix phx.server

Visit localhost:4002 to see the live dashboard.

Running Classification with Entity Extraction

Once content is ingested (see Content Ingestion below), you can run multi-model classification:

# Classify first 10 unclassified items with all models (sentiment + NER)
mix classify.contents --limit 10 --multi-model

# Classify all unclassified content
mix classify.contents --all --multi-model

# Classify specific content IDs
mix classify.contents --ids 1,2,3 --multi-model

# Preview what would be classified (dry run)
mix classify.contents --limit 10 --dry-run

Output includes:

  • Sentiment consensus from 3 models (positive/negative/neutral)
  • Confidence scores and model agreement rates
  • Extracted entities: Organizations (ORG), Locations (LOC), Persons (PER), Miscellaneous (MISC)
  • Entity confidence scores and context

Market Impact Analysis

After classifying content, analyze market reactions by capturing price/volume snapshots around each Trump post.

1. Initial Setup

First, fetch asset information and establish baseline statistics:

# Fetch starter universe of assets (SPY, QQQ, DIA, IWM, GLD, TLT)
mix fetch.assets --symbols SPY,QQQ,DIA,IWM,GLD,TLT

# Calculate 60-day baseline statistics (mean, std dev, percentiles)
# This fetches historical data and computes rolling returns for 1hr, 4hr, 24hr windows
mix calculate.baselines --all

2. Capture Market Snapshots

Capture 4 time-windowed snapshots (before, 1hr, 4hr, 24hr after) for classified content:

# Single content item
mix snapshot.market --content-id 165

# Multiple specific items
mix snapshot.market --ids 165,166,167

# All content published on a date
mix snapshot.market --date 2025-10-28

# Date range
mix snapshot.market --date-range 2025-10-01 2025-10-31

# All classified content
mix snapshot.market --all

# Only content missing complete snapshots
mix snapshot.market --missing

# Preview without capturing (dry run)
mix snapshot.market --date 2025-10-28 --dry-run

Each snapshot captures:

  • Open, High, Low, Close (OHLC) prices
  • Volume and volume deviation from baseline
  • Volume z-score (standard deviations from mean)
  • Market state (pre-market, regular, after-hours, closed)
  • Data validity flags
  • Isolation score (contamination from nearby content)

3. Update Baselines

Keep baseline statistics fresh as new market data accumulates:

# Update only stale baselines (older than 24 hours)
mix calculate.baselines --all --check-freshness

# Update specific assets
mix calculate.baselines --symbols SPY,QQQ

# Force recalculation (ignore freshness)
mix calculate.baselines --all --force

4. Data Validation

Monitor data quality and coverage:

# Check which content is missing snapshots
mix snapshot.market --missing --dry-run

# View baseline statistics
iex -S mix
> alias VolfefeMachine.{Repo, MarketData.BaselineStats}
> Repo.all(BaselineStats) |> Enum.map(& {&1.asset_id, &1.window_minutes, &1.mean_return})

Common Workflows

Daily Update Workflow:

# 1. Update baselines for all assets (skips fresh ones)
mix calculate.baselines --all --check-freshness

# 2. Capture snapshots for newly classified content
mix snapshot.market --missing

# 3. Verify coverage
mix snapshot.market --missing --dry-run

Backfill Historical Data:

# Capture snapshots for all content in October 2025
mix snapshot.market --date-range 2025-10-01 2025-10-31

# Force recalculate all baselines with latest 60 days
mix calculate.baselines --all --force

Troubleshooting:

  • "No data available": TwelveData may not have data for that timestamp (market closed, weekend, holiday)
  • "Rate limit exceeded": Free tier allows 8 calls/minute, 800/day - add delays between operations
  • Incomplete snapshots: Use --missing flag to find and fill gaps
  • Stale baselines: Use --check-freshness to update only old statistics

Environment Variables

The project uses environment variables for sensitive configuration. Copy .env.example to .env and update with your credentials:

# PostgreSQL Database
PGHOST=localhost
PGDATABASE=volfefe_machine_dev
PGUSER=postgres
PGPASSWORD=your_postgres_password

# Apify API (for Truth Social scraping)
APIFY_USER_ID=your_user_id_here
APIFY_PERSONAL_API_TOKEN=your_api_token_here

⚠️ Never commit your .env file to version control! It's already in .gitignore.

Content Ingestion

Status: βœ… Complete - Unified ingestion pipeline ready

Fetch and import content from Truth Social using a single command:

# Fetch 100 posts from a specific user
mix ingest.content --source truth_social --username realDonaldTrump --limit 100

# Include replies in results
mix ingest.content --source truth_social --username realDonaldTrump --limit 50 --include-replies

# Preview what would be fetched (dry run)
mix ingest.content --source truth_social --username realDonaldTrump --limit 10 --dry-run

Available Options:

  • --source, -s - Content source (currently: truth_social)
  • --username, -u - Username/profile to fetch (required)
  • --limit, -l - Maximum posts to fetch (default: 100)
  • --include-replies - Include replies in results (default: false)
  • --dry-run - Preview configuration without fetching (default: false)

🧩 Key Components

Component Purpose Status
Database Schema Assets, Contents, Classifications, ContentTargets βœ… Complete
Multi-Model Classification 3 sentiment models + weighted consensus βœ… Complete
NER Entity Extraction Extract organizations, locations, persons βœ… Complete
Apify Integration Fetch Truth Social posts via API βœ… Complete
Ingestion Pipeline Unified fetch + import workflow βœ… Complete
Asset Linking Match extracted entities to assets database πŸ“‹ Phase 2
Strategy Engine Rule-based trade decision logic πŸ“‹ Phase 3
Trade Executor Alpaca API integration πŸ“‹ Phase 4
Dashboard Real-time monitoring UI πŸ“‹ Future

Legend: βœ… Complete | 🚧 In Progress | πŸ“‹ Planned


πŸ—„οΈ Data Model

Core Tables

sources - External data sources (Truth Social, NewsAPI, etc.)

%Source{
  name: "truth_social",
  adapter: "TruthSocialAdapter",
  base_url: "https://api.example.com",
  last_fetched_at: ~U[2025-01-26 10:00:00Z]
}

contents - Normalized posts/articles

%Content{
  source_id: uuid,
  external_id: "12345",
  author: "realDonaldTrump",
  text: "Big tariffs on steel coming soon!",
  url: "https://truthsocial.com/@realDonaldTrump/12345",
  published_at: ~U[2025-01-26 09:45:00Z],
  classified: false
}

classifications - ML analysis results with sentiment consensus

%Classification{
  content_id: uuid,
  sentiment: "negative",
  confidence: 0.9556,
  meta: %{
    "agreement_rate" => 1.0,
    "model_results" => [
      %{"model_id" => "distilbert", "sentiment" => "negative", "confidence" => 0.9812},
      %{"model_id" => "twitter_roberta", "sentiment" => "negative", "confidence" => 0.9654},
      %{"model_id" => "finbert", "sentiment" => "negative", "confidence" => 0.9201}
    ],
    "entities" => [
      %{"text" => "Tesla", "type" => "ORG", "confidence" => 0.9531},
      %{"text" => "United States", "type" => "LOC", "confidence" => 0.9912}
    ]
  }
}

assets - Tradable securities (9,000+ loaded)

%Asset{
  symbol: "TSLA",
  name: "Tesla Inc",
  exchange: "NASDAQ",
  asset_class: "us_equity"
}

content_targets - Extracted entities linked to assets (Phase 2)

%ContentTarget{
  content_id: uuid,
  asset_id: uuid,
  extraction_method: "ner_bert",
  confidence: 0.9531,
  context: "Tesla stock tumbled 12% today..."
}

πŸ› οΈ Tech Stack

Backend

Machine Learning

  • Python: Python 3.9+ with virtual environment
  • ML Framework: Transformers (Hugging Face)
  • Models:
    • Sentiment: DistilBERT, Twitter-RoBERTa, FinBERT
    • NER: BERT-base-NER (dslim/bert-base-NER)
  • Elixir Integration: Python interop via System.cmd/3

External Services


πŸ“… Roadmap

Phase 1: Foundation & ML Pipeline βœ… (Complete)

  • Project setup and architecture
  • Database schemas (contents, sources, classifications, assets, content_targets)
  • Assets database loaded (9,000+ securities)
  • Multi-model sentiment classification (DistilBERT, Twitter-RoBERTa, FinBERT)
  • Weighted consensus algorithm
  • NER entity extraction (BERT-base-NER)
  • Classification mix task with batch processing
  • Content ingestion - Unified mix task (Issue #46)
  • Content backup/seeding system (Issue #45) - Next Step

Phase 2: Asset Linking (In Progress)

  • Entity β†’ Asset matching logic (Issue #42)
  • ContentTargets creation
  • Fuzzy name matching
  • Confidence scoring
  • Manual validation tools

Phase 3: Strategy Engine

  • Sector-to-ticker mapping
  • Rule-based trade logic
  • Backtesting framework
  • Signal generation

Phase 4: Trade Execution

  • Alpaca API integration
  • Paper trading
  • Risk management
  • Live trading (manual approval)

Phase 5: Multi-Source Expansion

  • NewsAPI adapter
  • Reddit adapter
  • RSS feeds
  • Source weighting

See: Issue #43 (Phase 1 NER) | Issue #42 (Phase 2 Asset Linking)


🎲 Polymarket Insider Detection

The system includes tools for detecting potential insider trading on Polymarket prediction markets by analyzing blockchain trade data.

Architecture

Reference Cases (Seed Data)      Blockchain (Subgraph)
β”œβ”€ Event date                    β”œβ”€ ALL trades in date range
β”œβ”€ Description                   β”œβ”€ Wallet addresses
└─ Context from news             └─ Trade amounts & timing
        β”‚                                   β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β–Ό
            Pattern Discovery
            β”œβ”€ Scan by date range (not keyword)
            β”œβ”€ Group trades by market
            β”œβ”€ Score: volume, whales, timing
            └─ Output candidate markets
                    β–Ό
            Human Confirmation
            β”œβ”€ Review candidate list
            β”œβ”€ Confirm correct market
            └─ Update reference case
                    β–Ό
            Trade Ingestion
            └─ Ingest trades for analysis

Key Commands

1. Scan Mode (Discovery Without Reference Case)

Scan blockchain trades in a date range to discover market activity patterns:

# Scan last 7 days, show top 10 markets
mix polymarket.ingest --subgraph --days 7 --scan

# Scan specific date range around a known event
mix polymarket.ingest --subgraph --from 2025-10-08 --to 2025-10-12 --scan --top 20

# Full ingestion (without --scan flag)
mix polymarket.ingest --subgraph --from 2025-10-01 --to 2025-10-15

Scan output includes:

  • Total trades and volume in date range
  • Markets grouped by trading activity
  • Whale trade counts (>$1K positions)
  • Unique wallet counts per market
  • Trading period timestamps

2. Reference Case Discovery (Find Market for Known Event)

Discover which Polymarket market corresponds to a reference case:

# Discover markets for a specific reference case
mix polymarket.discover --reference-case "Nobel Peace Prize 2025"

# Custom window: 10 days before event, 2 days after
mix polymarket.discover --reference-case "Nobel Peace Prize 2025" --window 10 --after 2

# Show top 20 candidates
mix polymarket.discover --reference-case "Nobel Peace Prize 2025" --top 20

# Discover for ALL reference cases missing condition_ids
mix polymarket.discover --all-references

Discovery output includes:

  • Ranked candidate markets by score (0.0-1.0)
  • Condition ID for each candidate
  • Volume and pre-event volume percentage
  • Whale count and unique wallet count
  • Trading period around the event
  • Suspicious wallets with timing and volume data (Phase 3)

3. Confirm Market Match

After reviewing discover output, confirm which market is correct:

# Confirm market match for reference case
mix polymarket.confirm --reference-case "Nobel Peace Prize 2025" --condition 0x14a3dfeba8...

# With optional slug (auto-fetched if omitted)
mix polymarket.confirm --reference-case "Case Name" --condition 0xabc... --slug "market-slug"

4. Ingest Trades

After confirming, ingest trades for analysis:

# Ingest trades for a specific reference case
mix polymarket.ingest --subgraph --reference-case "Nobel Peace Prize 2025"

# Ingest trades for all reference cases with condition_ids
mix polymarket.ingest --subgraph --reference-cases

# Ingest for specific condition
mix polymarket.ingest --subgraph --condition 0x14a3dfeba8... --from 2025-10-01

Complete Workflow Example

# 1. You have a reference case (from news: "Nobel Peace Prize 2025", event Oct 11)
#    Check it exists in the database
mix polymarket.references

# 2. Discover which market matches this event
mix polymarket.discover --reference-case "Nobel Peace Prize 2025"
# Output shows candidate markets ranked by score:
#   1. "Will MarΓ­a Corina Machado win the Nobel Peace Prize?"
#      Condition: 0x14a3...  Score: 0.85  Volume: $45K (62% pre-event)

# 3. Review candidates and confirm the correct match
mix polymarket.confirm --reference-case "Nobel Peace Prize 2025" --condition 0x14a3dfeba8...

# 4. Promote discovered wallets to investigation candidates
mix polymarket.promote --reference-case "Nobel Peace Prize 2025"
# Output: Created 12 investigation candidate(s)
#         Batch ID: refcase-nobel-peace-1234567890

# 5. Ingest trades for detailed analysis
mix polymarket.ingest --subgraph --reference-case "Nobel Peace Prize 2025"

# 6. Review and investigate candidates
mix polymarket.candidates --batch refcase-nobel-peace-1234567890
mix polymarket.investigate --id 1

Scoring Algorithms

Market Scoring

Reference case discovery scores markets based on:

Factor Weight Description
Whale Activity 30% Trades >$1K indicate informed trading
Pre-Event Volume 30% Volume concentration before event
Total Volume 25% Log-scaled trading volume
Wallet Diversity 15% Unique wallets (avoid wash trading)

Wallet Suspicion Scoring (Phase 3)

Individual wallets are scored for suspicious activity:

Factor Weight Description
Volume 25% Log-scaled position size
Whale Trades 25% Number of trades >$1K
Pre-Event Concentration 30% % of volume placed before event
Timing Precision 20% Closer to event = more suspicious

Timing Precision Breakdown:

  • Within 24h of event: 20%
  • Within 48h: 15%
  • Within 72h: 10%
  • Beyond 72h: 5%

Discovery results (wallets + candidate markets) are automatically saved to the reference case for later analysis.

5. Promote Wallets to Investigation Candidates (Phase 4)

After discovery, convert suspicious wallets into investigation candidates:

# Preview promotion (dry run)
mix polymarket.promote --reference-case "Nobel Peace Prize 2025" --dry-run

# Promote wallets (creates investigation candidates)
mix polymarket.promote --reference-case "Nobel Peace Prize 2025"

# Custom thresholds
mix polymarket.promote --reference-case "Case Name" --min-score 0.6 --limit 10

# Force priority level
mix polymarket.promote --reference-case "Case Name" --priority critical

Promotion workflow:

  1. Reads discovered wallets from reference case
  2. Filters by minimum suspicion score (default: 0.4)
  3. Creates InvestigationCandidate records with:
    • Anomaly breakdown (volume, timing, whale trades)
    • Matched patterns (reference case linkage)
    • Priority based on suspicion score
  4. Assigns batch ID for tracking

After promotion:

# List candidates from this batch
mix polymarket.candidates --batch refcase-nobel-peace-1234567890

# Investigate specific candidate
mix polymarket.investigate --id 1

# Confirm as insider
mix polymarket.confirm --id 1 --notes "Pre-event timing matches"

Data Sources

Source Endpoint Data Available
Polymarket Subgraph api.goldsky.com/.../orderbook-subgraph All trades since Nov 2022
Polymarket API data-api.polymarket.com Recent trades (geo-blocked in US)
Gamma API gamma-api.polymarket.com Market metadata (geo-blocked in US)
CLOB API clob.polymarket.com Active market data (geo-blocked in US)

Note: The subgraph bypasses geo-blocking and provides complete historical data. For CLOB and Gamma APIs, see VPN Setup below.

VPN Setup (Required for US Users)

Polymarket geo-blocks US IP addresses for their CLOB and Gamma APIs (regulatory compliance). The application uses a Docker-based VPN proxy to route only Polymarket API calls through VPN, without affecting the rest of your network traffic.

Prerequisites

  • Docker Desktop installed and running
  • ProtonVPN account with WireGuard access

Setup Steps

  1. Get WireGuard credentials from ProtonVPN:

    • Go to Proton Account β†’ VPN β†’ WireGuard
    • Click "Generate Key" (or use existing)
    • Copy the PrivateKey value
  2. Add credentials to .env:

    # VPN Proxy for Polymarket API Access
    PROTONVPN_WIREGUARD_PRIVATE_KEY=your_wireguard_private_key_here
    VPN_PROXY_ENABLED=true
    VPN_PROXY_HOST=localhost
    VPN_PROXY_PORT=8888
  3. Start the VPN proxy container:

    docker compose -f docker-compose.vpn.yml up -d
  4. Verify connection:

    # Check container is running
    docker logs vpn-proxy | grep "Public IP"
    # Should show: Public IP address is X.X.X.X (Netherlands, ...)
    
    # Test API access through proxy
    curl -x http://localhost:8888 "https://gamma-api.polymarket.com/markets?limit=1"

Usage

When the VPN proxy is running and VPN_PROXY_ENABLED=true, the application automatically routes Polymarket API calls (CLOB, Gamma) through the VPN tunnel. The subgraph API does not require VPN.

# Start VPN proxy
docker compose -f docker-compose.vpn.yml up -d

# Run with VPN enabled
export VPN_PROXY_ENABLED=true
mix phx.server

# Or for one-off commands
export VPN_PROXY_ENABLED=true
mix polymarket.health

Troubleshooting

Issue Solution
Container won't start Ensure Docker Desktop is running
"Connection refused" on port 8888 Check docker logs vpn-proxy for errors
API still returning 403/blocked Verify VPN connected: docker logs vpn-proxy | grep "Public IP"
Rate limiting from Polymarket The VPN is working; reduce request frequency
Enrichment/metadata fetch failing Ensure VPN_PROXY_ENABLED=true is set

Stopping the VPN

docker compose -f docker-compose.vpn.yml down

Note: When the VPN proxy is stopped or VPN_PROXY_ENABLED=false, CLOB and Gamma API calls will fail for US users. The subgraph-based trade ingestion will continue to work.


πŸ§ͺ Testing

# Run all tests
mix test

# Run with coverage
mix test --cover

# Run specific test file
mix test test/volfefe/pipeline_test.exs

πŸ“Š Entity Extraction Output Example

Input Text:

"Tesla stock tumbled 12% today as Elon Musk's controversial tweet sparked
concerns about the company's future. Analysts in the United States and
Europe are worried about automotive sector stability."

Multi-Model Classification Output:

%{
  # Sentiment Consensus (3 models)
  consensus: %{
    sentiment: "negative",
    confidence: 0.9556,
    agreement_rate: 1.0
  },

  # Individual Model Results
  model_results: [
    %{model_id: "distilbert", sentiment: "negative", confidence: 0.9812},
    %{model_id: "twitter_roberta", sentiment: "negative", confidence: 0.9654},
    %{model_id: "finbert", sentiment: "negative", confidence: 0.9201}
  ],

  # Extracted Entities (NER)
  entities: [
    %{text: "Tesla", type: "ORG", confidence: 0.9531,
      context: "Tesla stock tumbled 12% today..."},
    %{text: "Elon Musk", type: "PER", confidence: 0.9802,
      context: "...12% today as Elon Musk's controversial..."},
    %{text: "United States", type: "LOC", confidence: 0.9912,
      context: "...Analysts in the United States and Europe..."},
    %{text: "Europe", type: "LOC", confidence: 0.9845,
      context: "...United States and Europe are worried..."}
  ],

  # Entity Statistics
  entity_stats: %{
    total_entities: 4,
    by_type: %{"ORG" => 1, "LOC" => 2, "PER" => 1, "MISC" => 0}
  },

  # Performance
  total_latency_ms: 663,
  successful_models: 4
}

Phase 2 Preview (not yet implemented):

  • "Tesla" β†’ Match to Asset{symbol: "TSLA", name: "Tesla Inc"}
  • Create ContentTarget{content_id: X, asset_id: Y, confidence: 0.95}

πŸ“Š Example Pipeline Flow

Current Workflow (Manual)

  1. Fetch & Import - mix ingest.content --source truth_social --username USER --limit 100
  2. Content Storage - Posts stored in PostgreSQL contents table
  3. Multi-Model Classification - Run mix classify.contents --all --multi-model
    • 3 sentiment models analyze text (DistilBERT, Twitter-RoBERTa, FinBERT)
    • Weighted consensus calculates final sentiment + confidence
    • NER model extracts entities (ORG, LOC, PER, MISC)
  4. Results Storage - Classifications saved to classifications table
  5. Entity Analysis - Entities stored in classification metadata

Future Automated Workflow

  1. Scheduler (Oban) - Poll Truth Social every 60 seconds
  2. Adapter - Fetch and normalize new posts
  3. PubSub - Broadcast {:new_content, content} events
  4. Auto-Classification - Trigger multi-model analysis on new content
  5. Asset Linking (Phase 2) - Match entities to assets, create ContentTargets
  6. Strategy Engine (Phase 3) - Generate trade recommendations
  7. Executor (Phase 4) - Place orders via Alpaca API
  8. Dashboard - Real-time monitoring via LiveView

🀝 Contributing

This is currently a private project, but contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

βš–οΈ Legal & Risk Disclaimer

This software is for educational and research purposes only.

  • Automated trading carries significant financial risk
  • Past performance does not guarantee future results
  • This system is not financial advice
  • Use at your own risk
  • Always start with paper trading
  • Understand all risks before deploying real capital

By using this software, you acknowledge that you are solely responsible for any trading decisions and outcomes.


πŸ“„ License

MIT License - see LICENSE for details


πŸ”— Resources

Framework & Platform

Machine Learning Models

APIs & Services

GitHub Issues


Built with ❀️ and Elixir

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published