Symdex

by symdex-100

GitHub 3 3 uses Remote

About

Symdex-100 is a semantic code search and indexing tool for Python codebases that generates compact metadata "Cyphers" for rapid, intent-based code discovery. It enables sub-second natural language searches across large codebases while dramatically reducing token consumption for AI agents. Key capabilities: - Indexes Python functions into 20-byte semantic fingerprints (Cyphers) for ultra-fast lookups and structured code analysis - Natural language search that understands intent, returning precise file locations and code context - Security and domain-specific auditing to identify code patterns, potential vulnerabilities, and quality issues - High-level codebase overviews for understanding structure and patterns without reading thousands of lines - 10-50x token reduction for AI agents compared to grep or full-text search methods - Sub-second query performance even on large, complex codebases

README

Symdex-100

*Symdex-100 — your AI companion for code exploration*

---

Semantic fingerprints for intent-based Python code search — 50–100x faster index lookups, 10–50x fewer tokens for AI agents.

Symdex-100 generates compact, structured metadata ("Cyphers") for every function in your Python codebase. Each Cypher is typically 20 bytes — a semantic fingerprint that enables sub-second, intent-based code search for developers and AI agents without reading thousands of lines of code.

# Your Python function → Indexed automatically
async def validate_user_token(token: str, user_id: int) -> bool:
    """Verify JWT token for a specific user."""
    # ... implementation ...

# Natural language search → Sub-second results
$ symdex search "where do we validate user tokens"
──────────────────────────────────────────────────────────────────────────────
  SYMDEX — 1 result in 0.0823 seconds
──────────────────────────────────────────────────────────────────────────────
  #1  validate_user_token  (Python)
  ────────────────────────────────────────────────────────────────────────────
    File   : /project/auth/tokens.py
    Lines  : 42–67
    Cypher : SEC:VAL_TOKEN--ASY
    Score  : 24.5      42 │ async def validate_user_token(token: str, user_id: int) -> bool:
      43 │     """Verify JWT token for a specific user."""
      44 │     if not token:
      45 │         return False

---

The Problem

Traditional code search methods scale poorly on large codebases:

| Approach | Limitation | Token Cost (AI agents) | |----------|-----------|------------------------| | grep | Keyword noise — finds "token" in comments, strings, variable names | 3,000+ tokens (read all matches, many false positives) | | Full-text search | No semantic understanding — can't distinguish intent | 5,000+ tokens (read 10 files, variable success) | | Embeddings | Opaque, expensive, query-time overhead | 2,000+ tokens (re-rank results, embedding index size) | | AST/LSP | Limited to structural queries (class/function names) | N/A (doesn't understand "what validates X") | | Symdex | Requires indexing step (one-time per codebase) | ~100–300 tokens (1–5 precise results with context) |

Result: Developers waste time reading irrelevant code. AI agents burn tokens on noise. Symdex reduces token usage by 10–50x for intent-based queries (vs reading multiple files) while providing sub-second index lookups.

---

The Solution: Semantic Fingerprints

Symdex-100 solves this with Cypher-100, a structured metadata format that encodes function semantics in 20 bytes:

Anatomy of a Cypher-100 String

Each Cypher follows a strict four-slot hierarchy designed for both machine filtering and human readability:

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│            DOM   :   ACT   _   OBJ   --   PAT               │
│              │        │         │           │               │
│         Domain   Action       Object        Pattern         │
│                                                             │
│   Where does     What does    What is       How does        │
│   this live?     it do?       the target?   it run?         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Formal specification:

$$ \text{Cypher} = \text{DOM} : \text{ACT} \text{OBJ} \text{--} \text{PAT} $$

Where:

DOM *(Domain)*: Semantic namespace — SEC (Security), NET (Network), DAT (Data), SYS (System), LOG (Logging), UI (Interface), BIZ (Business), TST (Testing)

ACT *(Action)*: Primary operation — VAL (Validate), FET (Fetch), TRN (Transform), CRT (Create), SND (Send), SCR (Scrub), UPD (Update), AGG (Aggregate), FLT (Filter), DEL (Delete)

OBJ *(Object)*: Target entity — USER, TOKEN, DATASET, CONFIG, LOGS, REQUEST, JSON, EMAIL, DIR. Can be compound (primary+secondary+tertiary, max 3 parts) when function involves multiple objects: RELATIONSHIPS+AUDIT, RECORD+INDEX, FILE+CACHE

PAT *(Pattern)*: Execution model — ASY (Async), SYN (Synchronous), REC (Recursive), GEN (Generator), DEC (Decorator), CTX (Context manager)

Example:

SEC:SCR_EMAIL--ASY

Translation: A security function that scrubs email data asynchronously.

Breakdown:

SEC = Security domain

SCR = Scrub action (sanitize/remove)

EMAIL = Email object

ASY = Asynchronous pattern

This 18-character string (or 30–40 chars with compound OBJ like RELATIONSHIPS+AUDIT) replaces 2,000+ characters of function body for search purposes — a 50–100:1 compression ratio with zero semantic loss. Compound OBJ improves ranking for multi-concept queries (e.g. "audit relations" → functions with `RELATIONSHIPS+AUD

Related MCP Servers

AI Research Assistant

hamid-vakilzadeh

AI Research Assistant provides comprehensive access to millions of academic papers through the Semantic Scholar and arXiv databases. This MCP server enables AI coding assistants to perform intelligent literature searches, citation network analysis, and paper content extraction without requiring an API key. Key features include: - Advanced paper search with multi-filter support by year ranges, citation thresholds, field of study, and publication type - Title matching with confidence scoring for finding specific papers - Batch operations supporting up to 500 papers per request - Citation analysis and network exploration for understanding research relationships - Full-text PDF extraction from arXiv and Wiley open-access content (Wiley TDM token required for institutional access) - Rate limits of 100 requests per 5 minutes with options to request higher limits through Semantic Scholar

Web & Search

12 8

Linkup

LinkupPlatform

Linkup is a real-time web search and content extraction service that enables AI assistants to search the web and retrieve information from trusted sources. It provides source-backed answers with citations, making it ideal for fact-checking, news gathering, and research tasks. Key features of Linkup: - Real-time web search using natural language queries to find current information, news, and data - Page fetching to extract and read content from any webpage URL - Search depth modes: Standard for direct-answer queries and Deep for complex research across multiple sources - Source-backed results with citations and context from relevant, trustworthy websites - JavaScript rendering support for accessing dynamic content on JavaScript-heavy pages

Web & Search

2 24

Math-MCP

EthanHenrickson

Math-MCP is a computation server that enables Large Language Models (LLMs) to perform accurate numerical calculations through the Model Context Protocol. It provides precise mathematical operations via a simple API to overcome LLM limitations in arithmetic and statistical reasoning. Key features of Math-MCP: - Basic arithmetic operations: addition, subtraction, multiplication, division, modulo, and bulk summation - Statistical analysis functions: mean, median, mode, minimum, and maximum calculations - Rounding utilities: floor, ceiling, and nearest integer rounding - Trigonometric functions: sine, cosine, tangent, and their inverses with degrees and radians conversion support

Developer Tools

22 81