Price Per TokenPrice Per Token
Gemini Vision

Gemini Vision

by artin0123

GitHub 1 413 uses Remote
0

About

Gemini Vision is an AI-powered media analysis tool that uses Google's Gemini models to extract insights from images and videos. It provides fast visual understanding capabilities through a simple API interface. Key features of Gemini Vision: - Analyze images from URLs with automatic MIME type detection and a 16 MB size limit per image - Process YouTube videos directly via streaming without any file size restrictions - Scene summarization, object identification, and key detail extraction for reports - Configurable model selection with default to `gemini-flash-lite-latest` - Batch analysis of multiple images in a single tool call - Support for Google AI Studio API keys via environment variables

README

image-mcp-server-gemini

[](https://smithery.ai/server/@Artin0123/gemini-image-mcp-server)

> This is remote server, use local version for local images and videos.

Features

  • Analyze one or more image URLs with a single tool call.
  • Analyze YouTube videos without downloading files locally.
  • Supply an API key and optionally override the Gemini model via environment variables.
  • File size limit: Images are limited to 16 MB to ensure fast processing.
  • YouTube videos: No size limit as they are streamed directly by Gemini API.
  • Installation

    Installing via Smithery

    Install the server in Claude Desktop:

    npx -y @smithery/cli install @Artin0123/gemini-image-mcp-server --client claude
    

    Manual Installation

    # Clone the repository
    git clone https://github.com/Artin0123/gemini-vision-mcp.git
    cd gemini-vision-mcp

    Install dependencies

    npm install

    Compile TypeScript to dist/

    npm run build

    Configuration

    Create a Gemini API key in Google AI Studio and provide GEMINI_API_KEY to the server.

    {
      "mcpServers": {
        "gemini-media": {
          "command": "node",
          "args": ["/absolute/path/to/gemini-vision-mcp/dist/index.js"],
          "env": {
            "GEMINI_API_KEY": "your_api_key_here",
            "GEMINI_MODEL": "models/gemini-flash-lite-latest"
          }
        }
      }
    }
    

    If no key is supplied, the server can still start (handy for automated scans), but any tool invocation will return a configuration error until a valid API key is configured.

    Model override

    The server defaults to models/gemini-flash-lite-latest. Override it by either:

    > Setting the GEMINI_MODEL environment variable, or Providing modelName in the Smithery/SDK configuration schema.

    Available tools

  • analyze_image: Analyze one or more image URLs. Maximum file size: 16 MB per image.
  • analyze_youtube_video: Analyze a YouTube video from URL. No size limit.
  • Image URLs are downloaded and processed with a 16 MB size limit to ensure fast response times. Files exceeding this limit will result in an error message indicating the actual file size.

    YouTube videos are streamed directly by Gemini API without downloading, so there is no size restriction.

    Prompt examples

    Please analyze this product photo: https://teimg-bgr.pages.dev/file/mvYT6KeF.webp
    

    Extract the main talking points from this clip: https://www.youtube.com/watch?v=dQw4w9WgXcQ
    

    Development

    npm install
    npm test
    npm run build
    

    The test suite exercises URL forwarding, MIME handling, and configuration fallbacks.

    License

    MIT

    Related MCP Servers

    AI Research Assistant

    AI Research Assistant

    hamid-vakilzadeh

    AI Research Assistant provides comprehensive access to millions of academic papers through the Semantic Scholar and arXiv databases. This MCP server enables AI coding assistants to perform intelligent literature searches, citation network analysis, and paper content extraction without requiring an API key. Key features include: - Advanced paper search with multi-filter support by year ranges, citation thresholds, field of study, and publication type - Title matching with confidence scoring for finding specific papers - Batch operations supporting up to 500 papers per request - Citation analysis and network exploration for understanding research relationships - Full-text PDF extraction from arXiv and Wiley open-access content (Wiley TDM token required for institutional access) - Rate limits of 100 requests per 5 minutes with options to request higher limits through Semantic Scholar

    Web & Search
    12 8
    Linkup

    Linkup

    LinkupPlatform

    Linkup is a real-time web search and content extraction service that enables AI assistants to search the web and retrieve information from trusted sources. It provides source-backed answers with citations, making it ideal for fact-checking, news gathering, and research tasks. Key features of Linkup: - Real-time web search using natural language queries to find current information, news, and data - Page fetching to extract and read content from any webpage URL - Search depth modes: Standard for direct-answer queries and Deep for complex research across multiple sources - Source-backed results with citations and context from relevant, trustworthy websites - JavaScript rendering support for accessing dynamic content on JavaScript-heavy pages

    Web & Search
    2 24
    Math-MCP

    Math-MCP

    EthanHenrickson

    Math-MCP is a computation server that enables Large Language Models (LLMs) to perform accurate numerical calculations through the Model Context Protocol. It provides precise mathematical operations via a simple API to overcome LLM limitations in arithmetic and statistical reasoning. Key features of Math-MCP: - Basic arithmetic operations: addition, subtraction, multiplication, division, modulo, and bulk summation - Statistical analysis functions: mean, median, mode, minimum, and maximum calculations - Rounding utilities: floor, ceiling, and nearest integer rounding - Trigonometric functions: sine, cosine, tangent, and their inverses with degrees and radians conversion support

    Developer Tools
    22 81