Screenshot and Analysis API

Overview

The Screenshot and Analysis endpoints provide tools for capturing screenshots of the Windows desktop and analyzing the system state. These endpoints form the foundation for visual analysis and system overview, allowing agent systems to understand what's currently displayed on the screen and gather detailed information about running applications, UI elements, and browser content.

These tools are essential for agents that need to visually interpret the screen and understand the current state of the Windows environment before taking actions.

Available Endpoints

Get System Overview

POST /tools-api/system/overview

Gets an overview of the computer, including open applications, focused UI elements, window structures, and Chrome browser details (if available).

View Details

Take Screenshot

GET /tools-api/screenshot

Captures a screenshot of the current desktop and returns it as a base64-encoded image.

View Details

Find UI Element

POST /tools-api/screenshot/find-ui-element

Analyzes a screenshot to find UI elements based on a text description and returns their coordinates.

View Details