The Screenshot and Analysis endpoints provide tools for capturing screenshots of the Windows desktop and analyzing the system state. These endpoints form the foundation for visual analysis and system overview, allowing agent systems to understand what's currently displayed on the screen and gather detailed information about running applications, UI elements, and browser content.
These tools are essential for agents that need to visually interpret the screen and understand the current state of the Windows environment before taking actions.
Gets an overview of the computer, including open applications, focused UI elements, window structures, and Chrome browser details (if available).
View DetailsCaptures a screenshot of the current desktop and returns it as a base64-encoded image.
View DetailsAnalyzes a screenshot to find UI elements based on a text description and returns their coordinates.
View Details