Take Screenshot - Smooth Operator Tools Server

Description

Takes a screenshot of the entire screen and returns it as a Base64-encoded image. This endpoint is fundamental for visually understanding the current state of the computer and is essential for AI-powered agents to understand the visual context before taking actions.

This endpoint can be used for:

Capturing the current screen state for analysis
Providing visual context to an AI agent
Preparing image data for UI element detection
Debugging automation sequences
Documenting the state of the system during operation

The screenshot is taken using Windows native APIs and is returned in PNG format, encoded as a Base64 string. This allows for easy transmission and use in web applications, AI models, and other tools.

Request

This endpoint accepts a GET request with no parameters. Simply send a GET request to the endpoint URL.

Response Format

The response contains the screenshot as a Base64-encoded string along with metadata.

{
    "Success": boolean,      // Whether the operation was successful
    "ImageBase64": "string", // Base64-encoded PNG image
    "Timestamp": "string",   // ISO timestamp of when the screenshot was taken
    "Message": "string"      // Status or error message
}

Response Fields

Field	Type	Description
Success	boolean	Indicates whether the screenshot was successfully captured (true) or not (false).
ImageBase64	string	The Base64-encoded PNG image of the screenshot. Can be used directly in an HTML img tag with prefix "data:image/png;base64,"
Timestamp	string	The time when the screenshot was taken (ISO format).
Message	string	A status message, usually "Screenshot captured successfully" or an error message if the operation failed.

Example Response

{
  "Success": true,
  "ImageBase64": "iVBORw0KGgoAAAANSUhEUgAA...[truncated for brevity]...",
  "Timestamp": "2023-10-15T14:30:22.1234567+01:00",
  "Message": "Screenshot captured successfully"
}

Code Examples

import requests
import json
import base64
from PIL import Image
import io

def take_screenshot(api_key):
    url = "http://localhost:54321/tools-api/screenshot"
    headers = {
        "Authorization": f"Bearer {api_key}"
    }
    
    response = requests.get(url, headers=headers)
    
    if response.status_code == 200:
        result = response.json()
        return result
    else:
        print(f"Error: {response.status_code}")
        print(response.text)
        return None

# Example usage
api_key = "your_api_key_here"
screenshot_response = take_screenshot(api_key)

if screenshot_response and screenshot_response.get("Success"):
    # Get the Base64 image
    image_base64 = screenshot_response.get("ImageBase64")
    
    # Save the image to a file
    if image_base64:
        image_data = base64.b64decode(image_base64)
        image = Image.open(io.BytesIO(image_data))
        image.save("screenshot.png")
        print(f"Screenshot saved to screenshot.png")
        
        # You can also display the image if needed
        # image.show()
else:
    print("Failed to capture screenshot")

interface ScreenshotResponse {
  Success: boolean;
  ImageBase64: string;
  Timestamp: string;
  Message: string;
}

async function takeScreenshot(apiKey: string): Promise {
  const url = "http://localhost:54321/tools-api/screenshot";
  
  try {
    const response = await fetch(url, {
      method: "GET",
      headers: {
        "Authorization": `Bearer ${apiKey}`
      }
    });
    
    if (!response.ok) {
      console.error(`Error: ${response.status}`);
      console.error(await response.text());
      return null;
    }
    
    return await response.json() as ScreenshotResponse;
  } catch (error) {
    console.error("Failed to take screenshot:", error);
    return null;
  }
}

// Example usage
async function example() {
  const apiKey = "your_api_key_here";
  const screenshot = await takeScreenshot(apiKey);
  
  if (screenshot && screenshot.Success) {
    console.log("Screenshot captured at:", screenshot.Timestamp);
    
    // Display the image in an HTML img element
    const imgElement = document.createElement('img');
    imgElement.src = `data:image/png;base64,${screenshot.ImageBase64}`;
    imgElement.alt = "Screenshot";
    imgElement.style.maxWidth = "100%";
    
    // Add the image to the page
    document.body.appendChild(imgElement);
    
    // You could also save the image using the File System Access API
    // or another method appropriate for your application
  } else {
    console.error("Failed to capture screenshot");
  }
}

using System;
using System.IO;
using System.Net.Http;
using System.Text.Json;
using System.Threading.Tasks;

public class ScreenshotResponse
{
    public bool Success { get; set; }
    public string ImageBase64 { get; set; }
    public DateTime Timestamp { get; set; }
    public string Message { get; set; }
}

public class ToolsServerClient
{
    private readonly HttpClient _httpClient;
    private readonly string _apiKey;
    
    public ToolsServerClient(string apiKey)
    {
        _httpClient = new HttpClient { BaseAddress = new Uri("http://localhost:54321") };
        _apiKey = apiKey;
        _httpClient.DefaultRequestHeaders.Add("Authorization", $"Bearer {apiKey}");
    }
    
    public async Task TakeScreenshotAsync()
    {
        var response = await _httpClient.GetAsync("/tools-api/screenshot");
        response.EnsureSuccessStatusCode();
        
        var jsonResponse = await response.Content.ReadAsStringAsync();
        return JsonSerializer.Deserialize(jsonResponse, new JsonSerializerOptions
        {
            PropertyNameCaseInsensitive = true
        });
    }
}

// Example usage
public class Example
{
    public static async Task Main()
    {
        var client = new ToolsServerClient("your_api_key_here");
        var screenshot = await client.TakeScreenshotAsync();
        
        if (screenshot.Success)
        {
            Console.WriteLine($"Screenshot captured at: {screenshot.Timestamp}");
            
            // Save the image to a file
            byte[] imageBytes = Convert.FromBase64String(screenshot.ImageBase64);
            File.WriteAllBytes("screenshot.png", imageBytes);
            Console.WriteLine("Screenshot saved to screenshot.png");
            
            // You could also display the image using a Windows Forms application
            // or WPF if you're building a GUI application
        }
        else
        {
            Console.WriteLine($"Failed to capture screenshot: {screenshot.Message}");
        }
    }
}

Usage Notes

This endpoint captures the entire screen, including all monitors in a multi-monitor setup
The Base64-encoded image can be large, especially for high-resolution displays
For optimal performance, avoid taking screenshots too frequently (e.g., not more than once per second)
The screenshot is returned as a PNG image, which provides good quality while maintaining reasonable file size
To use the Base64 string directly in HTML, prefix it with: data:image/png;base64,
This endpoint is often used in conjunction with the "Find UI Element" endpoint to provide visual context for element detection