Add coordinate_system parameter for DPI-aware coordinate conversion#81
Add coordinate_system parameter for DPI-aware coordinate conversion#81ikoskela wants to merge 1 commit intoCursorTouch:mainfrom
Conversation
On systems with DPI scaling > 100%, coordinates from Win32 APIs like
GetWindowRect and Cursor.Position are in logical space, but these tools
expect physical coordinates. This mismatch causes clicks to land in the
wrong place on ~300-500M Windows machines with HiDPI displays.
Adds coordinate_system parameter ("physical" default, or "logical") to
Click-Tool, Type-Tool, Scroll-Tool, and Move-Tool. When set to "logical",
coordinates are auto-converted to physical using the existing
get_dpi_scaling() method. No new dependencies — uses the DPI detection
already in desktop/service.py.
Fully backward-compatible: default is "physical" (no conversion).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
What is the benefit of doing this approach? Already, we are considering the logical coordinates, which solve almost everything. The DPI scaling I used when it comes to annotating the screenshot; other than that, everything is at the same level. |
|
Great point — if windows-mcp is standardizing on logical coordinates, that covers most cases. The remaining use case is mixed-tool workflows where windows-mcp receives coordinates from external tools — some providing physical, some logical. Having the ability to specify which coordinate system is being passed eliminates ambiguity and lets the tool handle conversion rather than requiring the caller to know and convert every time. If the team sees that as a worthwhile addition, happy to adjust the PR to align with whatever direction you're heading. If not, no worries — just wanted to contribute. |
Summary
coordinate_systemparameter to Click-Tool, Type-Tool, Scroll-Tool, and Move-Tool"physical"(default, no conversion) or"logical"(auto-converts to physical using DPI scale factor)desktop.get_dpi_scaling()method — no new dependenciesThe Problem
Windows has two coordinate systems: physical (actual pixels) and logical (DPI-scaled). On systems with DPI scaling above 100% — which includes ~300-500 million Windows machines (most modern laptops, 1440p/4K monitors) — different Windows APIs return coordinates in different systems with no indication of which one.
These tools accept physical coordinates, but many Win32 APIs (
GetWindowRect,Cursor.Position,.NETmethods) return logical coordinates. Without conversion, clicks land in the wrong place:Solution
A simple
coordinate_systemparameter that defaults to"physical"(backward-compatible) but can be set to"logical"when coordinates come from APIs that return logical values:The conversion uses
desktop.get_dpi_scaling()which already exists indesktop/service.pyviactypes.windll.user32.GetDpiForSystem().Scope
coordinate_system: "physical" | "logical"coordinate_system: "physical" | "logical"coordinate_system: "physical" | "logical"coordinate_system: "physical" | "logical"Test plan
coordinate_system="physical"coordinate_system="logical"correctly converts coordinatesdesktop.get_dpi_scaling()returns correct scale factor