ui-element-ops

This repository keeps a reusable skill for UI parsing and desktop operations:

skills/ui-element-ops

What It Can Do

Parse UI screenshots into structured elements (type, bbox, text, clickable)
Find / wait for elements
Click, type, key, hotkey
Take screenshots
Calibrate coordinates for DPI / multi-display / window offsets

GUI / Headless Notes

In headless systems, screenshot parsing still works if you already have image files.
In headless systems, list / find / wait / calibrate can still run on existing *.elements.json.
In headless systems, interactive actions do not work: click, click-xy, type, key, hotkey, screenshot, screen-info.
Interactive actions require an active GUI desktop session and OS permissions (Accessibility / Screen Recording where applicable).

Quick Start

Bootstrap environment:

skills/ui-element-ops/scripts/bootstrap_omniparser_env.sh "$PWD"

Parse an image:

skills/ui-element-ops/scripts/run_parse_ui.sh /abs/path/to/screen.png

Operate UI:

python3 skills/ui-element-ops/scripts/operate_ui.py --help

Capture + parse with randomized names:

skills/ui-element-ops/scripts/capture_and_parse.sh

Performance Notes

run_parse_ui.sh and capture_and_parse.sh are compute-heavy (OCR + detection + captioning).
On CPU-only machines, one run can take tens of seconds and high CPU/RAM usage.
If you already have a screenshot file, parse it directly instead of capturing again:

skills/ui-element-ops/scripts/run_parse_ui.sh /abs/path/to/screen.png

Avoid tight loops; increase polling intervals for repeated tasks.
Prefer lower-frequency parsing and reuse the latest *.elements.json when possible.

Main Files

skills/ui-element-ops/SKILL.md
skills/ui-element-ops/agents/openai.yaml
skills/ui-element-ops/scripts/parse_ui.py
skills/ui-element-ops/scripts/operate_ui.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
skills/ui-element-ops		skills/ui-element-ops
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ui-element-ops

What It Can Do

GUI / Headless Notes

Quick Start

Performance Notes

Main Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ui-element-ops

What It Can Do

GUI / Headless Notes

Quick Start

Performance Notes

Main Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages