AutoControl is a cross-platform Python GUI automation framework providing mouse control, keyboard input, image recognition, screen capture, action scripting, and report generation — all through a unified API that works on Windows, macOS, and Linux (X11).
- Features
- Architecture
- Installation
- Requirements
- Quick Start
- API Reference
- Mouse Control
- Keyboard Control
- Image Recognition
- Accessibility Element Finder
- AI Element Locator (VLM)
- OCR (Text on Screen)
- Clipboard
- Screen Operations
- Action Recording & Playback
- Action Scripting (JSON Executor)
- Scheduler (Interval & Cron)
- Global Hotkey Daemon
- Event Triggers
- Run History
- Report Generation
- Remote Automation (Socket / REST)
- Plugin Loader
- Shell Command Execution
- Screen Recording
- Callback Executor
- Package Manager
- Project Management
- Window Management
- GUI Application
- Command-Line Interface
- Platform Support
- Development
- License
- Mouse Automation — move, click, press, release, drag, and scroll with precise coordinate control
- Keyboard Automation — press/release individual keys, type strings, hotkey combinations, key state detection
- Image Recognition — locate UI elements on screen using OpenCV template matching with configurable threshold
- Accessibility Element Finder — query the OS accessibility tree (Windows UIA / macOS AX) to locate buttons, menus, and controls by name/role
- AI Element Locator (VLM) — describe a UI element in plain language and let a vision-language model (Anthropic / OpenAI) find its screen coordinates
- OCR — extract text from screen regions using Tesseract; wait for, click, or locate rendered text
- Clipboard — read/write system clipboard text on Windows, macOS, and Linux
- Screenshot & Screen Recording — capture full screen or regions as images, record screen to video (AVI/MP4)
- Action Recording & Playback — record mouse/keyboard events and replay them
- JSON-Based Action Scripting — define and execute automation flows using JSON action files (dry-run + step debug)
- Scheduler — run scripts on an interval or cron expression; jobs persist across restarts
- Global Hotkey Daemon — bind OS-level hotkeys to action scripts (Windows today; macOS/Linux stubs in place)
- Event Triggers — fire scripts when an image appears, a window opens, a pixel changes, or a file is modified
- Run History — SQLite-backed run log across scheduler / triggers / hotkeys / REST with auto error-screenshot artifacts
- Report Generation — export test records as HTML, JSON, or XML reports with success/failure status
- Remote Automation — TCP socket server and REST API server to receive automation commands
- Plugin Loader — drop
.pyfiles exposingAC_*callables into a directory and register them as executor commands at runtime - Shell Integration — execute shell commands within automation workflows with async output capture
- Callback Executor — trigger automation functions with callback hooks for chaining operations
- Dynamic Package Loading — extend the executor at runtime by importing external Python packages
- Project & Template Management — scaffold automation projects with keyword/executor directory structure
- Window Management — send keyboard/mouse events directly to specific windows (Windows/Linux)
- GUI Application — built-in PySide6 graphical interface with live language switching (English / 繁體中文 / 简体中文 / 日本語)
- CLI Runner —
python -m je_auto_control.cli run|list-jobs|start-server|start-rest - Cross-Platform — unified API across Windows, macOS, and Linux (X11)
je_auto_control/
├── wrapper/ # Platform-agnostic API layer
│ ├── platform_wrapper.py # Auto-detects OS and loads the correct backend
│ ├── auto_control_mouse.py # Mouse operations
│ ├── auto_control_keyboard.py# Keyboard operations
│ ├── auto_control_image.py # Image recognition (OpenCV template matching)
│ ├── auto_control_screen.py # Screenshot, screen size, pixel color
│ └── auto_control_record.py # Action recording/playback
├── windows/ # Windows-specific backend (Win32 API / ctypes)
├── osx/ # macOS-specific backend (pyobjc / Quartz)
├── linux_with_x11/ # Linux-specific backend (python-Xlib)
├── gui/ # PySide6 GUI application
└── utils/
├── executor/ # JSON action executor engine
├── callback/ # Callback function executor
├── cv2_utils/ # OpenCV screenshot, template matching, video recording
├── accessibility/ # UIA (Windows) / AX (macOS) element finder
├── vision/ # VLM-based locator (Anthropic / OpenAI backends)
├── ocr/ # Tesseract-backed text locator
├── clipboard/ # Cross-platform clipboard
├── scheduler/ # Interval + cron scheduler
├── hotkey/ # Global hotkey daemon
├── triggers/ # Image/window/pixel/file triggers
├── run_history/ # SQLite run log + error-screenshot artifacts
├── rest_api/ # Stdlib HTTP/REST server
├── plugin_loader/ # Dynamic AC_* plugin discovery
├── socket_server/ # TCP socket server for remote automation
├── shell_process/ # Shell command manager
├── generate_report/ # HTML / JSON / XML report generators
├── test_record/ # Test action recording
├── script_vars/ # Script variable interpolation
├── watcher/ # Mouse / pixel / log watchers (Live HUD)
├── recording_edit/ # Trim, filter, re-scale recorded actions
├── json/ # JSON action file read/write
├── project/ # Project scaffolding & templates
├── package_manager/ # Dynamic package loading
├── logging/ # Logging
└── exception/ # Custom exception classes
The platform_wrapper.py module automatically detects the current operating system and imports the corresponding backend, so all wrapper functions work identically regardless of platform.
pip install je_auto_controlpip install je_auto_control[gui]On Linux, install the following system packages before installing:
sudo apt-get install cmake libssl-dev- Python >= 3.10
- pip >= 19.3
| Package | Purpose |
|---|---|
je_open_cv |
Image recognition (OpenCV template matching) |
pillow |
Screenshot capture |
mss |
Fast multi-monitor screenshot |
pyobjc |
macOS backend (auto-installed on macOS) |
python-Xlib |
Linux X11 backend (auto-installed on Linux) |
PySide6 |
GUI application (optional, install with [gui]) |
qt-material |
GUI theme (optional, install with [gui]) |
uiautomation |
Windows accessibility backend (optional, loaded on demand) |
pytesseract + Tesseract |
OCR engine (optional, loaded on demand) |
anthropic |
VLM locator — Anthropic backend (optional, loaded on demand) |
openai |
VLM locator — OpenAI backend (optional, loaded on demand) |
See Third_Party_License.md for a full list of third-party components and their licenses.
import je_auto_control
# Get current mouse position
x, y = je_auto_control.get_mouse_position()
print(f"Mouse at: ({x}, {y})")
# Move mouse to coordinates
je_auto_control.set_mouse_position(500, 300)
# Left click at current position (use key name)
je_auto_control.click_mouse("mouse_left")
# Right click at specific coordinates
je_auto_control.click_mouse("mouse_right", x=800, y=400)
# Scroll down
je_auto_control.mouse_scroll(scroll_value=5)import je_auto_control
# Press and release a single key
je_auto_control.type_keyboard("a")
# Type a whole string character by character
je_auto_control.write("Hello World")
# Hotkey combination (e.g., Ctrl+C)
je_auto_control.hotkey(["ctrl_l", "c"])
# Check if a key is currently pressed
is_pressed = je_auto_control.check_key_is_press("shift_l")import je_auto_control
# Find all occurrences of an image on screen
positions = je_auto_control.locate_all_image("button.png", detect_threshold=0.9)
# Returns: [[x1, y1, x2, y2], ...]
# Find a single image and get its center coordinates
cx, cy = je_auto_control.locate_image_center("icon.png", detect_threshold=0.85)
print(f"Found at: ({cx}, {cy})")
# Find an image and automatically click it
je_auto_control.locate_and_click("submit_button.png", mouse_keycode="mouse_left")Query the OS accessibility tree to locate controls by name, role, or app.
Works on Windows (UIA, via uiautomation) and macOS (AX).
import je_auto_control
# List all visible buttons in the Calculator app
elements = je_auto_control.list_accessibility_elements(app_name="Calculator")
# Find a specific element
ok = je_auto_control.find_accessibility_element(name="OK", role="Button")
if ok is not None:
print(ok.bounds, ok.center)
# Click it directly
je_auto_control.click_accessibility_element(name="OK", app_name="Calculator")Raises AccessibilityNotAvailableError if no accessibility backend is
installed for the current platform.
When template matching and accessibility both fail, describe the element in plain language and let a vision-language model find its coordinates.
import je_auto_control
# Uses Anthropic by default if ANTHROPIC_API_KEY is set, else OpenAI.
x, y = je_auto_control.locate_by_description("the green Submit button")
# Or click it in one shot
je_auto_control.click_by_description(
"the cookie-banner 'Accept all' button",
screen_region=[0, 800, 1920, 1080], # optional crop
)Configuration (environment variables only — keys are never persisted or logged):
| Variable | Effect |
|---|---|
ANTHROPIC_API_KEY |
Enables the Anthropic backend |
OPENAI_API_KEY |
Enables the OpenAI backend |
AUTOCONTROL_VLM_BACKEND |
anthropic or openai to force a backend |
AUTOCONTROL_VLM_MODEL |
Override the default model (e.g. claude-opus-4-7, gpt-4o-mini) |
Raises VLMNotAvailableError if neither SDK is installed or no API key
is set.
import je_auto_control as ac
# Locate all matches of a piece of text
matches = ac.find_text_matches("Submit")
# Center of the first match, or None
cx, cy = ac.locate_text_center("Submit")
# Click text in one call
ac.click_text("Submit")
# Block until text appears (or timeout)
ac.wait_for_text("Loading complete", timeout=15.0)If Tesseract is not on PATH, point at it explicitly:
ac.set_tesseract_cmd(r"C:\Program Files\Tesseract-OCR\tesseract.exe")import je_auto_control as ac
ac.set_clipboard("hello")
text = ac.get_clipboard()Backends: Windows (Win32 via ctypes), macOS (pbcopy/pbpaste),
Linux (xclip or xsel).
import je_auto_control
# Take a full-screen screenshot and save to file
je_auto_control.pil_screenshot("screenshot.png")
# Take a screenshot of a specific region [x1, y1, x2, y2]
je_auto_control.pil_screenshot("region.png", screen_region=[100, 100, 500, 400])
# Get screen resolution
width, height = je_auto_control.screen_size()
# Get pixel color at coordinates
color = je_auto_control.get_pixel(500, 300)import je_auto_control
import time
# Start recording mouse and keyboard events
je_auto_control.record()
time.sleep(10) # Record for 10 seconds
# Stop recording and get the action list
actions = je_auto_control.stop_record()
# Replay the recorded actions
je_auto_control.execute_action(actions)Create a JSON action file (actions.json):
[
["AC_set_mouse_position", {"x": 500, "y": 300}],
["AC_click_mouse", {"mouse_keycode": "mouse_left"}],
["AC_write", {"write_string": "Hello from AutoControl"}],
["AC_screenshot", {"file_path": "result.png"}],
["AC_hotkey", {"key_code_list": ["ctrl_l", "s"]}]
]Execute it:
import je_auto_control
# Execute from file
je_auto_control.execute_action(je_auto_control.read_action_json("actions.json"))
# Or execute from a list directly
je_auto_control.execute_action([
["AC_set_mouse_position", {"x": 100, "y": 200}],
["AC_click_mouse", {"mouse_keycode": "mouse_left"}]
])Available action commands:
| Category | Commands |
|---|---|
| Mouse | AC_click_mouse, AC_set_mouse_position, AC_get_mouse_position, AC_press_mouse, AC_release_mouse, AC_mouse_scroll, AC_mouse_left, AC_mouse_right, AC_mouse_middle |
| Keyboard | AC_type_keyboard, AC_press_keyboard_key, AC_release_keyboard_key, AC_write, AC_hotkey, AC_check_key_is_press |
| Image | AC_locate_all_image, AC_locate_image_center, AC_locate_and_click |
| Screen | AC_screen_size, AC_screenshot |
| Accessibility | AC_a11y_list, AC_a11y_find, AC_a11y_click |
| VLM (AI Locator) | AC_vlm_locate, AC_vlm_click |
| OCR | AC_locate_text, AC_click_text, AC_wait_text |
| Clipboard | AC_clipboard_get, AC_clipboard_set |
| Record | AC_record, AC_stop_record |
| Report | AC_generate_html, AC_generate_json, AC_generate_xml, AC_generate_html_report, AC_generate_json_report, AC_generate_xml_report |
| Project | AC_create_project |
| Shell | AC_shell_command |
| Process | AC_execute_process |
| Executor | AC_execute_action, AC_execute_files |
import je_auto_control as ac
# Interval job — run every 30 seconds
job = ac.default_scheduler.add_job(
script_path="scripts/poll.json", interval_seconds=30, repeat=True,
)
# Cron job — 09:00 on weekdays (minute hour dom month dow)
cron_job = ac.default_scheduler.add_cron_job(
script_path="scripts/daily.json", cron_expression="0 9 * * 1-5",
)
ac.default_scheduler.start()Both flavours coexist; job.is_cron tells them apart.
Bind OS-level hotkeys to action JSON scripts (Windows backend today;
macOS / Linux raise NotImplementedError on start() with Strategy-
pattern seams in place).
from je_auto_control import default_hotkey_daemon
default_hotkey_daemon.bind("ctrl+alt+1", "scripts/greet.json")
default_hotkey_daemon.start()Poll-based triggers that fire a script when a condition becomes true:
from je_auto_control import (
default_trigger_engine, ImageAppearsTrigger,
WindowAppearsTrigger, PixelColorTrigger, FilePathTrigger,
)
default_trigger_engine.add(ImageAppearsTrigger(
trigger_id="", script_path="scripts/click_ok.json",
image_path="templates/ok_button.png", threshold=0.85, repeat=True,
))
default_trigger_engine.start()Every run from the scheduler, trigger engine, hotkey daemon, REST API,
and manual GUI replay is recorded to ~/.je_auto_control/history.db.
Errors automatically attach a screenshot under
~/.je_auto_control/artifacts/run_{id}_{ms}.png for post-mortem.
from je_auto_control import default_history_store
for run in default_history_store.list_runs(limit=20):
print(run.id, run.source, run.status, run.artifact_path)The GUI Run History tab exposes filter/refresh/clear and double-click-to-open on the artifact column.
import je_auto_control
# Enable test recording first
je_auto_control.test_record_instance.set_record_enable(True)
# ... perform automation actions ...
je_auto_control.set_mouse_position(100, 200)
je_auto_control.click_mouse("mouse_left")
# Generate reports
je_auto_control.generate_html_report("test_report") # -> test_report.html
je_auto_control.generate_json_report("test_report") # -> test_report.json
je_auto_control.generate_xml_report("test_report") # -> test_report.xml
# Or get report content as string
html_string = je_auto_control.generate_html()
json_string = je_auto_control.generate_json()
xml_string = je_auto_control.generate_xml()Reports include: function name, parameters, timestamp, and exception info (if any) for each recorded action. HTML reports display successful actions in cyan and failed actions in red.
Two servers are available — a raw TCP socket and a stdlib HTTP/REST
server. Both default to 127.0.0.1; binding to 0.0.0.0 is an explicit,
documented opt-in.
import je_auto_control as ac
# TCP socket server (default: 127.0.0.1:9938)
ac.start_autocontrol_socket_server(host="127.0.0.1", port=9938)
# REST API server (default: 127.0.0.1:9939)
ac.start_rest_api_server(host="127.0.0.1", port=9939)
# Endpoints:
# GET /health liveness probe
# GET /jobs scheduler job list
# POST /execute body: {"actions": [...]}Client example:
import socket
import json
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("localhost", 9938))
# Send an automation command
command = json.dumps([
["AC_set_mouse_position", {"x": 500, "y": 300}],
["AC_click_mouse", {"mouse_keycode": "mouse_left"}]
])
sock.sendall(command.encode("utf-8"))
# Receive response
response = sock.recv(8192).decode("utf-8")
print(response)
sock.close()Drop .py files defining top-level AC_* callables into a directory,
then register them as executor commands at runtime:
from je_auto_control import (
load_plugin_directory, register_plugin_commands,
)
commands = load_plugin_directory("./my_plugins")
register_plugin_commands(commands)
# Now usable from any JSON action script:
# [["AC_greet", {"name": "world"}]]Warning: Plugin files execute arbitrary Python on load. Only load from directories you control.
import je_auto_control
# Using the default shell manager
je_auto_control.default_shell_manager.exec_shell("echo Hello")
je_auto_control.default_shell_manager.pull_text() # Print captured output
# Or create a custom ShellManager
shell = je_auto_control.ShellManager(shell_encoding="utf-8")
shell.exec_shell("ls -la")
shell.pull_text()
shell.exit_program()import je_auto_control
import time
# Method 1: ScreenRecorder (manages multiple recordings)
recorder = je_auto_control.ScreenRecorder()
recorder.start_new_record(
recorder_name="my_recording",
path_and_filename="output.avi",
codec="XVID",
frame_per_sec=30,
resolution=(1920, 1080)
)
time.sleep(10)
recorder.stop_record("my_recording")
# Method 2: RecordingThread (simple single recording, outputs MP4)
recording = je_auto_control.RecordingThread(video_name="my_video", fps=20)
recording.start()
time.sleep(10)
recording.stop()Execute an automation function and trigger a callback upon completion:
import je_auto_control
def my_callback():
print("Action completed!")
# Execute set_mouse_position then call my_callback
je_auto_control.callback_executor.callback_function(
trigger_function_name="AC_set_mouse_position",
callback_function=my_callback,
x=500, y=300
)
# With callback parameters
def on_done(message):
print(f"Done: {message}")
je_auto_control.callback_executor.callback_function(
trigger_function_name="AC_click_mouse",
callback_function=on_done,
callback_function_param={"message": "Click finished"},
callback_param_method="kwargs",
mouse_keycode="mouse_left"
)Dynamically load external Python packages into the executor at runtime:
import je_auto_control
# Add all functions/classes from a package to the executor
je_auto_control.package_manager.add_package_to_executor("os")
# Now you can use os functions in JSON action scripts:
# ["os_getcwd", {}]
# ["os_listdir", {"path": "."}]Scaffold a project directory structure with template files:
import je_auto_control
# Create a project structure
je_auto_control.create_project_dir(project_path="./my_project", parent_name="AutoControl")
# This creates:
# my_project/
# └── AutoControl/
# ├── keyword/
# │ ├── keyword1.json # Template action file
# │ ├── keyword2.json # Template action file
# │ └── bad_keyword_1.json # Error handling template
# └── executor/
# ├── executor_one_file.py # Execute single file example
# ├── executor_folder.py # Execute folder example
# └── executor_bad_file.py # Error handling exampleSend events directly to specific windows (Windows and Linux only):
import je_auto_control
# Send keyboard event to a window by title
je_auto_control.send_key_event_to_window("Notepad", keycode="a")
# Send mouse event to a window handle
je_auto_control.send_mouse_event_to_window(window_handle, mouse_keycode="mouse_left", x=100, y=50)Launch the built-in graphical interface (requires [gui] extra):
import je_auto_control
je_auto_control.start_autocontrol_gui()Or from the command line:
python -m je_auto_controlAutoControl can be used directly from the command line:
# Execute a single action file
python -m je_auto_control -e actions.json
# Execute all action files in a directory
python -m je_auto_control -d ./action_files/
# Execute a JSON string directly
python -m je_auto_control --execute_str '[["AC_screenshot", {"file_path": "test.png"}]]'
# Create a project template
python -m je_auto_control -c ./my_projectA richer subcommand CLI built on the headless APIs:
# Run a script, optionally with variables, and/or a dry-run
python -m je_auto_control.cli run script.json
python -m je_auto_control.cli run script.json --var name=alice --dry-run
# List scheduler jobs
python -m je_auto_control.cli list-jobs
# Start the socket or REST server
python -m je_auto_control.cli start-server --port 9938
python -m je_auto_control.cli start-rest --port 9939--var name=value is parsed as JSON when possible (so count=10 becomes
an int), otherwise treated as a string.
| Platform | Status | Backend | Notes |
|---|---|---|---|
| Windows 10 / 11 | Supported | Win32 API (ctypes) | Full feature support |
| macOS 10.15+ | Supported | pyobjc / Quartz | Action recording not available; send_key_event_to_window / send_mouse_event_to_window not supported |
| Linux (X11) | Supported | python-Xlib | Full feature support |
| Linux (Wayland) | Not supported | — | May be added in a future release |
| Raspberry Pi 3B / 4B | Supported | python-Xlib | Runs on X11 |
git clone https://github.com/Intergration-Automation-Testing/AutoControl.git
cd AutoControl
pip install -r dev_requirements.txt# Unit tests
python -m pytest test/unit_test/
# Integration tests
python -m pytest test/integrated_test/- Homepage: https://github.com/Intergration-Automation-Testing/AutoControl
- Documentation: https://autocontrol.readthedocs.io/en/latest/
- PyPI: https://pypi.org/project/je_auto_control/
MIT License © JE-Chen. See Third_Party_License.md for the licenses of bundled and optional third-party dependencies.