Browser Automation¶
OmniAgent includes a built-in browser tool powered by Rod for web automation and scraping.
Overview¶
The browser tool enables agents to:
- Navigate to URLs and capture screenshots
- Click elements and fill forms
- Execute JavaScript code
- Handle browser dialogs (alerts, confirms, prompts)
- Extract text content from pages
Basic Usage¶
import "github.com/plexusone/omniagent/tools/browser"
// Create the browser tool
tool, err := browser.New(browser.Config{
Headless: true,
Logger: logger,
})
if err != nil {
return err
}
defer tool.Close()
// The agent can now use browser actions
Configuration¶
| Field | Type | Default | Description |
|---|---|---|---|
Headless |
bool | true |
Run browser without UI |
UserData |
string | - | Chrome user data directory |
Logger |
*slog.Logger | - | Logger instance |
EvaluateTimeout |
duration | - | JavaScript evaluation timeout |
DialogCallback |
func(Dialog) | - | Callback for observed dialogs |
tool, err := browser.New(browser.Config{
Headless: true,
UserData: "/tmp/chrome-data",
EvaluateTimeout: 30 * time.Second,
DialogCallback: func(d browser.Dialog) {
log.Printf("Dialog observed: %s - %s", d.Type, d.Message)
},
})
Actions¶
The browser tool supports these actions:
| Action | Description |
|---|---|
navigate |
Go to a URL |
click |
Click an element |
type |
Enter text in an input |
screenshot |
Capture page screenshot |
get_text |
Extract text from element |
wait |
Wait for element or timeout |
evaluate |
Execute JavaScript |
get_dialogs |
Get observed dialog history |
dismiss_dialog |
Dismiss an active dialog |
Navigate¶
Click¶
Type¶
Screenshot¶
Returns the screenshot as a base64-encoded image.
Get Text¶
Wait¶
JavaScript Evaluation¶
Execute arbitrary JavaScript in the page context:
Complex Scripts¶
{
"action": "evaluate",
"script": "return Array.from(document.querySelectorAll('a')).map(a => a.href)"
}
Async Scripts¶
Dialog Handling¶
The browser tool automatically tracks JavaScript dialogs (alert, confirm, prompt).
Dialog Types¶
| Type | Description |
|---|---|
alert |
Information message, OK button only |
confirm |
Yes/No choice |
prompt |
Text input |
beforeunload |
Page leave confirmation |
Dialog Structure¶
type Dialog struct {
Type string // "alert", "confirm", "prompt", "beforeunload"
Message string // Dialog message text
DefaultValue string // Default value for prompt dialogs
URL string // Page URL where dialog appeared
Timestamp time.Time // When observed
Handled bool // If automatically handled
Response string // Response given
}
Get Dialog History¶
Returns all observed dialogs:
[
{
"type": "alert",
"message": "Form submitted successfully!",
"url": "https://example.com/form",
"timestamp": "2025-01-15T10:30:00Z",
"handled": true
},
{
"type": "confirm",
"message": "Are you sure you want to delete?",
"url": "https://example.com/settings",
"timestamp": "2025-01-15T10:31:00Z",
"handled": true,
"response": "true"
}
]
Dismiss Active Dialog¶
For confirm dialogs, use "true" or "false". For prompt dialogs, provide the text to enter.
Dialog Callback¶
Handle dialogs programmatically:
tool, err := browser.New(browser.Config{
Headless: true,
DialogCallback: func(d browser.Dialog) {
switch d.Type {
case "alert":
log.Printf("Alert: %s", d.Message)
case "confirm":
log.Printf("Confirm: %s (answered: %s)", d.Message, d.Response)
case "prompt":
log.Printf("Prompt: %s (input: %s)", d.Message, d.Response)
}
},
})
Selectors¶
The browser tool supports CSS selectors and XPath:
CSS Selectors¶
{"selector": "#submit-button"}
{"selector": ".form-input"}
{"selector": "button[type='submit']"}
{"selector": "div.container > p:first-child"}
XPath¶
Error Handling¶
The browser tool returns errors for common failure cases:
| Error | Cause |
|---|---|
element not found |
Selector didn't match any element |
timeout waiting for element |
Element didn't appear within timeout |
navigation failed |
URL couldn't be loaded |
script evaluation failed |
JavaScript error |
Best Practices¶
Wait for Elements¶
Always wait for elements before interacting:
[
{"action": "navigate", "url": "https://example.com"},
{"action": "wait", "selector": ".page-loaded"},
{"action": "click", "selector": "button#action"}
]
Handle Dynamic Content¶
Use JavaScript evaluation for dynamic pages:
{
"action": "evaluate",
"script": "return new Promise(resolve => { const check = () => { const el = document.querySelector('.data-loaded'); if (el) resolve(el.textContent); else setTimeout(check, 100); }; check(); })"
}
Clean Up¶
Always close the browser when done:
Headless Mode¶
Use headless mode in production for better performance and resource usage:
Disable headless for debugging: