Browser Automation¶

OmniAgent includes a built-in browser tool powered by Rod for web automation and scraping.

Overview¶

The browser tool enables agents to:

Navigate to URLs and capture screenshots
Click elements and fill forms
Execute JavaScript code
Handle browser dialogs (alerts, confirms, prompts)
Extract text content from pages

Basic Usage¶

import "github.com/plexusone/omniagent/tools/browser"

// Create the browser tool
tool, err := browser.New(browser.Config{
    Headless: true,
    Logger:   logger,
})
if err != nil {
    return err
}
defer tool.Close()

// The agent can now use browser actions

Configuration¶

Field	Type	Default	Description
`Headless`	bool	`true`	Run browser without UI
`UserData`	string	-	Chrome user data directory
`Logger`	*slog.Logger	-	Logger instance
`EvaluateTimeout`	duration	-	JavaScript evaluation timeout
`DialogCallback`	func(Dialog)	-	Callback for observed dialogs

tool, err := browser.New(browser.Config{
    Headless:        true,
    UserData:        "/tmp/chrome-data",
    EvaluateTimeout: 30 * time.Second,
    DialogCallback: func(d browser.Dialog) {
        log.Printf("Dialog observed: %s - %s", d.Type, d.Message)
    },
})

Actions¶

The browser tool supports these actions:

Action	Description
`navigate`	Go to a URL
`click`	Click an element
`type`	Enter text in an input
`screenshot`	Capture page screenshot
`get_text`	Extract text from element
`wait`	Wait for element or timeout
`evaluate`	Execute JavaScript
`get_dialogs`	Get observed dialog history
`dismiss_dialog`	Dismiss an active dialog

Navigate¶

{
  "action": "navigate",
  "url": "https://example.com"
}

Click¶

{
  "action": "click",
  "selector": "button#submit"
}

Type¶

{
  "action": "type",
  "selector": "input[name='email']",
  "text": "user@example.com"
}

Screenshot¶

{
  "action": "screenshot"
}

Returns the screenshot as a base64-encoded image.

Get Text¶

{
  "action": "get_text",
  "selector": "h1.title"
}

Wait¶

{
  "action": "wait",
  "selector": ".loading-complete",
  "timeout": 10000
}

JavaScript Evaluation¶

Execute arbitrary JavaScript in the page context:

{
  "action": "evaluate",
  "script": "return document.title"
}

Complex Scripts¶

{
  "action": "evaluate",
  "script": "return Array.from(document.querySelectorAll('a')).map(a => a.href)"
}

Async Scripts¶

{
  "action": "evaluate",
  "script": "return await fetch('/api/data').then(r => r.json())"
}

Dialog Handling¶

The browser tool automatically tracks JavaScript dialogs (alert, confirm, prompt).

Dialog Types¶

Type	Description
`alert`	Information message, OK button only
`confirm`	Yes/No choice
`prompt`	Text input
`beforeunload`	Page leave confirmation

Dialog Structure¶

type Dialog struct {
    Type         string    // "alert", "confirm", "prompt", "beforeunload"
    Message      string    // Dialog message text
    DefaultValue string    // Default value for prompt dialogs
    URL          string    // Page URL where dialog appeared
    Timestamp    time.Time // When observed
    Handled      bool      // If automatically handled
    Response     string    // Response given
}

Get Dialog History¶

{
  "action": "get_dialogs"
}

Returns all observed dialogs:

[
  {
    "type": "alert",
    "message": "Form submitted successfully!",
    "url": "https://example.com/form",
    "timestamp": "2025-01-15T10:30:00Z",
    "handled": true
  },
  {
    "type": "confirm",
    "message": "Are you sure you want to delete?",
    "url": "https://example.com/settings",
    "timestamp": "2025-01-15T10:31:00Z",
    "handled": true,
    "response": "true"
  }
]

Dismiss Active Dialog¶

{
  "action": "dismiss_dialog",
  "response": "false"
}

For confirm dialogs, use "true" or "false". For prompt dialogs, provide the text to enter.

Dialog Callback¶

Handle dialogs programmatically:

tool, err := browser.New(browser.Config{
    Headless: true,
    DialogCallback: func(d browser.Dialog) {
        switch d.Type {
        case "alert":
            log.Printf("Alert: %s", d.Message)
        case "confirm":
            log.Printf("Confirm: %s (answered: %s)", d.Message, d.Response)
        case "prompt":
            log.Printf("Prompt: %s (input: %s)", d.Message, d.Response)
        }
    },
})

Selectors¶

The browser tool supports CSS selectors and XPath:

CSS Selectors¶

{"selector": "#submit-button"}
{"selector": ".form-input"}
{"selector": "button[type='submit']"}
{"selector": "div.container > p:first-child"}

XPath¶

{"selector": "//button[contains(text(), 'Submit')]"}
{"selector": "//div[@class='results']//a"}

Error Handling¶

The browser tool returns errors for common failure cases:

Error	Cause
`element not found`	Selector didn't match any element
`timeout waiting for element`	Element didn't appear within timeout
`navigation failed`	URL couldn't be loaded
`script evaluation failed`	JavaScript error

Best Practices¶

Wait for Elements¶

Always wait for elements before interacting:

[
  {"action": "navigate", "url": "https://example.com"},
  {"action": "wait", "selector": ".page-loaded"},
  {"action": "click", "selector": "button#action"}
]

Handle Dynamic Content¶

Use JavaScript evaluation for dynamic pages:

{
  "action": "evaluate",
  "script": "return new Promise(resolve => { const check = () => { const el = document.querySelector('.data-loaded'); if (el) resolve(el.textContent); else setTimeout(check, 100); }; check(); })"
}

Clean Up¶

Always close the browser when done:

tool, err := browser.New(config)
if err != nil {
    return err
}
defer tool.Close()

Headless Mode¶

Use headless mode in production for better performance and resource usage:

tool, err := browser.New(browser.Config{
    Headless: true,
})

Disable headless for debugging:

tool, err := browser.New(browser.Config{
    Headless: false,  // Shows browser window
})