The Next Frontier of Web Automation: Supercharging AI Agents with Web Domain Insights
In today’s digital world, so much of enterprise functionality lives on the web — yet much of that remains under-automated. To build truly autonomous AI systems, agents need more than just access: they need insight. By embedding deep understanding of web domains — their structure, context, and quirks — we make it possible for agents to navigate, reason, and act with far greater precision. This post explores how we’re pushing the frontier of web automation by giving our agents the power to extract domain-specific knowledge from sites and use that to execute robust, context-aware workflows — a key step toward autonomous, intelligent systems that can operate across real-world web environments.
AI agents have demonstrated their capability of navigating pages, filling out forms, and summarizing content in controlled or simulated environments (e.g., web bench). However, web automation is complex and remains to be one of the promising yet most challenging frontiers for AI agents. This is due to the ever-changing nature of the real web, which makes it difficult for web agents to generalize effectively across various websites.
In contrast, humans extract actionable insights from browsing the internet that guide their information-seeking process and make it transferrable across websites. This presents an opportunity to enhance web agents by understanding how humans navigate the web and using that as a reference. Specifically, by anchoring on how humans parse information on the web based on text snippets, links to pages, or metadata that hint at valuable content –naturally optimizing the trade-off between information gained and effort spent– we developed an automated framework that auto extracts browsing and navigational insights from websites that can be leveraged by AI agents to navigate the web more intelligently and efficiently when completing tasks online.
To generate these web-domain insights, we auto-generated 315 distinct web tasks, 46 (14.6%) of them are information-seeking tasks (e.g., “Go to www.sec.gov Retrieve filings made during the previous five business days using the "Current Events Analysis" feature”), from five websites across the domains of finance, biomedical research, and public health. Next, our framework used each of the 46 information-seeking tasks to navigate the corresponding websites, learning navigational heuristics and understanding how information is structured on each site. This process resulted in a total of 1,383 web-domain insights. Below are some examples of these insights:
- The 'Search for Daily Filings by Type (Current Events)' section contains a form with a dropdown menu that allows users to select days prior to the current date, including an option labeled 'Five business days prior (t-5)', which can be used to retrieve filings from the previous week
- The CIK Lookup page features an input field where users can type a partial name to search for a Central Index Key. This field is crucial for initiating a search query
- The 'Latest Filings' link is crucial for accessing the most recent SEC filings. Clicking this link directs to the same page, indicating that it is the primary navigation element for viewing updated filings
- If a search returns zero results, a hyperlink labeled 'Perform another Company-CIK Lookup' allows users to easily return to the search page to try a different query
- The input field for the company name is pre-filled with 'Example Company,' serving as a placeholder to guide users on the type of input expected
To assess the efficacy of these auto-generated insights in enhancing AI agents’ performance on web tasks, we benchmarked our enterprise web agent on 46 tasks, both with and without the incorporation of the 1,383 web-domain insights. Our evaluation shows that incorporating navigational information including how information is structured on websites (beyond parsing page’s DOM structure) results in:
- Higher task completion rate (up by 7%) for our enterprise agent with domain insights (see Figure 1).
- Average reduction of 45 seconds in task completion time per task.
- Substantial reduction in the number of LLM calls (50.3% less) initiated by our enterprise agent as part of reasoning about next steps (see Figure 2).


Figure 1: Our enterprise web agent equipped with web-domain insights achieved higher task completion rates and completed tasks an average of 45 seconds faster than the same agent without such insights.

Figure 2: The enterprise web agent with web-domain insights initiated approximately 50.3% fewer LLM calls for reasoning and planning compared to the same agent without such insights.
In addition, we qualitatively evaluated the final responses generated by our enterprise web agent, with and without web-domain insights, after completing each web task across four dimensions: preference, granularity, relevance, and plausibility. Human annotators showed an overall 81% preference for responses generated by our web agent with insights. Moreover, the responses generated by our enterprise agent with web-domain insights were perceived as sufficiently detailed, relevant to the assigned tasks, and plausible given the context of each task. Below are some testimonials from the annotators:
- “Agent 2 [with insights] is clear, accurate, and well-structured, providing a complete list of filings from the previous week with company names, filing types, CIKs, and direct links for easy access and further analysis.”
- “Agent 2 [with insights] is more comprehensive, covering both the search methodology and concrete examples, whereas Agent 1 [without] provides fewer results and less contextual information about Clinical Queries filters and categories.”
- “Agent 1 [without insights] is like a general google search, but the response from Agent 2 [with insights] is more intelligent as it inferred that excipients are inactive ingredients and framed the response in that direction.”
- “Agent 2 [with insights] offers a more comprehensive coverage than Agent 1 [without insights], including the dedicated NTD overview, control strategies, and disease-specific pages, making it highly informative and actionable.”
Why does this matter for Enterprises?
For organizations relying on web-based data collection, compliance checks, or research automation, these performance gains in agentic capability translate into measurable impact such as:
- Faster Decision Cycles – Agents can extract and summarize data from regulatory or research websites in seconds.
- Lower Operational Costs – Fewer LLM calls mean lower compute bills.
- Higher Accuracy – Insights reduce agent drift, improving precision in complex information tasks.
What we learned?
- Agent’s Long Term Memory (LTM) matters: Without a way to embed and retrieve domain insights as an agent navigates the internet, web agents have to re-learn too much information each time they attempt to complete a task online.
- Verification overhead: Agents sometimes spend too long reasoning and double-checking information at any step during execution, when they should always act. Therefore, integrating domain insights can help agents to prioritize the best next action based on an agent’s knowledge of the task and the website.
What is next?
The web is dynamic, domain-rich and constantly evolving. By equipping agents with the ability to forage, reason, and learn the domain structure, purpose, and affordances behind the pages they visit, we’re shifting from brittle web automation to agents that complete tasks with an understanding of the web. Accordingly, as part of next steps, we plan to evaluate the performance of web agents developed by other providers on the 46 web tasks and further compare their results to our enterprise web agent, both with and without insights.
More From the Journal

The Next Frontier of Web Automation: Supercharging AI Agents with Web Domain Insights

Building GenAgent: A Journey Through Reliable Code Generation
