At Emergence, we’ve always believed that the next significant advancement in workflow automation will come from the planning, selection, and use of multiple *external tools* by artificial intelligence. While large language models (LLMs) have ushered in significant advancements in machine understanding of natural language, there is still much work to be done in enabling an intelligent agent to actually influence its environment (digital or physical).

Appropriate use of tools requires goal-oriented chains of thought reasoning and the decomposition of tasks into efficiently executable steps. At last year’s NeurIPS conference, we presented a paper detailing our work on an agent that controls tools such as math solvers (like Wolfram Alpha) and graphing calculators (like Desmos) to create complex mathematical visualizations based on natural language instructions. This feat went beyond external tool control—math is one of the subjects most difficult for AI to deal with precisely.

LLMs are fundamentally different from traditional computing systems. AI models are trained on vast datasets of text in order to identify patterns and make predictions. When you ask an LLM to identify a prime number, it’s referring back to text where similarly phrased questions have been asked and answered. It was never built to answer your question 100% correctly; it’s built to answer it naturally, approximating over countless data identified as linguistically relevant. So, the more data we use to train LLMs, the further they may drift from the correct answers to certain questions.

However, this limitation can be mitigated or outright circumvented by using specialized tools in conjunction with LLM chains of thought . Emergence’s paper, “An Automated Graphing System for Mathematical Pedagogy,” showcases an agent (termed *MathViz-E*), which accurately coordinates the use of multiple math tools. This automated graphing system takes in utterances, uses the LLM to formulate the speaker’s intent into computational steps, uses the solvers as needed at each step, converts the solutions into mathematical expressions, and graphs them with the Desmos graphing calculator.

Emergence builds scalable multi-agent solutions for enterprises of any industry. This agent in particular was built to solve a pain point in the education sector. Graphs are an essential tool in the classroom, allowing students to visualize and interact with mathematical concepts. While far from the only use-case for the agent, MathViz-E could be used by teachers to visualize mathematical concepts described entirely by voice.

In the above flowchart featured in the paper, an overview is given of the automated graphing system. A user can input (by speaking) an unformatted prompt in natural language, such as, “Graph y equals x cubed minus 6 x squared plus 9 x plus 4 and find the relative extrema.” The LLM will rephrase the question into a query understandable by a solver like Wolfram Alpha, which will then provide the solution. The LLM will then take that solution, write a detailed explanation for it, and use it to create expressions which can be understood and graphed by the Desmos calculator. The detailed explanations and the visualizations assist the teacher in helping students interactively understand the concept being taught. On a wide variety of learning objectives taken from the math Common Core standards, our system significantly out-performed a conventional LLM-only solution (86%+ accuracy vs 64%+ accuracy).

A hurdle which had to be overcome was the evaluation of the accuracy of output mathematical statements. AI systems may be able to evaluate equivalence for simple expressions, but their judgment becomes inconsistent for more complex ones. Our paper notes that lexical similarity metrics used in natural language processing might consider “5=2+3” to be more similar to “5=2+4” than to “5=4+1” due simply to the greater number of characters it shares with the former. Therefore, evaluating the correctness of the outputs of the above graphing system required a technical solution of its own.

We used a combination of SymPy, a computer algebra system, and a critique-LLM to evaluate the correctness of the output expressions generated by MathViz-E. SymPy could accurately isolate variables and check math statements, and the critique-LLM helped the evaluator to deal with variations in expression formatting. The combined evaluator provided a significant increase in evaluation accuracy compared to a critique-LLM alone.

The progress in agents’ abilities to control external tools holds much promise for the integration of multi-agent systems into mainstream workflows. Emergence will continue to develop intelligent agents whose purpose goes above and beyond natural language alone.

Emergence is dedicated to the democratization of the powerful tools that AI presents, as well as to ensuring safe and responsible development of AI across the board. As such, MathViz-E has been released, open source, on GitHub—we look forward to collaborating with the open source community and seeing your contributions.