Microsoft quietly launches Fara-7B, a new 'agentic' small language model that lives on your PC — and it’s more powerful than GPT-4o

The new Fara-7B model is designed to takeover your mouse and keyboard

Logo of Microsoft, developer of the Fara-7B small language model, pictured at a company event in New York City, USA.
(Image credit: Getty Images)

Microsoft has unveiled Fara-7B, a new 'agentic' small language model that can live on your computer and control it for you.

The small language model (SLM) isn't there to answer your searchers or other text-based queries, however. Instead, it's a computer use agent (CUA) that can complete tasks for users by taking over the mouse or keyboard.

"Fara-7B operates by visually perceiving a webpage and takes actions like scrolling, typing, and clicking on directly predicted coordinates," the company explained in a blog post.

"It does not rely on separate models to parse the screen, nor on any additional information like accessibility trees, and thus uses the same modalities as humans to interact with the computer."

Notably, the model does this with only seven billion parameters compared to the hundreds of billions used by models from OpenAI.

Small language models have been touted as one solution to some of the energy and complexity challenges of large language models when dealing with specific tasks. Its smaller size means Fara-7B can run CUA models directly on devices.

"This results in reduced latency and improved privacy, as user data remains local," the researchers said.

How Fara-7B works

As noted, Fara-7B interacts with a website or other interface visually – it looks at them just as a human user would. One major challenge was finding enough data for training, researchers noted.

"A key bottleneck for building CUA models is a lack of large-scale, high-quality computer interaction data," they said. "Collecting such data with human annotators is prohibitively expensive as a single CUA task can involve dozens of steps, each of which needs to be annotated."

To address that, the system has been trained using a synthetic data generation pipeline that showed it multi-step web tasks, with data drawn from real web pages and users.

The system then tries to complete those synthetic tasks, with attempts fine-tuned and the agent's plan of action – called a trajectory – verified for success, with any failures removed.

"We ultimately train this version of Fara-7B on a dataset of 145,000 trajectories consisting of one million steps covering diverse websites, task types, and difficulty levels," researchers explained.

"Additionally, we include training data for several auxiliary tasks, including grounding for accurate UI element localization, captioning, and visual question answering."

Compared to existing benchmarks, researchers said it generally outperformed larger models including GPT-4o.

Agent for experimentation

The CUA-enabling model doesn't mean we need never click or type again, as this is very much a project in the works. Researchers noted the new model is an “experimental release designed to invite hands-on exploration” and to draw feedback from the community.

"Users can build and test agentic experiences beyond pure research — automating everyday web tasks like filling out forms, searching for information, booking travel, or managing accounts."

Indeed, Fara-7B should be run in a sandboxed environment, researchers advised, allowing users to keep a close eye on how it works and to avoid any sensitive data or high-risk tasks.

"Responsible use is essential as the model continues to evolve," they said.

Beyond that, Fara-7B has built in controls based on Microsoft's Responsible AI Policy, with its own benchmarks suggesting a "high refusal rate of 82%". Plus, the model is designed to spot and stop at any critical point where user consent or data is required.

Fara-7B is available now via Microsoft Foundry and Hugging Face, as well as via Microsoft Research's Magentic-UI.

"We are also sharing a quantized and silicon-optimized version of Fara-7B, which will be available to install and run on Copilot+ PCs powered by Windows 11, for turnkey experimentation," Microsoft added.

"The community can simply download the pre-optimized model and run it in their environment."

The system is also being made openly available, including weights, to make it easier to play around with and improve.

"By making Fara-7B open-weight, we aim to lower the barrier to experimenting with and improving CUA technology for automating routine web tasks, such as searching for information, shopping, and booking reservations," the tech giant said.

Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.

MORE FROM ITPRO

Freelance journalist Nicole Kobie first started writing for ITPro in 2007, with bylines in New Scientist, Wired, PC Pro and many more.

Nicole the author of a book about the history of technology, The Long History of the Future.