What is AI alignment?

An abstract image showing data points connected by neuron-like strands of glowing energy, representing large language models (LLMs) and generative AI. Decorative: the strands are shades of blue, yellow, purple, red, and pink set against a dark background.

(Image credit: Getty Images)

published 14 October 2025

AI alignment refers to the process of ensuring AI systems operate in line with human goals, values and behavior, and is becoming more important than ever as advanced models gain autonomy and are integrated into decision-making processes.

For example, as agentic AI is now the focus for many big tech firms AI alignment is a key consideration for ensuring agents don’t expose businesses to risks.

It’s critical, not just for safety and reliability, but also trust.

Misaligned systems can produce outputs that may be technically correct but ethically problematic – or even harmful, notes Rachel Reid, head of artificial intelligence at law firm Eversheds Sutherland. This makes it a foundational concern for AI governance and ethical AI development.

AI alignment vs AI grounding

AI alignment is distinct from AI grounding, as the latter focuses on improving the factual accuracy and relevance of AI output and reducing hallucinations while AI alignment focuses on intent and behavior. In short, AI alignment is all about developing the right weighting to decide what the AI should do or say in a given situation.

“In the real world, an aligned AI should refuse to answer a harmful query even if it has the correct information, whereas hallucination-free AI might still act unethically if not aligned,” notes Arun Chandrasekaran, distinguished VP analyst, AI at Gartner.

An example of how AI alignment can go wrong comes from Grok – the xAI-developed AI chatbot integrated into X that was aligned to be more ‘spicy’ in its responses than alternatives. By loosening the guardrails of what Grok could output, the chatbot began to post biased, hateful content such as antisemitic messages and instructions for assault. Grok even went so far as referring to itself as ‘MechaHitler’ before it was reigned in.

While this is an extreme example, the Grok debacle does highlight the importance of responsible value alignment. This is something every IT decision maker looking to deploy AI in their enterprise environment will have to consider.

The potential reputational impact of a misaligned chatbot can be seen in past examples such as Microsoft Tay – and unwanted model outputs could also cause financial harm to businesses, too.

Today, alignment is largely shaped by the AI builders accountable for the base model values, with “different labs in different geographies typically training their models with local values and ideas,” notes Tom Martin, associate director at Boston Consulting Group (BCG).

What values should AI reflect?

How should producers decide which values an AI system is aligned with, and how can they account for cultural or ethical differences? This is one of the thorniest challenges in AI governance, says Reid, as values are not monolithic and vary across cultures, industries and individuals.

“Conservative estimates put the count of human values in the hundreds but it’s more likely in the thousands, with each of those values having a continuum,” notes Noah Broestl, partner and associate director at BCG. Martin adds that the industry continues to research approaches to understand, interpret, and control values – a field known as mechanistic interpretability.

In practice, alignment decisions are today made by a combination of developers, product managers, legal teams and sometimes regulators. To account for diversity, organizations must adopt inclusive design practices, engage with stakeholders across geographies and build in mechanisms for feedback and correction, says Reid. “Legal frameworks like the EU AI Act and emerging US guidance are beginning to formalize these expectations, but the real work lies in operationalizing them at scale,” she says.

Experts agree that AI system builders need to be transparent and provide better tools to understand and change the alignment of a system, however “given the breadth of human values and contexts, I don’t see how we can expect this from producers,” says Broestl. “My perspective is that the deployer must account for and tackle cultural or ethical differences, and that producers need to do a better job supporting them.”

Aligning AI with your organization’s values

As a buyer or user of an AI model you won’t be able to change the system’s base values, so what steps can you take to ensure the system you’re using is aligned to your company’s values and geographic regulations?

Firstly, make model card reviews a core part of the buying process. These are documents that essentially act like a data sheet, providing detailed information on the model, including its target audience, data used to train it, and ethical considerations such as potential biases. “This puts pressure on model builders to focus resources on building transparent and aligned models,” notes Martin.

Once you’ve chosen a model, there are other steps you can take to influence its values. This includes blocking inputs or outputs that are misaligned, and tuning prompts, which according to Broestl is “the weapon of choice” as these can change the system’s behavior.

“This is a complex and deep topic, and techniques vary by model, but the trick is to have an evaluation that drives outcomes,” advises Martin. “This approach has limitations – you will only be able to affect the model’s behavior to the point that its internal values stop you. For example, asking a model to break the law is actually quite difficult, asking it to behave as a marketer is relatively easy. Understanding these guardrails is crucial for the successful deployment of AI.”

Another approach, which is considered the most challenging and carries significant risks, is fine-tuning. This entails changing part of the model to more fundamentally alter its behavior.

“Open source models can be fine-tuned with the right infrastructure, and some proprietary models have an API for fine-tuning. In both cases, this is a complex and technical topic. Whilst it can yield strong results, research shows it has unintended consequences in other model behavior,” Martin warns.

AI alignment’s a journey, not a destination

Al alignment is an area that’s still evolving. Reporting and monitoring standards are in their early days, with regulators thinking through how to control AI systems at national levels, but within an organization, evaluation criteria is key, says Martin.

“Have a clear governance programme which covers the full lifecycle of AI solutions from procurement, model fine-tuning, use case build, testing and release. At each step there should be safeguards built into the process to check that the outcome delivered aligns with the values of the organization.”

Finally, the experts agree that it’s important to remember alignment isn’t something you do once and move on, as there will always be another value to look at, and additional contexts to challenge those value alignments.

“We haven’t ‘solved’ ethics for humans so I think it would be a bit naïve to think we will ‘solve’ AI alignment,” Broestl says. “It will require continuous, thoughtful work to ensure it keeps up with the developing capabilities of the technologies, as well as the social and cultural context into which the systems are deployed.”

Keri Allan is a freelancer with 20 years of experience writing about technology and has written for publications including the Guardian, the Sunday Times, CIO, E&T and Arabian Computer News. She specialises in areas including the cloud, IoT, AI, machine learning and digital transformation.