Red teaming comes to the fore as devs tackle AI application flaws
Red teaming can play a crucial role in identifying flaws and cutting risky behaviors


Only a third of organizations employ adequate testing practices in AI application development, according to new research, prompting calls for increased red teaming to reduce risks.
Analysis from Applause found 70% of developers are currently developing AI applications and features, with over half (55%) highlighting chatbots and customer support tools as their primary focus at present.
Yet despite an acceleration in AI application development, a concerning number of organizations are overlooking quality assurance (QA) efforts during the software development lifecycle.
The study warned this trend is having an adverse impact on both quality and long-term return on investment (ROI).
“The results of our annual AI survey underscore the need to raise the bar on how we test and roll out new generative AI models and applications,” said Chris Sheehan, EVP of high tech & AI at Applause.
AI application development needs a human touch
A key talking point of the Applause study centered around human involvement in the development lifecycle. With developers ramping up the use of generative AI tools in their daily workflows, the need for a ‘human touch’ has become critical to identify and remediate a range of issues.
These include issues such as inaccuracy, bias, and toxicity, the study noted.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Researchers found the top QA-related activities that involve human testing include prompt and response grading (61%), accessibility testing (54%), and UX testing (57%).
Applause added that humans are also crucial in training industry-specific or ‘niche’ models, particularly with the rise of agentic AI applications that interact directly with end-users.
Notably, the study found that only one-third (33%) of organizations currently employ red team testing in application development processes. Red teaming refers to adversarial testing practices - commonly used in cybersecurity - to identify potential weak points in platforms or applications.
Researchers called for a heightened focus on red teaming in AI application development, noting that this could play a key role in highlighting the aforementioned issues such as model bias or inaccuracy.
Application flaws persist
The study from Applause found that customer-related issues are becoming a frequent problem for enterprises. Nearly two-thirds of customers using generative Ai in 2025 reported encountering some sort of issue.
Over a third (35%) encountered biased responses, hallucinations (32%), and offensive responses (17%).
Hallucinations have been a persistent problem in AI development for some time now.
While the situation has improved markedly since the early days of the generative AI boom, the issue is still causing a degree of uncertainty among enterprise IT leaders.
In a study by KPMG in August 2024, six-in-ten tech leaders specifically highlighted hallucinations as a key concern with adopting or building generative AI tools and applications.
Sheehan noted that positive changes are being made by development teams, however. Many enterprises surveyed by the firm are “already ahead of the curve” and are integrating AI testing measures into the development lifecycle at an earlier stage.
This includes more robust model training methods which employ “diverse, high quality” datasets. Some enterprises are also warming to red teaming practices, he added.
“While every generative AI use case requires a custom approach to quality, human intelligence can be applied to many parts of the development process including model data, model evaluation and comprehensive testing in the real world.
“As AI seeps into every part of our existence, we need to ensure these solutions provide the exceptional experiences users demand while mitigating the risks that are inherent to the technology.”
MORE FROM ITPRO
- Developers spend 17 hours a week on security — but don't consider it a top priority
- Java developers are facing serious productivity issues
- Want developers to build secure software? You need to ditch these two programming languages

Ross Kelly is ITPro's News & Analysis Editor, responsible for leading the brand's news output and in-depth reporting on the latest stories from across the business technology landscape. Ross was previously a Staff Writer, during which time he developed a keen interest in cyber security, business leadership, and emerging technologies.
He graduated from Edinburgh Napier University in 2016 with a BA (Hons) in Journalism, and joined ITPro in 2022 after four years working in technology conference research.
For news pitches, you can contact Ross at ross.kelly@futurenet.com, or on Twitter and LinkedIn.
-
What is polymorphic malware?
Explainer Polymorphic malware constantly changes its code to avoid detection, making it a top cybersecurity threat that demands advanced, behavior-based defenses
-
Outgoing Kaseya CEO teases "this is just the beginning" for the company
Opinion We spoke to Fred Voccola who remains a key figurehead at the firm as it enters its next chapter...
-
‘Frontier models are still unable to solve the majority of tasks’: AI might not replace software engineers just yet – OpenAI researchers found leading models and coding tools still lag behind humans on basic tasks
News AI might not replace software engineers just yet as new research from OpenAI reveals ongoing weaknesses in the technology.
-
Java developers are facing serious productivity issues: Staff turnover, lengthy redeploy times, and a lack of resources are hampering efficiency – but firms are banking on AI tools to plug the gaps
News Java developers are encountering significant productivity barriers, according to new research, prompting businesses to take drastic measures to boost efficiency.
-
Software security debt is spiraling out of control – remediation times have surged 47% in the last five years, and it’s pushing teams to breaking point
News Software security flaws are taking longer to fix than ever, with remediation times having grown by 47% in the last five years.
-
Why the CrowdStrike outage was a wakeup call for developer teams
News The CrowdStrike outage in 2024 has prompted wholesale changes to software testing and development lifecycle practices, according to new research.
-
The ultimate guide to getting your killer app off the ground
Industry Insight When building software, the process of designing, testing, prototyping, and perfecting your project is never ending
-
The best Python test frameworks
Best Make your Python code shine with these testing tools
-
IT Pro Panel: The road to Windows 11
IT Pro Panel As the new OS gears up for rollout, we talk to our panellists about their upgrade plans
-
Huawei to launch HarmonyOS for smartphones next week
News The Chinese tech giant will switch to its homegrown OS as it looks to fully abandon Android by October