In the age of the internet, more text is generated every day than a person could hope to read in a lifetime. An estimated 319.6 billion emails will be sent every day in 2021, according to The Radicati Group. Add to that social media posts, articles, reports, records and the endless other forms of written document that are constantly created, and you’ll begin to understand the incredible amount of text out there.
Even on a smaller scale, an organisation may find that it is generating and capturing more textual information than it can handle, and as documents are digitised and more types of information are captured, this will only increase. This can lead to situations in which you know that valuable insights are contained in your data, but do not have the capacity to extract them.
Text mining helps us make the most of large amounts of unstructured data, extracting insights from textual information that we would never have the time or capacity to do manually.
What is text mining?
Text mining – sometimes known as text analysis or text analytics – is an aspect of data mining that, through machine learning (ML) and natural language processing (NLP), can extract actionable insights from large, unmanageable amounts of data.
Conventional search tools function on rule-based systems that use keywords or phrases which often fall short in analysing for contextual accuracy. Google goes beyond the standard by incorporating synonyms, query intent, page relevance, quality and domain authority, but ultimately the reading and understanding of all the results must still be done manually and is too time-consuming to execute effectively for large amounts of data.
Text mining is orders of magnitude more sophisticated than a conventional search tool. It helps to provide summaries, entity extraction and sentiment analysis, to establish relationships and to structure data that can be integrated into a database or a data warehouse.
A key branch of AI used in text mining is NLP, which allows text mining systems to parse written information like a human reader. These systems are able to “read” documents and to understand the complex concepts, ambiguities and contextual clues they contain so as to locate and analyse information in a way far beyond the standard search function (this is known as natural language understanding), as well as to structure the information and interpret it to produce summaries that allow us to better understand the data as a whole (natural language generation).
If your team doesn’t have the AI/ML skills for text mining internally, companies like Merit can help build bespoke and cost effective solutions without you having to develop them from the ground up. Text mining services can be personalised to different organisations using ontologies, vocabularies and custom dictionaries that incorporate particular language and terminology used in a specific field or industry. This helps the text mining tool properly understand the target text and to provide insights most appropriate for the business.
There are many examples of how text mining can be applied to your operations.
It can be very useful when it comes to assessing customer feedback or Net Promoter Rating (NPR). A large organisation might receive thousands of responses to a CSAT survey. It would be incredibly time consuming to read them all, and splitting the task between several employees might mean that they miss larger patterns in the feedback. Text mining can similarly be applied to social media or online reviews to analyse sentiment related to your business or field. Using these tools, you can identify the trends and pull out insights that allow you to make material improvements to your products and services based on customers’ needs and desires.
For educational institutions, text mining can offer a more robust way of detecting plagiarism. Traditional tools look for similarities between documents, but not all the written sources will be available, so some cases will slip through the net. Using NLP, text mining tools are able to detect when the writing style changes in a document, indicating segments that were not written by the supposed author. In this way, it doesn’t matter whether the original text is immediately available to your plagiarism detection system.
Cyber security is constantly evolving in an uphill battle against the latest threats. One of the big challenges is to identify and eliminate vulnerabilities in the various security systems we rely on to protect our networks. But reports on these vulnerabilities registered in various databases are difficult to keep track of manually. Text mining tools can locate and report on the mentions of these vulnerabilities so that cyber security firms and government agencies can clamp down on security risks as soon as they are identified.
The healthcare industry stands to be a key beneficiary of text mining. It would be impossible to read every scientific paper published every year related to medicine and health, but these tools are ideal for assessing these documents and pulling out insights that could contribute to healthcare breakthroughs. They can also assist in the development and testing of drugs by highlighting potential safety issues and unsafe doses that have been identified by other studies and during the development of other treatments, leading to quicker and safer refinement.
The applications of text mining are almost endless. Wherever you have large amounts of written information, it offers the opportunity of drawing insights you would unlikely come to on your own, ones that can significantly improve your systems, products, services and the very future of your organisation.
Get the ITPro. daily newsletter
Receive our latest news, industry updates, featured resources and more. Sign up today to receive our FREE report on AI cyber crime & security - newly updated for 2023.
ITPro is a global business technology website providing the latest news, analysis, and business insight for IT decision-makers. Whether it's cyber security, cloud computing, IT infrastructure, or business strategy, we aim to equip leaders with the data they need to make informed IT investments.