“Treasure trove” of 66m records suggests LinkedIn data scraping
Users warned to guard against unsolicited job adverts via email


A publicly-exposed cache of 66 million records has been found in the wild containing employment information such as career history, and skills.
Researchers have stumbled on a "treasure trove" of 66,147,856 unique records across three large databases, including a MongoDB database, that contained a wealth of scraped business-centric information.
The exposed records include full names, personal or professional emails, location details, skills, and work histories; all information that would typically be found on a LinkedIn profile.
Although the entries do not contain any sensitive personal information, such as financial details or passwords, they do contain private information, such as a person's background and their IP address.
Malwarebytes' lead malware intelligence analyst, Christopher Boyd, said that such databases will typically be used to power large-scale phishing campaigns, although there's no way of knowing if the information is being exploited in the wild.
"It could also prove useful to anyone wanting a ready-made marketing list. The big problem is that even if the ones doing the data scraping had no harmful intentions, that may not apply to anybody finding the treasure trove.
"Given how this information was stumbled upon in the first place, there's no real way to know how many bad actors got their hands on it first."
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
This data-scraping incident is also connected with a host of complaints by many LinkedIn and GitHub users that they have been receiving breach notifications lately, Boyd added. The users affected can also expect to be targeted with phishing attacks alongside a string of fake job offers and "poor quality" job offer spam.
Data scraping, the method which the data was obtained, is considered a grey area legally-speaking, given it does not involve hacking, or breaching an organisation's defences. Rather, the data is very efficiently gathered by threat actors from information that users themselves choose to make publicly available.
But HackenProof, which first discovered the three exposed databases, argues data scraping "without first obtaining the prior individual's written consent" is illegal, especially in light of the European Union's General Data Protection Regulation. But the platform conceded this can vary from case to case depending on how the extracted data is used.
HackenProof also suggested several steps employees and bosses can take to minimise the chances their personal information isn't reaped in such a way in future.
These include only providing the bare minimum data when creating online profiles, using different email addresses for bank accounts versus social media, and assessing whether personal data volunteered may potentially be used in a malicious way.

Keumars Afifi-Sabet is a writer and editor that specialises in public sector, cyber security, and cloud computing. He first joined ITPro as a staff writer in April 2018 and eventually became its Features Editor. Although a regular contributor to other tech sites in the past, these days you will find Keumars on LiveScience, where he runs its Technology section.
-
M&S suspends online sales as 'cyber incident' continues
News Marks & Spencer (M&S) has informed customers that all online and app sales have been suspended as the high street retailer battles a ‘cyber incident’.
By Ross Kelly
-
Manners cost nothing, unless you’re using ChatGPT
Opinion Polite users are costing OpenAI millions of dollars each year – but Ps and Qs are a small dent in what ChatGPT could cost the planet
By Ross Kelly
-
Empowering enterprises with AI: Entering the era of choice
whitepaper How High Performance Computing (HPC) is making great ideas greater, bringing out their boundless potential, and driving innovation forward
By ITPro
-
The CEO's guide to generative AI: A new frontier for the future of work
Whitepaper Make people, not technology, central to your generative AI strategy
By ITPro
-
The CEO's guide to generative AI: Be a creator, not a consumer
Whitepaper Innovate your business model with modern IT architecture, and the principles of trustworthy AI
By ITPro
-
Learning and operating Presto
whitepaper Meet your team’s warehouse and lakehouse infrastructure needs
By ITPro
-
Scale AI workloads: An open data lakehouse approach
whitepaper Combine the advantages of data warehouses and data lakes within a new managed cloud service
By ITPro
-
Managing data for AI and analytics at scale with an Open Data Lakehouse approach
whitepaper Discover a fit-for-purpose data store to scale AI workloads
By ITPro
-
The power of AI & automation: Productivity and agility
whitepaper To perform at its peak, automation requires incessant data from across the organization and partner ecosystem
By ITPro
-
A guide to help you choose the UPS battery backup for your needs
Whitepaper Download this guide and stay connected with a UPS that's free of interruption or disturbance
By ITPro