Is it allowed to collect and use human‑generated training data scraped from public social‑media posts for AI research in the European Union?

Last updated on October 1, 2025

Under the EU General Data Protection Regulation (GDPR), processing personal data scraped from social media for AI training requires a lawful basis (e.g., consent, legitimate‑interest balancing) and must respect data‑subject rights; for sensitive categories or profiling, stricter conditions apply. The European Data Protection Board and national DPAs have issued guidance emphasising data‑minimisation, purpose limitation, transparency, DPIAs for high‑risk AI systems and legal assessments before large‑scale scraping. Researchers must ensure legal bases, robust anonymisation where possible, contractual safeguards, and implement data‑subject rights mechanisms; failure to comply may result in substantial administrative fines and enforcement actions by EU DPAs.

 

https://edpb.europa.eu/our‑work‑tools/our‑documents_en and https://ec.europa.eu/info/law/better‑regulation/have‑your‑say/initiatives/13752‑regulatory‑framework‑ai_en

2024‑05‑30

Scroll to Top