CLEAR – Contextual Legal Entity Anonymization and Recognition
Texts containing personal data may only be used for many purposes, including training AI systems, research or training purposes, and the publication of court decisions or parliamentary materials, if they have been anonymized or pseudonymized beforehand. This requires reliable and traceable identification of personal references.
The CLEAR research project develops and investigates generic, transparent, trustworthy, and sustainable solutions for recognizing entities in German-language continuous text, with a focus on the identification of personal data. CLEAR combines the advantages of rule-based and machine learning methods:
- Using human-in-the-loop approaches, rules for named entity recognition (NER) are learned and made configurable for specialist users.
- Deep learning models generate candidates for entities, which are selected based on trained, application-specific rule sets.
This creates a flexible and verifiable architecture that attempts to avoid the weaknesses of current “black box” solutions while reducing environmental costs and training effort.
In addition to technical development, the focus is on the legal issues surrounding anonymization. As part of the project, the Institute for Innovation and Digitalization in Law is conducting research on the following questions in particular:
- How can anonymization be legally distinguished from pseudonymization?
- What is the significance of unclear definitions in the GDPR and in new EU legislation such as the Data Act and the Data Governance Act?
- What requirements does the AI Act impose—for example, on research exemptions, obligations for developers and providers, or risk classification of AI systems?
- What copyright issues arise when using training data?
Through this interdisciplinary combination of technology and law, CLEAR aims to develop practical and legally compliant anonymization strategies that are also of central importance for sensitive areas of application such as the judiciary, administration, or parliament.
Further information about the project can be found here and in u:cris.
Experts of the Department working on this project:
- Forgó, Nikolaus (Project Lead)
- Wimmer, Martina (Admin)
- Kandov, Boris (Scientific Project Staff)
- Hafenscher, Hannah (Scientific Project Staff)
