[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[cross-project-issues-dev] Anonymisation of public data
|
Hello good people,
In the context of the Crossminer research project [1], we plan to
publish a number of datasets to the public and for the research
community. This includes public data from the Eclipse forge (i.e. data
is fetched from public data sources and APIs only), and we want to setup
an anonymisation process that would:
* Efficiently and safely remove all personally identifiable data -- we
don't want to help spammers or malicious harvesters, and
* Still provide valuable information and datasets for the research
community -- e.g. ability to identify identical IDs across sources
without specifically knowing them.
The basic idea is to simply replace all identifiers with asymmetrically
encrypted strings, so all IDs have the same ciphered result. RSA is used
for the encryption, and the private key is thrown away once the encoding
is done, making it impossible (according to common encryption standards)
to retrieve the original string.
A prototype has already been published [2, 3] and we would like to ask
people to review it so as to make sure that our privacy-preserving
mechanism is safe.
Any feedback, concern or contribution is warmly welcome.
[1] https://www.crossminer.org/
[2] https://github.com/borisbaldassari/data-anonymiser
[3] https://borisbaldassari.github.io/data-anonymiser/
Thanks in advance, have a wonderful week!
--
boris