Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [cross-project-issues-dev] Anonymisation of public data

Hi Boris,

I was one of the people asking off-list because I have a concern with encryption as a technology for anonymizing data. It immediately raises a red flag for me because it allows to de-anonymize the data. Thus, I would like to see use of data masking techniques such as hashing instead of encryption. To be more clear, I find it suspicious why reversible anonymization must be used in the first place.

Can you also be more specific about what public data and which API endpoints you are going to use?

I assume it's anything that is public in Git already, which makes this discussion obsolete as everything is already public. But I want to confirm that non of the API endpoints require authentication to get data you wouldn't get without authentication.


Gunnar Wagenknecht

> On Apr 26, 2018, at 07:18, Boris Baldassari <boris@xxxxxxxxxxxxxx> wrote:
> Hello good people,
> In the context of the Crossminer research project [1], we plan to publish a number of datasets to the public and for the research community. This includes public data from the Eclipse forge (i.e. data is fetched from public data sources and APIs only), and we want to setup an anonymisation process that would:
> * Efficiently and safely remove all personally identifiable data -- we don't want to help spammers or malicious harvesters, and
> * Still provide valuable information and datasets for the research community -- e.g. ability to identify identical IDs across sources without specifically knowing them.
> The basic idea is to simply replace all identifiers with asymmetrically encrypted strings, so all IDs have the same ciphered result. RSA is used for the encryption, and the private key is thrown away once the encoding is done, making it impossible (according to common encryption standards) to retrieve the original string.
> A prototype has already been published [2, 3] and we would like to ask people to review it so as to make sure that our privacy-preserving mechanism is safe.
> Any feedback, concern or contribution is warmly welcome.
> [1]
> [2]
> [3]
> Thanks in advance, have a wonderful week!
> --
> boris
> _______________________________________________
> cross-project-issues-dev mailing list
> cross-project-issues-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit

Back to the top