[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [cross-project-issues-dev] Anonymisation of public data
|
Hiho good people,
There has been some off-list discussions going on, and I'd like to
follow-up on this.
As Mike nailed it, keys will not be stored.
And since there was no counter-reaction, we'll go with that for now. Any
inputs or feedback is still appreciated, of course, and I'll let you
know when things move forward.
Thanks, have a lovely end of week! :-)
--
boris
On 26/04/2018 10:15, Mike Milinkovich wrote:
The Eclipse Foundation would prefer to *not* be responsible for securely
retaining such keys. If an interesting pattern was ever uncovered, I
think that we could analyze the original data to discover the relevant
authors using the original available data. And I cannot really imagine
providing researchers direct access to interesting contributors
discovered by analyzing anonymized data, as I am certain that would
violate our privacy policies.
In short, IMO the privacy risk of maintaining those keys outweighs any
potential advantages of retaining them. My 2c :)
On 2018-04-26 3:02 AM, Mickael Istria wrote:
Hi Boris,
The basic idea is to simply replace all identifiers with
asymmetrically encrypted strings, so all IDs have the same
ciphered result. RSA is used for the encryption, and the private
key is thrown away once the encoding is done, making it impossible
(according to common encryption standards) to retrieve the
original string.
Is this a requirement, at this point, to make it impossible to
retrieve the original stream for anyone?
I understand that the providing anonymous dataset is interesting as
you explained, but what couldn't you or Eclipse Foundation keep the
private RSA key safely to decode the id if you find some unexpected
patterns? If you make id anonymous and find a set of id which have a
strange correlation and that you'd like to explain, wouldn't it be
helpful to decode the id and find out who are the individuals behind
it to better understand the cause of the correlation or even set up
chats with selected contributors to better understand their practices?
I have the impression there could be value in keeping ability to
decode strings, while I don't think fully discarding the key is much
safer than keeping it in a safe place (like an EF server with strong
restriction on who can access the key).
My 2c (or maybe even less ;)
--
Mickael Istria
Eclipse IDE <https://www.eclipse.org/downloads/eclipse-packages/>
developer, for Red Hat Developers <https://developers.redhat.com/>
_______________________________________________
cross-project-issues-dev mailing list
cross-project-issues-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev
--
Mike Milinkovich
mike.milinkovich@xxxxxxxxxxxxxxxxxxxxxx
(m) +1.613.220.3223
_______________________________________________
cross-project-issues-dev mailing list
cross-project-issues-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev