When I used the term "bandwidth", I
was referring to time in the human
resources sense. The Eclipse webmaster
*may* be able to set you up with a virtual
server, but the EF has limited resources
to provide support and maintenance on that
server.
On 01/11/2012 04:16 AM, Marcel Bruch
wrote:
I think that there are several
things that need discussion.
1. Which data users (explicitly or
implicitly) provide
2. Under which terms of use this
data is used by us and others
3. Who stores the data
4. Who can access the data and in
which format (degree of
anonymization).
1.
The term 'data' subsumes a quite
large range of information.
For Snipmatch this includes code
snippets and maybe usage statistics
(what has been used when to update the
ranking strategies)
For Extdoc this may include
information like comments, editorial
actions, or user ratings.
For Call Completion this includes
the models that have to be delivered
to the clients and information about
their jar's they use (e.g., file
fingerprints etc).
For Chain Completion this may
include usage statistics (as for
snipmatch to improve ranking
strategies) and code snippets.
You can think of other information
too.
2.
I'd like to say that this is an
important topic that needs a solid
research. It will probably require us
to get in contact with lawyers to
clarify what's possible/required.
It should be clear that everyone
who shares data (code snippets etc.)
must be in the position to actually be
allowed to share it. For me, it's
basically the same as with the Eclipse
Wiki. All users that contribute to it
must agree its terms of use. Is there
a difference? Are these terms of use
reusable for our use case? I guess I
should prepare a detailed description
what get's collected and provided by
whom to enable a lawyer to help here?
3.
If I understood correctly, the
foundation has no bandwidth to host
these services. In that case, I've to
get back to my university and ask for
permission to host these services
somewhere close to our backbone or
raise some funding to put a server
elsewhere. One question that comes
into my mind: If the foundation is not
hosting these services, can we deliver
Code Recommenders with preconfigured
URLs that point to external project
servers? For instance, something like
"
code.recommenders.org"?
4.
What is needed - and technical
feasible? It may become the case that
the raw data exceeds TBs (not in the
first years I guess :)). Honestly,
I've yet no clue how much data will be
collected and what information others
may be interested in. What we have in
mind is to create reference data sets
for machine learners and se
researchers to enable research to
create new tools and improve
algorithms for code search, code
recommendations etc. But these data
sets will, for practical reasons, only
include a subset of (anonymized) data
needed for research purpose. Would
this be satisfying? Do you think some
kind of agreement is needed?
Is there anything I'm currently not
aware of?
Thanks,
Marcel
On 11.01.2012, at 00:08, Wayne
Beaton wrote:
FWIW, the Eclipse Foundation
has a single lawyer on staff.
Though we do retain the services
of other lawyers. So I guess,
"lawyers" is generally accurate
:-)
The project needs to make a case
to the Eclipse Foundation for
capturing and maintaining this
data. We are very concerned about
privacy, and so are many people in
the community. There are actual
laws in some countries that need
to be considered as well.
Since we are a transparent and
open organization, there needs to
be consideration for disseminating
the collected data to other
parties. With the usage data, we
tried publishing filtered data
(which excluded anything that
could potentially expose/identify
specific users) with limited
success. We failed in this regard
which is a big reason why we shut
down the udc.
Unfortunately, the Eclipse
Foundation lacks the bandwidth to
maintain this data on your behalf.
Wayne
On 01/10/2012 05:36 PM, Marcel
Bruch wrote:
sounds good to me. But let's see what the Foundation's lawyers say about this... I'll keep you posted.
On 10.01.2012, at 23:26, Doug Wightman wrote:
Hi Marcel,
I think that's a great idea. For SnipMatch, it would probably make the
most sense to have wording in to the effect that the contributor is
verifying that they own the code and is giving a royalty-free license
to use it for any purpose. This would be associated with a checkbox
that must be checked when the code is to be shared publicly. We
currently have something to this effect already built, but the wording
hasn't been run by lawyers.
Doug
On Tue, Jan 10, 2012 at 3:00 PM, Marcel Bruch <bruch@xxxxxxxxxxxxxxxxxx> wrote:
Hi PMC,
code recommenders is making good progress and we are confident that we'll
satisfy all major criteria for M5. Extended documentation platform, code
completion engines, and local code search engine are maturing quickly and
SnipMatch guys will start at the end of January. Java, RCP/RAP, and Scout
Packages expressed some interest to integrate Code Recommenders in their
package and we work at full blast to make this happen.
One thing that hasn't been discussed in detail was how do we deal with the
data users provide for instance to snipmatch's community code templates
store or to the extended documentation platform? Is there a special
wiki-like 'terms of usage' needed? Were does this data go to? Also, for
stacktrace search or model generation and model download some data needs to
be delivered to the client and submitted. We started this discussion a while
ago but postponed it.
I'd like to pick up the discussion again - early enough before Juno
arrives. I'm not sure wether this is a discussion for the PMC mailing list
since finally it's a decision of the Foundation. But Wayne will know, I
guess.
Thanks,
Marcel
_______________________________________________
recommenders-dev mailing list
recommenders-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/recommenders-dev
_______________________________________________
recommenders-dev mailing list
recommenders-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/recommenders-dev
Thanks,
Marcel
_______________________________________________
recommenders-dev mailing list
recommenders-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/recommenders-dev
_______________________________________________
recommenders-dev mailing list
recommenders-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/recommenders-dev