When I used the term "bandwidth", I
                              was referring to time in the human
                              resources sense. The Eclipse webmaster
                              *may* be able to set you up with a virtual
                              server, but the EF has limited resources
                              to provide support and maintenance on that
                              server. 
                              
                              On 01/11/2012 04:16 AM, Marcel Bruch
                              wrote:
                              
                                I think that there are several
                                  things that need discussion.
                                
                                
                                1. Which data users (explicitly or
                                  implicitly) provide
                                2. Under which terms of use this
                                  data is used by us and others
                                3. Who stores the data
                                4. Who can access the data and in
                                  which format (degree of
                                  anonymization).
                                
                                
                                
                                
                                1.
                                The term 'data' subsumes a quite
                                  large range of information.
                                
                                
                                For Snipmatch this includes code
                                  snippets and maybe usage statistics
                                  (what has been used when to update the
                                  ranking strategies)
                                For Extdoc this may include
                                  information like comments, editorial
                                  actions, or user ratings.
                                For Call Completion this includes
                                  the models that have to be delivered
                                  to the clients and information about
                                  their jar's they use (e.g., file
                                  fingerprints etc).
                                For Chain Completion this may
                                  include usage statistics (as for
                                  snipmatch to improve ranking
                                  strategies) and code snippets.
                                You can think of other information
                                  too.
                                
                                
                                2.
                                I'd like to say that this is an
                                  important topic that needs a solid
                                  research. It will probably require us
                                  to get in contact with lawyers to
                                  clarify what's possible/required.
                                It should be clear that everyone
                                  who shares data (code snippets etc.)
                                  must be in the position to actually be
                                  allowed to share it. For me, it's
                                  basically the same as with the Eclipse
                                  Wiki. All users that contribute to it
                                  must agree its terms of use. Is there
                                  a difference? Are these terms of use
                                  reusable for our use case? I guess I
                                  should prepare a detailed description
                                  what get's collected and provided by
                                  whom to enable a lawyer to help here?
                                
                                
                                3.
                                If I understood correctly, the
                                  foundation has no bandwidth to host
                                  these services. In that case, I've to
                                  get back to my university and ask for
                                  permission to host these services
                                  somewhere close to our backbone or
                                  raise some funding to put a server
                                  elsewhere. One question that comes
                                  into my mind: If the foundation is not
                                  hosting these services, can we deliver
                                  Code Recommenders with preconfigured
                                  URLs that point to external project
                                  servers? For instance, something like
                                  "
code.recommenders.org"?
                                
                                4.
                                What is needed - and technical
                                  feasible? It may become the case that
                                  the raw data exceeds TBs (not in the
                                  first years I guess :)). Honestly,
                                  I've yet no clue how much data will be
                                  collected and what information others
                                  may be interested in. What we have in
                                  mind is to create reference data sets
                                  for machine learners and se
                                  researchers to enable research to
                                  create new tools and improve
                                  algorithms for code search, code
                                  recommendations etc. But these data
                                  sets will, for practical reasons, only
                                  include a subset of (anonymized) data
                                  needed for research purpose. Would
                                  this be satisfying? Do you think some
                                  kind of agreement is needed?
                                
                                
                                Is there anything I'm currently not
                                  aware of?
                                
                                
                                Thanks,
                                Marcel
                                
                                  On 11.01.2012, at 00:08, Wayne
                                    Beaton wrote:
                                  
                                  
                                     FWIW, the Eclipse Foundation
                                      has a single lawyer on staff.
                                      Though we do retain the services
                                      of other lawyers. So I guess,
                                      "lawyers" is generally accurate
                                      :-)
                                      
                                      The project needs to make a case
                                      to the Eclipse Foundation for
                                      capturing and maintaining this
                                      data. We are very concerned about
                                      privacy, and so are many people in
                                      the community. There are actual
                                      laws in some countries that need
                                      to be considered as well.
                                      
                                      Since we are a transparent and
                                      open organization, there needs to
                                      be consideration for disseminating
                                      the collected data to other
                                      parties. With the usage data, we
                                      tried publishing filtered data
                                      (which excluded anything that
                                      could potentially expose/identify
                                      specific users) with limited
                                      success. We failed in this regard
                                      which is a big reason why we shut
                                      down the udc.
                                      
                                      Unfortunately, the Eclipse
                                      Foundation lacks the bandwidth to
                                      maintain this data on your behalf.
                                      
                                      
                                      Wayne
                                      
                                      On 01/10/2012 05:36 PM, Marcel
                                      Bruch wrote:
                                      
                                        sounds good to me. But let's see what the Foundation's lawyers say about this... I'll keep you posted.
On 10.01.2012, at 23:26, Doug Wightman wrote:
                                        
                                          Hi Marcel,
I think that's a great idea. For SnipMatch, it would probably make the
most sense to have wording in to the effect that the contributor is
verifying that they own the code and is giving a royalty-free license
to use it for any purpose. This would be associated with a checkbox
that must be checked when the code is to be shared publicly. We
currently have something to this effect already built, but the wording
hasn't been run by lawyers.
Doug
On Tue, Jan 10, 2012 at 3:00 PM, Marcel Bruch <bruch@xxxxxxxxxxxxxxxxxx> wrote:
                                          
                                            Hi PMC,
code recommenders is making good progress and we are confident that we'll
satisfy all major criteria for M5. Extended documentation platform, code
completion engines, and local code search engine are maturing quickly and
SnipMatch guys will start at the end of January. Java, RCP/RAP, and Scout
Packages expressed some interest to integrate Code Recommenders in their
package and we work at full blast to make this happen.
One thing that hasn't been discussed in detail was how do we deal with the
data users provide for instance to snipmatch's community code templates
store or to the extended documentation platform? Is there a special
wiki-like 'terms of usage' needed? Were does this data go to? Also, for
stacktrace search or model generation and model download some data needs to
be delivered to the client and submitted. We started this discussion a while
ago but postponed it.
I'd like to pick up the discussion again - early enough before Juno
arrives. I'm not sure wether this is a discussion for the PMC mailing list
since finally it's a decision of the  Foundation. But Wayne will know, I
guess.
Thanks,
Marcel
_______________________________________________
recommenders-dev mailing list
recommenders-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/recommenders-dev
                                          
                                          _______________________________________________
recommenders-dev mailing list
recommenders-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/recommenders-dev
                                        
                                        Thanks,
Marcel
                                      
                                      
                                      
                                    
                                    recommenders-dev mailing list
                                    recommenders-dev@xxxxxxxxxxx
                                    http://dev.eclipse.org/mailman/listinfo/recommenders-dev
                                  
                                 
                                
                                
                                
                                  
_______________________________________________
recommenders-dev mailing list
recommenders-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/recommenders-dev