Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [technology-pmc] CodeMatch code snippet search engine considers moving to Eclipse Recommenders

On 23.08.2011, at 14:47, Gunnar Wagenknecht wrote:

> Am 23.08.2011 08:29, schrieb Marcel Bruch:
>> 1. Hosting all servers at Eclipse Foundation
> 
> Well, "all" sounds like many. My assumption was that you would run a
> single virtual server.

Yes. I meant one (a single) machine that runs several services.

>> 2. Create a public sharing API, add support for multiple knowledge-bases/connectors (such as community, in-house etc.). Then, server doesn't have to be hosted at Eclipse.
> 
> Eclipse should still host a server. That would be useful for all
> knowledge about usage patters of org.eclipse.* packages.

I would like it very much to see Code Recommenders serving Eclipse APIs! The question is: Is there a server? Computing the models and storing the data puts some demands on the machine. In addition, the server side is in constant flux and that we can't assess the demands properly for a crowd-sourcing approach yet. I prefer to use (for the moment) a cloud server such as Amazon EC2 for many reasons (memory-, cpu-, network-, and disk-scaling as needed, data backups, minimal maintenance efforts etc.). I'm in contact with Amazon and applied for a research funding to make this happen (some kind of FoE funding :)).


I'm not sure whether other concerns exist which haven't been mentioned yet. So, just to be sure:
In the case you wonder whether the crowd-sourcing approach will be open-source: It has been contributed 3 weeks ago already.
CQ 5453 (https://dev.eclipse.org/ipzilla/show_bug.cgi?id=5453) contains the crowd-sourcing client (for Eclipse) and the server. It is waiting for PMC approval since then. If the PMC's requirements about data collection (see below) and data hosting are met, can we continue with this contribution?


>> Regarding data privacy/sharing, you say that it's a matter of trust and we (Code Recommenders) have to be very careful which data we use. But everyone can decide to contribute or not, and thus, there is no severe problem.
> 
> Yes. It should be an opt-in process and you should not submit code that
> is confidential.

Agreed. No doubt.

> However, that last statement made we wonder. What if you pre-process the
> code on the consumers machine already so that you only submit usage
> patters to the remote collector? Is that possible? That would avoid
> sending confidential source code across the wire as well as save on
> processing resource on the server.

Actually, it is like you say. Only the usage data is submitted - no source code. The code is analyzed inside your IDE and only the usage information for libraries you want to contribute to is submitted. Everything else stays on your local drive.

Andreas Frankenberger (student who is working on the crowd-sourcing approach) assembled a preliminary set of screenshots and descriptions of the upload wizard under http://wiki.eclipse.org/Recommenders/New_and_Noteworthy/0.4 (please don't take offense on the URL. We decided on this URL before we had the discussion here).



Can you please comment on this point of your previous email?

> 
> I'm not sure I got your last point:
> 
>> Of course, you should not enable a public connector with a bunch of packages listed per
>> default.
> 
> What do you mean with "with a bunch of packages listed per default". Will there be any (public-or-not) connector be enabled per default?



Thanks,
Marcel

-- 
Eclipse Code Recommenders:
 w www.eclipse.org/recommenders
 tw www.twitter.com/marcelbruch
 g+ www.gplus.to/marcelbruch



Back to the top