Jay Jay Billings, Ph.D.
AI Policy Discussion
What are the key concerns about the use of AI to generate source code and other artifacts for Eclipse projects?
Jeff Johnston: What's the source of the AI? Are there blessed LLMs and AI code generators that we must used or which should be blessed?
Wayne: Ultimately it is people who adopt the software that get sued. Eclipse is in the middle.
- De Minimus: No one cares about accessors, loops, and basic constructs. Algorithms are the breaking point where IP concerns are raised.
Wayne: I draw a bunch of pictures. I own the copyright. You take my pictures and make a pretty picture book for the coffee table. You own the copyright of the collected work, I still own the copyright for my pictures.
Jeff: Is there a line count limit?
Wayne: No one cares about boilerplate content.
Jeff: Assuming all code that is added has to be annotated, would it be part of a legal review to determine the complexity?
Wayne: You can only get legal advice from your lawyer, not "EF Legal."
Jeff, clarification: Not, legal, but Eclipse IP.
Christoph: If you have some AI code generators that committers use, what about 3rd party contributors? What does it mean if we allow AI content? Using AI for trivial content doesn't really make sense. Do we need something like an "AI ECA?"
Wayne: Right - none of these are new problems. Think back to working for older organizations that would not even let you look at code from other organizations.
Jay: Something to ask legal is whether or not expert content is governed differently across jurisdictions. For example, in the US, two experts producing very similar code after looking at the same reference (possibly one of their codes) is not necessarily a copyright violation.
Martin: What's the role of the EF in this? It is the responsibility of the committer to commit code they wrote and for which they have the copyright (or can act on behalf of the owner as a delegate). Are we really talking about how we, as individual committers, can be safer since all use references anyway? Is there any difference from the EF perspective?
Christoph: Reading a book doesn't explicitly allow you to use that content in a professional, commercial work. Let's say we find a really nice piece of GPL code in a library, for example.
Hannes: If we have external contributions, we can only rely on the assertions provided by the contributor.
Christoph: There is a difference. Let's say Apple sues Eclipse for some violation, Eclipse can pass the liability on to the contributor. Can Eclipse pass the liability on to an LLM?
Jesse: We're really talking about deterministically or nondeterministically executed behaviors. The problem is that LLMs typically generate content nondeterministically and we need to track that there is no reproducibility in what the contributor provided. How do we track that?
Wayne: "Guidelines, really."
Martin: {Deterministic, Nondeterministic} -> {?,?}
Wayne & Jesse: {Deterministic, Nondeterministic} -> {Rote, Novel}
Hannes: Maybe the distinction should be the amount of direction given to the LLM by the operator?
Christoph: The idea of "taking responsibility" seems to be an important point. Who is in control of the change?
Wayne: Committers should next accept contributions that they don't understand or that the contributor doesn't understand.
Martin: In the next few years, the role of IDEs and the amount of AI use to generate code will go far beyond the "little things."
Christoph: If you say, "I'm, as a committer, responsible for trusting the contribution, but we are all using AI," what does that mean?
Jesse: The onus is on the committer. These are open source projects and the care that goes into authorship, management, and maintenance means that the onus is on the committer to understand it. You are essentially saying that you have faith that the underlying tool is a trustworthy partner.
Jeff: The point is that there has to be trust and trustworthy LLMs. Also, should we be asking contributors to mark their code with info about the AI contributions?
Wayne: We have this already.
Possible policy recommendations:
1. We are not worried about "little things" that are not complex, that we could write by hand, or that are already covered by other rote code generators.
2. We should have attestation and attribution for AI contributions.
3. Committers are responsible for the code they accept into their projects and should always follow best practices, whether the code is AI-generated or not.
4. The Eclipse Foundation can let us know which LLMs are problematic and have been shown in court cases to produce questionable content.