Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [open-regulatory-compliance] Non-CRA: Open Data in the Digital Omnibus proposal

Hi Felix,

Thank you for the interest and you are right in my opinion - ORC might be able to be about more than the CRA.

That said, my reason for being involved here is because it is clear to me that ORC exists to support open source communities, stewards, and individual maintainers: which in my opinion means first and foremost civil society.

Therefore, I want to point out that the needs and the rights of natural persons (the public) are by nature not the same as the needs and rights of legal persons (enterprises).

Correct me if I'm wrong, but it seems to me that you are lobbying for the enjoyment of the same rights and privileges delegated to natural persons and civil society by your employer, Microsoft (and other VLEs).

Let's put this to the test:

The New Legislative Framework defines distinct categories of economic operators: manufacturers, importers, distributors, and authorized representatives. Each has different obligations calibrated to their role in the supply chain and their capacity to bear responsibility. The CRA has now added a new category: the "open source software steward," which for the first time in EU law recognizes that civil society organizations play a fundamentally different role than commercial manufacturers.

This distinction matters. The steward exists precisely because the CRA drafters understood that foundations, non-profits, and community-driven projects are not equivalent to Platinum members of the Linux Foundation paying $500K annually, or Strategic members of Eclipse paying similar amounts. These are the entities that should bear proportional costs for access to public resources.

Article 27 of the Universal Declaration of Human Rights states that "everyone has the right freely to participate in the cultural life of the community, to enjoy the arts and to share in scientific advancement and its benefits." Open government data is precisely such a benefit. The question is not whether VLEs should have access, but whether they should enjoy the same terms as natural persons exercising their human rights.

The open source foundation model already provides an answer: tiered membership. The Linux Foundation charges individuals $99, small companies $5,000, and Platinum members $500,000+. Eclipse follows similar patterns. This is not discrimination. It is proportionality, based on ability to pay and the benefits extracted from the commons.

If the Commission truly wishes to allow public sector bodies to charge VLEs while preserving open licensing for everyone else, the solution is straightforward:

  1. Maintain CC-0 or CC-BY licensing for all public sector data for natural persons, civil society organizations, open source stewards, and SMEs.
  2. Create a separate commercial licensing regime for VLEs, where fees scale with the economic capacity of the entity, similar to foundation membership tiers.
  3. Prohibit VLEs from circumventing this regime by obtaining openly-licensed data through intermediaries.

This approach preserves license interoperability for the open ecosystem, respects the human right to share in scientific advancement, and allows public sector bodies to generate appropriate returns from those with the greatest ability to pay.

On the license incompatibility concern: I think you're overstating the risk. Facts are not copyrightable. When Wikipedia incorporates government statistics or demographic data, it's incorporating facts, not copyrighted _expression_. The EU Database Directive protects against substantial extraction of databases, but this protection doesn't attach to individual data points once they enter public discourse. Wikipedia routinely cites factual information from sources that aren't CC-BY-SA licensed because the facts themselves carry no license.

The real-world license compatibility problems in open data, as documented in OpenStreetMap's transition from CC-BY-SA to ODbL, concern conflicts between share-alike licenses themselves. The UK's OS OpenData Licence was incompatible with ODbL because of conflicting attribution and downstream licensing requirements, not because of differential pricing.

The Commission's proposal allows public sector bodies to charge VLEs differently while the underlying data remains available to everyone else. The license and the commercial relationship are separate, just as the Apache License applies equally to everyone while Platinum members of the Apache Foundation pay $125,000 annually for governance rights.

I don't see this as the end of open government data. It could strengthen sustainability of public data infrastructure while preserving openness for civil society.

Cheers, 

Daniel



On Sat, 29 Nov 2025 at 15:40, Luis Villa via open-regulatory-compliance <open-regulatory-compliance@xxxxxxxxxxx> wrote:
Hi, Felix-
Thanks for sharing this; it’s an interesting problem. 

Without speaking for CC, I will note that in essentially every public discussion CC has had about AI and data in the past 18 months we’ve loudly heard many variations of “yes, I want to share, but not with trillion dollar companies”. So this idea is heavily in the air and pushing back on it will be difficult. (I would imagine even more so in the EU, since so many of those biggest enterprises are American.)

Couple thoughts:

Would you mind if I shared your email with the CC staff?

- at the scale of modern datasets, distribution cost can be non-trivial. We have a real problem at one of my non-profits that a single complete download of our dataset from Google Cloud (where it is hosted) could cost us ~1% of our annual budget.

- related to the previous point, for many of the biggest and most interesting datasets, API terms are as important or more important than the license of the underlying data, because many (most?) users will only want to fetch some rows via an API, rather than obtaining the entire dataset. This is particularly true for datasets that are uploaded regularly (which I would imagine is the case for many of the best/most important government datasets).

- what is the relevant definition of VLE? One suggestion I have heard in the AI context is that we should rely on the definition of VLOPs; don’t think I’ve heard use of VLEs. But that may be an artifact of the policy circles I run in.

- the license complexity/interoperability problem is a very real one; but not one that in my experience resonates with anyone unless they have personally tried to build a compliant dataset. (In other words, very few people care!) If you want to push back, I suspect you will need to find some academics who have used government datasets that would be 

Luis

On Fri, Nov 28, 2025 at 11:14 AM Felix Reda via open-regulatory-compliance <open-regulatory-compliance@xxxxxxxxxxx> wrote:
Hi everyone,

In the understanding that the scope of ORC WG is in principle broader than the CRA, and that it may deal with other policy issues that affect the open source community, I am sharing the following message, which relates to the open licensing of government data. If you feel this is out of scope for ORC WG, please let me know.

I have identified a potential problem for open licensing in the recently introduced European Commission digital omnibus proposal that I would love to hear your views about (there are two digital omnibus proposals, one on AI and one on other digital legislation, I am referring to the non-AI one: proposal on simplification of the digital legislation). This is about the Open Data Directive, which is supposed to make government data available and re-usable to the public, a piece of legislation which I helped negotiate in a previous life, so the topic is dear to me.

The Digital Omnibus aims to reduce the number of data-related legislative acts by repealing the Open Data Directive and instead incorporating its contents in the Data Act, which is a regulation. That is not a bad idea, because regulations are directly applicable law across the EU and do not need to be transposed into national law by the Member States, a process which can often be a cause for confusion and national differences. So I like the basic idea.

However, the European Commission proposes to make two consequential changes to the Article 6 and 8 Open Data Directive in the process, which I am including below in track changes (they’re Articles 32q and 32r in the new proposal on simplification of the digital legislation, see pp. 41-42). I will use data and documents interchangeably in the following analysis for simplicity’s sake, the legislation applies to both.

My reading of these changes is the following. Up until now, for those public sector bodies not mentioned in Art. 32q (2), the re-use of public sector documents had to be free of charge and subject to non-discriminatory license conditions, ideally standard open data licenses such as CC-0, CC-by, or national open data licenses like Datenlizenz Deutschland. While public sector bodies could charge a marginal price for costs that arose in the context of providing the data (for example the work needed to anonymise a document or dataset that contained personal data), they couldn’t charge for the access to the data as such. This was great news for open data, because the use of nondiscriminatory standard licenses that meet the open definition would allow the general public to combine different data sources without risking the kinds of license conflicts that we know all too well from the open source world.

The proposed changes below, as I read them, apply the following changes to this regime:

If the entity requesting re-use of the documents is a very large enterprise, the public sector body can charge them a higher fee. That alone is defensible, given their greater economic power. However, I am concerned that the specific way that Art. 32q (6) is drafted does not just allow the charging of higher fees from very large enterprises compared to other data users, it also allows the public sector bodies to charge very large enterprises in situations where the same data was previously made available free of charge. I come to this conclusion because Art 32q (6) states that such charges may cover a range of different costs, together with a reasonable return on investment, *in addition* to any of the charges mentioned in paragraph 1. In other words: There can be charges that are not marginal costs related to making the data available in the first place. That means that the basic principle of paragraph 1, that the re-use of government documents must be free of charge, does not apply to very large enterprises at all.


Article 6 32q

Principles governing charging for open government data


1.   The re-use of documents within the scope of this Section shall be free of charge. However, the recovery by the public sector body holding the data of the marginal costs incurred for the reproduction, provision and dissemination of such data or documents as well as for anonymisation of personal data and measures taken to protect commercially confidential information may be allowed.

2.   By way of exception, paragraph 1 shall not apply to the following:

  1. public sector bodies that are required to generate revenue to cover a substantial part of their costs relating to the performance of their public tasks;
  2. libraries, including university libraries, museums and archives;
  3. public undertakings.

[…]

5.   Where charges are made by the public sector bodies referred to in point (b) of paragraph 2, point (b), the total income from supplying and allowing the re-use of data or documents over the appropriate accounting period shall not exceed the cost of collection, production, reproduction, dissemination, data storage, preservation and rights clearance and, where applicable, the anonymisation of personal data and measures taken to protect commercially confidential information, together with a reasonable return on investment. Charges shall be calculated in accordance with the accounting principles applicable to the public sector bodies involved.

6. Public sector bodies may set out higher charges for the re-use of data and documents by very large enterprises than the charges provided for in paragraphs 1, 4 and 5. Any such charges shall be proportionate and based on objective criteria, taking into account the economic power, or the ability of the entity to acquire data, including in particular a designation as a gatekeeper under Regulation (EU) 2022/1925. In addition to the elements listed in paragraph 1 of this Article, such charges may cover the cost of collection, production, reproduction dissemination and data storage and where applicable the cost of anonymisation or measures to protect the confidentiality of the data or documents, together with a reasonable return on investment.

6. 7.   The re-use of the following shall be free of charge for the user:
  1. subject to Article 14 32v paragraph (3), (4) and (5), the high-value datasets, as listed in accordance with paragraph 1 of that Article;
  2. research data referred to in point (c) of Article 1(1)32i.
You may think that’s fair - after all, very large enterprises tend to be very profitable, right? Where this becomes a problem becomes apparent in Art. 32r, which deals with open licenses. Previously, public sector bodies were categorically forbidden from using licenses for public sector data that included discriminatory conditions, which would violate conditions 2.1.6 (non-discrimination) or 2.1.8 (application for any purpose) of the open definition. The use of standard licenses, such as CC-0, was explicitly encouraged.

The objective of allowing public sector bodies to always be able to charge very large enterprises for public sector data conflicts with this open licensing approach. If the same data was provided free of charge under an open license to the general public, but be subject to a fee for very large enterprises, even if there were no costs incurred by the public sector body associated with making the data available, nothing would stop the very large enterprise from simply copying the open data from a third-party source, which would be able to reproduce the data legally. So in order to be able to charge very large enterprises for the data itself (not costs for the provision of the data, such as bandwidth, anonymisation etc.), the Commission has to abandon the encouragement of standard open licenses and explicitly allow for non-open license conditions, as becomes evident from the proposed changes to Article 32r:

Article 8 32r
Standard licences
(1) The re-use of data or documents shall not be subject to conditions, unless such conditions are objective, proportionate, non-discriminatory and justified on grounds of a public interest objective.
(2) When re-use is subject to conditions, those conditions shall not unnecessarily restrict possibilities for re-use and shall not be used to restrict competition.
(3) In Member States where licences are used, public sector bodies shall ensure that the standard licences for the re-use of public sector data or documents, which can be adapted to meet particular licence applications, are available in digital format and able to be processed electronically. Member States shall encourage the use of such standard licences.
(4) Public sector bodies may establish special conditions for the re-use of data and documents by very large enterprises. Such conditions shall be proportionate and should be based on objective criteria. They shall be established taking into consideration the economic power, or the ability of the entity to acquire data, including in particular a designation as a gatekeeper under Regulation (EU) 2022/1925.
What does this mean in practice? All but the most open-data-friendly public sector bodies will abandon the use of open licenses such as CC-0 in favor of either offering no standard licenses at all (every entity/person that requests re-use of the public sector data must negotiate its own license), or in favor of new, custom-made non-open standard licenses that distinguish between re-use by very large enterprises and re-use by everybody else.

The Commission may very well believe that this will have no adverse impact on anyone but very large enterprises, but this couldn’t be further from the truth. As we very well know from the open source context, license incompatibilities are a huge problem. Some open projects use share-alike provisions, such as Wikipedia. It would be impossible to combine such non-openly licensed public sector documents with openly licensed projects under a share-alike clause. By doing so, one would either violate the special conditions restricting the re-use by very large enterprises from the government license, or one would violate the share-alike requirement from the open project. If this new option to establish special conditions for the re-use of public sector data by very large enterprises was used widely by public sector bodies (and I think there would be a lot of economic pressure on them to use this option in the hopes of generating new revenue streams), that could very well be the end of open government data in the EU.

Am I missing something here? Please let me know what you think!

Best,
Felix


_______________________________________________
open-regulatory-compliance mailing list
open-regulatory-compliance@xxxxxxxxxxx
To unsubscribe from this list, visit https://accounts.eclipse.org
_______________________________________________
open-regulatory-compliance mailing list
open-regulatory-compliance@xxxxxxxxxxx
To unsubscribe from this list, visit https://accounts.eclipse.org

Back to the top