mark nottingham

On Opting Out of Copyright

Wednesday, 18 September 2024

Tech Regulation Web and Internet

The EU AI Act and emerging practice flip copyright’s default opt-in regime to an opt-out one. What effects is this likely to have on the balance of power between rights holders and reuse?

Copyright is a default opt-in regime, from the standpoint of the rights holder. If I publish something on this blog, the presumption is that I retain rights unless I specifically license them – for example, by attaching a creative commons license. If I don’t do that, you can’t legally reuse my content (unless your use falls within certain exemptions).

You can think about this arrangement in terms of protocol design: it’s an agreement between parties whose nature creates certain incentives and barriers to behaviour. Someone who wants to reuse my content has the burden of getting a license from me, and proving that they have one if I challenge them. I have the burden of finding misuse of my content and pursuing it.

Technical systems can assist both parties in these tasks. I can use search engines of various sorts to find potential abuses; a licensee can prove that a particular license was available by showing its existence in the cache of a disinterested third party (often, one of the same search engines).

This creates an equilibrium: the burdens are balanced to favour certain behaviours. You might argue that the balance is unjust (and many do), but it is known and stable.

As discussed previously, the EU AI Act and emerging practice flip copyright’s default opt-in regime to an opt-out one. A rights holder now has to take positive action if they want to reserve their rights. While on the face of it they still have the same capability, this ends up being a significant practical shift in power.

That’s partly because of the nature of opt-out itself. The burden shifts: now, the rights holder must find misuse of their content, and prove that they opted out.

Proving that you consistently opted out at every opportunity is difficult, because it’s effectively proving a negative – that you never failed to opt out. Search engines don’t see every request made on the Internet; they just crawl it periodically, sampling what they see. An AI crawler can plausibly claim that the opt out wasn’t present when they crawled, and the rights holder is reduced to proving that the teapot isn’t in orbit.

Notably, this is the case whether the opt-out is attached to the content by a mechanism like robots.txt or if it’s embedded in the content itself as metadata. In the former case, content without the opt-out might be obtained at a different location, or at a different time; in the latter, the opt-out might be stripped from the content or a copy of it, either intentionally or unintentionally (e.g. it is a common to strip metadata from images to optimise performance and improve privacy).

On top of that, using this regime for AI makes finding misuse difficult too. There’s no easy way to query an LLM for a particular bit of content in the corpus that was used to train it; instead, you have to trust the vendor to tell you what they used. While transparency measures are being discussed as a policy solution to this issue, they don’t have the same properties as third-party or technical verification, in that they require trusting assertions from the vendor.

In this manner, changing copyright’s default opt-in to an opt-out for AI dramatically shifts the burden of compliance to rights holders, and the lack of support for managing those burdens brings into question the practical enforceability of the regime. It could be argued that this is appropriate for policy reasons – in particular, to enable innovation. However, it is a mistake to say it doesn’t represent a change in the balance of power as compared to opt-in.