Information on AI and copyright for UCL Press authors
We understand that authors are keen to know more about how their works might be used by AI companies for the training of Large Language Models (LLMs). This page contains information about the current AI landscape with regards to author copyright and licensing, and will be updated regularly as new developments emerge in AI legislation and technology.
Current legislation
Works in copyright are covered by current legislation that prevents their use without permission. However, AI companies argue that training LLMs on copyright work doesn’t constitute infringement of copyright because the works are used for training purposes rather than being reproduced. Such use is also potentially covered by the text and data mining exception in the UK and ‘fair use’ in the US. The UK government undertook consultation in February 2025 on changing the law regarding copyright and AI to introduce an ‘opt out’ option for copyright owners, which would place the onus on them to ensure that their works cannot be accessed by AI companies. The outcome of this consultation is awaited. There are also a number of legal cases underway by media companies and artists challenging AI companies’ use of copyright works without permission, acknowledgement or compensation. The outcome of these legal cases and any changes in legislation could affect author rights and this information will therefore be updated regularly
Creative Commons licensing and attribution
Most Open Access works are published under a Creative Commons licence that allows copyright owners to specify how their works can be accessed, shared and reused. All Creative Commons licences specify that attribution must be made to the copyright owner and publisher of the work if direct extracts of the work are cited. Gen AI outputs don’t generally cite passages from works; rather they provide a synthesis of information ingested from huge datasets. Therefore, direct attribution isn’t always possible. However, links to original source material are increasingly being included in Gen AI outputs as AI companies realise the importance of transparency, factual accuracy and adherence to copyright laws.
Creative Commons plans to develop ‘author preference signalling’ tools that indicate to AI companies how authors would like their works to be used by AI companies, but this will be a statement of preference and won’t be legally enforceable. When these tools are made available, this information will be updated. While the application of a CC BY-NC (non-commercial) licence should prevent the use of works for commercial use, text and data mining exceptions may take precedence in some regions.
Scholarly publishing agreements with AI companies
Some large scholarly publishers have recently signed licensing agreements with AI companies to allow them to use the works they’ve published. In some cases, these allow authors to opt out and they aim to ensure that authors will be attributed and paid royalties. AI companies are unlikely to undertake such agreements with OA publishers, as OA content is already freely available and the reuse of OA works is covered by the Creative Commons licence.
Prevention of AI bots
There are some technical solutions that can be installed on websites to signal to AI bots that website owners don’t want their content to be crawled (e.g. Robots.txt); however, this signals a preference rather than providing a block. Newer solutions, such as Cloudflare, claim to be able to prevent AI bots. UCL Press content is mainly hosted on third-party platforms in order to ensure widespread dissemination. They’ll have different mechanisms in place that UCL Press doesn’t control. These OA hosting platforms may not feel that a complete block is necessary or desirable and that the content should be open to use under the terms of the licence.
FAQs
How can my work be used by AI companies for training LLMs?
Open Access works are freely available for access, sharing and reuse under the terms of the Creative Commons licence. See Creative Commons licensing and attribution above.
Can I prevent my work being used by AI companies for training LLMs?
Under the terms of Creative Commons licences, reuse is permitted, provided that attribution is made for direct citation, though there are limitations to the ways in which works can be attributed in Gen AI outputs (see below). This will vary depending on which licence is chosen. A CC BY-NC (non-commercial) licence should prevent an AI company using data for any commercial purpose; however, text and data mining exceptions may apply in some regions.
How can I ensure that my work will be attributed under the terms of the Creative Commons licence assigned to my work?
Increasingly, AI tools include a link to a source when generating responses to prompts, although this often requires a specific request. We’ve seen examples of links to UCL Press books on the UCL Press website, author websites and on distributor platforms. While direct attribution would be required for text extracts, Gen AI outputs don’t generally reproduce texts verbatim; rather they use works to train LLMs, which then provide new information and responses to prompts by synthesising huge datasets.
How can I identify whether my work has been used for training LLMs?
As above, this can be challenging when answers to prompts provide a synthesis of large amounts of information rather than direct quotes.
Can UCL Press come to an agreement with the large AI companies to licence UCL Press publications and ensure greater control over how they are used?
Since UCL Press publications are already Open Access, which means that access, sharing and reuse is permitted under the terms of a Creative Commons licence, it’s unlikely that AI companies will seek such arrangements. Their main interest is gaining access to large volumes of copyright materials that are otherwise inaccessible.
What does the UK government say about the use of copyright works by AI companies?
A government consultation on AI and copyright was undertaken in early 2025 and the outcome of this is awaited. The government proposals recommend the implementation of an ‘opt out’ system in which copyright owners would need to restrict the use of their works by AI companies. Current copyright legislation requires permission to be granted by copyright owners before their works may be used and this is felt to be entirely adequate as a protection. The proposals would represent a significant change if implemented and would put the onus on the copyright owner to opt out. There are widespread concerns about the implications of such a change in the legislation among the creative industries.
What is the legal position on the use of copyright works by AI companies in other regions such as the USA and Europe?
EU: The EU directive on copyright and related rights in the Digital Single Market (Directive (EU) 2019/790) includes two exceptions that allow Text and Data Mining (TDM). Article 3 of the Directive allows research organisations to carry out text and data mining for the purposes of scientific research, provided that they have lawful access (which includes open access) to the source materials. Article 4 allows text and data mining of lawfully accessed works for any purpose, unless rights holders have expressly reserved their rights (‘opted out’ of the exception): in this case, the works cannot be reproduced under the exception. .
While some questions remain whether the TDM exceptions apply to training AI models, the EU AI Act (article 53) mentions TDM in the context of general-purpose AI, indicating that AI providers must respect rights reservation when relying on the exception. The Kneschke vs. LAION case in Germany is also the first case to clarify the application of the TDM exceptions to AI training (see an extensive analysis of this case on the European law blog).
USA: ‘Fair use’ is a US legal doctrine that allows the use of copyright-protected works without permission if certain criteria apply. These are: the purpose and character of the use, the nature of the work being used, the amount and substantiality of the work being used, and the effect of the use on the market of the work. Whether use of copyright-protected works is ‘fair use’ will be informed by the outcomes of cases that are still in progress. However, existing precedents involving machine learning (if not AI training) lend support to the argument that such activities should be considered ‘fair use’. For a relevant discussion see the Association of Research Libraries blog.
Can UCL Press prevent AI bots from crawling UCL Press publications?
The UCL Press website currently has Robots.txt in place, which signals to AI bots that we don’t want our website to be crawled. This signals a preference but doesn’t prevent AI bots from crawling the website. UCL Press publications are hosted on third-party platforms and the measures they have in place to prevent AI bots will vary – see below.
Do the OA platforms where my work is distributed have any technical protections in place such as Robots.txt or Cloudflare?
Solutions such as Robots.txt can be installed on websites to signal to AI bots that website owners don’t want their content to be crawled; however, this signals a preference rather than providing a block. Newer solutions, such as Cloudflare, claim to be able to prevent AI bots. UCL Press content is mainly hosted on third-party platforms in order to ensure widespread dissemination. Those will have different mechanisms in place that UCL Press doesn’t control. The platforms hosting Open Access content may not feel that a complete block is necessary or desirable and that the content should be open to use under the terms of the licence.
Last updated 23/09/2025