Information on AI and copyright for UCL Press authors
We understand that authors are keen to know more about how their works might be used by AI companies for the training of Large Language Models (LLMs). This page contains information about the current AI landscape with regards to author copyright and licensing, and will be updated regularly as new developments emerge in AI legislation and technology.
Table of contents
- We understand that authors are keen to know more about how their works might be used by AI companies for the training of Large Language Models (LLMs). This page contains information about the current AI landscape with regards to author copyright and licensing, and will be updated regularly as new developments emerge in AI legislation and technology.
- Current legislation
- Creative Commons licensing and attribution
- Scholarly publishing agreements with AI companies
- Prevention of AI bots
- FAQs
Current legislation
Works in copyright are covered by current legislation that prevents their use without permission. However, AI companies argue that training LLMs on copyright work doesn’t constitute infringement of copyright because the works are used for training purposes rather than being reproduced. Such use is also potentially covered by the text and data mining exception in the UK and ‘fair use’ in the US. The UK government undertook consultation in February 2025 on changing the law regarding copyright and AI to introduce an ‘opt out’ option for copyright owners, which would place the onus on them to ensure that their works cannot be accessed by AI companies. The outcome of this consultation is awaited. There are also a number of legal cases underway by media companies and artists challenging AI companies’ use of copyright works without permission, acknowledgement or compensation. The outcome of these legal cases and any changes in legislation could affect author rights and this information will therefore be updated regularly
Creative Commons licensing and attribution
Most Open Access works are published under a Creative Commons licence that allows copyright owners to specify how their works can be accessed, shared and reused. All Creative Commons licences specify that attribution must be made to the copyright owner and publisher of the work if direct extracts of the work are cited. Gen AI outputs don’t generally cite passages from works; rather they provide a synthesis of information ingested from huge datasets. Therefore, direct attribution isn’t always possible. However, links to original source material are increasingly being included in Gen AI outputs as AI companies realise the importance of transparency, factual accuracy and adherence to copyright laws.
Creative Commons plans to develop ‘author preference signalling’ tools that indicate to AI companies how authors would like their works to be used by AI companies, but this will be a statement of preference and won’t be legally enforceable. When these tools are made available, this information will be updated. While the application of a CC BY-NC (non-commercial) licence should prevent the use of works for commercial use, text and data mining exceptions may take precedence in some regions.
Scholarly publishing agreements with AI companies
Some large scholarly publishers have recently signed licensing agreements with AI companies to allow them to use the works they’ve published. In some cases, these allow authors to opt out and they aim to ensure that authors will be attributed and paid royalties. AI companies are unlikely to undertake such agreements with OA publishers, as OA content is already freely available and the reuse of OA works is covered by the Creative Commons licence.
Prevention of AI bots
There are some technical solutions that can be installed on websites to signal to AI bots that website owners don’t want their content to be crawled (e.g. Robots.txt); however, this signals a preference rather than providing a block. Newer solutions, such as Cloudflare, claim to be able to prevent AI bots. UCL Press content is mainly hosted on third-party platforms in order to ensure widespread dissemination. They’ll have different mechanisms in place that UCL Press doesn’t control. These OA hosting platforms may not feel that a complete block is necessary or desirable and that the content should be open to use under the terms of the licence.
FAQs
How can my work be used by AI companies for training LLMs?
Can I prevent my work being used by AI companies for training LLMs?
How can I ensure that my work will be attributed under the terms of the Creative Commons licence assigned to my work?
How can I identify whether my work has been used for training LLMs?
Can UCL Press come to an agreement with the large AI companies to licence UCL Press publications and ensure greater control over how they are used?
What does the UK government say about the use of copyright works by AI companies?
What is the legal position on the use of copyright works by AI companies in other regions such as the USA and Europe?
Can UCL Press prevent AI bots from crawling UCL Press publications?
Do the OA platforms where my work is distributed have any technical protections in place such as Robots.txt or Cloudflare?
Last updated 23/09/2025