How to train your AI Dragon

The rapid increase in AI tools can create works of art, prose, music and film has raised concern in the creative industry and a number of lawsuits are being brought by artists who claim that AI companies have infringed their copyright by training their image generators to produce derivative works. Whilst there exist open-source databases that are made available for training AI – AI systems can also be trained on data that exists online.

Training an AI system involves feeding data into the AI algorithm. Text and data mining “TDM” are used for training AI systems. These datasets or databases may be protected as a copyright work or database.

Some rights holders license their works to allow TDM, but others do not. This has financial costs for businesses using data mining software to train their AI. 

Although factual data, trends and concepts are not protected by copyright, they are often embedded in copyright works. In a similar way data and information in databases may be protect by database right.

Data mining systems often copy works to extract and analyse the data they contain. They may also use the information that they have gathered as a basis to create new works. Unless permitted under licence or an exception, making such copies or derivative works will constitute copyright infringement.  

Such systems may also extract or re-use images, data or other information contained within a database. Extracting or re-using all or a substantial part of the contents of the database will be an infringement of database right unless the activity is done with the owner’s permission or an exception applies.

There are exceptions to copyright for research and private study and a specific exception to copyright which allows TDM for research for non-commercial purposes.  

Under UK database laws, extraction or re-use of insubstantial parts of publicly available database is permitted where it is for non-commercial research.

However, as many AI applications are trained for commercial purposes, these exceptions cannot be relied upon. This can significantly increase the cost and effort of training AI systems.

In July 2022 the UK government consulted on AI and IP and following the consultation decided to introduce a new copyright and database exception which allowed TDM for any purpose. The TDM exception was proposed with no right to opt-out of it and without charge – although protecting content would be allowed and rights owners could charge to access their platform. The UK Government saw this exception as being the most supportive of AI and wider innovation.

Concerns were raised by the music industry and other creative sectors that a blanket exemption from intellectual property rights (IPRs) for work to be used in AI TDM would allow AI works to be created using content protected by IPRs that AI developers do not own and without compensation to the artists and rights holders who had created or invested in it. In a letter that UK Music sent to the Department for Digital, Culture, Media and Sport minister in July 2022 UK Music said about the proposed TDM exception:

Our innovation has value. Our creative IP is not raw material for others to freely monetise – yet this is what these proposals open the door to.

In January 2023, after taking evidence from a range of witnesses in the creative industries, The Lords Communications and Digital Committee said: 

The Intellectual Property Office’s proposed changes to intellectual property law are misguided. They take insufficient account of the potential harm to the creative industries.

As a result of these concerns, in a dramatic U-turn earlier this month the Minister for Science Research and Innovation stated that the proposed introduction of a general TDM exception to copyright and database right would not be proceeding.

Whilst this may come as a disappointment to SMEs wishing to train their AI on data and information available online, it is likely to come as a relief to rights owners who are concerned at the speed at which AI application can scrape their music, literary works and digital images online and use this to create new works.

For now existing copyright and database laws remain in place for TDM and SMEs who are developing AI applications should ensure that they have appropriate rights to use data for training their AI.