In an era where artificial intelligence (AI) is becoming increasingly sophisticated, the methods used to train these systems are coming under scrutiny. A recent controversy has emerged involving Meta, the tech giant behind social media platforms Facebook and Instagram, which has left Australian authors furious.
The issue at hand is the alleged use of a pirated dataset of books to train Meta’s AI, without the consent of the authors.
The heart of the matter lies in the LibGen dataset—an online archive of books—that Meta has allegedly used for AI training, despite internal warnings that the dataset contained ‘pirated’ content.
The Atlantic has published a searchable database allowing authors to check if their works are part of the LibGen dataset.
The revelation that many Australian authors, including former prime ministers Malcolm Turnbull, Kevin Rudd, Julia Gillard, and John Howard, have had their books included in this dataset has sparked outrage.
Holden Sheppard, author of the popular young adult novel Invisible Boys, which has been adapted into a series, reportedly discovered that two of his books and two short stories were potentially used to train Meta’s AI.
‘I am furious to learn my books have been again pirated and used without my consent to train a generative AI system which is not only unethical and illegal in its current form, but something I am vehemently opposed to,’ he expressed.
Sheppard’s call to action is clear: Australia needs ‘AI-specific legislation’ to ensure that generative AI developers comply with existing copyright laws.
Journalist and author Tracey Spicer said her books The Good Girl Stripped Bare and Man-Made are also included in the dataset. She expressed a sense of violation upon learning her works were used without her permission.
She also highlighted the financial struggles authors face, especially in a small market like Australia, and condemned what she termed ‘peak technocapitalism.’ With this, Spicer is advocating for a class action in the country and is urging authors to reach out to their local federal MPs.
‘It’s a bit rich for big tech to cry poor. These companies can afford to pay for content, or they can create synthetic datasets,’ she claimed.
Likewise, Alexandra Heller-Nicholas, an award-winning film critic and author, said she found eight of her books on cult movies in the dataset. She described the situation as not only upsetting and angering but also exhausting, as it represents her life’s work being used without her consent.
The Australian Society of Authors has taken to Facebook to rally authors to advocate against the unauthorised use of their works. Sophie Cunningham, the society’s chair, reported a widespread ill feeling among authors about the way their work has been handled, with massive corporations allegedly profiting at the expense of writers.
Meta has reportedly declined to comment due to the ongoing litigation but has previously lobbied for AI training on copyrighted data to be considered fair use. It is facing a lawsuit in the United States, with high-profile authors like Ta-Nehisi Coates and comedian Sarah Silverman among the plaintiffs accusing the company of copyright infringement.
Meanwhile, the debate over AI and copyright continues to evolve, with some AI companies, like OpenAI, entering into agreements with publishers for the use of their work.
This controversy raises important questions about the ethics of AI development, the protection of intellectual property, and the rights of creators in the digital age. Australian authors are now at the forefront of this battle, demanding respect for their work and a voice in how it is used.
What are your thoughts on this issue? Have you ever considered how the books you love might be used to train AI without the authors’ consent? Share your opinions with the YourLifeChoices community in the comments below.
Also read: $50 million Meta settlement: Australians could be compensated for privacy violations