Generative AI and the Law:

“Disgorgement” is a legal concept in which a court orders the return of any benefits gained from wrongful conduct. It is a form of equitable relief used to prevent unjust enrichment, restore the status quo and deter further wrongdoing. This is akin to the “fruit of the poisoned tree” construct, in which any material gained in an illegal search may not be used in legal proceedings. This is intended to prevent the use of illegal search and seizure by the criminal justice system.

Now, our friends over at the Federal Trade Commission are using algorithmic disgorgement with the intent of keeping practitioners of AI honest.

Algorithmic disgorgement states that any algorithms resulting from the use of illegally obtained data must be destroyed. This raises the question: If the system is trained on copyrighted material, whether or not the output is in fact, identical to copyrighted material, is this a violation of copyright?

The counter argument, of course, is that generative AI systems are trained just as humans are, with lots of examples, and we don’t examine the source material that humans have been trained on in deciding whether or not their outputs are violations of copyright.

Algorithmic disgorgement implies that a record of all training data for a particular algorithm must be preserved. If at any time this material is in violation, any algorithms derived from it must be destroyed. This is because it is essentially impossible to analyze the system and reconstruct the original training data. Presumably, not preserving all of the training data accurately, although this would be difficult to prove, would be a form of fraud.

Discovering the provenance of training data is difficult due to “data entanglement:” combining data sources to create a single, more complex dataset. Data entanglement can improve the accuracy of deep learning models. Data entanglement, however, makes maintaining a clear chain of custody difficult if not impossible. This is akin to mixing different volumes of wine into a single vat—once the mixing has occurred, retrieving the separate vintages is impossible

Some Examples

The Federal Trade Commission (FTC) and Department of Justice (DOJ) recently announced a settlement agreement with WW International, after charging them with violating the Children’s Online Privacy Protection Act for improperly collecting health information from children. The complaint claims that WW’s social media app Kurbo encouraged children to falsely claim they were over the age of thirteen, and that it failed to provide safeguards to guarantee that those who choose the parent signup option were actually parents. The company must delete any data collected from children and destroy any algorithms derived from the data.

In another example, the FTC recently settled with Everalbum, Inc. the developer of a now-defunct photo storage service, for retaining photos and videos of deactivated users without proper consent. Everalbum had to destroy all face embeddings, models and algorithms derived from the users’ biometric information.

Algorithmic systems used in AI and machine learning can involve large models that use complex logic expressed in code. Companies often do not track their AI projects for effectiveness and bias. Given the complexity of algorithmic systems, decoupling data from them is not always straightforward. Data used to train them hardly ever sits in one place. Lawmakers should consider the implications of algorithmic disgorgement within this context. How can organisations implement algorithmic disgorgement if they lack full knowledge of their data assets?

References

Author