
Risks of AI training datasets

Nearly 80% of Training Datasets May Be a Legal Hazard for Enterprise AI
A recent paper from LG AI Research suggests that supposedly ‘open’ datasets used for training AI models may be offering a false sense of security – finding that nearly four out of five AI datasets labeled as ‘commercially usable’ actually contain hidden legal risks. Such risks range from the inclusion of undisclosed copyrighted material to…

Monetizing Research for AI Training: The Risks and Best Practices
As the demand for generative AI grows, so does the hunger for high-quality data to train these systems. Scholarly publishers have started to monetize their research content to provide training data for large language models (LLMs). While this development is creating a new revenue stream for publishers and empowering generative AI for scientific discoveries, it…