NVIDIA has been accused of using copyrighted content without permission in AI training. According to 404 Media, the tech giant asked its employees to download videos from YouTube, Netflix and other platforms and use them in AI projects. These projects include the Omniverse 3D world generator, self-driving car systems and digital human projects.
The company’s practice stands out in the rapidly evolving AI race in the industry, which has raised ethical issues. NVIDIA argued in an email to Engadget that these practices are fully compliant with copyright laws. According to the company, intellectual property laws protect certain expressions, but not elements such as ideas, knowledge, and data. Therefore, it argues that it is legal to use information and ideas learned from other sources.
YouTube disagrees. YouTube spokesman Jack Malon told Engadget that using YouTube to train AI models is a clear violation of the platform’s terms, citing a statement from the platform’s CEO Neal Mohan in April.
Some NVIDIA employees have expressed concerns that this practice could create ethical and legal issues. However, executives have responded to these concerns by stating that they have received approval from the highest levels of the company. NVIDIA Vice President of Research Ming-Yu Liu stated that this decision was made by management and that they have general approval for all data. Other employees stated that this data collection process is a legal issue that will be addressed in the future.
NVIDIA’s approach is reminiscent of Facebook’s (now Meta) “move fast and break” philosophy, which once led to many people’s privacy being breached.
The company reportedly instructed them to train on YouTube and Netflix videos, as well as the MovieNet movie trailer database, internal libraries containing video game footage, and GitHub video datasets WebVid (defunct) and InternVid-10M. InternVid-10M contains a dataset of 10 million YouTube video IDs.
Some of NVIDIA’s data is allegedly marked as suitable for academic or non-commercial use only. For example, the HD-VG-130M dataset contains 130 million YouTube videos and is licensed for academic research use only. However, NVIDIA has argued that this data can be used for commercial AI products, ignoring these academic terms of use.
NVIDIA’s copyright policies and industry backlash
NVIDIA allegedly uses virtual machines (VMs) and rotating IP addresses to download content from YouTube, thus avoiding the bans. When one employee suggested using a third-party IP address rotation tool, another NVIDIA employee explained that restarting the VM instances using Amazon Web Services (AWS) provides a new public IP, and thus they have not had any issues so far.
404 Media’s full report on NVIDIA’s practices is worth reading, covering all the details. This report highlights the legal and ethical dimensions of NVIDIA’s methods in AI training. The controversy over copyright infringement, in particular, is bringing to the forefront the boundaries that tech giants can cross in the AI race.