On April 11, 2023, the Cyberspace Administration of China released draft measures on “Generative AI Services”, which are now open for public comment through May 10.
The draft, an apparent response to the rapid rise of new AI tools such as ChatGPT and Dall-E, builds on earlier provisions regulating “Deep Synthesis Internet Information Services”, that were jointly released by the CAC, Ministry of Public Security, Ministry of Industry and Information Technology, and took effect in January 2023.
The scope of the earlier document differs slightly from the new draft in that it applies to all machine content-generation services provided only through the Internet, while the new draft could apply online and off. Generative AI services also arguably include only a subset of the content-generation tools covered by the Deep Synthesis provisions. Still, it is fair to assume that both of the documents will apply to many tools for computer generation of text, images, audio, video, and other media; and it’s unsurprising that their content is overlapping and complementary in many areas.
The Draft Generative AI rules
The stated goal of the draft rules is to support the healthy development and regulated application of generative AI tools. This includes encouraging independent innovation, increased popularization, and international cooperation on basic technologies.
The draft highlights several issues, initially laid out in article 4, that are familiar from the global debate surrounding artificial intelligence:
- Content Controls / Censorship
- Preventing Discrimination
- Protection of Intellectual Property Rights,
- Curbing Misinformation
- Privacy and Data Protection
Each of these core concerns is considered in more detail below, presented in the order in which they are listed. An additional list of service provider obligations and liability follows the discussions, and this can serve as an overall outline of the rules’ content as well.
China’s strict censorship regime is infamous and is bound to be an immediate focus of overseas audiences looking to see how China regulates new information technologies. The draft, however, largely attempts to apply existing regulations to enforce content restrictions.
Article 4(1) addresses the general content requirements for uses of generative AI:
“(1) Content generated using generative AI shall embody the Core Socialist Values and must not incite subversion of national sovereignty or the overturn of the socialist system, incite separatism, undermine national unity, advocate terrorism or extremism, propagate ethnic hatred and ethnic discrimination, or have information that is violent, obscene, or fake, as well as content that might disrupt the economic or social order.”
First, the “Core Socialist Values” refers to a propaganda campaign focusing on civic duties and morality that now permeates Chinese law and society. Its twelve ‘values’ are broken into 3 groups: the national goals of prosperity, democracy, civility, and harmony; the societal goals of freedom, equality, justice and rule of law; and the individual goals of patriotism, dedication, integrity, and friendship.
References to the Core Socialist Values are already commonplace in Chinese legal authority, with documents in diverse areas regularly including at least a passing reference to the guidance, practice, promotion, or embodiment of the Core Socialist Values. While the frequency with which the campaign is cited shows its importance to party-state ideological rhetoric, the individual ‘values’ are vague enough that their inclusion rarely adds enforceable substance. That is particularly true here considering that there are much clearer censorship requirements. Put another way, understanding the Core Socialist Values might help explain why certain content is regulated, but they probably shouldn’t be viewed as an additional restriction in and of themselves.
The remainder of Article 4(1)’s language on more specific prohibitions is taken from article 12 of the Cybersecurity Law, concerning prohibited online conduct. The prohibitions also closely track the most recent general online content regulations’ list of “illegal content” that online platforms, users, and content producers are prohibited from producing, publishing, or disseminating online.
It’s worth noting that the current online content regulations also include a separate list of ‘negative’ online content that is discouraged and restricted if not strictly ‘illegal’, but the draft generative AI rules make no mention of this ‘negative content’. The earlier Deep Synthesis rules, which expressly apply to online content generation, however, impose equal requirements for service providers to identify and report both illegal and negative content.
Finally, as will be discussed in more detail in the section on Preventing Misinformation, restrictions on ‘false’ or ‘fake’ content amount to a different type of content limitation, outside the normal censorship regime.
In order to address algorithmic bias, the draft requires that active measures be taken to prevent discrimination based on traits such as race, ethnicity, country, faith, sex, age, or occupation during the selection of training data, model generation and fine-tuning, and provision of services. Such factors must also not result in discriminatory content generation based on users’ characteristics.
While these principles are admirable, it will be significantly harder to put them into practice. The draft does not explore the standards for identifying discriminatory outputs but does emphasize transparency and user feedback as mechanisms to prevent it. The authorities are empowered to request that service providers describe key factors used in their generative tools including information on training data sources, rules for manual data tagging, and foundational algorithms and systems. The service providers themselves are required to review and address user complaints and to ensure the diversity and objectivity of all training data. Descriptions of the algorithms used in providing the services must also be filed in accordance with China’s general provisions on algorithmic recommendation services.
The works created by generative AI tools are inherently based on an analysis of training data, and artists around the world have already begun to fear that their intellectual property is being infringed when AI is trained using their online works.
The draft rules hold that providers are responsible for ensuring that their training data does not include content that infringes on copyright. At first glance, this appears to be a firm ban on using any copyrighted materials in training data without authorization from the rights holder, but the circular phrasing leaves many questions. By defining the rule in terms of content that ‘violates’ copyright, the draft fails to explain what uses of training material by AI tools are considered an infringement. A work is necessarily reproduced when it is included in a database, and this may be a copyright violation, but it may also be considered to fall within one of the ‘fair use’ exceptions in China’s Copyright Law, such as for using others’ published works for personal research.
The draft also holds that intellectual property rights must not be violated in the provision of generative AI services, assumedly meaning that the generated content must also not violate IP rights. This is a distinct but related question to whether training datasets violate IP protections. Consideration of the extent to which a generated work is derivative of an original work, or whether it is being used to directly compete with the original work, may ultimately become the deciding factors in determining both whether the generated work violates copyright, and whether the use of copyrighted training data was permissible. In their current form, however, the draft rules only leave many questions answered.
The draft is further concerned about the use of generative tools to gain an unfair business advantage. This issue was touched on in November 2022 draft revisions to the Unfair Competition Law emphasizing the use of technology in unfair competition, as well as in the provisions on algorithmic recommendation services that took effect in March 2022. These include language prohibiting business operators from using data, algorithms, and technology to disrupt competition such as by using technology to influence consumer choices or using a position of relative technical advantage to coerce partners or game e-commerce platforms’ user reviews, ratings, and product displays. The related content in the draft generative AI rules is not detailed, but these earlier documents provide some indication of the aggressive marketing tactics and monopolistic uses of technology that the draft is touching on.
The draft rules baldly state that the content created by generative AI must be truthful and accurate. Service providers are required to take measures to prevent the generation of ‘fake’ content and to ensure that all training data is true and accurate. The motivation here is clearly to avoid fraud and deception but focusing on the content, rather than its use and impact, is the wrong tactic.
Consider these images of unicorns. The first, from a fresco by Domenichino c. 1604 (via Wikipedia) is ‘fake’ in that its content is fictional, but it is ‘true and accurate’ as a representation of a detail from the historical work. Can it be used as training data? The next two are both AI-generated ‘photo-realistic’ unicorns (via Dall-E), and are also fake because they are computer-generated and unicorns aren’t real. Neither creates problems of misinformation, however, until someone makes claims about what they show. It’s only when someone claims, for example, that the final image was taken by their trail-cam one night, that we start to touch on ‘deception’ (albeit likely harmless in this example).
The earlier provisions on deep synthesis services provided a more focused approach, specifically targeting false rumors and news information. China already has strict regulations on who can create even ‘true’ online news, including any investigation and reporting on “politics, economics, military affairs, and foreign affairs, as well as reporting and commentary on relevant societal emergency incidents” that already effectively prohibit the use of AI to generate fake stories. China also has harsh penalties (including jail time) for spreading rumors that one knows or should know to be false (recently emphasized during the pandemic).
In prohibiting the generation of fake content, both the Deep Synthesis and Generative AI rules, seek to stop the problem at the source rather than wait for an offense to occur, and they place this responsibility on the service provider. In addition to requirements of identifying and reporting prohibited content, the Deep Synthesis rules also require service providers to actively dispel rumors that are furthered using their content generation services. The generative AI rules hold that service providers are responsible for content created using their products as if they were the producers- which may even include the criminal liability for fake news and rumor mongering mentioned above.
It’s easy to see how the unclear standards for permissible content and harsh penalties could lead service providers to over-censor, or hobble their products, as they try to avoid liability. Unfortunately, this type of chilling effect is common in Chinese speech regulation.
In a more practical measure aimed directly at stopping misinformation, the Deep Synthesis provisions have two types of labeling requirement for generated content, and these are expressly incorporated in the generative AI draft:
- The first is technical labeling, requiring a potentially invisible tag, such as in code or file meta-data, that indicates that the content was created or edited using a machine content-generation service. These labels should not impact users’ applications of the tools (i.e. the output), but are to be logged and stored should problems with the content be discovered later.
- The second is conspicuous, visible labeling of content that is likely to mislead or confuse the public. Listed examples include smart text generation or editing that simulates a human, speech generation or manipulation, face generation or alteration, the creation of interactive realistic scenes and environments, etc.
Removing or concealing these labels is prohibited.
All users of generative AI services are required to verify their identities in accordance with the Cybersecurity Law. This type of “real name system” is standard in China, effectively prohibiting anonymity by linking online activity to the verified ID. This allows platforms and the authorities to more easily identify those responsible for misconduct, and is also used to identify minors, whose access to online content is further restricted. This identity information is generally not available to other users, and accounts may usually adopt public-facing aliases or display names of their own creation.
Protection of personal information
China’s Personal Information Protection Law (PIPL) creates a framework for the protection of personal information, meaning information from which individuals can be identified. At the risk of oversimplifying, the law emphasizes the requirement of informed consent in the collection, use, and processing of personal information, and follows a ‘minimum necessity’ principle to limit the scope of data involved.
Informed consent requires letting the information subject know what data will be collected, how it will be used, who will use it, and for what purposes it will be used before they give consent. The minimum necessity principle limits the collection and use of personal information to that needed to achieve the collectors’ primary functions and goals, with additional requirements needed for extra information.
The draft rules provide that where generative AI services involve personal information, the service providers are considered ‘personal information handlers’ under the PIPL and related laws. This includes ensuring that consent is given for the use of personal information in any training data.
User Input Data
Generative AI service providers bear a duty to protect information on users’ use of the service, including the data they input. The input information must not be used by the service providers to create user profiles or be provided to other parties. If users’ identities might be deduced from their input information, the information must not be improperly retained.
This does not mean that users are guaranteed absolute privacy as to the data they input. Under article 10 of the Deep Synthesis provisions, service providers are required to monitor user input for illegal and negative information. Where such information is found, they must take actions ranging from warnings to termination of services to prevent it, in addition to reporting it to the relevant authorities.
- Security Assessment to be conducted before making services publicly available in accordance with Provisions on the Security Assessment of Internet Information Services that have Public Opinion Properties or the Capacity for Social Mobilization (Art. 6)
- Filing Algorithms in accordance with Provisions on the Management of Algorithmic Recommendations in Internet Information Services (Art. 6)
- Ensuring the sources of training data are lawful (Art. 7)
- Developing detailed rules for manual data tagging (Art. 8)
- Implementing Real-name verification system (Art. 9)
- Specifying intended users and uses (Art. 10)
- Taking measures to prevent over-reliance and addiction (Art. 10) and guide proper use (Art. 18)
- Protecting information entered by users (Art. 11)
- Non-discriminatory output (Art. 12)
- Accept user complaints and correct information that infringes user rights (Art. 13)
- Provide safe and stable services (Art. 14)
- Prevent illegal content through screening and retraining of the model. (Art. 15)
- Label generated content (Art. 16)
- Suspend services to stop user violations (Art. 19)
- Responsible for generated content as if a producer of the content (Art. 5)
- Responsible for handling of personal information (Art. 5)
- Punishment in accordance with other law, or where laws are silent, penalties of warnings, public criticism, ordering corrections, stopping services, or fines of between 10,000 and 100,000 RMB