From Guard Rails to Epic Fails: Can Generative AI Police Its Own Capacity for Offence?

Autori

  • Tony Veale University College Dublin

DOI:

https://doi.org/10.6092/issn.1974-4382/20544

Parole chiave:

generative AI, Twitterbots, automation, weaponization, regulation

Abstract

Social media platforms have become the outlets of choice for many provocateurs in the digital-age. Not only do they afford egregious behaviours from their human users, this misbehaviour can serve to magnify, and even weaponize, the least desirable outputs of the generative AI systems (often called “bots”) that also operate upon them. In this paper we consider the responsibilities that AI system builders bear for the offences caused by their online creations, and explore what can they do to prevent, or mitigate, the worst excesses, whether explicit or implicit. As the term implies, explicit offence is overt and relatively easy to detect and root out, either in the final edit (in what we call “outer regulation”) or from the generative space itself (in what we call “inner regulation”). Conversely, implicit offence is subtle, mischievous and emergent, and is often crafted to bypass a censor’s built-in guardrails and filters. In line with recent developments in the technology of Large Language Models (LLMs), we argue that generative systems must approach the mitigation of offence as a dialogue, both with their own internal monitors and with their users. Here we will explore, in worked examples from simple generators, whether LLMs are sufficient to provide AI systems with the moral imagination they need to understand the implicit offences that emerge from the superficially innocent uses of words.

Downloads

Pubblicato

2024-10-24

Come citare

Veale, T. (2024). From Guard Rails to Epic Fails: Can Generative AI Police Its Own Capacity for Offence?. MediAzioni, 43, A177-A194. https://doi.org/10.6092/issn.1974-4382/20544