ASCII art elicits harmful responses from 5 major AI chatbots

Researchers have uncovered a novel hacking method targeting AI assistants by employing ASCII art, a technique that dates back to the 1970s. This method, dubbed ArtPrompt, involves using ASCII representations to mask a single word in a user prompt, tricking large language models (LLMs) like GPT-4 into providing responses they are typically programmed to reject, such as instructions for illegal activities.

The study revealed that when ASCII art is used to represent a word related to prohibited content, the AI fails to recognize the word and proceeds to generate a response that would normally be blocked. For instance, ASCII art depicting the word “counterfeit” led an AI to provide detailed steps on creating and distributing counterfeit money.

This vulnerability stems from LLMs prioritizing the recognition of ASCII art over adhering to safety protocols. The findings highlight a broader issue with AI’s understanding of context, as they are trained to interpret text semantically but can be misled by non-standard representations of words.

ArtPrompt represents a type of ‘jailbreak’ attack, which induces AI to perform actions against their alignment, such as engaging in illegal or unethical behavior. This discovery adds to the growing list of prompt injection attacks that exploit AI vulnerabilities, underscoring the need for more robust AI safety measures.
Read more at Ars Technica…

ASCII art elicits harmful responses from 5 major AI chatbots

Related

The Day 7,000 Robot Vacuums Almost Became a Remote-Controlled Army

When Trust Is Breached: What PayPal’s Account Compromise Reveals About Financial Security

How to Erase an AI’s Conscience in 45 Minutes

Qwen3.5-397B-A17B: A Serious Look at Alibaba’s New Open-Weight Giant

gog: One Binary to Rule Your Google Workspace from the Terminal

PicoClaw: A Leaner AI Assistant That Actually Fits on Cheap Hardware

When AI Benchmarks Turn Into Memory Tests

Why Andromeda Is Racing Toward Us While the Rest of the Universe Pulls Away

When the World Becomes a Prompt: How Text in the Environment Can Hijack Embodied AI