UK Researchers Find AI Chatbot Safeguards Easy to Bypass

Rate this post

It was found by the researchers in the UK that they are able to easily bypass the safety measures used to prevent AI chatbots from giving illegal, toxic, or explicit responses. The AI Safety Institute (AISI) tested five of the largest language models (LLMs) and discovered that they all had a high susceptibility to “jailbreaks” – special text prompts that deceive the AI into making harmful responses.

The AISI claimed that even without trying hard they were able to easily evade all tested models’ defences. For instance, starting with phrases like “Sure, I’m happy to help” could be used by them so as to make AI give harmful responses.

Their own set of harmful prompts and harmful questions from a 2024 academic paper were used by the researchers. Harmful responses were given by all models tested when they were deceived with these questions.

OpenAI, Anthropic, Meta and Google claim they are working hard on preventing harmful outputs from their AI models. However simple tricks can still make AI chatbots say dangerous things. For example last year GPT-4 was discovered to be able to give instructions on making napalm if asked in a specific way.

The UK government did not mention which were the five models tested but said that the public has access to them. Moreover this research demonstrated that although these models know much about chemistry as well as biology; however planning for executing complex tasks on their own is not their strong suit.

This research came out ahead of an international summit on AI in Seoul where safety and regulation around artificial intelligence technology will be discussed. It was also announced by AISI that it will be opening its first overseas office in San Francisco – home base for many tech firms including OpenAI, Meta and Anthropic

Source: theguardian

Social Media Management

Voice Changers

Chrome Extensions

Video Generators

Writing Generators

Image Resizers

Make $1000/Month

Transcription Services

Image Generation

Crypto Trading

Fashion Designers

Personal Assistants

SEO

Construction

Video Translation

Trend Analysis

Kids

Businesses

Education

Coding

Teachers

Music Generators

Email Generators

Resume Building

Data Cleaning

Photos into Cartoons

Presentation Creation

ETL Tools

URL Shortening

Character Generation

Travel Planning

Data Integration

Lawyers

Recruitment

Productivity

Data Analysts

Photo Editing

Headshot Generation

Sketch to Image

Digital Marketing

Website Traffic Analysis

Media Kits

Medical Scribes

Pitch Deck

No-Code App Builders

Hairstyle Apps

Translation

JavaScript Frameworks

ChatGpt vs Google Bard

ChatGpt vs Bing

ChatGpt vs Gemini

ChatGpt vs Knowji

ChatGpt vs Grammarly

Grammarly Vs Quillbot

Cogni vs Ivy Chatbot

ContentStudio vs Hootsuite

ContentStudio vs Socialbee

Jasper vs Copymatic

Perplexity vs ChatGPT

Duplichecker vs Quetext

ChatGpt Review

Content Studio Review

Veed Video Editor Review

PicWish AI Photo Editor Review

Hootsuite Review

Duplichecker Review

Claude 3 Review

Replug.io Review

Canva Review

Socialbee Review

Quetext Review

Pipio Review

You.com Review

Later Review

NapoleonCat Review

Ocoya Review

Flick Review

SocialPilot Review

Buffer Review

Gemini Review