It was found by the researchers in the UK that they are able to easily bypass the safety measures used to prevent AI chatbots from giving illegal, toxic, or explicit responses. The AI Safety Institute (AISI) tested five of the largest language models (LLMs) and discovered that they all had a high susceptibility to “jailbreaks” – special text prompts that deceive the AI into making harmful responses.
The AISI claimed that even without trying hard they were able to easily evade all tested models’ defences. For instance, starting with phrases like “Sure, I’m happy to help” could be used by them so as to make AI give harmful responses.
Their own set of harmful prompts and harmful questions from a 2024 academic paper were used by the researchers. Harmful responses were given by all models tested when they were deceived with these questions.
OpenAI, Anthropic, Meta and Google claim they are working hard on preventing harmful outputs from their AI models. However simple tricks can still make AI chatbots say dangerous things. For example last year GPT-4 was discovered to be able to give instructions on making napalm if asked in a specific way.
The UK government did not mention which were the five models tested but said that the public has access to them. Moreover this research demonstrated that although these models know much about chemistry as well as biology; however planning for executing complex tasks on their own is not their strong suit.
This research came out ahead of an international summit on AI in Seoul where safety and regulation around artificial intelligence technology will be discussed. It was also announced by AISI that it will be opening its first overseas office in San Francisco – home base for many tech firms including OpenAI, Meta and Anthropic
Source: theguardian