OpenAI has launched CriticGPT, a new AI model based on GPT-4, to help find errors in code written by ChatGPT. During tests, CriticGPT improved code review results by 60% compared to those who did not use it.
CriticGPT will be part of OpenAI’s Reinforcement Learning from Human Feedback (RLHF) process. This process gives AI trainers better tools to judge complex AI outputs.
The GPT-4 models that power ChatGPT are made to be helpful and interactive through RLHF. In this process, AI trainers compare different responses and rate their quality. As ChatGPT gets better at reasoning, its mistakes become less obvious, making it harder for trainers to spot errors. This shows a key limitation of RLHF: advanced models can become so smart that human trainers find it difficult to give useful feedback.
CriticGPT has been trained to write critiques that point out mistakes in ChatGPT’s answers. While its suggestions aren’t always perfect, they help trainers find more issues than when working alone. In tests, teams using CriticGPT made more complete critiques and found fewer false positives than those working alone. A second trainer preferred the critiques from the Human+CriticGPT team over those from an unassisted reviewer more than 60% of the time.
Learn more: AI Tools for Coding
CriticGPT was trained similarly to ChatGPT but focused on spotting mistakes. AI trainers added errors to ChatGPT’s code and gave example feedback. These trainers then compared multiple critiques of the changed code to judge CriticGPT’s performance. CriticGPT’s critiques were preferred in 63% of cases with real bugs, partly because it made fewer useless complaints and fewer imaginary problems.
Despite its success, CriticGPT has limitations. It was trained on short ChatGPT answers and needs more work to handle longer, more complex tasks. Also, while models still make up things and trainers sometimes make mistakes, the focus on single-point errors needs to expand to handle errors spread across multiple parts of an answer.
Source: businesstoday