You are currently viewing AI tool to detect hate speech in Southeast Asian languages

AI tool to detect hate speech in Southeast Asian languages

Rate this post

Social media and the internet have grown rapidly over the years, allowing people to share all kinds of content, including harmful speech. One serious issue is hate speech—language that attacks or threatens people based on their ethnicity, religion, sexual orientation, and other characteristics.

Hate speech detection models are computer systems designed to find and flag hate speech online. “These models are essential for managing online content and stopping harmful speech from spreading, especially on social media,” said Assistant Professor Roy Lee from the Singapore University of Technology and Design (SUTD). However, traditional evaluation methods often fail because of biases in the datasets used for testing these models.

To solve this problem, tools like HateCheck and Multilingual HateCheck (MHC) were developed to create realistic tests that better capture the variety and complexity of hate speech. Building on these frameworks, Asst Prof Lee and his team created SGHateCheck—an AI-powered tool that detects hate speech in the languages and cultural contexts specific to Singapore and Southeast Asia.

Most current hate speech detection tools are based on Western contexts, which don’t always apply to Southeast Asia’s unique social dynamics. SGHateCheck addresses this by providing tests specifically designed for the region’s needs, making hate speech detection more accurate and culturally sensitive.

SGHateCheck uses large language models (LLMs) to translate and adapt test cases into Singapore’s main languages: English, Mandarin, Tamil, and Malay. These test cases are then refined by native speakers to ensure cultural relevance. The result is over 11,000 detailed test cases that help evaluate hate speech detection models more precisely.

While MHC covers many languages, it lacks the regional focus that SGHateCheck offers. This regional focus allows SGHateCheck to more effectively identify hate speech that broader models might miss.

The team also found that LLMs trained on monolingual datasets tend to classify content as non-hateful more often than they should. On the other hand, LLMs trained on multilingual datasets perform better because they understand a wider range of language expressions and cultural contexts. This highlights the importance of using diverse and multilingual data for AI tools in regions with many languages.

SGHateCheck was created to tackle a real issue in Southeast Asia and aims to make online spaces more respectful and inclusive. It will be useful on social media, online forums, community platforms, and news websites. The team also plans to expand SGHateCheck to include other Southeast Asian languages like Thai and Vietnamese.

SGHateCheck shows how combining advanced technology with thoughtful design can solve real-world problems. By focusing on creating a tool that is both technologically advanced and culturally aware, this project emphasizes the importance of a human-centered approach in developing new technologies.

Source: asiaresearchnews