# An Effort to Fight Illegal Online Gambling (Judol) Promotions in Indonesia using AI

Amid all the other nonsensical things happening in Indonesia recently, our YouTube comment section is flooded with illegal online gambling promotions, known as Judol. From creators with hundreds of views to those with millions of subscribers, if you watch their videos daily, chances are you'll encounter these Judol comments. What's unique about them is that the posters are very aware of YouTube's moderation system, which would easily block their words. So, they cleverly use fancy Unicode characters like Mathematical Alphanumeric Symbols, Enclosed Alphanumerics, and Cyrillic blocks. Thanks to ChatGPT for helping me discover these characteristics.

By doing this, traditional keyword matching systems struggle to predict how similar these words are when using fancy Unicode compared to plain text keywords. That's exactly what's happening with YouTube's moderation settings. In the creator dashboard, you can only block exact keywords or block the commenters. Unfortunately, after reading some creator feedback, it's clear that manually blocking these words with fancy Unicode one-by-one is very hard and time-consuming. In this article, I bring the big gun, Generative AI or LLM, to help us extract these Judol keywords and potentially help creators save time blocking these keywords, reducing the space for Judol promotions across YouTube.

# Dear YouTube Creators

You can visit the website directly at [https://judol-watchdog.mbts.dev](https://judol-watchdog.mbts.dev). There, you can see the Judol keywords I gathered by sampling comments from the [@WilliamJakfar](https://www.youtube.com/@williamjakfar) channel.

As instructed on the website, you can copy all these keywords and paste them into the **Blocked Keywords** section under Settings &gt; Community Moderation.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1742525826708/df0438d9-8115-428b-b45d-4fd0c4eab209.png align="center")

Additionally, you can also block users who promote those Judol websites in the same settings page, under the **Hidden Users** section.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1742526027135/20ec6a2f-a15e-4e92-b90a-24be1019e423.png align="center")

By doing this, we can limit the keywords previously used to promote Judol. Although, according to my calculations, there are over 1 billion possible combinations to construct a Judol website using the Fancy Unicode I mentioned earlier, I still hope that this small action, combined with many creators joining forces to block all these known keywords from their channels, can reduce Judol promotions across YouTube.

# Dear Engineers

I hope you're curious about how I built this website from the ground up. If so, let’s break it down. Or jump to code directly on my GitHub below

%[https://github.com/IqbalLx/judol-watchdog/] 

Every project I create needs a purpose. Besides being annoyed by these Judol comments, I wanted to learn a new tech stack. In this project, I wanted hands-on experience with [Bun](https://bun.sh/). I aimed to use Bun as natively as possible, testing their claim that Bun is battery-included, with no need for third-party HTTP servers, HTTP clients, or SQL connectors. And that's exactly what I did—I used all of Bun's native functionality. For the client side, I didn't want to use React, so I chose HTMX. I then dockerized everything and deployed it to [Fly.io](https://fly.io).

Now for the LLM parts, as a broke college student, I used Meta LLama 3 70B hosted on [Groq](https://groq.com/) for affordable pricing. To make it even cheaper, I used their [batch-processing](https://console.groq.com/docs/batch), which offers a 25% discount. Below is the general flow I use.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1742527412838/d93b34eb-3b3a-40b6-b455-0db0f247aab3.png align="center")

During development, I found that smaller models often produce inaccurate answers when extracting Judol keywords, so I chose to use Meta LLama 3 70B, which is said to be comparable to GPT-4 for some metrics.

I also noticed that when processing more than 50 comments per request, LLama tends to repeat words and hallucinate. Therefore, I limit each request to 50 comments. Below, you can find the System Prompt I use, with the temperature set to 1 and max tokens set to 1024.

```plaintext
You are an assistant to help reduce illegal online-gamble promotion in youtube comments.
You will be provided an array of youtube comments inside <comment> tag.
You need to extract exact word from given comments that are highly possible to be the online-gambling name.
Do not hallucinate, only response with text within provided comments.

Examples:

<comment>
Buat yang belum coba, kalian harus coba sekarang juga di 𝘼𝘌𝐑𝘖𝟴𝟪!
Gacir banget tiap main di АHMA𝘿𝑇O𝙏𝐎,nggak pernah bikin kecewa!
Nggak salah pilih main di 𝐴𝐆U𝐒𝑇O𝘛О,rezekinya ngalir terus. Top banget!
Gak ada yang tau kapan rezeki datang, tapi di A𝐆𝑈𝑆T𝐎𝘛О,semuanya bisa terjadi!
Hasil gacir bikin aku makin puas main di 𝐀𝙀𝙍𝙊𝟴𝟾,makasih banyak!
АGU𝑆𝑇𝑂𝑇Omenawarkan berbagai fitur yang menarik bagi sebagian pemain.
Main bentar langsung gacir. Rezeki nggak bisa diprediksi di D𝑂𝙍A7𝟩!
𝐦𝐚𝐢𝐧 𝐝𝐢 sini 𝐠𝐚𝐜𝐨𝐫 𝐡𝐚𝐛𝐢𝐬 𝐛𝐚𝐫𝐮 𝐬𝐚𝐣𝐚 𝐦𝐚𝐢𝐧 𝐬𝐮𝐝𝐚𝐡 𝐝𝐢 𝐤𝐚𝐬𝐢 𝐦𝐚𝐱𝐰𝐢𝐧 𝐢 𝐥𝐨𝐯𝐞 𝐲𝐨𝐮 sawer4d 𝐞𝐦𝐦𝐦𝐦𝐮𝐚𝐚𝐚𝐡𝐡.
<comment/>

𝘼𝘌𝐑𝘖𝟴𝟪,АHMA𝘿𝑇O𝙏𝐎,𝐴𝐆U𝐒𝑇O𝘛О,A𝐆𝑈𝑆T𝐎𝘛О,𝐀𝙀𝙍𝙊𝟴𝟾,АGU𝑆𝑇𝑂𝑇O,D𝑂𝙍A7𝟩,sawer4d
```

# Closing Statement

I hope this article provides valuable insights for both YouTube creators and software engineers. Please give it a try and share your feedback. Especially for YouTube creators, if you notice more comments on your channel that aren't listed on the website, you can email me at iqbal@mbts.dev to have your channel automatically scanned by this app as well. Thank you!
