Some days ago I was participating in a bluesky thread and a Bluesky post by
Yves Venedey reminded me
of a project idea I wanted to give a try: Fact Checking on bsky! You can actually extend
the atproto ecosystem with a labeler, which makes these labels
available to everyone who wants to see them.
Fast forward: Today I publish atproto-fact-labeler and have a live instance at @facts.kiesel.app as a BETA for you to give a try!
It's a matter of pressing "subscribe":

Similar to TopicWatchdog (which
tried to extract claims from public youtube videos) I do this in my spare time and for my
personal learning journey using my machine learning and federated social networks exploration.
In atproto, bluesky, claimreview, embeddings, fact-checking, labeler, llm, open-source, research by DracoBlue @ 2026-06-26 | 475 Words
Currently, I am running Rancher Desktop for Docker CLI needs locally. But I've found a viable
solution I wanted to give a try.
It's called container and it is released by Apple at https://github.com/apple/container for
Apple Silicon Mac devices.
In apple, container, docker, rancher-desktop, silicon, socktainer by DracoBlue @ 2026-03-24 | 337 Words
In Week 1 (extraction), Week 2 (embeddings + KMeans), and Week 3 (stable topics with BERTopic) I built the foundations. This week applies the same idea to claims — using BERTopic to cluster claim snippets and keep stable claim_ids via a registry + dim table.
This week we explore BERTopic + stable claim IDs:
- Use pre-computed embeddings from BigQuery (same pipeline as before).
- Fit/Load a BERTopic model (UMAP + HDBSCAN) in Python.
- Assign internal cluster IDs per batch, then map them to stable
claim_ids.
- Persist to
video_claims, claim_registry, and dim_claims tables for analysis.
- Inspect behavior in Looker Studio and reflect on limitations.
In bertopic, bigquery, clustering, embeddings, gcp, gemini, llm, looker-studio, machine-learning, python, research, topicwatchdog by DracoBlue @ 2025-10-15 | 2864 Words
In Week 1 (extraction) and Week 2 (embeddings + KMeans in BigQuery ML) we laid the groundwork. This week I built a Python BERTopic stage whose IDs stay stable across runs by mapping BERTopic’s internal clusters to stable topic IDs in BigQuery. I use Google Gemini again to generate nice labels for the extracted topic clusters.
This week we explore BERTopic + stable topic IDs (via an ID registry):
- Train a BERTopic model in Python (UMAP + HDBSCAN).
- Map BERTopic’s internal clusters (model_version, internal_topic_id)
- Ensure topic IDs remain consistent across retraining (no more ID jumps).
- Join human-readable labels and persist results into
video_topics for analysis.
- Inspect results in Looker Studio and reflect on limitations.
In bertopic, bigquery, clustering, embeddings, gcp, gemini, llm, looker-studio, machine-learning, python, research, topicwatchdog by DracoBlue @ 2025-09-10 | 3187 Words
This post documents Week 2 of the TopicWatchdog project.
Last week we successfully extracted topics and claims from German political short videos and persisted them in BigQuery.
However, topics often appeared under slightly different names — making aggregation unreliable.
This week we explore embeddings + clustering:
- Generate embeddings of canonical topics and claims with BigQuery ML.
- Train a KMeans model on those embeddings to group semantically similar entries.
- Assign clusters back to each topic/claim.
- Inspect first results in Looker Studio and reflect on limitations.
In bigquery, clustering, embeddings, gcp, gemini, kmeans, llm, looker-studio, machine-learning, research, topicwatchdog by DracoBlue @ 2025-09-03 | 1599 Words