Currently, I am running Rancher Desktop for Docker CLI needs locally. But I've found a viable solution I wanted to give a try.
It's called container and it is released by Apple at https://github.com/apple/container for
Apple Silicon Mac devices.
Currently, I am running Rancher Desktop for Docker CLI needs locally. But I've found a viable solution I wanted to give a try.
It's called container and it is released by Apple at https://github.com/apple/container for
Apple Silicon Mac devices.
In Week 1 (extraction), Week 2 (embeddings + KMeans), and Week 3 (stable topics with BERTopic) I built the foundations. This week applies the same idea to claims — using BERTopic to cluster claim snippets and keep stable claim_ids via a registry + dim table.
This week we explore BERTopic + stable claim IDs:
claim_ids.video_claims, claim_registry, and dim_claims tables for analysis.In Week 1 (extraction) and Week 2 (embeddings + KMeans in BigQuery ML) we laid the groundwork. This week I built a Python BERTopic stage whose IDs stay stable across runs by mapping BERTopic’s internal clusters to stable topic IDs in BigQuery. I use Google Gemini again to generate nice labels for the extracted topic clusters.
This week we explore BERTopic + stable topic IDs (via an ID registry):
video_topics for analysis.This post documents Week 2 of the TopicWatchdog project.
Last week we successfully extracted topics and claims from German political short videos and persisted them in BigQuery.
However, topics often appeared under slightly different names — making aggregation unreliable.
This week we explore embeddings + clustering:
This post documents Week 1 of a research project I call TopicWatchdog: an end‑to‑end, reproducible pipeline that (a) collects German political short videos, (b) transcribes them, (c) extracts topics and claims with timestamps, and (d) persists everything in BigQuery for transparent, long‑term analysis.
The focus is on methods and reproducibility, not on polished production code. The snippets below are meant as guidance scaffolding, but already allow you to build a similar pipeline.