dracoblue.net

Docker CLI with Apple Container

Currently, I am running Rancher Desktop for Docker CLI needs locally. But I've found a viable solution I wanted to give a try.

It's called container and it is released by Apple at https://github.com/apple/container for Apple Silicon Mac devices.

Continue reading ...

In apple, container, docker, rancher-desktop, silicon, socktainer by DracoBlue @ 2026-03-24 | 337 Words

Week 4: Stable Claims with BERTopic

In Week 1 (extraction), Week 2 (embeddings + KMeans), and Week 3 (stable topics with BERTopic) I built the foundations. This week applies the same idea to claims — using BERTopic to cluster claim snippets and keep stable claim_ids via a registry + dim table.

This week we explore BERTopic + stable claim IDs:

  • Use pre-computed embeddings from BigQuery (same pipeline as before).
  • Fit/Load a BERTopic model (UMAP + HDBSCAN) in Python.
  • Assign internal cluster IDs per batch, then map them to stable claim_ids.
  • Persist to video_claims, claim_registry, and dim_claims tables for analysis.
  • Inspect behavior in Looker Studio and reflect on limitations.
Continue reading ...

In bertopic, bigquery, clustering, embeddings, gcp, gemini, llm, looker-studio, machine-learning, python, research, topicwatchdog by DracoBlue @ 2025-10-15 | 2864 Words

Week 3: Stable Topics with BERTopic

In Week 1 (extraction) and Week 2 (embeddings + KMeans in BigQuery ML) we laid the groundwork. This week I built a Python BERTopic stage whose IDs stay stable across runs by mapping BERTopic’s internal clusters to stable topic IDs in BigQuery. I use Google Gemini again to generate nice labels for the extracted topic clusters.

This week we explore BERTopic + stable topic IDs (via an ID registry):

  • Train a BERTopic model in Python (UMAP + HDBSCAN).
  • Map BERTopic’s internal clusters (model_version, internal_topic_id)
  • Ensure topic IDs remain consistent across retraining (no more ID jumps).
  • Join human-readable labels and persist results into video_topics for analysis.
  • Inspect results in Looker Studio and reflect on limitations.
Continue reading ...

In bertopic, bigquery, clustering, embeddings, gcp, gemini, llm, looker-studio, machine-learning, python, research, topicwatchdog by DracoBlue @ 2025-09-10 | 3187 Words

Week 2: Embeddings & KMeans Clustering of Topics/Claims

This post documents Week 2 of the TopicWatchdog project.
Last week we successfully extracted topics and claims from German political short videos and persisted them in BigQuery.
However, topics often appeared under slightly different names — making aggregation unreliable.

This week we explore embeddings + clustering:

  • Generate embeddings of canonical topics and claims with BigQuery ML.
  • Train a KMeans model on those embeddings to group semantically similar entries.
  • Assign clusters back to each topic/claim.
  • Inspect first results in Looker Studio and reflect on limitations.
Continue reading ...

In bigquery, clustering, embeddings, gcp, gemini, kmeans, llm, looker-studio, machine-learning, research, topicwatchdog by DracoBlue @ 2025-09-03 | 1599 Words

Kickoff (Week 1): Extracting Topics & Claims from German Politics Videos

This post documents Week 1 of a research project I call TopicWatchdog: an end‑to‑end, reproducible pipeline that (a) collects German political short videos, (b) transcribes them, (c) extracts topics and claims with timestamps, and (d) persists everything in BigQuery for transparent, long‑term analysis.

The focus is on methods and reproducibility, not on polished production code. The snippets below are meant as guidance scaffolding, but already allow you to build a similar pipeline.

Continue reading ...

In bigquery, gcp, gemini, llm, looker-studio, machine-learning, research, topicwatchdog, youtube by DracoBlue @ 2025-08-27 | 2842 Words

Page 1 - Page 2

Give something back

Were my blog posts useful to you? If you want to give back, support one of these charities, too!

Report hate in social media Campact e.V. With our technology and your help, we protect the oceans from plastic waste. Gesellschaft fur Freiheitsrechte e. V. The civil eye in the mediterranean

Recent Dev-Articles

Read recently

Recent Files

About