One question about the reproducibility #2428

powerhorse1986 · 2025-09-27T13:55:01Z

powerhorse1986
Sep 27, 2025

Hi Maarten,

We are doing a new project using BERTopic, which is really an awesome tool!

But I noticed that the reproducibility of BERTopic might be a problem.

For this new project, we performed topic modeling multiple times using BERTopic on more than 3000 abstracts. For the first ten times, BERTopic generated 4 topics, including one outliers. But for the 11th time, 24 topics were generated. All the parameters of UMAP and HDBSCAN were the same. Then I adjusted the parameter "min_cluster_size" of HDBSCAN and got 4 topics again.

I totally have no idea why this happened. Would you mind giving some hints? Thank you :)

Best,
Li

amitca71 · 2025-09-27T15:31:06Z

amitca71
Sep 27, 2025

hdbscan is not deterministic, which is very furstrating.... if determinsm is what you need, try to replace with kmeans.

0 replies

carobs9 · 2025-10-15T13:28:53Z

carobs9
Oct 15, 2025

Hello @powerhorse1986 ,

What has been working best for me so far is setting always the same seed during the UMAP step to ensure the embeddings are always calculated the same way.

After that, you can even save the embeddings as a .npy file, so you make sure you always use the same input for the HDBSCAN. Keeping the same parameters and using the exact same input across iterations should ensure the exact same HDBSCAN results.

0 replies

MaartenGr · 2025-10-23T08:11:06Z

MaartenGr
Oct 23, 2025
Maintainer

As @carobs9 already mentioned, HDBSCAN is reproducible as long as you also fix the UMAP random_state. You can find more information about that here: https://maartengr.github.io/BERTopic/faq.html#why-are-the-results-not-consistent-between-runs

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

One question about the reproducibility #2428

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

One question about the reproducibility #2428

Uh oh!

powerhorse1986 Sep 27, 2025

Replies: 3 comments

Uh oh!

amitca71 Sep 27, 2025

Uh oh!

carobs9 Oct 15, 2025

Uh oh!

MaartenGr Oct 23, 2025 Maintainer

powerhorse1986
Sep 27, 2025

amitca71
Sep 27, 2025

carobs9
Oct 15, 2025

MaartenGr
Oct 23, 2025
Maintainer