[Tutorial]Ok-Means in Plain Python: Flip a Messy Spend Listing into 3 Clear Buckets | by Alen George

A delicate walk-through with ~30 strains of NumPy, a scikit-learn model, and a tiny dataset you generate in code.

Your financial institution feed is a blur. Let’s flip it into three buckets you’ll be able to scan at a look: small, medium, and enormous spends. We’ll hold issues easy, write a tiny Ok-Means from scratch, then use scikit-learn. No downloads. The information is created contained in the code so anybody can run it.

A minimal Ok-Means in about 30 strains
The scikit-learn model you’ll use day after day
A fast “measurement bucket” labeler for artificial transactions

pip set up numpy pandas scikit-learn

We’ll create 60 faux transactions with practical ranges and small service provider quirks. It’s deterministic, so your outcomes match mine.

import numpy as np, pandas as pd
def make_transactions(n=60, seed=0):
rng = np.random.default_rng(seed)
begin = np.datetime64("2025-06-01")
dates = begin + rng.integers(0, 30…