A delicate walk-through with ~30 strains of NumPy, a scikit-learn model, and a tiny dataset you generate in code.
Your financial institution feed is a blur. Let’s flip it into three buckets you’ll be able to scan at a look: small, medium, and enormous spends. We’ll hold issues easy, write a tiny Ok-Means from scratch, then use scikit-learn. No downloads. The information is created contained in the code so anybody can run it.
- A minimal Ok-Means in about 30 strains
- The scikit-learn model you’ll use day after day
- A fast “measurement bucket” labeler for artificial transactions
pip set up numpy pandas scikit-learn
We’ll create 60 faux transactions with practical ranges and small service provider quirks. It’s deterministic, so your outcomes match mine.
import numpy as np, pandas as pd
def make_transactions(n=60, seed=0):
rng = np.random.default_rng(seed)
begin = np.datetime64("2025-06-01")
dates = begin + rng.integers(0, 30…