{"id":12592,"date":"2026-03-10T18:27:59","date_gmt":"2026-03-10T18:27:59","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=12592"},"modified":"2026-03-10T18:28:00","modified_gmt":"2026-03-10T18:28:00","slug":"hybrid-neuro-symbolic-fraud-detection-guiding-neural-networks-with-area-guidelines","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=12592","title":{"rendered":"Hybrid Neuro-Symbolic Fraud Detection: Guiding Neural Networks with Area Guidelines"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n<p class=\"wp-block-paragraph\"> datasets are extraordinarily imbalanced, with optimistic charges beneath 0.2%. Commonplace neural networks educated with weighted binary cross-entropy usually obtain excessive ROC-AUC however battle to establish suspicious transactions below threshold-sensitive metrics. I suggest a Hybrid Neuro-Symbolic (HNS) method that comes with area information instantly into the coaching goal as a differentiable rule loss \u2014 encouraging the mannequin to assign excessive fraud likelihood to transactions with unusually giant quantities and atypical PCA signatures. On the Kaggle Credit score Card Fraud dataset, the hybrid achieves ROC-AUC of 0.970 \u00b1 0.005 throughout 5 random seeds, in comparison with 0.967 \u00b1 0.003 for the pure neural baseline below symmetric analysis. A key sensible discovering: on imbalanced knowledge, threshold choice technique impacts F1 as a lot as mannequin structure \u2014 each fashions have to be evaluated with the identical method for any comparability to be significant. Code and reproducibility supplies can be found at <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/Emmimal\/neuro-symbolic-fraud-pytorch\/\">GitHub<\/a>.<\/p>\n<h2 class=\"wp-block-heading\">The Drawback: When ROC-AUC Lies<\/h2>\n<p class=\"wp-block-paragraph\">I had a fraud dataset at 0.17% optimistic charge. Educated a weighted BCE community, obtained ROC-AUC of 0.96, somebody mentioned \u201cgood\u201d. Then I pulled up the rating distributions and threshold-dependent metrics. The mannequin had quietly discovered that predicting \u201cnot fraud\u201d on something ambiguous was the trail of least resistance \u2014 and nothing within the loss perform disagreed with that call.<\/p>\n<p class=\"wp-block-paragraph\">What bothered me wasn\u2019t the mathematics. It was that the mannequin had no concept what fraud <em>seems like<\/em>. A junior analyst on day one might inform you: giant transactions are suspicious, transactions with uncommon PCA signatures are suspicious, and when each occur collectively, it&#8217;s best to positively be paying consideration. That information simply\u2026 by no means makes it into the coaching loop.So I ran an experiment. What if I encoded that analyst instinct as a smooth constraint instantly within the loss perform \u2014 one thing the community has to fulfill whereas additionally becoming the labels? The outcome was a <strong>Hybrid Neuro-Symbolic (HNS)<\/strong> setup. This text walks by means of the total experiment: the mannequin, the rule loss, the lambda sweep, and \u2014 critically \u2014 what a correct multi-seed variance evaluation with symmetric threshold analysis really exhibits.<\/p>\n<h2 class=\"wp-block-heading\">The Setup<\/h2>\n<p class=\"wp-block-paragraph\">I used the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.kaggle.com\/datasets\/mlg-ulb\/creditcardfraud\">Kaggle Credit score Card Fraud dataset<\/a> \u2014 284,807 transactions, 492 of that are fraud (0.172%). The V1\u2013V28 options are PCA elements from an anonymized authentic characteristic area. Quantity and Time are uncooked. The extreme imbalance is the entire level; that is the place normal approaches begin to battle [1].<\/p>\n<p class=\"wp-block-paragraph\">Break up was 70\/15\/15 prepare\/val\/check, stratified. I educated 4 issues and in contrast them head-to-head:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><strong>Isolation Forest<\/strong> \u2014 contamination=0.001, suits on the total coaching set<\/li>\n<li class=\"wp-block-list-item\"><strong>One-Class SVM<\/strong> \u2014 nu=0.001, suits solely on the non-fraud coaching samples<\/li>\n<li class=\"wp-block-list-item\"><strong>Pure Neural<\/strong> \u2014 three-layer MLP with BCE + class weighting, no area information<\/li>\n<li class=\"wp-block-list-item\"><strong>Hybrid Neuro-Symbolic<\/strong> \u2014 the identical MLP, with a differentiable rule penalty added to the loss<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">Isolation Forest and One-Class SVM function a gut-check. If a supervised community with 199k coaching samples can&#8217;t clear the bar set by an unsupervised technique, that&#8217;s price understanding earlier than you write up outcomes. A tuned gradient boosting mannequin would doubtless outperform each neural approaches; this comparability is meant to isolate the impact of the rule loss, not benchmark in opposition to all potential strategies. Full code for all 4 is on <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/Emmimal\/neuro-symbolic-fraud-pytorch\/\">GitHub<\/a>.<\/p>\n<h2 class=\"wp-block-heading\">The Mannequin<\/h2>\n<p class=\"wp-block-paragraph\">Nothing unique. A 3-layer MLP with batch normalization after every hidden layer. The batch norm issues greater than you would possibly anticipate \u2014 below heavy class imbalance, activations can drift badly with out it [3].<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">class MLP(nn.Module):\n    def __init__(self, input_dim):\n        tremendous().__init__()\n        self.internet = nn.Sequential(\n            nn.Linear(input_dim, 128),\n            nn.ReLU(),\n            nn.BatchNorm1d(128),\n            nn.Linear(128, 64),\n            nn.ReLU(),\n            nn.BatchNorm1d(64),\n            nn.Linear(64, 1)\n        )\n\n    def ahead(self, x):\n        return self.internet(x)\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">For the loss, BCEWithLogitsLoss with pos_weight \u2014 computed because the ratio of non-fraud to fraud counts within the coaching set. On this dataset that&#8217;s 577 [4]. A single fraud pattern in a batch generates 577 instances the gradient of a non-fraud one.<\/p>\n<p class=\"wp-block-paragraph\"><em>pos_weight = depend(y=0) \/ depend(y=1) \u2248 577<\/em><\/p>\n<p class=\"wp-block-paragraph\">That weight supplies a directional sign when labeled fraud does seem. However the mannequin nonetheless has no idea of what \u201csuspicious\u201d seems like in characteristic area \u2014 it solely is aware of that fraud examples, after they do present up, ought to be closely weighted. That&#8217;s totally different from understanding the place to look on batches that occur to include no labeled fraud in any respect.<\/p>\n<h2 class=\"wp-block-heading\">The Rule Loss<\/h2>\n<p class=\"wp-block-paragraph\">Right here is the core concept. Fraud analysts know two issues empirically: unusually excessive transaction quantities are suspicious, and transactions that sit removed from regular habits in PCA area are suspicious. I would like the mannequin to assign excessive fraud chances to transactions that match each indicators \u2014 even when a batch accommodates no labeled fraud examples.<\/p>\n<p class=\"wp-block-paragraph\">The trick is making the rule <em>differentiable<\/em>. An if\/else threshold \u2014 flag any transaction the place quantity &gt; 1000 \u2014 is a tough step perform. Its gradient is zero in all places besides on the threshold itself, the place it&#8217;s undefined. Which means backpropagation has nothing to work with; the rule produces no helpful gradient sign and the optimizer ignores it. As an alternative, I take advantage of a steep sigmoid centered on the batch imply. It approximates the identical threshold habits however stays easy and differentiable in all places \u2014 the gradient is small removed from the boundary and peaks close to it, which is strictly the place you need the optimizer paying consideration. The result&#8217;s a easy suspicion rating between 0 and 1:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def rule_loss(x, probs):\n    # x[:, -1]   = Quantity  (final column in creditcard.csv after dropping Class)\n    # x[:, 1:29] = V1\u2013V28  (PCA elements, columns 1\u201328)\n    quantity   = x[:, -1]\n    pca_norm = torch.norm(x[:, 1:29], dim=1)\n\n    suspicious = (\n        torch.sigmoid(5 * (quantity   - quantity.imply())) +\n        torch.sigmoid(5 * (pca_norm - pca_norm.imply()))\n    ) \/ 2.0\n\n    penalty = suspicious * torch.relu(0.6 - probs.squeeze())\n    return penalty.imply()\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">A word on why PCA norm particularly: the V1\u2013V28 options are the results of a PCA remodel utilized to the unique anonymized transaction knowledge. A transaction that sits removed from the origin on this compressed area has uncommon variance throughout a number of authentic options concurrently \u2014 it&#8217;s an outlier within the latent illustration. The Euclidean norm of the PCA vector captures that distance in a single scalar. This isn&#8217;t a Kaggle-specific trick. On any dataset the place PCA elements symbolize regular behavioral variance, the norm of these elements is an inexpensive proxy for atypicality. In case your options are usually not PCA-transformed, you&#8217;d exchange this with a domain-appropriate sign \u2014 Mahalanobis distance, isolation rating, or a feature-specific z-score.<\/p>\n<p class=\"wp-block-paragraph\">The relu(0.6 \u2013 probs) time period is the constraint: it fires solely when the mannequin\u2019s predicted fraud likelihood is beneath 0.6 for a suspicious transaction. If the mannequin is already assured (prob &gt; 0.6), the penalty is zero. That is intentional \u2014 I&#8217;m not penalizing the mannequin for being too aggressive on suspicious transactions, just for being too conservative. The asymmetry means the rule can by no means combat in opposition to an accurate high-confidence prediction.<\/p>\n<p class=\"wp-block-paragraph\">Formally, the mixed goal is:<\/p>\n<p class=\"wp-block-paragraph\"><em>L_total = L_BCE + \u03bb \u00b7 L_rule<\/em><\/p>\n<p class=\"wp-block-paragraph\"><em>L_rule = E[ \u03c3_susp(x) \u00b7 ReLU(0.6 \u2212 p) ]<\/em><\/p>\n<p class=\"wp-block-paragraph\"><em>\u03c3_susp(x) = \u00bd \u00b7 [ \u03c3(5\u00b7(amount \u2212 \u0101)) + \u03c3(5\u00b7(\u2016V\u2081\u208b\u2082\u2088\u2016 \u2212 mean\u2016V\u2016)) ]<\/em><\/p>\n<p class=\"wp-block-paragraph\">The \u03bb hyperparameter controls how laborious the rule pushes. At \u03bb=0 you get the pure neural baseline. The total coaching loop:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">for xb, yb in train_loader:\n    xb, yb = xb.to(DEVICE), yb.to(DEVICE)\n\n    logits = mannequin(xb)\n    bce    = criterion(logits.squeeze(), yb)\n    probs  = torch.sigmoid(logits)\n    rl     = rule_loss(xb, probs)\n    loss   = bce + lambda_rule * rl\n\n    optimizer.zero_grad()\n    loss.backward()\n    optimizer.step()\n<\/code><\/pre>\n<h2 class=\"wp-block-heading\">Tuning Lambda<\/h2>\n<p class=\"wp-block-paragraph\">5 values examined: 0.0, 0.1, 0.5, 1.0, 2.0. Every mannequin educated to finest validation PR-AUC with early stopping at endurance=7, seed=42:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-markup\">Lambda 0.0  \u2192  Val PR-AUC: 0.7580\nLambda 0.1  \u2192  Val PR-AUC: 0.7595\nLambda 0.5  \u2192  Val PR-AUC: 0.7620   \u2190 finest\nLambda 1.0  \u2192  Val PR-AUC: 0.7452\nLambda 2.0  \u2192  Val PR-AUC: 0.7504\n\nFinest Lambda: 0.5\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">\u03bb=0.5 wins narrowly on validation PR-AUC. The hole between \u03bb=0.0, 0.1, and 0.5 is small \u2014 throughout the vary of seed variance because the multi-seed evaluation beneath exhibits. The significant drop at \u03bb=1.0 and a pair of.0 means that aggressive rule weighting can override the BCE sign somewhat than complement it. On new knowledge, deal with \u03bb=0 because the default and confirm any enchancment holds throughout seeds earlier than trusting it.<\/p>\n<p class=\"wp-block-paragraph\">One factor to watch out about with threshold choice: I computed the optimum F1 threshold on the <strong>validation set<\/strong> and utilized it to the check set \u2014 for <strong>each<\/strong> fashions symmetrically. On a 0.17% positive-rate dataset, the optimum choice boundary is nowhere close to 0.5. Making use of totally different thresholding methods to totally different fashions means measuring the brink hole, not the mannequin hole. Each should use the identical method:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def find_best_threshold(y_true, probs):\n    precision, recall, thresholds = precision_recall_curve(y_true, probs)\n    f1_scores = 2*(precision*recall) \/ (precision+recall+1e-8)\n    return thresholds[np.argmax(f1_scores)]\n\n# Utilized symmetrically to BOTH fashions \u2014 val set solely\nhybrid_thresh, _ = find_best_threshold(y_val, hybrid_val_probs)\npure_thresh,   _ = find_best_threshold(y_val, pure_val_probs)\n<\/code><\/pre>\n<h2 class=\"wp-block-heading\">Outcomes<\/h2>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<tbody>\n<tr>\n<td>Mannequin<\/td>\n<td>F1<\/td>\n<td>PR-AUC<\/td>\n<td>ROC-AUC<\/td>\n<td>Recall@1percentFPR<\/td>\n<\/tr>\n<tr>\n<td><em>Isolation Forest<\/em><\/td>\n<td><em>0.121<\/em><\/td>\n<td><em>0.172<\/em><\/td>\n<td><em>0.941<\/em><\/td>\n<td><em>0.581<\/em><\/td>\n<\/tr>\n<tr>\n<td><em>One-Class SVM<\/em><\/td>\n<td><em>0.029<\/em><\/td>\n<td><em>0.391<\/em><\/td>\n<td><em>0.930<\/em><\/td>\n<td><em>0.797<\/em><\/td>\n<\/tr>\n<tr>\n<td><em>Pure Neural (\u03bb=0)<\/em><\/td>\n<td><em>0.776<\/em><\/td>\n<td><em>0.806<\/em><\/td>\n<td><em>0.969<\/em><\/td>\n<td><em>0.878<\/em><\/td>\n<\/tr>\n<tr>\n<td><em>Hybrid (\u03bb=0.5)<\/em><\/td>\n<td><em>0.767<\/em><\/td>\n<td><em>0.745<\/em><\/td>\n<td><strong><em>0.970<\/em><\/strong><\/td>\n<td><em>0.878<\/em><\/td>\n<\/tr>\n<\/tbody>\n<\/table><figcaption class=\"wp-element-caption\"><em>Desk 1 \u2014 Check-set outcomes, seed=42, each supervised fashions utilizing val-tuned thresholds. The pure neural baseline is a single retrained run; seed variance is quantified in Desk 2 beneath.<\/em><\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">On this seed, the hybrid and pure baseline are aggressive on F1 (0.767 vs 0.776) and an identical on Recall@1percentFPR. The hybrid\u2019s PR-AUC is decrease on this explicit seed (0.745 vs 0.806). The cleanest sign is ROC-AUC \u2014 0.970 for the hybrid vs 0.969 for the pure baseline. ROC-AUC is threshold-independent, measuring rating high quality throughout all potential cutoffs. That edge is the place the rule loss exhibits up most persistently.<\/p>\n<h3 class=\"wp-block-heading\">Precision-Recall Curve<\/h3>\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2026\/03\/image-136.png\" alt=\"Precision-Recall curve for the Hybrid Neuro-Symbolic model (seed=42) showing PR-AUC of 0.745\" class=\"wp-image-649223\"\/><figcaption class=\"wp-element-caption\">Determine 1 \u2014 Precision-Recall curve for the Hybrid mannequin (seed=42). PR-AUC = 0.745. Picture by Creator.<br \/><\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Robust early precision is what you need in a fraud system. The curve holds fairly earlier than dropping \u2014 that means the mannequin\u2019s top-ranked transactions are genuinely fraud-heavy, not only a fortunate threshold. In manufacturing you&#8217;d tune the brink to your precise value ratio: the price of a missed fraud versus the price of a false alarm. The val-optimized F1 threshold used here&#8217;s a affordable center floor for reporting, not the one legitimate selection.<\/p>\n<h3 class=\"wp-block-heading\">Confusion Matrix<\/h3>\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2026\/03\/image-137.png\" alt=\"Confusion matrix for the Hybrid model (seed=42) at validation-tuned threshold\" class=\"wp-image-649224\"\/><figcaption class=\"wp-element-caption\">Determine 2 \u2014 Confusion Matrix at validation-tuned threshold (seed=42). Picture by Creator.<\/figcaption><\/figure>\n<h3 class=\"wp-block-heading\">Rating Distributions<\/h3>\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2026\/03\/image-138.png\" alt=\"Histogram of predicted probabilities for non-fraud (blue) and fraud (orange) classes using the Hybrid model (seed=42)\" class=\"wp-image-649225\"\/><figcaption class=\"wp-element-caption\">Determine 3 \u2014 Predicted likelihood distributions (seed=42). Non-fraud (blue) clusters close to 0; fraud (orange) is pushed greater by the rule penalty. Picture by Creator.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">This histogram is what I take a look at first after coaching any classifier on imbalanced knowledge. The non-fraud distribution ought to spike close to zero; the fraud distribution ought to unfold towards 1. The overlap area within the center is the place the mannequin is genuinely unsure \u2014 that&#8217;s the place your threshold lives.<\/p>\n<h2 class=\"wp-block-heading\">Variance Evaluation \u2014 5 Random Seeds<\/h2>\n<p class=\"wp-block-paragraph\">A single-seed outcome on a dataset this imbalanced shouldn&#8217;t be sufficient to belief. I ran each fashions throughout seeds [42, 0, 7, 123, 2024], making use of val-optimized thresholds symmetrically to each in each run:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-markup\">Seed   42 | Hybrid F1: 0.767  PR-AUC: 0.745 | Pure F1: 0.776  PR-AUC: 0.806\nSeed    0 | Hybrid F1: 0.733  PR-AUC: 0.636 | Pure F1: 0.788  PR-AUC: 0.743\nSeed    7 | Hybrid F1: 0.809  PR-AUC: 0.817 | Pure F1: 0.767  PR-AUC: 0.755\nSeed  123 | Hybrid F1: 0.797  PR-AUC: 0.756 | Pure F1: 0.757  PR-AUC: 0.731\nSeed 2024 | Hybrid F1: 0.764  PR-AUC: 0.745 | Pure F1: 0.826  PR-AUC: 0.763\n<\/code><\/pre>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<tbody>\n<tr>\n<td>Mannequin<\/td>\n<td>F1 (imply \u00b1 std)<\/td>\n<td>PR-AUC (imply \u00b1 std)<\/td>\n<td>ROC-AUC (imply \u00b1 std)<\/td>\n<\/tr>\n<tr>\n<td>Pure Neural<\/td>\n<td>0.783 \u00b1 0.024<\/td>\n<td>0.760 \u00b1 0.026<\/td>\n<td>0.967 \u00b1 0.003<\/td>\n<\/tr>\n<tr>\n<td>Hybrid (\u03bb=0.5)<\/td>\n<td>0.774 \u00b1 0.027<\/td>\n<td>0.740 \u00b1 0.058<\/td>\n<td><strong>0.970 \u00b1 0.005<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table><figcaption class=\"wp-element-caption\"><em>Desk 2 \u2014 Multi-seed variance throughout 5 seeds. Hybrid and pure baseline are statistically indistinguishable on F1 and PR-AUC. Hybrid exhibits a constant ROC-AUC benefit throughout all 5 seeds.<\/em><\/figcaption><\/figure>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2026\/03\/image-139-1024x410.png\" alt=\"Bar chart showing mean and standard deviation of F1 and PR-AUC across 5 random seeds for pure neural and hybrid models\" class=\"wp-image-649226\"\/><figcaption class=\"wp-element-caption\">Determine 4 \u2014 F1 and PR-AUC imply \u00b1 std throughout 5 seeds. Variations on threshold-dependent metrics are inside noise vary. Picture by Creator.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Three observations from the variance knowledge. The hybrid wins on F1 in 2 of 5 seeds; the pure baseline wins in 3 of 5. Neither dominates on threshold-dependent metrics. The hybrid\u2019s PR-AUC variance is notably greater (\u00b10.058 vs \u00b10.026), that means the rule loss makes some initializations higher and a few worse \u2014 it&#8217;s a sensitivity, not a assured enchancment. The one outcome that holds with out exception: ROC-AUC is greater for the hybrid throughout all 5 seeds. That&#8217;s the cleanest sign from this experiment.<\/p>\n<h2 class=\"wp-block-heading\">Why Does the Rule Loss Assist ROC-AUC?<\/h2>\n<p class=\"wp-block-paragraph\">ROC-AUC is threshold-independent \u2014 it measures how properly the mannequin ranks fraud above non-fraud throughout all potential cutoffs. A constant enchancment throughout 5 seeds is an actual sign. Here&#8217;s what I believe is occurring.<\/p>\n<p class=\"wp-block-paragraph\">With 0.172% fraud prevalence, most 2048-sample batches include solely 3\u20134 labeled fraud examples. The BCE loss receives virtually no fraud-relevant gradient on the vast majority of batches. The rule loss fires on each suspicious transaction no matter label \u2014 it generates gradient indicators on batches that may in any other case inform the optimizer virtually nothing about fraud. This provides the mannequin constant path all through coaching, not simply on the uncommon batches the place labeled fraud occurs to look.<\/p>\n<p class=\"wp-block-paragraph\">The penalty can also be feature-selective. By pointing the mannequin particularly towards quantity and PCA norm, the rule reduces the prospect that the mannequin latches onto irrelevant correlations within the different 28 dimensions. It features as smooth regularization over the characteristic area, not simply the output area.<\/p>\n<p class=\"wp-block-paragraph\">The one-sided relu issues too. I&#8217;m not penalizing the mannequin for being too aggressive on suspicious transactions \u2014 just for being too conservative. The rule can&#8217;t combat in opposition to an accurate high-confidence prediction, solely push up underconfident ones. That asymmetry is deliberate.<\/p>\n<p class=\"wp-block-paragraph\"><em>The lesson shouldn&#8217;t be that guidelines exchange studying. It&#8217;s that guidelines can information it \u2014 particularly when labeled examples are scarce and also you already know one thing about what you&#8217;re searching for.<\/em><\/p>\n<h2 class=\"wp-block-heading\">On Threshold Analysis in Imbalanced Classification<\/h2>\n<p class=\"wp-block-paragraph\">One discovering from this experiment is price its personal part as a result of it applies to any imbalanced classification downside, not simply fraud.<\/p>\n<p class=\"wp-block-paragraph\">On a dataset with 0.17% optimistic charge, the optimum F1 threshold is nowhere close to 0.5. A mannequin can rank fraud virtually completely and nonetheless rating poorly on F1 at a default threshold, just because the choice boundary must be calibrated to the category imbalance. Which means that if two fashions are evaluated with totally different thresholding methods \u2014 one at a set cutoff, the opposite with a val-optimized cutoff \u2014 you aren&#8217;t evaluating fashions. You&#8217;re measuring the brink hole.<\/p>\n<p class=\"wp-block-paragraph\">The sensible guidelines for clear comparability on imbalanced knowledge:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Each fashions evaluated with the <strong>identical<\/strong> thresholding technique<\/li>\n<li class=\"wp-block-list-item\">Threshold chosen on validation knowledge, <strong>by no means<\/strong> on check knowledge<\/li>\n<li class=\"wp-block-list-item\">PR-AUC and ROC-AUC reported alongside F1 \u2014 each are threshold-independent<\/li>\n<li class=\"wp-block-list-item\">Variance throughout a number of seeds to separate actual variations from fortunate initialization<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\">Issues to Watch Out For<\/h2>\n<p class=\"wp-block-paragraph\"><strong>Batch-relative statistics.<\/strong> The rule computes \u201cexcessive quantity\u201d and \u201cexcessive PCA norm\u201d relative to the batch imply, not a set inhabitants statistic. Throughout coaching with giant batches (2048) and stratified sampling, batch means are secure sufficient. In on-line inference scoring particular person transactions, freeze these statistics to training-set values. In any other case the \u201csuspicious\u201d boundary shifts with each name.<\/p>\n<p class=\"wp-block-paragraph\"><strong>PR-AUC variance will increase with the rule loss.<\/strong> Hybrid PR-AUC ranges from 0.636 to 0.817 throughout seeds versus 0.731 to 0.806 for the pure baseline. A rule that helps on some initializations and hurts on others requires multi-seed validation earlier than drawing conclusions. Single-seed outcomes are usually not sufficient.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Excessive \u03bb degrades efficiency.<\/strong> \u03bb=1.0 and a pair of.0 present a significant drop in validation PR-AUC. Aggressive rule weighting can override the BCE sign somewhat than complement it. Begin at \u03bb=0.5 and confirm by yourself knowledge earlier than going greater.<\/p>\n<p class=\"wp-block-paragraph\">A pure extension would make the rule weights learnable somewhat than mounted at 0.5\/0.5:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Learnable mixture weights\nself.rule_w = nn.Parameter(torch.tensor([0.5, 0.5]))\n\nw = torch.softmax(self.rule_w, dim=0)\nsuspicious = (\n    w[0] * torch.sigmoid(5 * (quantity   - quantity.imply())) +\n    w[1] * torch.sigmoid(5 * (pca_norm - pca_norm.imply()))\n)\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">This lets the mannequin determine whether or not quantity or PCA norm is extra predictive for the particular knowledge, somewhat than hard-coding equal weights. This variant has not been run but \u2014 it&#8217;s the subsequent factor on the checklist.<\/p>\n<h2 class=\"wp-block-heading\">Closing Ideas<\/h2>\n<p class=\"wp-block-paragraph\">The rule loss does one thing actual \u2014 the ROC-AUC enchancment is constant and threshold-independent throughout all 5 seeds. The development on threshold-dependent metrics like F1 and PR-AUC is inside noise vary and depends upon initialization. The trustworthy abstract: area guidelines injected into the loss perform can enhance a mannequin\u2019s underlying rating distributions on rare-event knowledge, however the magnitude relies upon closely on the way you measure it and the way secure the advance is throughout seeds.<\/p>\n<p class=\"wp-block-paragraph\">In case you work in fraud detection, anomaly detection, or any area the place labeled positives are uncommon and area information is wealthy, this sample is price experimenting with. The implementation is easy \u2014 a handful of strains on high of an ordinary coaching loop. The extra necessary self-discipline is measurement: use symmetric threshold analysis, report threshold-independent metrics, and all the time run a number of seeds earlier than trusting a outcome.<\/p>\n<p class=\"wp-block-paragraph\">The <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/Emmimal\/neuro-symbolic-fraud-pytorch\/\">repo<\/a> has the total coaching loop, lambda sweep, variance evaluation, and eval code. Obtain the CSV from Kaggle, drop it in the identical listing, run app.py. The numbers above ought to reproduce \u2014 if they don&#8217;t in your machine, open a difficulty and I&#8217;ll have a look.<\/p>\n<h2 class=\"wp-block-heading\">References<\/h2>\n<p class=\"wp-block-paragraph\">[1] A. Dal Pozzolo, O. Caelen, R. A. Johnson and G. Bontempi, <em>Calibrating Chance with Undersampling for Unbalanced Classification<\/em> (2015), IEEE SSCI. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/dalpozz.github.io\/static\/pdf\/SSCI_calib_final_noCC.pdf\">https:\/\/dalpozz.github.io\/static\/pdf\/SSCI_calib_final_noCC.pdf<\/a><\/p>\n<p class=\"wp-block-paragraph\">[2] ULB Machine Studying Group, <em>Credit score Card Fraud Detection Dataset<\/em> (Kaggle). <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.kaggle.com\/datasets\/mlg-ulb\/creditcardfraud\">https:\/\/www.kaggle.com\/datasets\/mlg-ulb\/creditcardfraud<\/a> (Open Database license)<\/p>\n<p class=\"wp-block-paragraph\">[3] S. Ioffe and C. Szegedy, <em>Batch Normalization: Accelerating Deep Community Coaching by Lowering Inside Covariate Shift<\/em> (2015), arXiv:1502.03167. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1502.03167\">https:\/\/arxiv.org\/abs\/1502.03167<\/a><\/p>\n<p class=\"wp-block-paragraph\">[4] PyTorch Documentation \u2014 BCEWithLogitsLoss. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/pytorch.org\/docs\/stable\/generated\/torch.nn.BCEWithLogitsLoss.html\">https:\/\/pytorch.org\/docs\/secure\/generated\/torch.nn.BCEWithLogitsLoss.html<\/a><\/p>\n<p class=\"wp-block-paragraph\">[5] Experiment code and reproducibility supplies. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/Emmimal\/neuro-symbolic-fraud-pytorch\/\">https:\/\/github.com\/Emmimal\/neuro-symbolic-fraud-pytorch\/<\/a><\/p>\n<h2 class=\"wp-block-heading\">Disclosure<\/h2>\n<p class=\"wp-block-paragraph\">This text is predicated on unbiased experiments utilizing publicly accessible knowledge (Kaggle Credit score Card Fraud dataset) and open-source instruments (PyTorch). No proprietary datasets, firm assets, or confidential data had been used. The outcomes and code are absolutely reproducible as described, and the GitHub repository accommodates the entire implementation. The views and conclusions expressed listed here are my very own and don&#8217;t symbolize any employer or group.<\/p>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Summary datasets are extraordinarily imbalanced, with optimistic charges beneath 0.2%. Commonplace neural networks educated with weighted binary cross-entropy usually obtain excessive ROC-AUC however battle to establish suspicious transactions below threshold-sensitive metrics. I suggest a Hybrid Neuro-Symbolic (HNS) method that comes with area information instantly into the coaching goal as a differentiable rule loss \u2014 encouraging [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":12594,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[703,1187,968,8173,1524,667,298,8172,4015],"class_list":["post-12592","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-detection","tag-domain","tag-fraud","tag-guiding","tag-hybrid","tag-networks","tag-neural","tag-neurosymbolic","tag-rules"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/12592","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=12592"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/12592\/revisions"}],"predecessor-version":[{"id":12593,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/12592\/revisions\/12593"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/12594"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=12592"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=12592"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=12592"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-04-27 10:13:17 UTC -->