{"id":9585,"date":"2025-12-10T02:39:28","date_gmt":"2025-12-10T02:39:28","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=9585"},"modified":"2025-12-10T02:39:28","modified_gmt":"2025-12-10T02:39:28","slug":"the-machine-studying-creation-calendar-day-9-lof-in-excel","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=9585","title":{"rendered":"The Machine Studying \u201cCreation Calendar\u201d Day 9: LOF in Excel"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p class=\"wp-block-paragraph\">Yesterday, we labored with <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/towardsdatascience.com\/the-machine-learning-advent-calendar-day-8-isolation-forest-in-excel\/\">Isolation Forest<\/a>, which is an Anomaly Detection methodology.<\/p>\n<p class=\"wp-block-paragraph\">Immediately, we have a look at one other algorithm that has the identical goal. However in contrast to Isolation Forest, it does <em>not<\/em> construct timber.<\/p>\n<p class=\"wp-block-paragraph\">It&#8217;s known as LOF, or Native Outlier Issue.<\/p>\n<p class=\"wp-block-paragraph\">Individuals usually summarize LOF with one sentence: <strong>Does this level reside in a area with a decrease density than its neighbors?<\/strong><\/p>\n<p class=\"wp-block-paragraph\">This sentence is definitely <strong>tough to know<\/strong>. I struggled with it for a very long time.<\/p>\n<p class=\"wp-block-paragraph\">Nonetheless, there&#8217;s one half that&#8217;s instantly simple to know,<br \/>and we&#8217;ll see that it turns into the important thing level:<br \/><strong>there&#8217;s a notion of neighbors.<\/strong><\/p>\n<p class=\"wp-block-paragraph\">And as quickly as we speak about neighbors,<br \/>we naturally return to <strong>distance-based fashions<\/strong>.<\/p>\n<p class=\"wp-block-paragraph\">We are going to clarify this algorithm in 3 steps.<\/p>\n<p class=\"wp-block-paragraph\">To maintain issues quite simple, we&#8217;ll use this dataset, once more:<\/p>\n<p class=\"wp-block-paragraph\">1, 2, 3, 9<\/p>\n<p class=\"wp-block-paragraph\">Do you keep in mind that I&#8217;ve the copyright on this dataset? We did Isolation Forest with it, and we&#8217;ll do LOF with it once more. And we will additionally evaluate the 2 outcomes.<\/p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/12\/image-129-1024x445.png\" alt=\"\" class=\"wp-image-635526\"\/><figcaption class=\"wp-element-caption\">LOF in Excel with 3 steps- all photographs by creator<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">All of the Excel information can be found via this <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/ko-fi.com\/s\/4ddca6dff1\">Kofi hyperlink<\/a>. Your assist means quite a bit to me. The value will enhance throughout the month, so early supporters get the very best worth.<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/11\/image-205-1024x348.png\" alt=\"\" class=\"wp-image-631458\"\/><figcaption class=\"wp-element-caption\">All Excel\/Google sheet information for ML and DL<\/figcaption><\/figure>\n<h2 class=\"wp-block-heading\">Step 1 \u2013 ok Neighbors and k-distance<\/h2>\n<p class=\"wp-block-paragraph\">LOF begins with one thing very simple:<\/p>\n<p class=\"wp-block-paragraph\"><strong>Take a look at the distances between factors.<br \/>Then discover the ok nearest neighbors of every level.<\/strong><\/p>\n<p class=\"wp-block-paragraph\">Allow us to take <strong>ok = 2<\/strong>, simply to maintain issues minimal.<\/p>\n<h3 class=\"wp-block-heading\">Nearest neighbors for every level<\/h3>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Level <strong>1<\/strong> \u2192 neighbors: 2 and three<\/li>\n<li class=\"wp-block-list-item\">Level <strong>2<\/strong> \u2192 neighbors: 1 and three<\/li>\n<li class=\"wp-block-list-item\">Level <strong>3<\/strong> \u2192 neighbors: 2 and 1<\/li>\n<li class=\"wp-block-list-item\">Level <strong>9<\/strong> \u2192 neighbors: 3 and a couple of<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">Already, we see a transparent construction rising:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">1, 2, and three kind a good cluster<\/li>\n<li class=\"wp-block-list-item\">9 lives alone, removed from the others<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\">The k-distance: an area radius<\/h3>\n<p class=\"wp-block-paragraph\">The k-distance is just the most important distance among the many ok nearest neighbors.<\/p>\n<p class=\"wp-block-paragraph\">And that is really <strong>the important thing level<\/strong>.<\/p>\n<p class=\"wp-block-paragraph\">As a result of this single quantity tells you one thing very concrete:<br \/><em>the native radius across the level.<\/em><\/p>\n<p class=\"wp-block-paragraph\">If k-distance is small, the purpose is in a dense space.<br \/>If k-distance is massive, the purpose is in a sparse space.<\/p>\n<p class=\"wp-block-paragraph\">With simply this one measure, you have already got a primary sign of \u201cisolation\u201d.<\/p>\n<p class=\"wp-block-paragraph\">Right here, we use the thought of \u201cok nearest neighbors\u201d, which in fact reminds us of <strong data-start=\"737\" data-end=\"745\">k-NN<\/strong> (the classifier or regressor).<br \/>The context right here is completely different, however the calculation is precisely the identical.<\/p>\n<p class=\"wp-block-paragraph\">And if you happen to consider <strong data-start=\"872\" data-end=\"883\">k-means<\/strong>, don&#8217;t combine them:<br \/>the \u201cok\u201d in k-means has nothing to do with the \u201cok\u201d right here.<\/p>\n<h3 class=\"wp-block-heading\">The k-distance calculation<\/h3>\n<p class=\"wp-block-paragraph\">For level <strong>1<\/strong>, the 2 nearest neighbors are <strong>2<\/strong> and <strong>3<\/strong> (distances 1 and a couple of), so <strong>k-distance(1) = 2<\/strong>.<\/p>\n<p class=\"wp-block-paragraph\">For level <strong>2<\/strong>, neighbors are <strong>1<\/strong> and <strong>3<\/strong> (each at distance 1), so <strong>k-distance(2) = 1<\/strong>.<\/p>\n<p class=\"wp-block-paragraph\">For level 3, the 2 nearest neighbors are 1 and <strong>2<\/strong> (distances 2 and 1), so <strong>k-distance(3) = 2<\/strong>.<\/p>\n<p class=\"wp-block-paragraph\">For level <strong>9<\/strong>, neighbors are <strong>3<\/strong> and <strong>2<\/strong> (6 and seven), so <strong>k-distance(9) = 7<\/strong>. That is large in comparison with all of the others.<\/p>\n<p class=\"wp-block-paragraph\">In Excel, we will do a pairwise distance matrix to get the k-distance for every level.<\/p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/12\/image-131-1024x647.png\" alt=\"\" class=\"wp-image-635668\"\/><figcaption class=\"wp-element-caption\">LOF in Excel \u2013 picture by creator<\/figcaption><\/figure>\n<h2 class=\"wp-block-heading\">Step 2 \u2013 Reachability Distances<\/h2>\n<p class=\"wp-block-paragraph\">For this step, I&#8217;ll simply outline the calculations right here, and apply the formulation in Excel. As a result of, to be sincere, I by no means succeeded to find a very intuitive technique to clarify the outcomes.<\/p>\n<p class=\"wp-block-paragraph\">So, what&#8217;s \u201creachability distance\u201d?<\/p>\n<p class=\"wp-block-paragraph\">For some extent p and a neighbor o, we outline this reachability distance as:<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><strong>reach-dist(p, o) = max(k-dist(o), distance(p, o))<\/strong><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">Why take the utmost?<\/p>\n<p class=\"wp-block-paragraph\">The aim of reachability distance is <strong>to stabilize density comparability<\/strong>.<\/p>\n<p class=\"wp-block-paragraph\">If the neighbor o lives in a really dense area (small k-dist), then we don&#8217;t wish to permit an unrealistically small distance.<\/p>\n<p class=\"wp-block-paragraph\">Particularly, for level 2:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Distance to 1 = 1, however k-distance(1) = 2 \u2192 reach-dist(2, 1) = 2<\/li>\n<li class=\"wp-block-list-item\">Distance to three = 1, however k-distance(3) = 2 \u2192 reach-dist(2, 3) = 2<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">Each neighbors power the reachability distance upward.<\/p>\n<p class=\"wp-block-paragraph\">In Excel, we&#8217;ll hold a matrix format to show the reachability distances: one level in comparison with all of the others.<\/p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/12\/image-134-1024x586.png\" alt=\"\" class=\"wp-image-635676\"\/><figcaption class=\"wp-element-caption\">LOF in Excel \u2013 picture by creator<\/figcaption><\/figure>\n<h3 class=\"wp-block-heading\">Common reachability distance<\/h3>\n<p class=\"wp-block-paragraph\">For every level, we will now compute the common worth, which tells us: <strong>on common, how far do I must journey to achieve my native neighborhood?<\/strong><\/p>\n<p class=\"wp-block-paragraph\">And now, do you discover one thing: the purpose 2 has a bigger common reachability distance than 1 and three.<\/p>\n<p class=\"wp-block-paragraph\">This isn&#8217;t that intuitive to me!<\/p>\n<h2 class=\"wp-block-heading\">Step 3 \u2013 LRD and the LOF Rating<\/h2>\n<p class=\"wp-block-paragraph\">The ultimate step is type of a \u201cnormalization\u201d to seek out an anomaly rating.<\/p>\n<p class=\"wp-block-paragraph\">First, we outline the LRD, Native Reachability Density, which is just the inverse of the common reachability distance.<\/p>\n<p class=\"wp-block-paragraph\">And the ultimate LOF rating is calculated as: <\/p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/12\/image-124-1024x113.png\" alt=\"\" class=\"wp-image-635396\"\/><\/figure>\n<p class=\"wp-block-paragraph\">So, LOF compares the density of some extent to the density of its neighbors.<\/p>\n<p class=\"wp-block-paragraph\">Interpretation:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">If LRD(p) \u2248 LRD (neighbors), then LOF \u2248 1<\/li>\n<li class=\"wp-block-list-item\">If LRD(p) is way <strong>smaller<\/strong>, then LOF &gt;&gt; 1. So p is in a sparse area<\/li>\n<li class=\"wp-block-list-item\">If LRD(p) is way <strong>bigger<\/strong> \u2192 LOF &lt; 1. So p is in a really dense pocket.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">I additionally did a model with extra developments, and shorter formulation.<\/p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/12\/image-138-1024x182.png\" alt=\"\" class=\"wp-image-635752\"\/><\/figure>\n<h2 class=\"wp-block-heading\">Understanding What \u201cAnomaly\u201d Means in Unsupervised Fashions<\/h2>\n<p class=\"wp-block-paragraph\">In <strong>unsupervised studying<\/strong>, there isn&#8217;t a floor reality. And that is precisely the place issues can turn into tough.<\/p>\n<p class=\"wp-block-paragraph\">We should not have labels.<br \/>We should not have the \u201cappropriate reply\u201d.<br \/>We solely have the construction of the information.<\/p>\n<p class=\"wp-block-paragraph\">Take this tiny pattern:<\/p>\n<p class=\"wp-block-paragraph\"><strong>1, 2, 3, 7, 8, 12<\/strong><br \/>(I even have the copyright on it.)<\/p>\n<p class=\"wp-block-paragraph\">In the event you have a look at it intuitively, which one looks like an anomaly?<\/p>\n<p class=\"wp-block-paragraph\">Personally, I&#8217;d say <strong>12<\/strong>.<\/p>\n<p class=\"wp-block-paragraph\">Now allow us to have a look at the outcomes. LOF says the outlier is <strong>7<\/strong>.<\/p>\n<p class=\"wp-block-paragraph\">(And you may discover that with k-distance, we&#8217;d say that it&#8217;s <strong>12<\/strong>.)<\/p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/12\/image-135-1024x340.png\" alt=\"\" class=\"wp-image-635677\"\/><figcaption class=\"wp-element-caption\">LOF in Excel \u2013 picture by creator<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Now, we will evaluate <strong>Isolation Forest<\/strong> and <strong>LOF<\/strong> aspect by aspect.<\/p>\n<p class=\"wp-block-paragraph\">On the left, with the dataset <strong>1, 2, 3, 9<\/strong>, each strategies agree:<br \/><strong>9<\/strong> is the clear outlier.<br \/>Isolation Forest offers it the bottom rating,<br \/>and LOF offers it the very best LOF worth.<\/p>\n<p class=\"wp-block-paragraph\">If we glance nearer, for Isolation Forest: 1, 2 and three haven&#8217;t any variations in rating. And LOF offers a better rating for two. That is what we already seen.<\/p>\n<p class=\"wp-block-paragraph\">With the dataset <strong>1, 2, 3, 7, 8, 12<\/strong>, the story modifications.<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><strong>Isolation Forest<\/strong> factors to <strong>12<\/strong> as probably the most remoted level.<br \/>This matches the instinct: 12 is way from everybody.<\/li>\n<li class=\"wp-block-list-item\"><strong>LOF<\/strong>, nonetheless, highlights <strong>7<\/strong> as a substitute.<\/li>\n<\/ul>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/12\/image-137-1024x446.png\" alt=\"\" class=\"wp-image-635680\"\/><figcaption class=\"wp-element-caption\">LOF in Excel \u2013 picture by creator<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">So who is true?<\/p>\n<p class=\"wp-block-paragraph\">It&#8217;s troublesome to say.<\/p>\n<p class=\"wp-block-paragraph\">In observe, we first must agree with enterprise groups on <strong>what \u201canomaly\u201d really means<\/strong> within the context of our information.<\/p>\n<p class=\"wp-block-paragraph\">As a result of in unsupervised studying, there isn&#8217;t a single reality.<\/p>\n<p class=\"wp-block-paragraph\">There&#8217;s solely the definition of \u201canomaly\u201d that every algorithm makes use of.<\/p>\n<p class=\"wp-block-paragraph\">For this reason this can be very vital to know<br \/><strong>how the algorithm works<\/strong>, and how much anomalies it&#8217;s designed to detect.<\/p>\n<p class=\"wp-block-paragraph\">Solely then are you able to resolve whether or not LOF, or k-distance, or Isolation Forest is the fitting selection in your particular state of affairs.<\/p>\n<p class=\"wp-block-paragraph\">And that is the entire message of unsupervised studying:<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><em>Completely different algorithms have a look at the information in another way.<br \/>There isn&#8217;t a \u201ctrue\u201d outlier.<br \/>Solely the definition of what an outlier means for every mannequin.<\/em><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">For this reason understanding how the algorithm works<br \/>is extra vital than the ultimate rating it produces.<\/p>\n<h2 class=\"wp-block-heading\"><strong>LOF Is Not Actually a Mannequin<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">There&#8217;s yet one more level to make clear about LOF.<\/p>\n<p class=\"wp-block-paragraph\">LOF doesn&#8217;t study a mannequin within the common sense.<\/p>\n<p class=\"wp-block-paragraph\">For instance<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">k-means learns and retailer centroids (means)<\/li>\n<li class=\"wp-block-list-item\">GMM learns and retailer means and variances<\/li>\n<li class=\"wp-block-list-item\">resolution timber, study and retailer guidelines<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">All of those produce a operate you could apply to new information.<\/p>\n<p class=\"wp-block-paragraph\">And LOF doesn&#8217;t produce such a operate. It relies upon fully on the neighborhood construction contained in the dataset. In the event you add or take away some extent, the neighborhood modifications, the densities change, and the LOF values have to be recalculated.<\/p>\n<p class=\"wp-block-paragraph\">Even if you happen to hold the entire dataset, like k-NN does, you continue to can not apply LOF safely to new inputs. The definition itself doesn&#8217;t generalize.<\/p>\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n<p class=\"wp-block-paragraph\">LOF and Isolation Forest each detect anomalies, however they have a look at the information via fully completely different lenses.<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><strong>k-distance<\/strong> captures how far some extent should journey to seek out its neighbors.<\/li>\n<li class=\"wp-block-list-item\"><strong>LOF<\/strong> compares native densities.<\/li>\n<li class=\"wp-block-list-item\"><strong>Isolation Forest<\/strong> isolates factors utilizing random splits.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">And even on <strong>quite simple datasets<\/strong>, these strategies can disagree.<br \/>One algorithm might flag some extent as an outlier, whereas one other highlights a very completely different one.<\/p>\n<p class=\"wp-block-paragraph\">And that is the important thing message:<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">In unsupervised studying, there isn&#8217;t a \u201ctrue\u201d outlier.<br \/>Every algorithm defines anomalies in response to its personal logic.<\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">For this reason understanding <em>how<\/em> a way works is extra vital than the quantity it produces.<br \/>Solely then are you able to select the fitting algorithm for the fitting state of affairs, and interpret the outcomes with confidence.<\/p>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Yesterday, we labored with Isolation Forest, which is an Anomaly Detection methodology. Immediately, we have a look at one other algorithm that has the identical goal. However in contrast to Isolation Forest, it does not construct timber. It&#8217;s known as LOF, or Native Outlier Issue. Individuals usually summarize LOF with one sentence: Does this level [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":9587,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[6842,6839,697,2187,136,6843,113],"class_list":["post-9585","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-advent","tag-calendar","tag-day","tag-excel","tag-learning","tag-lof","tag-machine"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/9585","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=9585"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/9585\/revisions"}],"predecessor-version":[{"id":9586,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/9585\/revisions\/9586"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/9587"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=9585"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=9585"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=9585"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-06-14 05:05:01 UTC -->