{"id":4172,"date":"2025-07-03T12:16:01","date_gmt":"2025-07-03T12:16:01","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=4172"},"modified":"2025-07-03T12:16:03","modified_gmt":"2025-07-03T12:16:03","slug":"taking-resnet-to-the-subsequent-degree","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=4172","title":{"rendered":"Taking ResNet to the Subsequent\u00a0Degree"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<h2 class=\"wp-block-heading\"\/>\n<p class=\"wp-block-paragraph\">In case you learn the title of this text, you would possibly most likely assume that ResNeXt is immediately derived from ResNet. Properly, that\u2019s true, however I believe it\u2019s not completely correct. In truth, to me ResNeXt is type of like the mixture of ResNet, VGG, and Inception on the similar time\u200a\u2014\u200aI\u2019ll present you the rationale in a second. On this article we&#8217;re going to speak concerning the ResNeXt structure, which incorporates the historical past, the main points of the structure itself, and the final however not least, the code implementation from scratch with PyTorch.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\"\/>\n<h2 class=\"wp-block-heading\">The Historical past of\u00a0ResNeXt<\/h2>\n<p class=\"wp-block-paragraph\">The hyperparameter we often put our concern on when tuning a neural community mannequin is the depth and width, which corresponds to the variety of layers and the variety of channels, respectively. We see this in VGG and ResNet, the place the authors of the 2 fashions proposed small-sized kernels and skip-connections in order that they&#8217;ll improve the depth of the mannequin simply. In idea, this straightforward strategy is certainly able to increasing mannequin capability. Nevertheless, the 2 hyperparameter dimensions are all the time related to a major change within the variety of parameters, which is unquestionably an issue since sooner or later we may have our mannequin changing into too massive simply to make a slight enchancment on accuracy. However, we knew that in idea Inception is computationally cheaper, but it has a posh architectural design, which requires us to place extra effort to tune the depth and the width of this community. When you have ever realized about Inception, it basically works by passing a tensor by way of a number of convolution layers of various kernel sizes and let the community determine which one is healthier to signify the options of a selected job.<\/p>\n<p class=\"wp-block-paragraph\">Xie <em>et al.<\/em> questioned if they may extract the perfect a part of the three fashions in order that mannequin tuning may be simpler like VGG and ResNet whereas nonetheless sustaining the effectivity of Inception. All their concepts are wrapped in a paper titled \u201c<em>Aggregated Residual Transformations for Deep Neural Networks<\/em>\u201d [1], the place they named the community <em>ResNeXt<\/em>. That is basically the place a brand new idea known as <em>cardinality<\/em> got here from, wherein it basically adopts the thought of Inception, i.e., passing a tensor by way of a number of branches, but in an easier, extra scalable manner. We will understand cardinality as a brand new parameter potential to be tuned along with depth and width. By doing so, we now basically have the <em>subsequent<\/em> hyperparameter dimension\u200a\u2014\u200atherefore the identify, <em>ResNeXt<\/em>\u200a\u2014\u200awhich permits us to have a better diploma of freedom to carry out parameter tuning.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\"\/>\n<h2 class=\"wp-block-heading\">ResNeXt Module<\/h2>\n<p class=\"wp-block-paragraph\">In line with the paper, there are 3 ways we will do to implement cardinality, which you&#8217;ll be able to see in Determine 1 beneath. The paper additionally mentions that setting cardinality to 32 is the perfect apply because it usually offers a very good stability between accuracy and computational complexity, so I\u2019ll use this quantity to elucidate the next instance.<\/p>\n<figure class=\"wp-block-image alignwide\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/06\/1SW3gCOVyZgT9OEIkdQoNbQ.png\" alt=\"\" class=\"wp-image-607230\"\/><figcaption class=\"wp-element-caption\">Determine 1. The three ResNeXt module variants\u00a0[1].<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">The enter of the three modules above is precisely the identical, i.e., a picture tensor having 256 channels. In variant (a), the enter tensor is duplicated 32 instances, wherein every copy will probably be processed independently to signify the 32 paths. The primary convolution layer in every path is accountable to venture the 256-channel picture into 4 utilizing 1\u00d71 kernel, which is adopted by two extra layers: a 3\u00d73 convolution that preserves the variety of channels, and a 1\u00d71 convolution that expands the channels again to 256. The tensors from the 32 branches are then aggregated by element-wise summation earlier than finally being summed once more with the unique enter tensor from the very starting of the module by way of skip-connection.<\/p>\n<p class=\"wp-block-paragraph\">Do not forget that Inception makes use of the thought of <em>split-transform-merge<\/em>. That is precisely what I simply defined for the ResNeXt block variant (a), the place the <em>cut up<\/em> is completed earlier than the primary 1\u00d71 convolution layer, the <em>rework<\/em> is carried out inside every department, and the <em>merge<\/em> is the element-wise summation operations. This concept additionally applies to the ResNeXt module variant (b), wherein case the <em>merge<\/em> operation is carried out by channel-wise concatenation leading to 128-channel picture (which comes from 4 channels \u00d7 32 paths). The ensuing tensor is then projected again to the unique dimension by 1\u00d71 convolution layer earlier than finally summed with the unique enter tensor.<\/p>\n<p class=\"wp-block-paragraph\">Discover that there&#8217;s a phrase <em>equal<\/em> within the top-left nook of the above determine. Which means that these three ResNeXt block variants are mainly the same when it comes to the variety of parameters, FLOPs, and the ensuing accuracy scores. This notion is sensible as a result of they&#8217;re all mainly derived from the identical mathematical formulation. I\u2019ll speak extra about it later within the subsequent part. Regardless of this equivalency, I\u2019ll go together with possibility (c) later within the implementation half. It is because this variant employs the so-called <em>group convolution<\/em>, which is far simpler to implement than (a) and (b). In case you\u2019re not but aware of the time period, it&#8217;s basically a way in a convolution operation the place we divide all enter channels into a number of teams wherein each single of these is accountable to course of channels inside the similar group earlier than finally concatenating them. Within the case of (c), we scale back the variety of channels from 256 to 128 earlier than the splitting is completed, permitting us to have 32 convolution kernel teams the place every accountable to course of 4 channels. We then venture the tensor again to the unique variety of channels in order that we will sum it with the unique enter tensor.<\/p>\n<h3 class=\"wp-block-heading\">Mathematical Definition<\/h3>\n<p class=\"wp-block-paragraph\">As I discussed earlier, right here\u2019s what the formal mathematical definition of a ResNeXt module appears to be like like.<\/p>\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/06\/1Utud9OlCdbKLpS33FzWT8A.png\" alt=\"\" class=\"wp-image-607228\"\/><figcaption class=\"wp-element-caption\">Determine 2. The mathematical expression of a ResNeXt module\u00a0[1].<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">The above equation encapsulates all the <em>split-transform-merge<\/em> operation, the place <em>x<\/em> is the unique enter tensor, <em>y<\/em> is the output tensor, <em>C<\/em> is the cardinality parameter to find out the variety of parallel paths used, <em>T<\/em> is the transformation perform utilized to every path, and <em>\u2211<\/em> signifies that we&#8217;ll merge all data from the reworked tensors. Nevertheless, it is very important be aware that although sigma often denotes summation, solely (a) that truly sums the tensors. In the meantime, each (b) and (c) do the merging by way of concatenation adopted by 1\u00d71 convolution as a substitute, which in actual fact continues to be equal to (a).<\/p>\n<h3 class=\"wp-block-heading\">The Complete ResNeXt Structure<\/h3>\n<p class=\"wp-block-paragraph\">The construction displayed in Determine 1 and the equation in Determine 2 mainly solely correspond to a single ResNeXt block. So as to assemble all the structure, we have to stack the block a number of instances following the construction proven in Determine 3 beneath.<\/p>\n<figure class=\"wp-block-image aligncenter is-resized\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/06\/1HR7IFUz9K3vBmeGf8BLyoQ.png\" alt=\"\" class=\"wp-image-607229\" style=\"width:490px;height:auto\"\/><figcaption class=\"wp-element-caption\">Determine 3. The ResNet-50 structure and the ResNeXt-50 (32\u00d74d) counterpart [1].<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Right here you may see that the construction of ResNeXt is sort of similar to ResNet. So, I imagine you&#8217;ll later discover the ResNeXt implementation extraordinarily simple, particularly when you have ever applied ResNet earlier than. The primary distinction you would possibly discover within the structure is the variety of kernels of the primary two convolution layers in every block, the place the ResNeXt block usually has twice as many kernels as that of the corresponding ResNet block, particularly ranging from the <em>conv2<\/em> stage all the best way to the <em>conv5<\/em> stage. Secondly, additionally it is clearly seen that now we have the cardinality parameter utilized to the second convolution layer in every ResNeXt block.<\/p>\n<p class=\"wp-block-paragraph\">The ResNeXt variant applied above, which is equal to ResNet-50, is the one known as <em>ResNeXt-50 (32\u00d74d)<\/em>. This naming conference signifies that this variant consists of fifty layers in the principle department with 32 cardinality and 4 variety of channels in every path inside the <em>conv2<\/em> stage. As of this writing, there are three ResNeXt variants already applied in PyTorch, specifically <em>resnext50_32x4d<\/em>, <em>resnext101_32x8d<\/em>, and <em>resnext101_64x4d<\/em> [2]. You possibly can positively import them simply alongside the pretrained weights if you need. Nevertheless, on this article we&#8217;re going to implement the structure from scratch as a substitute.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\"\/>\n<h2 class=\"wp-block-heading\">ResNeXt Implementation<\/h2>\n<p class=\"wp-block-paragraph\">As now we have understood the underlying idea behind ResNeXt, let\u2019s now get our fingers soiled with the code! The very first thing we do is to import the required modules as proven in Codeblock 1 beneath.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Codeblock 1\nimport torch\nimport torch.nn as nn\nfrom torchinfo import abstract<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Right here I&#8217;m going to implement the <em>ResNeXt-50 (32\u00d74d)<\/em> variant. So, I have to set the parameters in Codeblock 2 based on the architectural particulars proven again in Determine 3.\u00a0<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Codeblock 2\nCARDINALITY  = 32              #(1)\nNUM_CHANNELS = [3, 64, 256, 512, 1024, 2048]  #(2)\nNUM_BLOCKS   = [3, 4, 6, 3]    #(3)\nNUM_CLASSES  = 1000            #(4)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The <code>CARDINALITY<\/code> variable at line <code>#(1)<\/code> is self-explanatory, so I don\u2019t assume I would like to elucidate it any additional. Subsequent, the <code>NUM_CHANNELS<\/code> variable is used to retailer the variety of output channels of every stage, aside from index 0 the place it corresponds to the variety of enter channels (<code>#(2)<\/code>). At line <code>#(3)<\/code>, <code>NUM_BLOCKS<\/code> is used to find out what number of instances we&#8217;ll repeat the corresponding block. Notice that we don\u2019t specify any quantity for the <em>conv1<\/em> stage since this stage solely consists of a single block. Lastly right here we set the <code>NUM_CLASSES<\/code> parameter to 1000 since ResNeXt is initially pretrained on ImageNet-1K dataset (<code>#(4)<\/code>).<\/p>\n<h3 class=\"wp-block-heading\">The ResNeXt\u00a0Module<\/h3>\n<p class=\"wp-block-paragraph\">For the reason that complete ResNeXt structure is mainly only a bunch of ResNeXt modules, we will mainly create a single class to outline the module and later use it repeatedly in the principle class. On this case, I check with the module as <code>Block<\/code>. The implementation of this class is fairly lengthy, although. So I made a decision to interrupt it down into a number of codeblocks. Simply make sure that all of the codeblocks of the identical quantity are positioned inside the similar pocket book cell if you wish to run the code.<\/p>\n<p class=\"wp-block-paragraph\">You possibly can see within the Codeblock 3a beneath that the <code>__init__()<\/code> technique of this class accepts a number of parameters. The <code>in_channels<\/code> parameter (<code>#(1)<\/code>) is used to set the variety of channels of the tensor to be handed into the block. I set it to be adjustable as a result of the blocks in numerous stage may have completely different enter shapes. Secondly, the <code>add_channel<\/code> and <code>downsample<\/code> parameters (<code>#(2,4)<\/code>) are flags to manage whether or not the block will carry out downsampling. In case you take a more in-depth have a look at Determine 3, you\u2019ll discover that each time we transfer from one stage to a different, the variety of output channels of the block turns into twice as massive because the output from the earlier stage whereas on the similar time the spatial dimension is lowered by half. We have to set each <code>add_channel<\/code> and <code>downsample<\/code> to <code>True<\/code> at any time when we transfer from one stage to the subsequent one. In any other case, we set the 2 parameters to <code>False<\/code> if we solely transfer from one block to a different inside the similar stage. The <code>channel_multiplier<\/code> parameter (<code>#(3)<\/code>), then again, is used to find out the variety of output channels relative to the variety of enter channels by altering the multiplication issue. This parameter is essential as a result of there&#8217;s a particular case the place we have to make the variety of output channels to be 4 instances bigger as a substitute of two, i.e., after we transfer from <em>conv1<\/em> stage (64) to <em>conv2<\/em> stage (256).<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Codeblock 3a\n  class Block(nn.Module):\n      def __init__(self, \n                   in_channels,            #(1)\n                   add_channel=False,      #(2)\n                   channel_multiplier=2,   #(3)\n                   downsample=False):      #(4)\n          tremendous().__init__()\n        \n\n        self.add_channel = add_channel\n        self.channel_multiplier = channel_multiplier\n        self.downsample = downsample\n        \n        \n        if self.add_channel:             #(5)\n            out_channels = in_channels*self.channel_multiplier  #(6)\n        else:\n            out_channels = in_channels   #(7) \n        \n        mid_channels = out_channels\/\/2   #(8).\n        \n        \n        if self.downsample:      #(9)\n            stride = 2           #(10)\n        else:\n            stride = 1<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The parameters we simply mentioned immediately management the <code>if<\/code> statements at line <code>#(5)<\/code> and <code>#(9)<\/code>. The previous goes to be executed at any time when the <code>add_channel<\/code> is <code>True<\/code>, wherein case the variety of enter channels will probably be multiplied by <code>channel_multiplier<\/code> to acquire the variety of output channels (<code>#(6)<\/code>). In the meantime, whether it is <code>False<\/code>, we&#8217;ll make enter and the output tensor dimension to be the identical (<code>#(7)<\/code>). Right here we set <code>mid_channels<\/code> to be half the dimensions of <code>out_channels<\/code> (<code>#(8)<\/code>). It is because based on Determine 3 the variety of channels within the output tensor of the primary two convolution layers inside every block is half of that of the third convolution layer. Subsequent, the <code>downsample<\/code> flag we outlined earlier is used to manage the <code>if<\/code> assertion at line <code>#(9)<\/code>. Each time it&#8217;s set to <code>True<\/code>, it&#8217;s going to assign the <code>stride<\/code> variable to 2 (<code>#(10)<\/code>), which can later trigger the convolution layer to scale back the spatial dimension of the picture by half.<\/p>\n<p class=\"wp-block-paragraph\">Nonetheless contained in the <code>__init__()<\/code> technique, let\u2019s now outline the layers inside the ResNeXt block. See the Codeblock 3b beneath for the main points.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Codeblock 3b\n        if self.add_channel or self.downsample:               #(1)\n            self.projection = nn.Conv2d(in_channels=in_channels,    #(2) \n                                        out_channels=out_channels, \n                                        kernel_size=1, \n                                        stride=stride, \n                                        padding=0, \n                                        bias=False)\n            nn.init.kaiming_normal_(self.projection.weight, nonlinearity='relu')\n            self.bn_proj = nn.BatchNorm2d(num_features=out_channels)\n        \n\n        self.conv0 = nn.Conv2d(in_channels=in_channels,       #(3)\n                               out_channels=mid_channels,     #(4)\n                               kernel_size=1, \n                               stride=1, \n                               padding=0, \n                               bias=False)\n        nn.init.kaiming_normal_(self.conv0.weight, nonlinearity='relu')\n        self.bn0 = nn.BatchNorm2d(num_features=mid_channels)\n        \n\n        self.conv1 = nn.Conv2d(in_channels=mid_channels,      #(5)\n                               out_channels=mid_channels, \n                               kernel_size=3, \n                               stride=stride,                 #(6)\n                               padding=1, \n                               bias=False, \n                               teams=CARDINALITY)            #(7)\n        nn.init.kaiming_normal_(self.conv1.weight, nonlinearity='relu')\n        self.bn1 = nn.BatchNorm2d(num_features=mid_channels)\n        \n\n        self.conv2 = nn.Conv2d(in_channels=mid_channels,      #(8)\n                               out_channels=out_channels,     #(9)\n                               kernel_size=1, \n                               stride=1, \n                               padding=0, \n                               bias=False)\n        nn.init.kaiming_normal_(self.conv2.weight, nonlinearity='relu')\n        self.bn2 = nn.BatchNorm2d(num_features=out_channels)\n        \n        self.relu = nn.ReLU()<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Do not forget that there are circumstances the place the output dimension of a ResNeXt block is completely different from the enter. In such a case, element-wise summation on the final step can&#8217;t be carried out (check with Determine 1). That is the rationale that we have to initialize a <code>projection<\/code> layer at any time when both the <code>add_channel<\/code> or <code>downsample<\/code> flags are <code>True<\/code> (<code>#(1)<\/code>). This <code>projection<\/code> layer (<code>#(2)<\/code>), which is a 1\u00d71 convolution, is used to course of the tensor within the skip-connection in order that the output form goes to match the tensor processed by the principle move, permitting them to be summed. In any other case, if we wish the ResNeXt module to protect the tensor dimension, we have to set each flags to <code>False<\/code> in order that the projection layer is not going to be initialized since we will immediately sum the skip-connection with the tensor from the principle move.<\/p>\n<p class=\"wp-block-paragraph\">The principle move of the ResNeXt module itself includes three convolution layers, which I check with as <code>conv0<\/code>, <code>conv1<\/code> and <code>conv2<\/code>, as written at line <code>#(3)<\/code>, <code>#(5)<\/code> and <code>#(8)<\/code> respectively. If we take a more in-depth have a look at these layers, we will see that each <code>conv0<\/code> and <code>conv2<\/code> are chargeable for manipulating the variety of channels. At strains <code>#(3)<\/code> and <code>#(4)<\/code>, we will see that <code>conv0<\/code> modifications the variety of picture channels from <code>in_channels<\/code> to <code>mid_channels<\/code>, whereas <code>conv2<\/code> modifications it from <code>mid_channels<\/code> to <code>out_channels<\/code> (<code>#(8-9)<\/code>). However, the <code>conv1<\/code> layer is accountable to manage the spatial dimension by way of the <code>stride<\/code> parameter (<code>#(6)<\/code>), wherein the worth is decided based on the <code>dowsample<\/code> flag we mentioned earlier. Moreover, this <code>conv1<\/code> layer will do all the <em>split-transform-merge<\/em> course of by way of group convolution (<code>#(7)<\/code>), which within the case of ResNeXt it corresponds to cardinality.<\/p>\n<p class=\"wp-block-paragraph\">Moreover, right here we initialize batch normalization layers named <code>bn_proj<\/code>, <code>bn0<\/code>, <code>bn1<\/code>, and <code>bn2<\/code>. Later within the <code>ahead()<\/code> technique, we&#8217;re going to place them proper after the corresponding convolution layers following the <em>Conv-BN-ReLU<\/em> construction, which is a regular apply on the subject of developing a CNN-based mannequin. Not solely that, discover that right here we additionally write <code>nn.init.kaiming_normal_()<\/code> after the initialization of every convolution layer. That is basically completed in order that the preliminary layer weights observe the Kaiming regular distribution as talked about within the paper.<\/p>\n<p class=\"wp-block-paragraph\">That was all the pieces concerning the <code>__init__()<\/code> technique, now that we&#8217;re going to transfer on to the <code>ahead()<\/code> technique to truly outline the move of the ResNeXt module. See the Codeblock 3c beneath.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Codeblock 3c\n    def ahead(self, x):\n        print(f'originaltt: {x.dimension()}')\n        \n        if self.add_channel or self.downsample:              #(1)\n            residual = self.bn_proj(self.projection(x))      #(2)\n            print(f'after projectiont: {residual.dimension()}')\n        else:\n            residual = x                                     #(3)\n            print(f'no projectiontt: {residual.dimension()}')\n        \n        x = self.conv0(x)    #(4)\n        x = self.bn0(x)\n        x = self.relu(x)\n        print(f'after conv0-bn0-relut: {x.dimension()}')\n\n        x = self.conv1(x)\n        x = self.bn1(x)\n        x = self.relu(x)\n        print(f'after conv1-bn1-relut: {x.dimension()}')\n        \n        x = self.conv2(x)    #(5)\n        x = self.bn2(x)\n        print(f'after conv2-bn2tt: {x.dimension()}')\n        \n        x = x + residual\n        x = self.relu(x)     #(6)\n        print(f'after summationtt: {x.dimension()}')\n        \n        return x<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Right here you may see that this perform accepts <code>x<\/code> as the one enter, wherein it&#8217;s mainly a tensor produced by the earlier ResNeXt block. The <code>if<\/code> assertion I write at line <code>#(1)<\/code> checks whether or not we&#8217;re about to carry out downsampling. If that&#8217;s the case, the tensor within the skip-connection goes to be handed by way of the <code>projection<\/code> layer and the corresponding batch normalization layer earlier than finally saved within the <code>residual<\/code> variable (<code>#(2)<\/code>). But when downsampling will not be carried out, we&#8217;re going to set <code>residual<\/code> to be precisely the identical as <code>x<\/code> (<code>#(3)<\/code>). Subsequent, we&#8217;ll course of the principle tensor <code>x<\/code> utilizing the stack of convolution layers ranging from <code>conv0<\/code> (<code>#(4)<\/code>) all the best way to <code>conv2<\/code> (<code>#(5)<\/code>). You will need to be aware that the <em>Conv-BN-ReLU<\/em> construction of the <code>conv2<\/code> layer is barely completely different, the place the ReLU activation perform is utilized after element-wise summation is carried out (<code>#(6)<\/code>).<\/p>\n<p class=\"wp-block-paragraph\">Now let\u2019s check the ResNeXt block we simply created to seek out out whether or not now we have applied it appropriately. There are three situations I&#8217;m going to check right here, specifically after we transfer from one stage to a different (setting each <code>add_channel<\/code> and <code>downsample<\/code> to <code>True<\/code>), after we transfer from one block to a different inside the similar stage (each <code>add_channel<\/code> and <code>downsample<\/code> are <code>False<\/code>), and after we transfer from <em>conv1<\/em> stage to <em>conv2<\/em> stage (setting <code>downsample<\/code> to <code>False<\/code> and <code>add_channel<\/code> to <code>True<\/code> with 4 channel multiplier).<\/p>\n<h3 class=\"wp-block-heading\">Take a look at Case\u00a01<\/h3>\n<p class=\"wp-block-paragraph\">The Codeblock 4 beneath demonstrates the primary check case, wherein right here I simulate the primary block of the <em>conv3<\/em> stage. In case you return to Determine 3, you will note that the output from the earlier stage is a 256-channel picture. Thus, we have to set the <code>in_channels<\/code> parameter based on this quantity. In the meantime, the output of the ResNeXt block within the stage has 512 channels with 28\u00d728 spatial dimension. This tensor form transformation is definitely the rationale that we set each flags to <code>True<\/code>. Right here we assume that the <code>x<\/code> tensor handed by way of the community is a dummy picture produced by the <em>conv2<\/em> stage.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Codeblock 4\nblock = Block(in_channels=256, add_channel=True, downsample=True)\nx = torch.randn(1, 256, 56, 56)\n\nout = block(x)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">And beneath is what the output appears to be like like. It&#8217;s seen at line <code>#(1)<\/code> that our <code>projection<\/code> layer efficiently projected the tensor to 512\u00d728\u00d728, precisely matching the form of the output tensor from the principle move (<code>#(4)<\/code>). The <code>conv0<\/code> layer at line <code>#(2)<\/code> doesn&#8217;t alter the tensor dimension in any respect since on this case our <code>in_channels<\/code> and <code>mid_channels<\/code> are the identical. The precise spatial downsampling is carried out by the <code>conv1<\/code> layer, the place the picture decision is lowered from 56\u00d756 to twenty-eight\u00d728 (<code>#(3)<\/code>) due to the stride which is about to 2 for this case. The method is then continued by the <code>conv2<\/code> layer which doubles the variety of channels from 256 to 512 (<code>#(4)<\/code>). Lastly, this tensor will probably be element-wise summed with the projected skip-connection tensor (<code>#(5)<\/code>). And with that, we efficiently transformed our tensor from 256\u00d756\u00d756 to 512\u00d728\u00d728.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\"># Codeblock 4 Output\nauthentic             : torch.Dimension([1, 256, 56, 56])\nafter projection     : torch.Dimension([1, 512, 28, 28])    #(1)\nafter conv0-bn0-relu : torch.Dimension([1, 256, 56, 56])    #(2)\nafter conv1-bn1-relu : torch.Dimension([1, 256, 28, 28])    #(3)\nafter conv2-bn2      : torch.Dimension([1, 512, 28, 28])    #(4)\nafter summation      : torch.Dimension([1, 512, 28, 28])    #(5)<\/code><\/pre>\n<h3 class=\"wp-block-heading\">Take a look at Case\u00a02<\/h3>\n<p class=\"wp-block-paragraph\">So as to display the second check case, right here I&#8217;ll simulate the block contained in the <em>conv3<\/em> stage which the enter is a tensor produced by the earlier block inside the similar stage. In such a case, we wish the enter and output dimension of this ResNeXt module to be the identical, therefore we have to set each <code>add_channel<\/code> and <code>downsample<\/code> to <code>False<\/code>. See the Codeblock 5 and the ensuing output beneath for the main points.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Codeblock 5\nblock = Block(in_channels=512, add_channel=False, downsample=False)\nx = torch.randn(1, 512, 28, 28)\n\nout = block(x)<\/code><\/pre>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\"># Codeblock 5 Output\nauthentic             : torch.Dimension([1, 512, 28, 28])\nno projection        : torch.Dimension([1, 512, 28, 28])    #(1)\nafter conv0-bn0-relu : torch.Dimension([1, 256, 28, 28])    #(2)\nafter conv1-bn1-relu : torch.Dimension([1, 256, 28, 28])\nafter conv2-bn2      : torch.Dimension([1, 512, 28, 28])    #(3)\nafter summation      : torch.Dimension([1, 512, 28, 28])<\/code><\/pre>\n<p class=\"wp-block-paragraph\">As I\u2019ve talked about earlier, the projection layer will not be going for use if the enter tensor will not be downsampled. That is the rationale that at line <code>#(1)<\/code> now we have our skip-connection tensor form unchanged. Subsequent, now we have our channel depend lowered to 256 by the <code>conv0<\/code> layer since on this case <code>mid_channels<\/code> is half the dimensions of <code>out_channels<\/code> (<code>#(2)<\/code>). We finally develop this variety of channels again to 512 utilizing the <em>conv2<\/em> layer (<code>#(3)<\/code>). Moreover, this sort of construction is usually often called <em>bottleneck<\/em> because it follows the <em>wide-narrow-wide<\/em> sample, which was first launched within the authentic ResNet paper [3].<\/p>\n<h3 class=\"wp-block-heading\">Take a look at Case\u00a03<\/h3>\n<p class=\"wp-block-paragraph\">The third check is definitely a particular case since we&#8217;re about to simulate the primary block within the <em>conv2<\/em> stage, the place we have to set the <code>add_channel<\/code> flag to <code>True<\/code> whereas the <code>downsample<\/code> to <code>False<\/code>. Right here we don\u2019t wish to carry out spatial downsampling within the convolution layer as a result of it&#8217;s already completed by a maxpooling layer. Moreover, you may also see in Determine 3 that the <em>conv1<\/em> stage returns a picture of 64 channels. Because of this purpose, we have to set the <code>channel_multiplier<\/code> parameter to 4 since we wish the following <em>conv2<\/em> stage to return 256 channels. See the main points within the Codeblock 6 beneath.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Codeblock 6\nblock = Block(in_channels=64, add_channel=True, channel_multiplier=4, downsample=False)\nx = torch.randn(1, 64, 56, 56)\n\nout = block(x)<\/code><\/pre>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\"># Codeblock 6 Output\nauthentic             : torch.Dimension([1, 64, 56, 56])\nafter projection     : torch.Dimension([1, 256, 56, 56])    #(1)\nafter conv0-bn0-relu : torch.Dimension([1, 128, 56, 56])    #(2)\nafter conv1-bn1-relu : torch.Dimension([1, 128, 56, 56])\nafter conv2-bn2      : torch.Dimension([1, 256, 56, 56])    #(3)\nafter summation      : torch.Dimension([1, 256, 56, 56])<\/code><\/pre>\n<p class=\"wp-block-paragraph\">It&#8217;s seen within the ensuing output above that the ResNeXt module routinely make the most of the <code>projection<\/code> layer, which on this case it efficiently transformed the 64\u00d756\u00d756 tensor into 256\u00d756\u00d756 (<code>#(1)<\/code>). Right here you may see that the variety of channels expanded to be 4 instances bigger whereas the spatial dimension remained the identical. Afterwards, we shrink the channel depend to 128 (<code>#(2)<\/code>) and develop it again to 256 (<code>#(3)<\/code>) to simulate the <em>bottleneck<\/em> mechanism. Thus, we will now carry out summation between the tensor from the principle move and the one produced by the <code>projection<\/code> layer.<\/p>\n<p class=\"wp-block-paragraph\">At this level we already acquired our ResNeXt module works correctly to deal with the three circumstances. So, I imagine this module is now able to be assembled to truly assemble all the ResNeXt structure.<\/p>\n<h3 class=\"wp-block-heading\">The Complete ResNeXt Structure<\/h3>\n<p class=\"wp-block-paragraph\">For the reason that following ResNeXt class is fairly lengthy, I break it down into two codeblocks to make issues simpler to observe. What we mainly have to do within the <code>__init__()<\/code> technique in Codeblock 7a is to initialize the ResNeXt modules utilizing the <code>Block<\/code> class we created earlier. The way in which to implement the <em>conv3<\/em> (<code>#(9)<\/code>), <em>conv4<\/em> (<code>#(12)<\/code>) and <em>conv5<\/em> (<code>#(15)<\/code>) levels are fairly simple since what we mainly have to do is simply to initialize the blocks inside <code>nn.ModuleList<\/code>. Do not forget that the primary block inside every stage is a downsampling block, whereas the remainder them aren&#8217;t meant to carry out downsampling. Because of this purpose, we have to initialize the primary block manually by setting each <code>add_channel<\/code> and <code>downsample<\/code> flags to <code>True<\/code> (<code>#(10,13,16)<\/code>) whereas the remaining blocks are initialized utilizing loops which iterate based on the numbers saved within the <code>NUM_CHANNELS<\/code> record (<code>#(11,14,17)<\/code>).<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Codeblock 7a\nclass ResNeXt(nn.Module):\n    def __init__(self):\n        tremendous().__init__()\n        \n\n        # conv1 stage  #(1)\n        self.resnext_conv1 = nn.Conv2d(in_channels=NUM_CHANNELS[0],\n                                       out_channels=NUM_CHANNELS[1],\n                                       kernel_size=7,    #(2) \n                                       stride=2,         #(3)\n                                       padding=3, \n                                       bias=False)\n        nn.init.kaiming_normal_(self.resnext_conv1.weight, \n                                nonlinearity='relu')\n        self.resnext_bn1 = nn.BatchNorm2d(num_features=NUM_CHANNELS[1])\n        self.relu = nn.ReLU()\n        self.resnext_maxpool1 = nn.MaxPool2d(kernel_size=3,    #(4)\n                                             stride=2, \n                                             padding=1)\n        \n\n        # conv2 stage  #(5)\n        self.resnext_conv2 = nn.ModuleList([\n            Block(in_channels=NUM_CHANNELS[1],\n                  add_channel=True,       #(6)\n                  channel_multiplier=4,\n                  downsample=False)       #(7)\n        ])\n        for _ in vary(NUM_BLOCKS[0]-1):  #(8)\n            self.resnext_conv2.append(Block(in_channels=NUM_CHANNELS[2]))\n            \n\n        # conv3 stage  #(9)\n        self.resnext_conv3 = nn.ModuleList([Block(in_channels=NUM_CHANNELS[2],  #(10)\n                                                  add_channel=True, \n                                                  downsample=True)])\n        for _ in vary(NUM_BLOCKS[1]-1):    #(11)\n            self.resnext_conv3.append(Block(in_channels=NUM_CHANNELS[3]))\n            \n            \n        # conv4 stage  #(12)\n        self.resnext_conv4 = nn.ModuleList([Block(in_channels=NUM_CHANNELS[3],  #(13)\n                                                  add_channel=True, \n                                                  downsample=True)])\n        \n        for _ in vary(NUM_BLOCKS[2]-1):    #(14)\n            self.resnext_conv4.append(Block(in_channels=NUM_CHANNELS[4]))\n            \n            \n        # conv5 stage  #(15)\n        self.resnext_conv5 = nn.ModuleList([Block(in_channels=NUM_CHANNELS[4],  #(16)\n                                                  add_channel=True, \n                                                  downsample=True)])\n        \n        for _ in vary(NUM_BLOCKS[3]-1):    #(17)\n            self.resnext_conv5.append(Block(in_channels=NUM_CHANNELS[5]))\n \n       \n        self.avgpool = nn.AdaptiveAvgPool2d(output_size=(1,1))  #(18)\n\n        self.fc = nn.Linear(in_features=NUM_CHANNELS[5],        #(19)\n                            out_features=NUM_CLASSES)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">As we mentioned earlier, the <em>conv2<\/em> stage (<code>#(5)<\/code>) is a bit particular for the reason that first block inside this stage does improve the variety of channels but it doesn&#8217;t scale back the spatial dimension. That is basically the rationale that I set the <code>add_channel<\/code> parameter to <code>True<\/code> (<code>#(6)<\/code>) whereas the <code>downsample<\/code> parameter is about to <code>False<\/code> (<code>#(7)<\/code>). The initialization of the remaining blocks is identical as the opposite levels we mentioned earlier, the place we will simply do it with a easy loop (<code>#(8)<\/code>).<\/p>\n<p class=\"wp-block-paragraph\">The <em>conv1<\/em> stage (<code>#(1)<\/code>) then again, doesn&#8217;t make the most of the <code>Block<\/code> class for the reason that construction is totally completely different from the opposite levels. In line with Determine 3, this stage solely includes a single 7\u00d77 convolution layer (<code>#(2)<\/code>), which permits us to seize a bigger context from the enter picture. The tensor produced by this layer may have half the spatial dimensions of the enter due to the <code>stride<\/code> parameter which is about to 2 (<code>#(3)<\/code>). Additional downsampling is carried out utilizing maxpooling layer with the identical stride, which once more, reduces the spatial dimension by half (<code>#(4)<\/code>).\u200a\u2014\u200aIn truth, this maxpooling layer needs to be contained in the <em>conv2<\/em> stage as a substitute, however on this implementation I put it outdoors the <code>nn.ModuleList<\/code> of that stage for the sake of simplicity.<\/p>\n<p class=\"wp-block-paragraph\">Lastly, we have to initialize a world common pooling layer (<code>#(18)<\/code>) which works by taking the common worth of every channel within the tensor produced by the final convolution layer. By doing this, we&#8217;re going to have a single quantity representing every channel. This tensor will then be linked to the output layer that produces <code>NUM_CLASSES<\/code> (1000) neurons (<code>#(19)<\/code>), wherein each single of them corresponds to every class within the dataset.<\/p>\n<p class=\"wp-block-paragraph\">Now have a look at the Codeblock 7b beneath to see how I outline the <code>ahead()<\/code> technique. I believe there may be not a lot factor I would like to elucidate since what we mainly do right here is simply to move the tensor from one layer to the following one sequentially.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Codeblock 7b\n    def ahead(self, x):\n        print(f'originaltt: {x.dimension()}')\n        \n        x = self.relu(self.resnext_bn1(self.resnext_conv1(x)))\n        print(f'after resnext_conv1t: {x.dimension()}')\n        \n        x = self.resnext_maxpool1(x)\n        print(f'after resnext_maxpool1t: {x.dimension()}')\n        \n        for i, block in enumerate(self.resnext_conv2):\n            x = block(x)\n            print(f'after resnext_conv2 #{i}t: {x.dimension()}')\n            \n        for i, block in enumerate(self.resnext_conv3):\n            x = block(x)\n            print(f'after resnext_conv3 #{i}t: {x.dimension()}')\n            \n        for i, block in enumerate(self.resnext_conv4):\n            x = block(x)\n            print(f'after resnext_conv4 #{i}t: {x.dimension()}')\n            \n        for i, block in enumerate(self.resnext_conv5):\n            x = block(x)\n            print(f'after resnext_conv5 #{i}t: {x.dimension()}')\n        \n        x = self.avgpool(x)\n        print(f'after avgpooltt: {x.dimension()}')\n        \n        x = torch.flatten(x, start_dim=1)\n        print(f'after flattentt: {x.dimension()}')\n        \n        x = self.fc(x)\n        print(f'after fctt: {x.dimension()}')\n        \n        return x<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Subsequent, let\u2019s check our ResNeXt class utilizing the next code. Right here I&#8217;m going to check it by passing a dummy tensor of dimension 3\u00d7224\u00d7224 which simulates a single RGB picture of dimension 224\u00d7224.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Codeblock 8\nresnext = ResNeXt()\nx = torch.randn(1, 3, 224, 224)\n\nout = resnext(x)<\/code><\/pre>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\"># Codeblock 8 Output\nauthentic               : torch.Dimension([1, 3, 224, 224])\nafter resnext_conv1    : torch.Dimension([1, 64, 112, 112])  #(1)\nafter resnext_maxpool1 : torch.Dimension([1, 64, 56, 56])    #(2)\nafter resnext_conv2 #0 : torch.Dimension([1, 256, 56, 56])   #(3)\nafter resnext_conv2 #1 : torch.Dimension([1, 256, 56, 56])   #(4)\nafter resnext_conv2 #2 : torch.Dimension([1, 256, 56, 56])   #(5)\nafter resnext_conv3 #0 : torch.Dimension([1, 512, 28, 28])\nafter resnext_conv3 #1 : torch.Dimension([1, 512, 28, 28])\nafter resnext_conv3 #2 : torch.Dimension([1, 512, 28, 28])\nafter resnext_conv3 #3 : torch.Dimension([1, 512, 28, 28])\nafter resnext_conv4 #0 : torch.Dimension([1, 1024, 14, 14])\nafter resnext_conv4 #1 : torch.Dimension([1, 1024, 14, 14])\nafter resnext_conv4 #2 : torch.Dimension([1, 1024, 14, 14])\nafter resnext_conv4 #3 : torch.Dimension([1, 1024, 14, 14])\nafter resnext_conv4 #4 : torch.Dimension([1, 1024, 14, 14])\nafter resnext_conv4 #5 : torch.Dimension([1, 1024, 14, 14])\nafter resnext_conv5 #0 : torch.Dimension([1, 2048, 7, 7])\nafter resnext_conv5 #1 : torch.Dimension([1, 2048, 7, 7])\nafter resnext_conv5 #2 : torch.Dimension([1, 2048, 7, 7])\nafter avgpool          : torch.Dimension([1, 2048, 1, 1])    #(6)\nafter flatten          : torch.Dimension([1, 2048])          #(7)\nafter fc               : torch.Dimension([1, 1000])          #(8)<\/code><\/pre>\n<p class=\"wp-block-paragraph\">We will see within the above output that our <em>conv1<\/em> stage appropriately scale back the spatial dimension from 224\u00d7224 to 112\u00d7112 whereas on the similar time additionally rising the variety of channels to 64 (<code>#(1)<\/code>). The downsapling is sustained by the maxpooling layer, the place it makes the spatial dimension of the picture lowered to 56\u00d756 (<code>#(2)<\/code>). Shifting on to the <em>conv2<\/em> stage, we will see that our first block within the stage efficiently transformed the 64-channel picture into 256 (<code>#(3)<\/code>), wherein the following blocks in the identical stage protect the dimension of this tensor (<code>#(4\u20135)<\/code>). The identical factor can also be completed by the subsequent levels till we attain the worldwide common pooling layer (<code>#(6)<\/code>). You will need to be aware that we have to carry out tensor flattening (<code>#(7)<\/code>) to drop the empty axes earlier than finally connecting it to the output layer (<code>#(8)<\/code>). And that concludes how a tensor flows by way of the ResNeXt structure.<\/p>\n<p class=\"wp-block-paragraph\">Moreover, you need to use the <code>abstract()<\/code> perform that we beforehand loaded from <code>torchinfo<\/code> if you wish to get even deeper into the architectural particulars. You possibly can see on the finish of the output beneath that we acquired 25,028,904 parameters in complete. In truth, this variety of params matches precisely with the one belongs to the <em>ResNeXt-50 32x4d<\/em> mannequin from PyTorch, so I imagine our implementation right here is right. You possibly can confirm this within the hyperlink at reference quantity [4].<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Codeblock 9\nresnext = ResNeXt()\nabstract(resnext, input_size=(1, 3, 224, 224))<\/code><\/pre>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\"># Codeblock 9 Output\n==========================================================================================\nLayer (sort:depth-idx)                   Output Form              Param #\n==========================================================================================\nResNeXt                                  [1000]                    --\n\u251c\u2500Conv2d: 1-1                            [1, 64, 112, 112]         9,408\n\u251c\u2500BatchNorm2d: 1-2                       [1, 64, 112, 112]         128\n\u251c\u2500ReLU: 1-3                              [1, 64, 112, 112]         --\n\u251c\u2500MaxPool2d: 1-4                         [1, 64, 56, 56]           --\n\u251c\u2500ModuleList: 1-5                        --                        --\n\u2502    \u2514\u2500Block: 2-1                        [1, 256, 56, 56]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-1                  [1, 256, 56, 56]          16,384\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-2             [1, 256, 56, 56]          512\n\u2502    \u2502    \u2514\u2500Conv2d: 3-3                  [1, 128, 56, 56]          8,192\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-4             [1, 128, 56, 56]          256\n\u2502    \u2502    \u2514\u2500ReLU: 3-5                    [1, 128, 56, 56]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-6                  [1, 128, 56, 56]          4,608\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-7             [1, 128, 56, 56]          256\n\u2502    \u2502    \u2514\u2500ReLU: 3-8                    [1, 128, 56, 56]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-9                  [1, 256, 56, 56]          32,768\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-10            [1, 256, 56, 56]          512\n\u2502    \u2502    \u2514\u2500ReLU: 3-11                   [1, 256, 56, 56]          --\n\u2502    \u2514\u2500Block: 2-2                        [1, 256, 56, 56]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-12                 [1, 128, 56, 56]          32,768\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-13            [1, 128, 56, 56]          256\n\u2502    \u2502    \u2514\u2500ReLU: 3-14                   [1, 128, 56, 56]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-15                 [1, 128, 56, 56]          4,608\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-16            [1, 128, 56, 56]          256\n\u2502    \u2502    \u2514\u2500ReLU: 3-17                   [1, 128, 56, 56]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-18                 [1, 256, 56, 56]          32,768\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-19            [1, 256, 56, 56]          512\n\u2502    \u2502    \u2514\u2500ReLU: 3-20                   [1, 256, 56, 56]          --\n\u2502    \u2514\u2500Block: 2-3                        [1, 256, 56, 56]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-21                 [1, 128, 56, 56]          32,768\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-22            [1, 128, 56, 56]          256\n\u2502    \u2502    \u2514\u2500ReLU: 3-23                   [1, 128, 56, 56]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-24                 [1, 128, 56, 56]          4,608\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-25            [1, 128, 56, 56]          256\n\u2502    \u2502    \u2514\u2500ReLU: 3-26                   [1, 128, 56, 56]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-27                 [1, 256, 56, 56]          32,768\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-28            [1, 256, 56, 56]          512\n\u2502    \u2502    \u2514\u2500ReLU: 3-29                   [1, 256, 56, 56]          --\n\u251c\u2500ModuleList: 1-6                        --                        --\n\u2502    \u2514\u2500Block: 2-4                        [1, 512, 28, 28]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-30                 [1, 512, 28, 28]          131,072\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-31            [1, 512, 28, 28]          1,024\n\u2502    \u2502    \u2514\u2500Conv2d: 3-32                 [1, 256, 56, 56]          65,536\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-33            [1, 256, 56, 56]          512\n\u2502    \u2502    \u2514\u2500ReLU: 3-34                   [1, 256, 56, 56]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-35                 [1, 256, 28, 28]          18,432\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-36            [1, 256, 28, 28]          512\n\u2502    \u2502    \u2514\u2500ReLU: 3-37                   [1, 256, 28, 28]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-38                 [1, 512, 28, 28]          131,072\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-39            [1, 512, 28, 28]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-40                   [1, 512, 28, 28]          --\n\u2502    \u2514\u2500Block: 2-5                        [1, 512, 28, 28]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-41                 [1, 256, 28, 28]          131,072\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-42            [1, 256, 28, 28]          512\n\u2502    \u2502    \u2514\u2500ReLU: 3-43                   [1, 256, 28, 28]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-44                 [1, 256, 28, 28]          18,432\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-45            [1, 256, 28, 28]          512\n\u2502    \u2502    \u2514\u2500ReLU: 3-46                   [1, 256, 28, 28]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-47                 [1, 512, 28, 28]          131,072\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-48            [1, 512, 28, 28]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-49                   [1, 512, 28, 28]          --\n\u2502    \u2514\u2500Block: 2-6                        [1, 512, 28, 28]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-50                 [1, 256, 28, 28]          131,072\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-51            [1, 256, 28, 28]          512\n\u2502    \u2502    \u2514\u2500ReLU: 3-52                   [1, 256, 28, 28]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-53                 [1, 256, 28, 28]          18,432\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-54            [1, 256, 28, 28]          512\n\u2502    \u2502    \u2514\u2500ReLU: 3-55                   [1, 256, 28, 28]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-56                 [1, 512, 28, 28]          131,072\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-57            [1, 512, 28, 28]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-58                   [1, 512, 28, 28]          --\n\u2502    \u2514\u2500Block: 2-7                        [1, 512, 28, 28]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-59                 [1, 256, 28, 28]          131,072\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-60            [1, 256, 28, 28]          512\n\u2502    \u2502    \u2514\u2500ReLU: 3-61                   [1, 256, 28, 28]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-62                 [1, 256, 28, 28]          18,432\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-63            [1, 256, 28, 28]          512\n\u2502    \u2502    \u2514\u2500ReLU: 3-64                   [1, 256, 28, 28]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-65                 [1, 512, 28, 28]          131,072\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-66            [1, 512, 28, 28]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-67                   [1, 512, 28, 28]          --\n\u251c\u2500ModuleList: 1-7                        --                        --\n\u2502    \u2514\u2500Block: 2-8                        [1, 1024, 14, 14]         --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-68                 [1, 1024, 14, 14]         524,288\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-69            [1, 1024, 14, 14]         2,048\n\u2502    \u2502    \u2514\u2500Conv2d: 3-70                 [1, 512, 28, 28]          262,144\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-71            [1, 512, 28, 28]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-72                   [1, 512, 28, 28]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-73                 [1, 512, 14, 14]          73,728\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-74            [1, 512, 14, 14]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-75                   [1, 512, 14, 14]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-76                 [1, 1024, 14, 14]         524,288\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-77            [1, 1024, 14, 14]         2,048\n\u2502    \u2502    \u2514\u2500ReLU: 3-78                   [1, 1024, 14, 14]         --\n\u2502    \u2514\u2500Block: 2-9                        [1, 1024, 14, 14]         --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-79                 [1, 512, 14, 14]          524,288\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-80            [1, 512, 14, 14]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-81                   [1, 512, 14, 14]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-82                 [1, 512, 14, 14]          73,728\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-83            [1, 512, 14, 14]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-84                   [1, 512, 14, 14]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-85                 [1, 1024, 14, 14]         524,288\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-86            [1, 1024, 14, 14]         2,048\n\u2502    \u2502    \u2514\u2500ReLU: 3-87                   [1, 1024, 14, 14]         --\n\u2502    \u2514\u2500Block: 2-10                       [1, 1024, 14, 14]         --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-88                 [1, 512, 14, 14]          524,288\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-89            [1, 512, 14, 14]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-90                   [1, 512, 14, 14]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-91                 [1, 512, 14, 14]          73,728\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-92            [1, 512, 14, 14]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-93                   [1, 512, 14, 14]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-94                 [1, 1024, 14, 14]         524,288\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-95            [1, 1024, 14, 14]         2,048\n\u2502    \u2502    \u2514\u2500ReLU: 3-96                   [1, 1024, 14, 14]         --\n\u2502    \u2514\u2500Block: 2-11                       [1, 1024, 14, 14]         --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-97                 [1, 512, 14, 14]          524,288\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-98            [1, 512, 14, 14]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-99                   [1, 512, 14, 14]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-100                [1, 512, 14, 14]          73,728\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-101           [1, 512, 14, 14]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-102                  [1, 512, 14, 14]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-103                [1, 1024, 14, 14]         524,288\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-104           [1, 1024, 14, 14]         2,048\n\u2502    \u2502    \u2514\u2500ReLU: 3-105                  [1, 1024, 14, 14]         --\n\u2502    \u2514\u2500Block: 2-12                       [1, 1024, 14, 14]         --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-106                [1, 512, 14, 14]          524,288\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-107           [1, 512, 14, 14]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-108                  [1, 512, 14, 14]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-109                [1, 512, 14, 14]          73,728\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-110           [1, 512, 14, 14]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-111                  [1, 512, 14, 14]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-112                [1, 1024, 14, 14]         524,288\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-113           [1, 1024, 14, 14]         2,048\n\u2502    \u2502    \u2514\u2500ReLU: 3-114                  [1, 1024, 14, 14]         --\n\u2502    \u2514\u2500Block: 2-13                       [1, 1024, 14, 14]         --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-115                [1, 512, 14, 14]          524,288\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-116           [1, 512, 14, 14]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-117                  [1, 512, 14, 14]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-118                [1, 512, 14, 14]          73,728\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-119           [1, 512, 14, 14]          1,024\n\u2502    \u2502    \u2514\u2500ReLU: 3-120                  [1, 512, 14, 14]          --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-121                [1, 1024, 14, 14]         524,288\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-122           [1, 1024, 14, 14]         2,048\n\u2502    \u2502    \u2514\u2500ReLU: 3-123                  [1, 1024, 14, 14]         --\n\u251c\u2500ModuleList: 1-8                        --                        --\n\u2502    \u2514\u2500Block: 2-14                       [1, 2048, 7, 7]           --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-124                [1, 2048, 7, 7]           2,097,152\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-125           [1, 2048, 7, 7]           4,096\n\u2502    \u2502    \u2514\u2500Conv2d: 3-126                [1, 1024, 14, 14]         1,048,576\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-127           [1, 1024, 14, 14]         2,048\n\u2502    \u2502    \u2514\u2500ReLU: 3-128                  [1, 1024, 14, 14]         --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-129                [1, 1024, 7, 7]           294,912\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-130           [1, 1024, 7, 7]           2,048\n\u2502    \u2502    \u2514\u2500ReLU: 3-131                  [1, 1024, 7, 7]           --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-132                [1, 2048, 7, 7]           2,097,152\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-133           [1, 2048, 7, 7]           4,096\n\u2502    \u2502    \u2514\u2500ReLU: 3-134                  [1, 2048, 7, 7]           --\n\u2502    \u2514\u2500Block: 2-15                       [1, 2048, 7, 7]           --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-135                [1, 1024, 7, 7]           2,097,152\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-136           [1, 1024, 7, 7]           2,048\n\u2502    \u2502    \u2514\u2500ReLU: 3-137                  [1, 1024, 7, 7]           --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-138                [1, 1024, 7, 7]           294,912\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-139           [1, 1024, 7, 7]           2,048\n\u2502    \u2502    \u2514\u2500ReLU: 3-140                  [1, 1024, 7, 7]           --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-141                [1, 2048, 7, 7]           2,097,152\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-142           [1, 2048, 7, 7]           4,096\n\u2502    \u2502    \u2514\u2500ReLU: 3-143                  [1, 2048, 7, 7]           --\n\u2502    \u2514\u2500Block: 2-16                       [1, 2048, 7, 7]           --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-144                [1, 1024, 7, 7]           2,097,152\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-145           [1, 1024, 7, 7]           2,048\n\u2502    \u2502    \u2514\u2500ReLU: 3-146                  [1, 1024, 7, 7]           --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-147                [1, 1024, 7, 7]           294,912\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-148           [1, 1024, 7, 7]           2,048\n\u2502    \u2502    \u2514\u2500ReLU: 3-149                  [1, 1024, 7, 7]           --\n\u2502    \u2502    \u2514\u2500Conv2d: 3-150                [1, 2048, 7, 7]           2,097,152\n\u2502    \u2502    \u2514\u2500BatchNorm2d: 3-151           [1, 2048, 7, 7]           4,096\n\u2502    \u2502    \u2514\u2500ReLU: 3-152                  [1, 2048, 7, 7]           --\n\u251c\u2500AdaptiveAvgPool2d: 1-9                 [1, 2048, 1, 1]           --\n\u251c\u2500Linear: 1-10                           [1, 1000]                 2,049,000\n==========================================================================================\nComplete params: 25,028,904\nTrainable params: 25,028,904\nNon-trainable params: 0\nComplete mult-adds (Models.GIGABYTES): 6.28\n==========================================================================================\nEnter dimension (MB): 0.60\nAhead\/backward move dimension (MB): 230.42\nParams dimension (MB): 100.12\nEstimated Complete Dimension (MB): 331.13\n==========================================================================================<\/code><\/pre>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\"\/>\n<h2 class=\"wp-block-heading\">Ending<\/h2>\n<p class=\"wp-block-paragraph\">I believe that\u2019s all the pieces about ResNeXt and its implementation. It&#8217;s also possible to discover all the code used on this article on my GitHub repo [5].\u00a0<\/p>\n<p class=\"wp-block-paragraph\">I hope you be taught one thing new at the moment, and thanks very a lot for studying! See you in my subsequent article.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\"\/>\n<h2 class=\"wp-block-heading\">References<\/h2>\n<p class=\"wp-block-paragraph\">[1] Saining Xie <em>et al.<\/em> Aggregated Residual Transformations for Deep Neural Networks. Arxiv. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1611.05431\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/arxiv.org\/abs\/1611.05431<\/a> [Accessed March 1, 2025].<\/p>\n<p class=\"wp-block-paragraph\">[2] ResNeXt. PyTorch. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/pytorch.org\/vision\/main\/models\/resnext.html\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/pytorch.org\/imaginative and prescient\/principal\/fashions\/resnext.html<\/a> [Accessed March 1, 2025].<\/p>\n<p class=\"wp-block-paragraph\">[3] Kaiming He <em>et al.<\/em> Deep Residual Studying for Picture Recognition. Arxiv. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1512.03385\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/arxiv.org\/abs\/1512.03385<\/a> [Accessed March 1, 2025].<\/p>\n<p class=\"wp-block-paragraph\">[4] resnext50_32x4d. PyTorch. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/pytorch.org\/vision\/main\/models\/generated\/torchvision.models.resnext50_32x4d.html#torchvision.models.resnext50_32x4d\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/pytorch.org\/imaginative and prescient\/principal\/fashions\/generated\/torchvision.fashions.resnext50_32x4d.html#torchvision.fashions.resnext50_32x4d<\/a> [Accessed March 1, 2025].<\/p>\n<p class=\"wp-block-paragraph\">[5] MuhammadArdiPutra. Taking ResNet to the NeXt Degree\u200a\u2014\u200aResNeXt. GitHub. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/MuhammadArdiPutra\/medium_articles\/blob\/main\/Taking%20ResNet%20to%20the%20NeXt%20Level%20-%20ResNeXt.ipynb\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/github.com\/MuhammadArdiPutra\/medium_articles\/blob\/principal\/Takingpercent20ResNetpercent20topercent20thepercent20NeXtpercent20Levelpercent20-%20ResNeXt.ipynb<\/a> [Accessed April 7, 2025].<\/p>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>In case you learn the title of this text, you would possibly most likely assume that ResNeXt is immediately derived from ResNet. Properly, that\u2019s true, however I believe it\u2019s not completely correct. In truth, to me ResNeXt is type of like the mixture of ResNet, VGG, and Inception on the similar time\u200a\u2014\u200aI\u2019ll present you the [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":4174,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[3776,3775],"class_list":["post-4172","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-nextlevel","tag-resnet"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/4172","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4172"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/4172\/revisions"}],"predecessor-version":[{"id":4173,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/4172\/revisions\/4173"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/4174"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4172"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4172"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4172"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-04-21 07:46:00 UTC -->