{"id":6264,"date":"2025-09-03T02:37:15","date_gmt":"2025-09-03T02:37:15","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=6264"},"modified":"2025-09-03T02:37:15","modified_gmt":"2025-09-03T02:37:15","slug":"practice-and-deploy-fashions-on-amazon-sagemaker-hyperpod-utilizing-the-brand-new-hyperpod-cli-and-sdk","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=6264","title":{"rendered":"Practice and deploy fashions on Amazon SageMaker HyperPod utilizing the brand new HyperPod CLI and SDK"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div id=\"\">\n<p>Coaching and deploying massive AI fashions requires superior distributed computing capabilities, however managing these distributed programs shouldn\u2019t be advanced for knowledge scientists and machine studying (ML) practitioners. The newly launched command line interface (CLI) and software program growth package (SDK) for <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/sagemaker\/ai\/hyperpod\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker HyperPod<\/a> simplify how you should utilize the service\u2019s distributed coaching and inference capabilities.<\/p>\n<p>The SageMaker HyperPod CLI supplies knowledge scientists with an intuitive command-line expertise, abstracting away the underlying complexity of distributed programs. Constructed on high of the SageMaker HyperPod SDK, the CLI gives simple instructions for frequent workflows like launching coaching or fine-tuning jobs, deploying inference endpoints, and monitoring cluster efficiency. This makes it very best for fast experimentation and iteration.<\/p>\n<p>For extra superior use instances requiring fine-grained management, the SageMaker HyperPod SDK permits programmatic entry to customise your ML workflows. Builders can use the SDK\u2019s Python interface to exactly configure coaching and deployment parameters whereas sustaining the simplicity of working with acquainted Python objects.<\/p>\n<p>On this submit, we reveal the way to use each the CLI and SDK to coach and deploy massive language fashions (LLMs) on SageMaker HyperPod. We stroll by means of sensible examples of distributed coaching utilizing Absolutely Sharded Information Parallel (FSDP) and mannequin deployment for inference, showcasing how these instruments streamline the event of production-ready generative AI purposes.<\/p>\n<h2>Stipulations<\/h2>\n<p>To comply with the examples on this submit, it&#8217;s essential to have the next stipulations:<\/p>\n<p>As a result of the use instances that we reveal are about coaching and deploying LLMs with the SageMaker HyperPod CLI and SDK, it&#8217;s essential to additionally set up the next Kubernetes operators within the cluster:<\/p>\n<h2>Set up the SageMaker HyperPod CLI<\/h2>\n<p>First, it&#8217;s essential to set up the most recent model of the SageMaker HyperPod CLI and SDK (the examples on this submit are based mostly on model 3.1.0). From the native atmosphere, run the next command (you may as well set up in a Python digital atmosphere):<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\"># Set up the HyperPod CLI and SDK\npip set up sagemaker-hyperpod<\/code><\/pre>\n<\/p><\/div>\n<p>This command units up the instruments wanted to work together with SageMaker HyperPod clusters. For an current set up, be sure you have the most recent model of the package deal put in (<code>sagemaker-hyperpod&gt;=3.1.0<\/code>) to have the ability to use the related set of options. To confirm if the CLI is put in appropriately, you possibly can run the <code>hyp<\/code> command and examine the outputs:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\"># Test if the HyperPod CLI is appropriately put in\nhyp<\/code><\/pre>\n<\/p><\/div>\n<p>The output will likely be just like the next, and consists of directions on the way to use the CLI:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-powershell\">Utilization: hyp [OPTIONS] COMMAND [ARGS]...\n\nChoices:\n\u00a0\u00a0--help \u00a0Present this message and exit.\n\nInstructions:\n\u00a0\u00a0create \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Create endpoints or pytorch jobs.\n\u00a0\u00a0delete \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Delete endpoints or pytorch jobs.\n\u00a0\u00a0describe \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Describe endpoints or pytorch jobs.\n\u00a0\u00a0get-cluster-context \u00a0Get context associated to the present set cluster.\n\u00a0\u00a0get-logs \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Get pod logs for endpoints or pytorch jobs.\n\u00a0\u00a0get-monitoring \u00a0 \u00a0 \u00a0 Get monitoring configurations for Hyperpod cluster.\n\u00a0\u00a0get-operator-logs \u00a0 \u00a0Get operator logs for endpoints.\n\u00a0\u00a0invoke \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Invoke mannequin endpoints.\n\u00a0\u00a0listing \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Record endpoints or pytorch jobs.\n\u00a0\u00a0list-cluster \u00a0 \u00a0 \u00a0 \u00a0 Record SageMaker Hyperpod Clusters with metadata.\n\u00a0\u00a0list-pods \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0Record pods for endpoints or pytorch jobs.\n\u00a0\u00a0set-cluster-context \u00a0Connect with a HyperPod EKS cluster.<\/code><\/pre>\n<\/p><\/div>\n<p>For extra data on CLI utilization and the obtainable instructions and respective parameters, consult with the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/sagemaker-hyperpod-cli.readthedocs.io\/en\/latest\/cli\/cli_index.html\" target=\"_blank\" rel=\"noopener noreferrer\">CLI reference documentation<\/a>.<\/p>\n<h3>Set the cluster context<\/h3>\n<p>The SageMaker HyperPod CLI and SDK use the Kubernetes API to work together with the cluster. Due to this fact, be sure the underlying Kubernetes Python shopper is configured to execute API calls in opposition to your cluster by setting the cluster context.<\/p>\n<p>Use the CLI to listing the clusters obtainable in your AWS account:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-css\"># Record all HyperPod clusters in your AWS account\nhyp list-cluster\n[\n\u00a0\u00a0 \u00a0{\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0\"Cluster\": \"ml-cluster\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0\"Instances\": [\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0{\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"InstanceType\": \"ml.g5.8xlarge\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"TotalNodes\": 8,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"AcceleratorDevicesAvailable\": 8,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"NodeHealthStatus=Schedulable\": 8,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"DeepHealthCheckStatus=Passed\": \"N\/A\"\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0},\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0{\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"InstanceType\": \"ml.m5.12xlarge\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"TotalNodes\": 1,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"AcceleratorDevicesAvailable\": \"N\/A\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"NodeHealthStatus=Schedulable\": 1,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"DeepHealthCheckStatus=Passed\": \"N\/A\"\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0}\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0]\n\u00a0\u00a0 \u00a0}\n]<\/code><\/pre>\n<\/p><\/div>\n<p>Set the cluster context specifying the cluster identify as enter (in our case, we use <code>ml-cluster<\/code> as <cluster_name>):<\/cluster_name><\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-powershell\"># Set the cluster context for subsequent instructions\nhyp set-cluster-context --cluster-name\u00a0<cluster_name\/><\/code><\/pre>\n<\/p><\/div>\n<h2>Practice fashions with the SageMaker HyperPod CLI and SDK<\/h2>\n<p>The SageMaker HyperPod CLI supplies an easy technique to submit PyTorch mannequin coaching and fine-tuning jobs to a SageMaker HyperPod cluster. Within the following instance, we schedule a Meta Llama 3.1 8B mannequin coaching job with FSDP.<\/p>\n<p>The CLI executes coaching utilizing the <code>HyperPodPyTorchJob<\/code> Kubernetes {custom} useful resource, which is applied by the HyperPod coaching operator, that must be put in within the cluster as mentioned within the stipulations part.<\/p>\n<p>First, clone the <code>awsome-distributed-training<\/code> repository and create the Docker picture that you&#8217;ll use for the coaching job:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\">cd ~\ngit clone https:\/\/github.com\/aws-samples\/awsome-distributed-training\/\ncd awsome-distributed-training\/3.test_cases\/pytorch\/FSDP<\/code><\/pre>\n<\/p><\/div>\n<p>Then, log in to the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/ecr\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Elastic Container Registry<\/a> (Amazon ECR) to drag the bottom picture and construct the brand new container:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-css\">export\u00a0AWS_REGION=$(aws ec2 describe-availability-zones --output textual content --query 'AvailabilityZones[0].[RegionName]')\nexport\u00a0ACCOUNT=$(aws sts get-caller-identity --query Account --output textual content)\nexport\u00a0REGISTRY=${ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com\/\ndocker construct -f Dockerfile -t ${REGISTRY}fsdp:pytorch2.7.1 .<\/code><\/pre>\n<\/p><\/div>\n<p>The Dockerfile within the <code>awsome-distributed-training<\/code> repository referenced within the previous code already accommodates the HyperPod elastic agent, which orchestrates lifecycles of coaching employees on every container and communicates with the HyperPod coaching operator. For those who\u2019re utilizing a unique Dockerfile, set up the HyperPod elastic agent following the directions in <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-eks-operator-install.html#sagemaker-eks-operator-elastic-agent\" target=\"_blank\" rel=\"noopener noreferrer\">HyperPod elastic agent<\/a>.<\/p>\n<p>Subsequent, create a brand new registry in your coaching picture if wanted and push the constructed picture to it:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-css\"># Create registry if wanted\nREGISTRY_COUNT=$(aws ecr describe-repositories | grep \"fsdp\" | wc -l)\nif [ \"$REGISTRY_COUNT\" -eq 0 ]; then\n\u00a0\u00a0 \u00a0aws ecr create-repository --repository-name fsdp\nfi\n\n# Login to registry\necho \"Logging in to $REGISTRY ...\"\naws ecr get-login-password | docker login --username AWS --password-stdin $REGISTRY\n\n# Push picture to registry\ndocker picture push ${REGISTRY}fsdp:pytorch2.7.1<\/code><\/pre>\n<\/p><\/div>\n<p>After you may have efficiently created the Docker picture, you possibly can submit the coaching job utilizing the SageMaker HyperPod CLI.<\/p>\n<p>Internally, the SageMaker HyperPod CLI will use the Kubernetes Python shopper to construct a <code>HyperPodPyTorchJob<\/code> {custom} useful resource after which create it on the Kubernetes the cluster.<\/p>\n<p>You possibly can modify the CLI command for different Meta Llama configurations by exchanging the <code>--args<\/code> to the specified arguments and values; examples might be discovered within the Kubernetes manifests within the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/aws-samples\/awsome-distributed-training\/tree\/a7a5f628b50a020d99001897440d568d33ab742f\/3.test_cases\/pytorch\/FSDP\/kubernetes\" target=\"_blank\" rel=\"noopener noreferrer\">awsome-distributed-training repository.<\/a><\/p>\n<p>Within the given configuration, the coaching job will write checkpoints to <code>\/fsx\/checkpoints<\/code> on the FSx for Lustre PVC.<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-typescript\">hyp create hyp-pytorch-job \n\u00a0\u00a0 \u00a0--job-name fsdp-llama3-1-8b \n\u00a0\u00a0 \u00a0--image ${REGISTRY}fsdp:pytorch2.7.1 \n\u00a0\u00a0 \u00a0--command '[\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0hyperpodrun,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--tee=3,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--log_dir=\/tmp\/hyperpod,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--nproc_per_node=1,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--nnodes=8,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0\/fsdp\/train.py\n\u00a0\u00a0 \u00a0]' \n\u00a0\u00a0 \u00a0--args '[\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--max_context_width=8192,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--num_key_value_heads=8,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--intermediate_size=14336,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--hidden_width=4096,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--num_layers=32,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--num_heads=32,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--model_type=llama_v3,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--tokenizer=hf-internal-testing\/llama-tokenizer,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--checkpoint_freq=50,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--validation_freq=25,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--max_steps=50,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--checkpoint_dir=\/fsx\/checkpoints,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--dataset=allenai\/c4,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--dataset_config_name=en,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--resume_from_checkpoint=\/fsx\/checkpoints,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--train_batch_size=1,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--val_batch_size=1,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--sharding_strategy=full,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0--offload_activations=1\n\u00a0\u00a0 \u00a0]' \n\u00a0\u00a0 \u00a0--environment '{\"PYTORCH_CUDA_ALLOC_CONF\": \"max_split_size_mb:32\"}' \n\u00a0\u00a0 \u00a0--pull-policy \"IfNotPresent\" \n\u00a0\u00a0 \u00a0--instance-type ml.g5.8xlarge \n\u00a0\u00a0 \u00a0--node-count 8 \n\u00a0\u00a0 \u00a0--tasks-per-node 1 \n\u00a0\u00a0 \u00a0--deep-health-check-passed-nodes-only false \n\u00a0\u00a0 \u00a0--max-retry 3 \n\u00a0\u00a0 \u00a0--volume identify=shmem,kind=hostPath,mount_path=\/dev\/shm,path=\/dev\/shm,read_only=false\u00a0\n\u00a0 \u00a0\u00a0--volume identify=fsx,kind=pvc,mount_path=\/fsx,claim_name=fsx-claim,read_only=false<\/code><\/pre>\n<\/p><\/div>\n<p>The <code>hyp create hyp-pytorch-job<\/code> command helps extra arguments, which might be found by working the next:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\">hyp create hyp-pytorch-job --help<\/code><\/pre>\n<\/p><\/div>\n<p>The previous instance code accommodates the next related arguments:<\/p>\n<ul>\n<li><code>--command<\/code> and <code>--args<\/code> supply flexibility in setting the command to be executed within the container. The command executed is <code>hyperpodrun<\/code>, applied by the HyperPod elastic agent that&#8217;s put in within the coaching container. The HyperPod elastic agent extends PyTorch\u2019s ElasticAgent and manages the communication of the assorted employees with the HyperPod coaching operator. For extra data, consult with <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-eks-operator-install.html#sagemaker-eks-operator-elastic-agent\" target=\"_blank\" rel=\"noopener noreferrer\">HyperPod elastic agent<\/a>.<\/li>\n<li><code>--environment<\/code> defines atmosphere variables and customizes the coaching execution.<\/li>\n<li><code>--max-retry<\/code> signifies the utmost variety of restarts on the course of degree that will likely be tried by the HyperPod coaching operator. For extra data, consult with <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-eks-operator-usage.html\" target=\"_blank\" rel=\"noopener noreferrer\">Utilizing the coaching operator to run jobs<\/a>.<\/li>\n<li><code>--volume<\/code> is used to map persistent or ephemeral volumes to the container.<\/li>\n<\/ul>\n<p>If profitable, the command will output the next:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\">Utilizing model: 1.0\n2025-08-12\u00a010:03:03,270 - sagemaker.hyperpod.coaching.hyperpod_pytorch_job - INFO - Efficiently submitted HyperPodPytorchJob 'fsdp-llama3-1-8b'!<\/code><\/pre>\n<\/p><\/div>\n<p>You possibly can observe the standing of the coaching job by means of the CLI. Operating <code>hyp listing hyp-pytorch-job<\/code> will present the <code>standing<\/code> first as <code>Created<\/code> after which as <code>Operating<\/code> after the containers have been began:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-typescript\">NAME \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0NAMESPACE \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 STATUS \u00a0 \u00a0 \u00a0 \u00a0 AGE \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\n--------------------------------------------------------------------------------\nfsdp-llama3-1-8b \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0default \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Operating \u00a0 \u00a0 \u00a0 \u00a06m \u00a0 \u00a0 \u00a0 \u00a0<\/code><\/pre>\n<\/p><\/div>\n<p>To listing the pods which are created by this coaching job, run the next command:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-typescript\">hyp list-pods hyp-pytorch-job --job-name fsdp-llama3-1-8b\nPods for job: fsdp-llama3-1-8b\n\nPOD NAME \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0NAMESPACE \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \n----------------------------------------------------------------------\nfsdp-llama3-1-8b-pod-0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0default \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \nfsdp-llama3-1-8b-pod-1 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0default \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0\nfsdp-llama3-1-8b-pod-2\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0default \u00a0 \u00a0 \u00a0 \u00a0\u00a0\nfsdp-llama3-1-8b-pod-3\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0default \u00a0 \u00a0 \u00a0 \u00a0\u00a0\nfsdp-llama3-1-8b-pod-4\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0default \u00a0 \u00a0 \u00a0 \u00a0\u00a0\nfsdp-llama3-1-8b-pod-5\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0default \u00a0 \u00a0 \u00a0 \u00a0\u00a0\nfsdp-llama3-1-8b-pod-6\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0default \u00a0 \u00a0 \u00a0 \u00a0\nfsdp-llama3-1-8b-pod-7\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0default \u00a0 \u00a0 \u00a0 \u00a0 \u00a0<\/code><\/pre>\n<\/p><\/div>\n<p>You possibly can observe the logs of one of many coaching pods that get spawned by working the next command:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">hyp get-logs hyp-pytorch-job --pod-name fsdp-llama3-1-8b-pod-0\u00a0\n--job-name fsdp-llama3-1-8b\n...\n2025-08-12T14:59:25.069208138Z [HyperPodElasticAgent] 2025-08-12 14:59:25,069 [INFO] [rank0-restart0] \/usr\/native\/lib\/python3.10\/dist-packages\/torch\/distributed\/elastic\/agent\/server\/api.py:685: [default] Beginning employee group \n2025-08-12T14:59:25.069301320Z [HyperPodElasticAgent] 2025-08-12 14:59:25,069 [INFO] [rank0-restart0] \/usr\/native\/lib\/python3.10\/dist-packages\/hyperpod_elastic_agent\/hyperpod_elastic_agent.py:221: Beginning employees with employee spec worker_group.spec=WorkerSpec(function=\"default\", local_world_size=1, rdzv_handler=<hyperpod_elastic_agent.rendezvous.hyperpod_rendezvous_backend.hyperpodrendezvousbackend object=\"\" at=\"\">, fn=None, entrypoint=\"\/usr\/bin\/python3\", args=('-u', '\/fsdp\/prepare.py', '--max_context_width=8192', '--num_key_value_heads=8', '--intermediate_size=14336', '--hidden_width=4096', '--num_layers=32', '--num_heads=32', '--model_type=llama_v3', '--tokenizer=hf-internal-testing\/llama-tokenizer', '--checkpoint_freq=50', '--validation_freq=50', '--max_steps=100', '--checkpoint_dir=\/fsx\/checkpoints', '--dataset=allenai\/c4', '--dataset_config_name=en', '--resume_from_checkpoint=\/fsx\/checkpoints', '--train_batch_size=1', '--val_batch_size=1', '--sharding_strategy=full', '--offload_activations=1'), max_restarts=3, monitor_interval=0.1, master_port=None, master_addr=None, local_addr=None)... \n2025-08-12T14:59:30.264195963Z [default0]:2025-08-12 14:59:29,968 [INFO] **essential**: Creating Mannequin \n2025-08-12T15:00:51.203541576Z [default0]:2025-08-12 15:00:50,781 [INFO] **essential**: Created mannequin with whole parameters: 7392727040 (7.39 B) \n2025-08-12T15:01:18.139531830Z [default0]:2025-08-12 15:01:18 I [checkpoint.py:79] Loading checkpoint from \/fsx\/checkpoints\/llama_v3-24steps ... \n2025-08-12T15:01:18.833252603Z [default0]:2025-08-12 15:01:18,081 [INFO] **essential**: Wrapped mannequin with FSDP \n2025-08-12T15:01:18.833290793Z [default0]:2025-08-12 15:01:18,093 [INFO] **essential**: Created optimizer<\/hyperpod_elastic_agent.rendezvous.hyperpod_rendezvous_backend.hyperpodrendezvousbackend><\/code><\/pre>\n<\/p><\/div>\n<p>We elaborate on extra superior debugging and observability options on the finish of this part.<\/p>\n<p>Alternatively, in case you favor a programmatic expertise and extra superior customization choices, you possibly can submit the coaching job utilizing the SageMaker HyperPod Python SDK. For extra data, consult with the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/sagemaker-hyperpod-cli.readthedocs.io\/en\/latest\/sdk\/sdk_index.html\" target=\"_blank\" rel=\"noopener noreferrer\">SDK reference documentation<\/a>. The next code will yield the equal coaching job submission to the previous CLI instance:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-css\">import\u00a0os\nfrom\u00a0sagemaker.hyperpod.coaching\u00a0import\u00a0HyperPodPytorchJob\nfrom\u00a0sagemaker.hyperpod.coaching\u00a0import\u00a0ReplicaSpec, Template, VolumeMounts, Spec, Containers, Sources, RunPolicy, Volumes, HostPath, PersistentVolumeClaim\nfrom\u00a0sagemaker.hyperpod.frequent.config\u00a0import\u00a0Metadata\n\nREGISTRY\u00a0=\u00a0os.environ['REGISTRY']\n\n# Outline job specs\nnproc_per_node\u00a0=\u00a0\"1\"\u00a0\u00a0# Variety of processes per node\nreplica_specs\u00a0=\u00a0[\n\u00a0\u00a0 \u00a0ReplicaSpec(\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0name\u00a0=\u00a0\"pod\", \u00a0# Replica name\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0replicas\u00a0=\u00a08,\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0template\u00a0=\u00a0Template(\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0spec\u00a0=\u00a0Spec(\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0containers\u00a0=\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0[\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0Containers(\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0# Container name\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0name=\"fsdp-training-container\", \u00a0\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0# Training image\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0image=f\"{REGISTRY}fsdp:pytorch2.7.1\", \u00a0\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0# Volume mounts\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0volume_mounts=[\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0VolumeMounts(\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0name=\"fsx\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0mount_path=\"\/fsx\"\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0),\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0VolumeMounts(\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0name=\"shmem\", \n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0mount_path=\"\/dev\/shm\"\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0)\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0],\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0env=[\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0{\"name\": \"PYTORCH_CUDA_ALLOC_CONF\", \"value\": \"max_split_size_mb:32\"},\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0],\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0# Picture pull coverage\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0image_pull_policy=\"IfNotPresent\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0assets=Sources(\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0requests={\"nvidia.com\/gpu\": \"1\"}, \u00a0\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0limits={\"nvidia.com\/gpu\": \"1\"}, \u00a0 \n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0),\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0# Command to run\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0command=[\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"hyperpodrun\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"--tee=3\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"--log_dir=\/tmp\/hyperpod\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"--nproc_per_node=1\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"--nnodes=8\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"\/fsdp\/train.py\"\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0], \u00a0\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0# Script arguments\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0args\u00a0=\u00a0[\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--max_context_width=8192',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--num_key_value_heads=8',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--intermediate_size=14336',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--hidden_width=4096',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--num_layers=32',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--num_heads=32',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--model_type=llama_v3',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--tokenizer=hf-internal-testing\/llama-tokenizer',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--checkpoint_freq=2',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--validation_freq=25',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--max_steps=50',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--checkpoint_dir=\/fsx\/checkpoints',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--dataset=allenai\/c4',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--dataset_config_name=en',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--resume_from_checkpoint=\/fsx\/checkpoints',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--train_batch_size=1',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--val_batch_size=1',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--sharding_strategy=full',\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0'--offload_activations=1'\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0]\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0)\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0],\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0volumes\u00a0=\u00a0[\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0Volumes(\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0name=\"fsx\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0persistent_volume_claim=PersistentVolumeClaim(\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0claim_name=\"fsx-claim\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0read_only=False\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0),\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0),\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0Volumes(\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0name=\"shmem\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0host_path=HostPath(path=\"\/dev\/shm\"),\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0)\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0],\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0node_selector={\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"node.kubernetes.io\/instance-type\": \"ml.g5.8xlarge\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0},\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0)\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0),\n\u00a0\u00a0 \u00a0)\n]\nrun_policy\u00a0=\u00a0RunPolicy(clean_pod_policy=\"None\", job_max_retry_count=3) \u00a0\n# Create and begin the PyTorch job\npytorch_job\u00a0=\u00a0HyperPodPytorchJob(\n\u00a0\u00a0 \u00a0# Job identify\n\u00a0\u00a0 \u00a0metadata\u00a0=\u00a0Metadata(\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0identify=\"fsdp-llama3-1-8b\", \u00a0 \u00a0 \n\u00a0\u00a0 \u00a0 \u00a0 \u00a0namespace=\"default\",\n\u00a0\u00a0 \u00a0),\n\u00a0\u00a0 \u00a0# Processes per node\n\u00a0\u00a0 \u00a0nproc_per_node\u00a0=\u00a0nproc_per_node, \u00a0 \n\u00a0\u00a0 \u00a0# Duplicate specs\n\u00a0\u00a0 \u00a0replica_specs\u00a0=\u00a0replica_specs, \u00a0 \u00a0 \u00a0 \u00a0\n)\n# Launch the job\npytorch_job.create() \u00a0<\/code><\/pre>\n<\/p><\/div>\n<h3>Debugging coaching jobs<\/h3>\n<p>Along with monitoring the coaching pod logs as described earlier, there are a number of different helpful methods of debugging coaching jobs:<\/p>\n<ul>\n<li>You possibly can submit coaching jobs with a further <code>--debug True<\/code> flag, which can print the Kubernetes YAML to the console when the job begins so it may be inspected by customers.<\/li>\n<li>You possibly can view an inventory of present coaching jobs by working <code>hyp listing hyp-pytorch-job<\/code>.<\/li>\n<li>You possibly can view the standing and corresponding occasions of the job by working <code>hyp describe hyp-pytorch-job \u2014job-name fsdp-llama3-1-8b<\/code>.<\/li>\n<li>If the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-hyperpod-eks-cluster-observability.html\" target=\"_blank\" rel=\"noopener noreferrer\">HyperPod observability stack<\/a> is deployed to the cluster, run <code>hyp get-monitoring --grafana<\/code> and <code>hyp get-monitoring --prometheus<\/code> to get the Grafana dashboard and Prometheus workspace URLs, respectively, to view cluster and job metrics.<\/li>\n<li>To watch GPU utilization or view listing contents, it may be helpful to execute instructions or open an interactive shell into the pods. You possibly can run instructions in a pod by working, for instance, <code>kubectl exec -it<\/code><pod-name><code>-- nvtop<\/code> to run <code>nvtop<\/code> for visibility into GPU utilization. You possibly can open an interactive shell by working <code>kubectl exec -it<\/code><pod-name><code>-- \/bin\/bash<\/code>.<\/pod-name><\/pod-name><\/li>\n<li>The logs of the HyperPod coaching operator controller pod can have beneficial details about scheduling. To view them, run <code>kubectl get pods -n aws-hyperpod | grep hp-training-controller-manager<\/code> to seek out the controller pod identify and run <code>kubectl logs -n aws-hyperpod<\/code><controller-pod-name> to view the corresponding logs.<\/controller-pod-name><\/li>\n<\/ul>\n<h2>Deploy fashions with the SageMaker HyperPod CLI and SDK<\/h2>\n<p>The SageMaker HyperPod CLI supplies instructions to rapidly deploy fashions to your SageMaker HyperPod cluster for inference. You possibly can deploy each basis fashions (FMs) obtainable on <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/de\/sagemaker-ai\/jumpstart\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker JumpStart<\/a> in addition to {custom} fashions with artifacts which are saved on Amazon S3 or FSx for Lustre file programs.<\/p>\n<p>This performance will routinely deploy the chosen mannequin to the SageMaker HyperPod cluster by means of Kubernetes {custom} assets, that are applied by the HyperPod inference operator, that must be put in within the cluster as mentioned within the stipulations part. It&#8217;s optionally attainable to routinely create a SageMaker inference endpoint in addition to an <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/elasticloadbalancing\/application-load-balancer\/\" target=\"_blank\" rel=\"noopener noreferrer\">Software Load Balancer<\/a> (ALB), which can be utilized immediately utilizing HTTPS calls with a generated TLS certificates to invoke the mannequin.<\/p>\n<h3>Deploy SageMaker JumpStart fashions<\/h3>\n<p>You possibly can deploy an FM that&#8217;s obtainable on SageMaker JumpStart with the next command:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-typescript\">hyp create hyp-jumpstart-endpoint \n\u00a0\u00a0--model-id deepseek-llm-r1-distill-qwen-1-5b \n\u00a0\u00a0--instance-type ml.g5.8xlarge \n\u00a0\u00a0--endpoint-name \n\u00a0\u00a0--tls-certificate-output-s3-uri s3:\/\/<certificate-bucket>\/ \n\u00a0\u00a0--namespace\u00a0default<\/certificate-bucket><\/code><\/pre>\n<\/p><\/div>\n<p>The previous code consists of the next parameters:<\/p>\n<ul>\n<li><code>--model-id<\/code> is the mannequin ID within the SageMaker JumpStart mannequin hub. On this instance, we deploy a <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/deepseek-ai\/DeepSeek-R1-Distill-Qwen-1.5B\" target=\"_blank\" rel=\"noopener noreferrer\">DeepSeek R1-distilled model of Qwen 1.5B<\/a>, which is out there on SageMaker JumpStart.<\/li>\n<li><code>--instance-type<\/code> is the goal occasion kind in your SageMaker HyperPod cluster the place you wish to deploy the mannequin. This occasion kind have to be supported by the chosen mannequin.<\/li>\n<li><code>--endpoint-name<\/code> is the identify that the SageMaker inference endpoint can have. This identify have to be distinctive. SageMaker inference endpoint creation is elective.<\/li>\n<li><code>--tls-certificate-output-s3-uri<\/code> is the S3 bucket location the place the TLS certificates for the ALB will likely be saved. This can be utilized to immediately invoke the mannequin by means of HTTPS. You should use S3 buckets which are accessible by the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-hyperpod-model-deployment-setup.html\" target=\"_blank\" rel=\"noopener noreferrer\">HyperPod inference operator IAM function<\/a>.<\/li>\n<li><code>--namespace<\/code> is the Kubernetes namespace the mannequin will likely be deployed to. The default worth is ready to <code>default<\/code>.<\/li>\n<\/ul>\n<p>The CLI helps extra superior deployment configurations, together with auto scaling, by means of extra parameters, which might be considered by working the next command:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\">hyp create hyp-jumpstart-endpoint --help<\/code><\/pre>\n<\/p><\/div>\n<p>If profitable, the command will output the next:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\">Creating JumpStart mannequin and sagemaker endpoint. Endpoint identify: deepseek-distill-qwen-endpoint-cli.\n\u00a0The method could take a couple of minutes...<\/code><\/pre>\n<\/p><\/div>\n<p>After a couple of minutes, each the ALB and the SageMaker inference endpoint will likely be obtainable, which might be noticed by means of the CLI. Operating <code>hyp listing hyp-jumpstart-endpoint<\/code> will present the <code>standing<\/code> first as <code>DeploymentInProgress<\/code> after which as <code>DeploymentComplete<\/code> when the endpoint is prepared for use:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-typescript\">| identify \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 | namespace \u00a0 | labels \u00a0 | standing \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 |\n|------------------------------------|-------------|----------|--------------------|\n| deepseek-distill-qwen-endpoint-cli | default \u00a0 \u00a0 | \u00a0 \u00a0 \u00a0 \u00a0 \u00a0| DeploymentComplete |<\/code><\/pre>\n<\/p><\/div>\n<p>To get extra visibility into the deployment pod, run the next instructions to seek out the pod identify and examine the corresponding logs:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-typescript\">hyp list-pods hyp-jumpstart-endpoint\u00a0--namespace <namespace>\nhyp get-logs\u00a0hyp-jumpstart-endpoint --namespace <namespace>\u00a0--pod-name <model-pod-name\/><\/namespace><\/namespace><\/code><\/pre>\n<\/p><\/div>\n<p>The output will look just like the next:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\">2025-08-12T15:53:14.042031963Z WARN \u00a0PyProcess W-195-model-stderr: Capturing CUDA graph shapes: 100%|??????????| 35\/35 [00:18&lt;00:00, \u00a01.63it\/s]\n2025-08-12T15:53:14.042257357Z WARN \u00a0PyProcess W-195-model-stderr: Capturing CUDA graph shapes: 100%|??????????| 35\/35 [00:18&lt;00:00, \u00a01.94it\/s]\n2025-08-12T15:53:14.042297298Z INFO \u00a0PyProcess W-195-model-stdout: INFO 08-12 15:53:14 llm_engine.py:436] init engine (profile, create kv cache, warmup mannequin) took 26.18 seconds\n2025-08-12T15:53:15.215357997Z INFO \u00a0PyProcess Mannequin [model] initialized.\n2025-08-12T15:53:15.219205375Z INFO \u00a0WorkerThread Beginning employee thread WT-0001 for mannequin mannequin (M-0001, READY) on machine gpu(0)\n2025-08-12T15:53:15.221591827Z INFO \u00a0ModelServer Initialize BOTH server with: EpollServerSocketChannel.\n2025-08-12T15:53:15.231404670Z INFO \u00a0ModelServer BOTH API bind to: http:\/\/0.0.0.0:8080<\/code><\/pre>\n<\/p><\/div>\n<p>You possibly can invoke the SageMaker inference endpoint you created by means of the CLI by working the next command:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-css\">hyp invoke hyp-jumpstart-endpoint \n\u00a0\u00a0 \u00a0--endpoint-name deepseek-distill-qwen-endpoint-cli  \u00a0 \u00a0 \u00a0 \n\u00a0\u00a0 \u00a0--body '{\"inputs\":\"What's the capital of USA?\"}'<\/code><\/pre>\n<\/p><\/div>\n<p>You&#8217;ll get an output just like the next:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-css\">{\"generated_text\": \" What's the capital of France? What's the capital of Japan? What's the capital of China? What's the capital of Germany? What's\"}<\/code><\/pre>\n<\/p><\/div>\n<p>Alternatively, in case you favor a programmatic expertise and superior customization choices, you should utilize the SageMaker HyperPod Python SDK. The next code will yield the equal deployment to the previous CLI instance:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\">from\u00a0sagemaker.hyperpod.inference.config.hp_jumpstart_endpoint_config\u00a0import\u00a0Mannequin, Server, SageMakerEndpoint, TlsConfig\nfrom\u00a0sagemaker.hyperpod.inference.hp_jumpstart_endpoint\u00a0import\u00a0HPJumpStartEndpoint\n\nmannequin=Mannequin(\n\u00a0\u00a0 \u00a0model_id='deepseek-llm-r1-distill-qwen-1-5b',\n)\n\nserver=Server(\n\u00a0\u00a0 \u00a0instance_type=\"ml.g5.8xlarge\",\n)\n\nendpoint_name=SageMakerEndpoint(identify=\"deepseek-distill-qwen-endpoint-cli\")\n\ntls_config=TlsConfig(tls_certificate_output_s3_uri='s3:\/\/<certificate-bucket>')\n\njs_endpoint=HPJumpStartEndpoint(\n\u00a0\u00a0 \u00a0mannequin=mannequin,\n\u00a0\u00a0 \u00a0server=server,\n\u00a0\u00a0 \u00a0sage_maker_endpoint=endpoint_name,\n\u00a0\u00a0 \u00a0tls_config=tls_config,\n\u00a0\u00a0 \u00a0namespace=\"default\"\n)\n\njs_endpoint.create() <\/certificate-bucket><\/code><\/pre>\n<\/p><\/div>\n<h3>Deploy {custom} fashions<\/h3>\n<p>It&#8217;s also possible to use the CLI to deploy {custom} fashions with mannequin artifacts saved on both Amazon S3 or FSx for Lustre. That is helpful for fashions which have been fine-tuned on {custom} knowledge. It&#8217;s essential to present the storage location of the mannequin artifacts in addition to a container picture for inference that&#8217;s suitable with the mannequin artifacts and SageMaker inference endpoints. Within the following instance, we deploy a <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/TinyLlama\/TinyLlama-1.1B-Chat-v1.0\" target=\"_blank\" rel=\"noopener noreferrer\">TinyLlama 1.1B mannequin<\/a> from Amazon S3 utilizing the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.djl.ai\/master\/docs\/serving\/serving\/docs\/lmi\/index.html#overview---large-model-inference-lmi-containers\" target=\"_blank\" rel=\"noopener noreferrer\">DJL Massive Mannequin Inference container picture<\/a>.<\/p>\n<p>In preparation, obtain the mannequin artifacts regionally and push them to an S3 bucket:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\"># Set up huggingface-hub if not current in your machine\npip set up huggingface-hub\n\n# Obtain mannequin\nhf\u00a0obtain TinyLlama\/TinyLlama-1.1B-Chat-v1.0 --local-dir .\/tinyllama-1.1b-chat\n\n# Add to S3\naws s3 cp .\/tinyllama s3:\/\/<model-bucket>\/fashions\/tinyllama-1.1b-chat\/ --recursive<\/model-bucket><\/code><\/pre>\n<\/p><\/div>\n<p>Now you possibly can deploy the mannequin with the next command:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-typescript\">hyp create hyp-custom-endpoint \n\u00a0 \u00a0 --endpoint-name my-custom-tinyllama-endpoint \n\u00a0 \u00a0 --model-name tinyllama \n\u00a0 \u00a0 --model-source-type s3 \n\u00a0 \u00a0 --model-location fashions\/tinyllama-1.1b-chat\/\u00a0\n\u00a0 \u00a0 --s3-bucket-name <model-bucket> \n\u00a0 \u00a0 --s3-region <model-bucket-region>\u00a0\n\u00a0 \u00a0 --instance-type ml.g5.8xlarge \n\u00a0 \u00a0 --image-uri 763104351884.dkr.ecr.us-west-2.amazonaws.com\/djl-inference:0.33.0-lmi15.0.0-cu128 \n\u00a0 \u00a0 --container-port 8080 \n\u00a0 \u00a0 --model-volume-mount-name modelmount \n\u00a0\u00a0\u00a0\u00a0--tls-certificate-output-s3-uri s3:\/\/<certificate-bucket>\/ \n\u00a0\u00a0\u00a0\u00a0--namespace default<\/certificate-bucket><\/model-bucket-region><\/model-bucket><\/code><\/pre>\n<\/p><\/div>\n<p>The previous code accommodates the next key parameters:<\/p>\n<ul>\n<li><code>--model-name<\/code> is the identify of the mannequin that will likely be created in SageMaker<\/li>\n<li><code>--model-source-type<\/code> specifies both <code>fsx<\/code> or <code>s3<\/code> for the placement of the mannequin artifacts<\/li>\n<li><code>--model-location<\/code> specifies the prefix or folder the place the mannequin artifacts are positioned<\/li>\n<li><code>--s3-bucket-name<\/code> and \u2014<code>s3-region<\/code> specify the S3 bucket identify and AWS Area, respectively<\/li>\n<li><code>--instance-type<\/code>, <code>--endpoint-name<\/code>, <code>--namespace<\/code>, and <code>--tls-certificate<\/code> behave the identical as for the deployment of SageMaker JumpStart fashions<\/li>\n<\/ul>\n<p>Just like SageMaker JumpStart mannequin deployment, the CLI helps extra superior deployment configurations, together with auto scaling, by means of extra parameters, which you&#8217;ll be able to view by working the next command:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\">hyp create hyp-custom-endpoint --help<\/code><\/pre>\n<\/p><\/div>\n<p>If profitable, the command will output the next:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\">Creating sagemaker mannequin and endpoint. Endpoint identify: my-custom-tinyllama-endpoint.\n\u00a0The method could take a couple of minutes...<\/code><\/pre>\n<\/p><\/div>\n<p>After a couple of minutes, each the ALB and the SageMaker inference endpoint will likely be obtainable, which you&#8217;ll be able to observe by means of the CLI. Operating <code>hyp listing hyp-custom-endpoint<\/code> will present the <code>standing<\/code> first as <code>DeploymentInProgress<\/code> and as <code>DeploymentComplete<\/code> when the endpoint is prepared for use:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-typescript\">| identify \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 | namespace \u00a0 | labels \u00a0 | standing \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 |\n|------------------------------|-------------|----------|----------------------|\n| my-custom-tinyllama-endpoint | default \u00a0 \u00a0 | \u00a0 \u00a0 \u00a0 \u00a0 \u00a0| DeploymentComplete\u00a0 \u00a0|<\/code><\/pre>\n<\/p><\/div>\n<p>To get extra visibility into the deployment pod, run the next instructions to seek out the pod identify and examine the corresponding logs:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-typescript\">hyp list-pods hyp-custom-endpoint\u00a0--namespace <namespace>\nhyp get-logs\u00a0hyp-custom-endpoint --namespace <namespace>\u00a0--pod-name <model-pod-name\/><\/namespace><\/namespace><\/code><\/pre>\n<\/p><\/div>\n<p>The output will look just like the next:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\">\u2502 INFO \u00a0PyProcess W-196-model-stdout: INFO 08-12 16:00:36 [monitor.py:33] torch.compile takes 29.18 s in whole \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u2502\n\u2502 INFO \u00a0PyProcess W-196-model-stdout: INFO 08-12 16:00:37 [kv_cache_utils.py:634] GPU KV cache dimension: 809,792 tokens \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u2502\n\u2502 INFO \u00a0PyProcess W-196-model-stdout: INFO 08-12 16:00:37 [kv_cache_utils.py:637] Most concurrency for two,048 tokens per request: 395.41x \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u2502\n\u2502 INFO \u00a0PyProcess W-196-model-stdout: INFO 08-12 16:00:59 [gpu_model_runner.py:1626] Graph capturing completed in 22 secs, took 0.37 GiB \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u2502\n\u2502 INFO \u00a0PyProcess W-196-model-stdout: INFO 08-12 16:00:59 [core.py:163] init engine (profile, create kv cache, warmup mannequin) took 59.39 seconds \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u2502\n\u2502 INFO \u00a0PyProcess W-196-model-stdout: INFO 08-12 16:00:59 [core_client.py:435] Core engine course of 0 prepared. \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u2502\n\u2502 INFO \u00a0PyProcess Mannequin [model] initialized. \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u2502\n\u2502 INFO \u00a0WorkerThread Beginning employee thread WT-0001 for mannequin mannequin (M-0001, READY) on machine gpu(0) \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u2502\n\u2502 INFO \u00a0ModelServer Initialize BOTH server with: EpollServerSocketChannel. \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u2502\n\u2502 INFO \u00a0ModelServer BOTH API bind to: http:\/\/0.0.0.0:8080\u00a0<\/code><\/pre>\n<\/p><\/div>\n<p>You possibly can invoke the SageMaker inference endpoint you created by means of the CLI by working the next command:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-css\">hyp invoke hyp-custom-endpoint \n\u00a0\u00a0 \u00a0--endpoint-name my-custom-tinyllama-endpoint  \u00a0 \u00a0 \u00a0 \n\u00a0\u00a0 \u00a0--body '{\"inputs\":\"What's the capital of USA?\"}'<\/code><\/pre>\n<\/p><\/div>\n<p>You&#8217;ll get an output just like the next:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-css\">{\"generated_text\": \" What's the capital of France? What's the capital of Japan? What's the capital of China? What's the capital of Germany? What's\"}<\/code><\/pre>\n<\/p><\/div>\n<p>Alternatively, you possibly can deploy utilizing the SageMaker HyperPod Python SDK. The next code will yield the equal deployment to the previous CLI instance:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-python\">from sagemaker.hyperpod.inference.config.hp_endpoint_config import S3Storage, ModelSourceConfig, TlsConfig, EnvironmentVariables, ModelInvocationPort, ModelVolumeMount, Sources, Employee\nfrom sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint\n\nmodel_source_config = ModelSourceConfig(\n\u00a0\u00a0 \u00a0model_source_type=\"s3\",\n\u00a0\u00a0 \u00a0model_location=\"fashions\/tinyllama-1.1b-chat\/\",\n\u00a0\u00a0 \u00a0s3_storage=S3Storage(\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0bucket_name=\"<model-bucket>\",\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0area='<model-bucket-region>',\n\u00a0\u00a0 \u00a0),\n)\n\nemployee = Employee(\n\u00a0\u00a0 \u00a0picture=\"763104351884.dkr.ecr.us-west-2.amazonaws.com\/djl-inference:0.33.0-lmi15.0.0-cu128\",\n\u00a0\u00a0 \u00a0model_volume_mount=ModelVolumeMount(\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0identify=\"modelmount\",\n\u00a0\u00a0 \u00a0),\n\u00a0\u00a0 \u00a0model_invocation_port=ModelInvocationPort(container_port=8080),\n\u00a0\u00a0 \u00a0assets=Sources(\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0requests={\"cpu\": \"30000m\", \"nvidia.com\/gpu\": 1, \"reminiscence\": \"100Gi\"},\n\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0limits={\"nvidia.com\/gpu\": 1}\n\u00a0\u00a0 \u00a0),\n)\n\ntls_config = TlsConfig(tls_certificate_output_s3_uri='s3:\/\/<certificate-bucket>\/')\n\ncustom_endpoint = HPEndpoint(\n\u00a0\u00a0 \u00a0endpoint_name=\"my-custom-tinyllama-endpoint\",\n\u00a0\u00a0 \u00a0instance_type=\"ml.g5.8xlarge\",\n\u00a0\u00a0 \u00a0model_name=\"tinyllama\", \u00a0\n\u00a0\u00a0 \u00a0tls_config=tls_config,\n\u00a0\u00a0 \u00a0model_source_config=model_source_config,\n\u00a0\u00a0 \u00a0employee=employee,\n)\n\ncustom_endpoint.create()<\/certificate-bucket><\/model-bucket-region><\/model-bucket><\/code><\/pre>\n<\/p><\/div>\n<h3>Debugging inference deployments<\/h3>\n<p>Along with the monitoring of the inference pod logs, there are a number of different helpful methods of debugging inference deployments:<\/p>\n<ul>\n<li>You possibly can entry the HyperPod inference operator controller logs by means of the SageMaker HyperPod CLI. Run <code>hyp get-operator-logs<\/code><hyp-custom-endpoint><code>\u2014since-hours 0.5<\/code> to entry the operator logs for {custom} and SageMaker JumpStart deployments, respectively.<\/hyp-custom-endpoint><\/li>\n<li>You possibly can view an inventory of inference deployments by working <code>hyp listing<\/code><hyp-custom-endpoint>.<\/hyp-custom-endpoint><\/li>\n<li>You possibly can view the standing and corresponding occasions of deployments by working <code>hyp describe<\/code><hyp-custom-endpoint><code>--name<\/code><deployment-name> to view the standing and occasions for {custom} and SageMaker JumpStart deployments, respectively.<\/deployment-name><\/hyp-custom-endpoint><\/li>\n<li>If the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-hyperpod-eks-cluster-observability.html\" target=\"_blank\" rel=\"noopener noreferrer\">HyperPod observability stack<\/a> is deployed to the cluster, run <code>hyp get-monitoring --grafana<\/code> and <code>hyp get-monitoring --prometheus<\/code> to get the Grafana dashboard and Prometheus workspace URLs, respectively, to view inference metrics as properly.<\/li>\n<li>To watch GPU utilization or view listing contents, it may be helpful to execute instructions or open an interactive shell into the pods. You possibly can run instructions in a pod by working, for instance, <code>kubectl exec -it<\/code><pod-name><code>-- nvtop<\/code> to run <code>nvtop<\/code> for visibility into GPU utilization. You possibly can open an interactive shell by working <code>kubectl exec -it<\/code><pod-name><code>-- \/bin\/bash<\/code>.<\/pod-name><\/pod-name><\/li>\n<\/ul>\n<p>For extra data on the inference deployment options in SageMaker HyperPod, see <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/amazon-sagemaker-hyperpod-launches-model-deployments-to-accelerate-the-generative-ai-model-development-lifecycle\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon SageMaker HyperPod launches mannequin deployments to speed up the generative AI mannequin growth lifecycle<\/a> and <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-hyperpod-model-deployment.html\" target=\"_blank\" rel=\"noopener noreferrer\">Deploying fashions on Amazon SageMaker HyperPod<\/a>.<\/p>\n<h2>Clear up<\/h2>\n<p>To delete the coaching job from the corresponding instance, use the next CLI command:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\">hyp delete hyp-pytorch-job --job-name fsdp-llama3-1-8b<\/code><\/pre>\n<\/p><\/div>\n<p>To delete the mannequin deployments from the inference instance, use the next CLI instructions for SageMaker JumpStart and {custom} mannequin deployments, respectively:<\/p>\n<div class=\"hide-language\">\n<pre><code class=\"lang-code\">hyp delete hyp-jumpstart-endpoint --name deepseek-distill-qwen-endpoint-cli\nhyp delete\u00a0hyp-custom-endpoint --name\u00a0my-custom-tinyllama-endpoint<\/code><\/pre>\n<\/p><\/div>\n<p>To keep away from incurring ongoing prices for the cases working in your cluster, you possibly can <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/smcluster-scale-down.html\" target=\"_blank\" rel=\"noopener noreferrer\">scale down<\/a> the cases or <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/catalog.workshops.aws\/sagemaker-hyperpod-eks\/en-US\/13-cleanup\" target=\"_blank\" rel=\"noopener noreferrer\">delete cases<\/a>.<\/p>\n<h2>Conclusion<\/h2>\n<p>The brand new SageMaker HyperPod CLI and SDK can considerably streamline the method of coaching and deploying large-scale AI fashions. By way of the examples on this submit, we\u2019ve demonstrated how these instruments present the next advantages:<\/p>\n<ul>\n<li><strong>Simplified workflows<\/strong> \u2013 The CLI gives simple instructions for frequent duties like distributed coaching and mannequin deployment, making highly effective capabilities of SageMaker HyperPod accessible to knowledge scientists with out requiring deep infrastructure data.<\/li>\n<li><strong>Versatile growth choices<\/strong> \u2013 Though the CLI handles frequent eventualities, the SDK permits fine-grained management and customization for extra advanced necessities, so builders can programmatically configure each side of their distributed ML workloads.<\/li>\n<li><strong>Complete observability<\/strong> \u2013 Each interfaces present strong monitoring and debugging capabilities by means of system logs and integration with the SageMaker HyperPod observability stack, serving to rapidly determine and resolve points throughout growth.<\/li>\n<li><strong>Manufacturing-ready deployment<\/strong> \u2013 The instruments assist end-to-end workflows from experimentation to manufacturing, together with options like computerized TLS certificates era for safe mannequin endpoints and integration with SageMaker inference endpoints.<\/li>\n<\/ul>\n<p>Getting began with these instruments is so simple as putting in the <code>sagemaker-hyperpod<\/code> package deal. The SageMaker HyperPod CLI and SDK present the suitable degree of abstraction for each knowledge scientists seeking to rapidly experiment with distributed coaching and ML engineers constructing manufacturing programs.<\/p>\n<p>For extra details about SageMaker HyperPod and these growth instruments, consult with the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/sagemaker-hyperpod-cli.readthedocs.io\/en\/documentation-with-new-changes\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\">SageMaker HyperPod CLI and SDK documentation<\/a> or discover the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/aws\/sagemaker-hyperpod-cli\" target=\"_blank\" rel=\"noopener noreferrer\">instance notebooks<\/a>.<\/p>\n<hr\/>\n<h3>In regards to the authors<\/h3>\n<p style=\"clear: both\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-full wp-image-33438\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2022\/02\/25\/Giuseppe-Angelo-Porcelli.jpeg\" alt=\"\" width=\"100\" height=\"134\"\/><strong>Giuseppe Angelo Porcelli<\/strong>\u00a0is a Principal Machine Studying Specialist Options Architect for Amazon Net Companies. With a number of years of software program engineering and an ML background, he works with clients of any dimension to know their enterprise and technical wants and design AI and ML options that make one of the best use of the AWS Cloud and the Amazon Machine Studying stack. He has labored on initiatives in numerous domains, together with MLOps, pc imaginative and prescient, and NLP, involving a broad set of AWS providers. In his free time, Giuseppe enjoys enjoying soccer.<\/p>\n<p style=\"clear: both\"><img decoding=\"async\" loading=\"lazy\" class=\"alignleft wp-image-97117 size-full\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2025\/01\/12\/sishwe.png\" alt=\"\" width=\"100\" height=\"105\"\/><strong>Shweta Singh<\/strong> is a Senior Product Supervisor within the Amazon SageMaker Machine Studying platform group at AWS, main the SageMaker Python SDK. She has labored in a number of product roles in Amazon for over 5 years. She has a Bachelor of Science diploma in Laptop Engineering and a Masters of Science in Monetary Engineering, each from New York College.<\/p>\n<p style=\"clear: both\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-115553 size-thumbnail alignleft\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2025\/09\/02\/njourdan-100x133.jpg\" alt=\"\" width=\"100\" height=\"133\"\/><strong>Nicolas Jourdan <\/strong>is a Specialist Options Architect at AWS, the place he helps clients unlock the complete potential of AI and ML within the cloud. He holds a PhD in Engineering from TU Darmstadt in Germany, the place his analysis targeted on the reliability, idea drift detection, and MLOps of business ML purposes. Nicolas has intensive hands-on expertise throughout industries, together with autonomous driving, drones, and manufacturing, having labored in roles starting from analysis scientist to engineering supervisor. He has contributed to award-winning analysis, holds patents in object detection and anomaly detection, and is keen about making use of cutting-edge AI to resolve advanced real-world issues.<\/p>\n<p>       \n      <\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Coaching and deploying massive AI fashions requires superior distributed computing capabilities, however managing these distributed programs shouldn\u2019t be advanced for knowledge scientists and machine studying (ML) practitioners. The newly launched command line interface (CLI) and software program growth package (SDK) for Amazon SageMaker HyperPod simplify how you should utilize the service\u2019s distributed coaching and inference [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":6266,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[387,1355,2309,738,266,388,721,2547],"class_list":["post-6264","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-amazon","tag-cli","tag-deploy","tag-hyperpod","tag-models","tag-sagemaker","tag-sdk","tag-train"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/6264","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6264"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/6264\/revisions"}],"predecessor-version":[{"id":6265,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/6264\/revisions\/6265"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/6266"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6264"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6264"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6264"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-06-09 21:23:24 UTC -->