Getting Began with the Claude API in Python

# Introduction

You wish to add Claude to a Python utility. Creating an account and making your first API name is simple. The official documentation can get you from zero to a working request in a couple of minutes. The following questions are often extra sensible:

What does the response object include?
How do you stream responses so customers can see output because it’s generated?
How do you construction prompts and deal with responses in a manufacturing utility?

The Claude Python SDK takes care of a lot of the underlying API interplay. It gives typed response objects, built-in retry dealing with, and a easy interface for working with the Messages API.

This text walks you thru setup, your first API name, studying the response, system prompts, and streaming. By the tip, you will have a working basis.

# Conditions and Set up

You want Python 3.9 or increased, a free Claude Console account, and an API key from the Console’s Settings > API Keys web page. You may add $5 in credit and work via every little thing on this article.

With these in place, set up the SDK:

By no means hardcode your API key in supply recordsdata. Retailer it as an atmosphere variable as an alternative:

export ANTHROPIC_API_KEY="YOUR-API-KEY-HERE"

Or add it to a .env file on the venture root when you’re utilizing python-dotenv. The SDK reads the ANTHROPIC_API_KEY out of your atmosphere, so that you needn’t move it anyplace in your code.

# Making Your First API Name

The entry level for each interplay is shopper.messages.create(). Let’s ask Claude to clarify what a context window is, one thing you will really want to know as you utilize the API.

You move three issues: the mannequin ID, a max_tokens restrict, and a messages record. The messages record is all the time a listing of dicts, every with a "function" and "content material" key.

import anthropic

shopper = anthropic.Anthropic()

response = shopper.messages.create(
    mannequin="claude-sonnet-5",
    max_tokens=256,
    messages=[
        {
            "role": "user",
            "content": "In one sentence, what is a context window?"
        }
    ]
)

print(response.content material[0].textual content)

The mannequin discipline takes the precise mannequin ID string. max_tokens is a tough ceiling on what number of output tokens Claude will produce; the response stops there even when the thought is not full, so set it excessive sufficient for open-ended requests. The messages record should all the time begin with a "consumer" flip.

Pattern output:

A context window is the utmost quantity of textual content (measured in tokens) {that a} language
mannequin can course of and take into account at one time, encompassing each your enter and its output.

# Understanding the Response Object

The response from messages.create() is a typed Message object. It is price inspecting the total construction earlier than constructing something on prime of it.

Change the print line within the earlier instance with:

Operating that provides you the total object:

Message(
  id='msg_01XFDUDYJgAACzvnptvVoYEL',
  sort="message",
  function="assistant",
  content material=[TextBlock(text="A context window is...", type="text")],
  mannequin="claude-sonnet-5",
  stop_reason='end_turn',
  stop_sequence=None,
  utilization=Utilization(input_tokens=19, output_tokens=42)
)

A number of fields right here matter greater than they first seem. stop_reason tells you why Claude stopped producing. end_turn means Claude completed by itself phrases. When you see max_tokens, the response was reduce off by your restrict, and you could want to boost it or rethink the immediate.

The utilization discipline tracks each enter and output tokens for the request. That is how Anthropic calculates billing, and it is also the way you detect when a immediate is creeping too near the mannequin’s context restrict. content material is a listing — in customary textual content responses it all the time has one merchandise, a TextBlock — so response.content material[0].textual content is the idiomatic technique to pull the textual content out.

# Utilizing System Prompts

A system immediate helps you to give Claude a persistent function, set constraints, or present context that ought to apply throughout all the dialog. You move it as a top-level system parameter — separate from the messages record, not as a message itself.

Right here we configure Claude to behave as a code reviewer who solely responds in Python and avoids basic explanations:

import anthropic

shopper = anthropic.Anthropic()

response = shopper.messages.create(
    mannequin="claude-sonnet-5",
    max_tokens=512,
    system=(
        "You're a Python code reviewer. "
        "Reply solely with corrected or improved Python code. "
        "Don't clarify adjustments until the consumer explicitly asks."
    ),
    messages=[
        {
            "role": "user",
            "content": (
                "def get_user(id):n"
                "    db = connect()n"
                "    return db.query('SELECT * FROM users WHERE id=' + id)"
            )
        }
    ]
)

print(response.content material[0].textual content)

The system immediate sits above the dialog in Claude’s context. It carries the identical authority all through all turns, so function directions, formatting guidelines, and area constraints you set right here persist with out you repeating them in each message.

# Streaming Responses

For requests the place Claude could take a couple of seconds to reply, streaming helps you to show textual content because it arrives as an alternative of ready for the total response. The SDK exposes this via shopper.messages.stream(), used as a context supervisor.

The text_stream iterator yields particular person textual content chunks in actual time. Every chunk is a string fragment, not a full sentence. You move finish="" and flush=True to print() so output seems repeatedly quite than buffering:

import anthropic

shopper = anthropic.Anthropic()

with shopper.messages.stream(
    mannequin="claude-sonnet-5",
    max_tokens=512,
    messages=[
        {
            "role": "user",
            "content": "Walk me through what happens when a Python list grows beyond its initial capacity."
        }
    ]
) as stream:
    for chunk in stream.text_stream:
        print(chunk, finish="", flush=True)

print()  # newline after stream ends

The context supervisor ensures the HTTP connection is closed cleanly when the block exits, even when an exception is raised mid-stream. When you want the whole Message object after streaming — together with token utilization counts — name stream.get_final_message() earlier than the block closes.

Pattern output:

Python lists are dynamic arrays. While you append a component and the record has no
room, Python allocates a brand new, bigger block of reminiscence — usually 1.125x the present
dimension — copies all current components into it, and releases the previous block. This
operation is O(n) within the worst case, however as a result of it occurs occasionally relative to
the variety of appends, the amortized value per append stays O(1). You may pre-allocate
capability with a listing comprehension or by passing an iterable to the record constructor
if  the ultimate dimension upfront.

# Subsequent Steps

You now have the core constructing blocks: requests, structured responses, system prompts, and streaming.

Subsequent, you’ll be able to find out about error dealing with, token utilization, and multi-turn conversations. As a result of the API is stateless, you must ship the dialog historical past with every request. The SDK documentation exhibits the really helpful method.

The API reference additionally consists of options like structured outputs and device use. Blissful exploring!

Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, information science, and content material creation. Her areas of curiosity and experience embrace DevOps, information science, and pure language processing. She enjoys studying, writing, coding, and low! At present, she’s engaged on studying and sharing her information with the developer group by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates partaking useful resource overviews and coding tutorials.