Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
261 changes: 261 additions & 0 deletions docs/sessions-vs-sessionless-decision.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
# **Decision Document: To App Session, or Not to App Session?**

## **Background**

### **Sessions in MCP**

Currently, the Model Context Protocol (MCP) utilizes sessions to manage
client-server connections, but this concept blurs the line between two very
distinct use cases:

* **Transport-Level Use Cases:** Using sessions to track protocol versioning,
capability negotiation (e.g., does this server support sampling?).
* **Application-Level Use Cases:** Using sessions to track logical state, such
as a specific user context, a continuous conversation thread, or stateful tool
operations.

### **Why Sessions Need to Change**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect (but correct me if I'm wrong) that sessions were originally added as a way of essentially shoehorning the existing stdio transport into HTTP. If so, it may be worth pointing out as background information that the mechanism just flat-out doesn't actually work as originally intended, because the idea was that all requests on a session would go to the same server instance, and that can't actually be guaranteed with HTTP (except perhaps in certain special cases that are far from the norm).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is quite correct. It was a very intentional decision for MCP to be a stateful protocol and the session ID was supposed to be the key for that state. The deployment difficulty was accepted as a trade-off for making stateful interactions more natural. Whether or not that was the right trade-off (or necessary trade-off) is another story.

See modelcontextprotocol/modelcontextprotocol#102 for some background (note: justin was one of the co-creators in MCP).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that it was designed as a stateful protocol, but wasn't it initially designed for stdio, where that statefulness is a more natural fit? Or was the HTTP transport part of the initial design?


The current implementation of sessions is highly ambiguous, leading to the
following problems/issues:

* **Inconsistent Lifecycles:** Some clients create a new session for *every
single tool call*, others create one per *conversation*, some use one for *all
conversations*, and others manage them without any clear boundaries. There are
no strict guarantees around what sessions provide or how long they persist.
* **Transport Divergences:** On STDIO, sessions are implicitly tied to the
process lifecycle. On HTTP, sessions are optional, and their absence could
mean the server is stateless *or* that the previous state was simply lost.
* **Coupling of State to Connection:** Application state, client capabilities,
and protocol versions are heavily coupled to the connection. This leads to
operational hazards like the "Rolling Upgrade" problem (where updating a
load-balanced server drops the connection and wipes the user's logical state)
and multiplexing failures.

### **Resolving Transport-Level Sessions**

To resolve the ambiguities at the transport level, the working group is moving
toward a stateless transport architecture. The need for transport-level sessions
is being removed via two core SEPs:

* [**SEP-1442**](https://github.com/modelcontextprotocol/modelcontextprotocol/issues/1442)**:**
Moves all data to a "per request" basis, eliminating the need to store
transport state between calls.
* [**SEP-2322**](https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2322)**:**
Allows for elicitation and sampling requests without relying on sessions to
align them.

## **The Problem Statement**

With transport-level sessions resolved by SEP-1442 and SEP-2322, we are left
with a critical architectural decision regarding **Application-Level Sessions**.

The open question the working group must decide on is: **"Should the protocol
support application sessions (or not)?"**

Developers building agents and tools frequently need a way to track logical
state across multiple turns of a conversation. However, it is currently unclear
whose responsibility it is to maintain that state. We must decide whether to
standardize a formal session concept at the data/application layer, or
completely remove the concept of sessions from the protocol and push the
responsibility of tracking state references entirely to the client via explicit
state handles.

## **Use Cases for Application Sessions**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need to take a step back and start by coming to consensus on what set of use-cases we actually need to support. These four examples are interesting as hypotheticals, but if no one is actually doing any of this today, then the examples aren't useful.

I think that in order to support the complexity of sessions in the protocol, we should require evidence of concrete real-world use-cases that are common enough to justify the complexity we'd be taking on.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's a fair callout. Sessions today don't work. As described above, the client and server can't agree on them, which makes implementing something like this impossible.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that they don't work today in the general case, in the sense that you can't use any arbitrary client with any arbitrary server and expect it to work properly. But I think we've heard that there are specific cases where people control both the clients and servers and are leveraging sessions for some functionality where those particular clients and servers do agree on things like the scope of the session. It seems like it would be useful to get a list of those use-cases along with some idea of how many people are using them.

But even setting aside examples of what people are actually doing today, I think there's still an important element of what things people want to do. We currently have no way of knowing how many people will actually want to do anything like these hypothetical use-cases. If no one (or very few people) want to implement one of these use-cases, then it's not worth the complexity of supporting it, and we should stop considering it.

My overall point here is that before we commit to supporting a given use-case, we should first have confidence that enough people will actually use it to make it worth supporting it. Right now, I don't see a strong signal of that -- we've had a lot of hypothetical discussion, but very little real-world input on use-cases.

In the absence of concrete use-cases that we have evidence that enough people are interested in, I would lean heavily toward option B.

Copy link
Copy Markdown

@jeffyaw jeffyaw Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real-world context, we built an MCP proxy and hosting platform and the servers that developers are most interested in building are inherently stateful: database connections with cursors, multi-step deployment workflows, auth flows that unlock additional tools. The simple stateless servers (search, lookup, linting) are less inclined to need this. The decision about sessions disproportionately affects the most likely to actually need to run remote, the mcp servers people will actually build businesses around.


To understand the need for application-level state, we can look at a few
progressive examples of how tools currently rely on state across multiple turns
of a conversation. Below are the **solution-neutral logical flows** for these
interactions.

### **Simple Counter / Accumulator**

The most basic form of application state is a simple accumulator where a server
remembers previous interactions. For example, a `count()` tool that increments
every time it is called by the same user or within the same conversational
thread.

**Logical Flow:**

```
User: "Start counting"
tool/call: count() -> Returns 0
tool/call: count() -> Returns 1
tool/call: count() -> Returns 2
```

### **E-Commerce / Shopping Cart**

In more complex workflows, tools require a specific sequence of operations where
state is built up over time. An e-commerce agent, for instance, needs to add
multiple items to a shopping cart across several distinct tool calls before
finally executing a checkout operation.

**Logical Flow:**

```
User: "I want to buy shoes and socks"
tool/call: add_item("shoes")
tool/call: add_item("socks")
tool/call: checkout() -> Processes order for [shoes, socks]
```

### **Progressive Discovery of Tools**

Some tools are deliberately hidden to prevent overwhelming the LLM's context
window or to enforce security gates. State allows an agent to call a tool that
"unlocks" deeper capabilities or different toolsets dynamically for the
remainder of the interaction.

**Logical Flow:**

```
User: "Query the production database"
tool/call: list_tools() -> Returns: [connect]
tool/call: execute_query("SELECT 1") -> ERROR: Tool not found
tool/call: connect($DATABASE_URI) -> Success
tool/call: list_tools() -> Returns: [execute_query, list_tables, ...]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate from this, but what would be the trigger for the client to call list_tools() again here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to point this out as well. This is going to be a problem, because IIUC the SDKs handle caching the tool list and won't refresh until they have some indication that the tool list has changed. While it's true that we are going to retain the ability for the client to subscribe to tool-list-changed notifications as an optional optimization, there won't be a way to tie that notification to the client that called connect($DATABASE_URI) once we remove sessions. So clients will not actually see the new tool list until the TTL expires.

I think this would be a problem only if we go with option A below. And under that option, if we do decide that this particular use-case is important, there are other ways we could consider handling it. For example, we could put a bit in the tool call response that tells the client to invalidate its tool list cache.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've previously discussed a combination of things like TTLs, notification-style changes (e.g. returning a indicator on a tool result that info needs to change). I agree there's future work here, but I think it's solvable (and likely has some value outside of these specific use-cases)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an advantage of handles, because then you don't need to hide the execute_query tool. If only the connect tool is given, then the model may not believe that using it will help it achieve it's goals without the other tools being present in the context to use that connection as well.

tool/call: list_tables() -> Success
```

### **Object Creation / Multi-step Provisioning**

When creating complex resources, an agent may need to iteratively build a
configuration before executing the final creation step. This requires holding a
"draft" state across multiple actions.

**Logical Flow:**

```
User: "Provision a new VM with 16GB RAM"
tool/call: init_vm("web-server") -> Success
tool/call: set_ram(16) -> Success
tool/call: set_cpu(4) -> Success
tool/call: deploy_vm() -> Success
```

## **Proposed Solutions**

The working group is split between two primary proposals for handling these
application-level use cases.

### **Option A: Adding Data Layer Sessions**

**Link:** [Transports-WG PR
\#20](https://github.com/modelcontextprotocol/transports-wg/pull/20)

**Outline:** Instead of relying on the transport layer (HTTP/STDIO) to manage
sessions, this proposal introduces explicit, standardized session constructs at
the *Data Layer*. The protocol would explicitly define how to initialize,
maintain, and terminate application contexts independent of the underlying
transport connection.

**Implementation Example (Implicit Tool State):** With data layer sessions, the
session ID is negotiated once and passed in the protocol envelope. Because the
server inherently knows which session is making the request, the tool signatures
themselves remain clean and rely on implicit state.

```
# The session is explicitly negotiated at the data layer
session_create(context="user_123")

# 1. Simple Counter
count() # Returns 0
count() # Returns 1

# 2. Shopping Cart
add_item("shoes") # Adds to the session's cart
add_item("socks")
checkout()

# 3. Capability Unlocking (Database)
list_tools() # Returns: [connect]
connect($DATABASE_URI) # State mutates silently for this session
list_tools() # Returns: [connect, execute_query, list_tables, ...]

# 4. Object Creation (VM Provisioning)
init_vm("web-server") # Context stored in session
set_ram(16)
set_cpu(4)
deploy_vm()
```

#### Advantages:

* **Lower implementation burden:** Removing the need for the agent to manage
state, meaning accuracy is programmatic vs deterministic.

#### Disadvantages:

* **Protocol Complexity:** Retains the concept of state within the protocol
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another important aspect of this is that this option requires us to define how each protocol mechanism interacts with sessions, and that will increase complexity in both the client and the server. For example, in the "capability unlocking" example, we'd need to define the tool list as being a function of the session, and both clients and servers would need to understand that.

definition, requiring servers to manage state lifecycles, TTLs (Time to Live),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: manage state lifecycles, TTLs (Time to Live), and garbage collection. is a function of whether the server wants to maintain state so is the same in both proposals. In Option A it's linked to the session id if the server chooses to support sessions. In Option B it's just linked to the handle.

The bigger issue with this option is the extra complexity of the session handshake and when it should happen.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to capture the complexity that @markdroth mentioned here -- essentially protocol levels TTLs and changes between interactions of requests in a session -- but didn't quite nail it.

and garbage collection.

#### Open Questions

As we weigh the advantages and disadvantages of both proposals, multi-agent
orchestration presents a significant unresolved challenge:
Comment on lines +196 to +197
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this be after both proposals?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it here because it seemed like the open questions only applied to to the Option A


* **Sub-Agent Orchestration:** It is currently unclear how sessions or state
Copy link
Copy Markdown

@gjz22 gjz22 Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just up to the client application to determine? It can either use the session from the parent or isolate depending on its use case.

In that way it is somewhat similar to option B where option B could initialize the subagent with some context including required handles.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answer to that is yes, but in this case the session data is handled by the application and not the agent. So it's likely going to need to be standardized in some fashion -- if half the servers assume that a subagent should use a new session and have use the existing sessions, server's will have a hard time matching both.

should be handled when a primary agent delegates work to a sub-agent. Should
a sub-agent instantiate a completely **new session**? Should it receive a
**copy/fork** of the parent agent's session so it has the necessary context
but cannot mutate the parent's state? Or should they share the exact same
`session_id` (which risks the sub-agent polluting the parent's context with
unintended operations)?

### **Option B: Sessionless MCP via Explicit State Handles**

**Link:** [Transports-WG PR \#25](https://github.com/modelcontextprotocol/transports-wg/pull/25)

**Outline:** Do not add sessions to the protocol at all. Instead, encourage a
completely stateless protocol where servers return "explicit state handles"
(e.g., tokens, cursors, or context IDs) in their tool responses. The client is
strictly responsible for storing this handle and passing it back as an argument
in subsequent related requests.

**Implementation Example (Explicit State Handles):** With explicit handles,
there are no sessions. The client explicitly requests a state handle (like a
basket ID or access token) and must manually inject it back into subsequent tool
arguments.

```
# 1. Simple Counter
counter = create_counter() # returns { "counter_id": "cnt_123" }
count(counter) # Returns 0
count(counter) # Returns 1

# 2. Shopping Cart
basket = create_basket() # returns { "basket_id": "bsk_a1b2c3" }
add_item(basket, "shoes")
add_item(basket, "socks")
checkout(basket)

# 3. Capability Unlocking (Database)
db = connect($DATABASE_URI) # returns { "connection_id": "db_prod_1" }
list_tables(db)
execute_sql(db)

# 4. Object Creation (VM Provisioning)
vm = init_vm("web-server") # returns { "draft_id": "vm_draft_99" }
set_ram(vm, 16)
set_cpu(vm, 4)
deploy_vm(vm)
```

**Advantages:**
Copy link
Copy Markdown
Collaborator

@pja-ant pja-ant Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Advantages:**
#### Advantages:


* **Minimal Complexity:** Protocol complexity is minimal, requiring no
additional changes.
Comment on lines +248 to +249
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean we're pushing all of the responsibility for creating and maintaining "handles" and their associated state onto the developer?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the server developer, yes. If you were, say, building a session-backed shopping cart, then today you'd have tools:

add_item(item_id)
checkout()

and in the proposal you'd have

create_basket() -> basket_id
add_item(basket_id, item_id)
checkout(basket_id)

Before, the server developer would have to map Mcp-Session-Id to the basket state. After, the map is keyed on basket_id.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We built an MCP hosting and proxy platform and this is the exact problem we've been designing around. The stateful servers (database connections, auth flows, multi-step provisioning) are the ones that need the most infrastructure support. Option B means every server author independently implements handle management with varying conventions: inconsistent handle formats, no standard TTL behavior, and no way for infrastructure to provide session management transparently. A standardized session primitive lets the platform layer handle this once instead of every server reinventing it.


**Disadvantages:**
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Disadvantages:**
#### Disadvantages:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main downside I see of this is that it adds a burden to the client's orchestration to be able to pass the correct handles back and forth. I'm not saying this is overriding but just to make sure we understand this, it would mean one of two things on the client side:

  1. The model would need to be given the handles in its context when making the decision to call a tool in such a way that it would correctly pass the handles back as parameters.
  2. The client would need to maintain hooks that intercept / augment tool calls to ensure handles are passed correctly. Let's take the example of create_basket() and assume the client only would want to create one basket per session and was worried (1) above would be flaky. For the client's definition of session length, every time create_basket() is called, it might return the original response to that tool call rather than calling the underlying MCP.

The advantage of a session id is it would remove the need for (1) and (2) and would do so in a general way. The counterargument to this is that as we expect models and architectures to supply the right context to improve (1) above should become less and less of an issue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One disadvantage that we may want to call out here is that this approach would work for tools only, not (e.g.) resources. In theory, there could be cases where reading a particular resource mutates session state, and that wouldn't be covered by this approach.

That having been said, I'm not sure this is actually a real case. See my comment above about stepping back to agree on use-cases.


* **Breaking Change:** If applications are using sessions today for application
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing to consider in this vein is that today, clients likely have the infrastructure to maintain session ids for the MCP servers they are interacting with in some form. If we remove sessions completely, it will be harder to add back later, because those clients likely will have removed that infrastructure.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that because the semantics of sessions are changing, we'd probably need a breaking API change in the SDKs in either option, so I'm not sure this is really a disadvantage of option B specifically.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If today there are use cases that use sessions, those tools would need to be re-written to use explicit state handles instead. I think that the use-cases are limited to use-cases where clients and servers are both under control of the developer, but I don't think that means they don't exist.

The feedback from the SDK owners has been "lots of people are asking about this" including things for resuming sessions and associating with the session_id.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In option A, I think any existing tools using the current session mechanism would need to be changed to use the new mechanism anyway, because the semantics of the current session mechanism are not well defined, so we can't guarantee that things will continue to work right for existing code anyway. I believe you were making this same argument in 1442 in the context of removing initialize and adding sessions/create: the existing mechanism isn't well defined and therefore not useful, and we don't want to surprise people with behavior changes.

state, it would require developers to update their tools and clients to use
explicit state handles instead.

## **Decision**

This decision was discussed at the [Core Maintainer's Meeting on
4/1](https://github.com/modelcontextprotocol/modelcontextprotocol/discussions/2536)
and the decision was made to move forward with the "no-session" proposal.