Clarified the text in several places.
Cleaned up text and examples, and added material about the HTTP bindings (currently only BOSH, with WebSocket to be added in a future revision).
Initial published version, incorporating improvements based on list discussion and removing the concept of stream management tickets.
Rough draft.
Establishing an XMPP session can require a fairly large number of round trips between the initiating entity and the receiving entity. However, in many deployment scenarios it would be helpful to reduce the number of round trips and therefore the time needed to establish a session. This document describes protocol optimizations and best practices to do just that.
In accordance with RFC 6120
XMPP applications SHOULD cache whatever information they can about the peer, especially stream features data and Service Discovery (XEP-0030)
XMPP clients SHOULD cache roster information, and servers SHOULD make such caching possible, using Roster Versioning (XEP-0237)
The primary method of speeding the connection process is pipelining of requests, along the lines of RFC 2920
In essence, pipelining relies on two assumptions:
Together, these assumptions enable the parties to reduce the number of round trips needed to complete the stream negotiation process by "pipelining" XMPP-related commands over the stream.
Note well that pipelining at the XMPP layer is not to be confused with HTTP pipelining, which was added to HTTP in version 1.1 and which is not encouraged when using the HTTP bindings for XMPP.
If an XMPP server supports pipelining, it MUST advertise a stream feature of <pipelining xmlns='urn:xmpp:features:pipelining'/>.
As noted, a server SHOULD also include its entity capabilities data in stream features as shown in Section 6.3 of XEP-0115.
If both parties support pipelining, they can proceed as follows over the TCP binding (the examples use the XML from Section 9.1 of RFC 6120 for the client-server stream establishment, but the same principles apply to server-to-server streams).
In the client-to-server half of the first exchange, the client assumes that the server supports the XMPP STARTTLS extension so it pipelines its initial stream header, the <starttls/> command, and the TLS ClientHello message.
In the server-to-client half of the first exchange, the server pipelines its response stream header, stream features advertisement, STARTTLS <proceed/> response, and TLS ServerHello messages (which might include ServerHello, Certificate, ServerKeyExchange, CertificateRequest, and ServerHelloDone -- see RFC 5246
Without pipelining, the foregoing exchange would require 3 round trips; with pipelining it requires 1 round trip.
Now the parties complete the TLS negotiation (i.e., some combination of the TLS messages specified in RFC 5246); for our purposes we don't count these round trips because they are the same no matter whether we use pipelining or not.
At the end of the TLS negotiation, the server knows that the client will need to restart the stream so it proactively attaches its response stream header and stream features in the same TCP packet at the TLS Finished message, thus starting the next exchange.
In response, the client pipelines its initial stream header with the command for initiating the SASL authentication process (including, if appropriate for the SASL mechanism used, the "initial response" data as explained in Section 6.3.10 of RFC 6120).
Without pipelining, the second exchange would require another 2 round trips; with pipelining it requires only 1.
At this point the client and server might exchange multiple SASL-related messages, depending on the SASL mechanism in use. Because this specification does not attempt to reduce the number of round trips involved in the challenge-response sequence, we do not describe these exchanges here.
When the client suspects that it is sending its final SASL response, with pipelining it appends an initial stream header and resource binding request.
The server then informs the client of SASL success (including "additional data with success" as explained in Section 6.3.10 of RFC 6120), sends a response stream header and stream features, and informs the client of successful resource binding.
Without pipelining, this exchange would require another 3 round trips; with pipelining it requires only 1.
Therefore, without pipelining the XMPP exchanges for stream establishment require at least 6 round trips (and perhaps more depending on the SASL mechanism used); with pipelining the minimum number of round trips is 3.
Naturally, for typical client-to-server sessions, additional round trips are needed so that the client can gather service discovery information, retrieve the roster, etc. As noted, these steps can be reduced or eliminated by using entity capabilities and roster versioning.
In the HTTP bindings (BOSH and WebSocket) channel encryption occurs at the HTTP layer and therefore the first exchange shown above for the TCP binding is not used.
For now, this section focuses on BOSH. A future version of this document will discuss WebSocket (once draft-moffitt-xmpp-over-websocket has been updated to include examples).
When pipelining is used, a BOSH client can include its XMPP authentication (SASL) request in the BOSH session creation request, as shown in the following example.