XEP-0301: In-Band Real Time Text

WARNING: This Standards-Track document is Experimental. Publication as an XMPP Extension Protocol does not imply approval of this proposal by the XMPP Standards Foundation. Implementation of the protocol described herein is encouraged in exploratory implementations, but production systems are advised to carefully consider whether it is appropriate to deploy implementations of this protocol before it advances to a status of Draft.

1. Introduction
2. Requirements
    2.1. Fluid Real-Time Text
    2.2. In-Band Transmission
    2.3. Flexible and Interoperable
    2.4. Accessible
3. Glossary
4. Protocol
    4.1. RTT Element
    4.2. RTT Attributes
       4.2.1. seq
       4.2.2. event
       4.2.3. id
    4.3. Body Element
       4.3.1. Backwards Compatible
    4.4. Transmission Interval
    4.5. Real-Time Text Actions
       4.5.1. Action Elements
       4.5.2. Summary of Attribute Values
       4.5.3. List of Action Elements
         4.5.3.1. Element <t/> – Insert Text
         4.5.3.2. Element <e/> – Backspace
         4.5.3.3. Element <d/> – Forward Delete
         4.5.3.4. Element <w/> – Wait Interval
       4.5.4. Accurate Processing of Action Elements
         4.5.4.1. Guidelines for Senders
         4.5.4.2. Guidelines for Recipients
         4.5.4.3. Unicode Character Counting
    4.6. Keeping Real-Time Text Synchronized
       4.6.1. Staying In Sync
       4.6.2. Recovery From Loss of Sync
       4.6.3. Message Reset
5. Determining Support
6. Implementation Notes
    6.1. Text Presentation
       6.1.1. Avoid Bursty Text Presentation
       6.1.2. Preserving Key Press Intervals
       6.1.3. Time Critical and Low Latency Methods
       6.1.4. Low-Bandwidth and Low-Precision Text Smoothing
    6.2. Activating and Deactivating Real-Time Text
       6.2.1. Activation Methods
       6.2.2. Deactivation Methods
    6.3. Optional Remote Cursor
       6.3.1. Calculating Cursor Position
    6.4. Sending Real-Time Text
       6.4.1. Monitoring Message Changes Instead Of Key Presses
       6.4.2. Monitoring Key Presses Directly
       6.4.3. Basic Real-Time Text
       6.4.4. Append-Only Real-Time Text
    6.5. Receiving Real-Time Text
    6.6. Other Guidelines
       6.6.1. Message Length
       6.6.2. Usage with Chat States
       6.6.3. Usage with Multi-User Chat and Simultaneous Logins
         6.6.3.1. Multi-User Chat
         6.6.3.2. Simultaneous Logins
       6.6.4. Stale Messages
       6.6.5. Performance & Efficiency
       6.6.6. Total Conversation – Combination with Audio and Video
7. Use Cases
    7.1. Example of Simple Real Time Text
    7.2. Example of Multiple Messages
    7.3. Examples of Message Edits
       7.3.1. Deleting Text From Message
       7.3.2. Inserting Text Into Message
       7.3.3. Deleting and Replacing Text In Message
       7.3.4. Multiple Message Edits
    7.4. Examples of Key Press Intervals
       7.4.1. Comparison With and Without Intervals
       7.4.2. Full Message Including Key Press Intervals
8. Interoperability Considerations
    8.1. RFC 4103 and T.140
9. Internationalization Considerations
10. Security Considerations
    10.1. Privacy
    10.2. Encryption
    10.3. Congestion Considerations
11. IANA Considerations
12. XMPP Registrar Considerations
    12.1. Protocol Namespaces
    12.2. Namespace Versioning
13. XML Schema
14. Acknowledgments

Appendices
    A: Document Information
    B: Author Information
    C: Legal Notices
    D: Relation to XMPP
    E: Discussion Venue
    F: Requirements Conformance
    G: Notes
    H: Revision History

1. Introduction

This document defines a specification for real-time text transmitted in-band over an XMPP network.

Real-time text is text transmitted instantly while it is being typed or created. The recipient can immediately read the sender's text as it is written, without waiting. Text can be used conversationally, similar to a telephone conversation, where one listens while the other is speaking. It eliminates waiting times found in messaging, and is favored by deaf and hard of hearing individuals who prefer text conversation. For a visual animation of real-time text, see RealJabber.org [1].

Real-time text is suitable for smooth and rapid mainstream communication in text, as an all-inclusive technology to complement instant messaging. It can also allow immediate conversation in situations where speech cannot be used (e.g. quiet environments, privacy, deaf and hard of hearing). Real-time text is also beneficial in emergency situations, due to its immediacy.

2. Requirements

2.1 Fluid Real-Time Text

Allow transmission of real-time text with a low latency.
Balance low latencies versus system, network and server limitations.
Support message editing in real-time, including text insertions and deletions.
Support transmission and reproduction of the original intervals between key presses, to preserve look-and-feel of typing independently of transmission intervals.

2.2 In-Band Transmission

Reliable real-time text delivery.
Be backwards compatible with XMPP clients that do not support real-time text.
Minimize reliance on network traversal mechanisms and/or out-of-band transmission protocols.
Compatible with multi-user chat (MUC) and simultaneous logins.

2.3 Flexible and Interoperable

Allow use within existing instant-messaging user interfaces, with minimal UI modifications.
Allow alternate optional presentations of real-time text, including split screen and/or other layouts.
Protocol design ensures integrity of real-time text, and allows extensions for new features.
Be interoperable with other real-time text protocols via gateways, including RFC 4103 and ITU-T T.140.

2.4 Accessible

Allow XMPP to follow the ITU-T Rec. F.703 [6] Total Conversation standard for simultaneous voice, video, and real-time text.
Be a candidate technology for use with Next Generation 9-1-1 / 1-1-2 emergency services.
Be suitable for transcription services and (when coupled with voice at user's choice) for TTY/text telephone alternatives, relay services, and captioned telephone systems.
Be an accessible enhancement for mobile phone text messaging and mainstream instant messaging.

3. Glossary

real-time – A conversational latency of less than 1 second, as defined by ITU-T Rec. F.700 [7], section 2.1.2.1.

real-time text – Text transmitted instantly while it is being typed or created.

real-time message – Recipient's real-time view of the sender's message still being typed or created.

action element – An XML element that represents a single real-time message edit, such as text insertion or deletion.

4. Protocol

4.1 RTT Element

Real-time text is transmitted via an <rtt/> child element of a <message/> stanza. The <rtt/> element is transmitted at regular intervals by the sender client while a message is being composed, to allow the recipient to see the sender type the message, without waiting for the full message sent in a <body/> element.

This is a basic example of a real-time message "Hello, my Juliet!” transmitted in real-time while it is being typed, before a final message delivery in a <body/> element (to remain Backwards Compatible):

Example 1: Introductory Example

<message to='juliet@capulet.lit' from='romeo@montague.lit/orchard' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello, </t>
  </rtt>
</message>

<message to='juliet@capulet.lit' from='romeo@montague.lit/orchard' type='chat' id='a02'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='1'>
    <t>my </t>
  </rtt>
</message>

<message to='juliet@capulet.lit' from='romeo@montague.lit/orchard' type='chat' id='a03'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='2'>
    <t>Juliet!</t>
  </rtt>
</message>

<message to='juliet@capulet.lit' from='romeo@montague.lit/orchard' type='chat' id='a04'>
  <body>Hello, my Juliet!</body>
</message>

The <rtt/> element contains a series of one or more child elements called action elements that represent Real-Time Text Actions such as text being appended, inserted, or deleted. Example 1 illustrates only the <t/> action element, which appends text to the end of a message.

Transmission of the <rtt/> element occurs at a regular Transmission Interval whenever the sender is actively composing a message. If there are no changes to the message since the last transmission, no transmission occurs.

There MUST NOT be more than one <rtt/> element per <message/> stanza.

The namespace of the <rtt/> element is “urn:xmpp:rtt:0”.

4.2 RTT Attributes

4.2.1 seq

This REQUIRED attribute is a counter to maintain the integrity of real-time text. (The bounds of seq is 31-bits, the range of positive values of a signed integer.)

Senders MUST increment the seq attribute by 1 in each subsequent <rtt/> transmitted without an event attribute. When an <rtt/> element has an event attribute, senders MAY instead use any value as the new starting value for seq. A random starting seq value is RECOMMENDED for best integrity during Usage with Multi-User Chat and Simultaneous Logins. Senders MAY limit the size of the new starting seq value, to keep <rtt/> compact while allowing plenty of incrementing room without overflow.

Recipients MUST monitor the seq value to verify the integrity of real-time text. See Keeping Real-Time Text Synchronized.

4.2.2 event

This attribute signals events for real-time text, such as the start of a new real-time message. The event attribute MAY be omitted from the <rtt/> element during regular real-time text transmission. Recipients MUST ignore <rtt/> containing unsupported event values.

Table 1:

event	Description	Sender Support	Recipient Support
new	Begin a new real-time message.	REQUIRED	REQUIRED
reset	Reset the current real-time message.	RECOMMENDED	REQUIRED
init	Initiate a real-time text session.	OPTIONAL	OPTIONAL
cancel	End a real-time text session.	OPTIONAL	OPTIONAL

event='new'
Senders MUST use this value when transmitting the first <rtt/> element containing Action Elements (i.e. the first character(s) of a new message). Recipient clients MUST initialize a new real-time message for display, and then process action elements within the <rtt/> element. If a real-time message already exists in the same chat session, its content MUST be replaced (i.e. cleared prior to processing action elements). Senders MAY send subsequent <rtt/> elements that do not contain an event attribute.

event='reset'
Recipients MUST treat 'reset' the same as 'new'. Senders MUST use 'new' only when the sender has started composing a new message, and use 'reset' when re-transmitting a real-time message. See Message Reset, used for Keeping Real-Time Text Synchronized and Basic Real-Time Text.

event='init'
Clients MAY use this value to signal the other end that real-time text is being activated. If used, this <rtt/> element MUST be empty with no action elements. See Activating and Deactivating Real-Time Text.

event='cancel'
Clients MAY use this value to signal the other end to stop transmitting real-time text. If used, this <rtt/> element MUST be empty with no action elements. Recipients SHOULD discontinue sending back <rtt/> elements for the remainder of the same chat session (or unless 'init' is used again). See Activating and Deactivating Real-Time Text.

4.2.3 id

This OPTIONAL attribute is used only if Last Message Correction [8] (XEP-0308) is implemented. Sender clients MAY use this attribute to allow recipient clients to have improved presentation of real-time text during message correction (e.g. shown as in-place editing of previous message).

This id attribute refers to the <message/> stanza containing the <body/> that is being edited (See 'Business Rules' in XEP-0308). If used at all, then id MUST be included in all <rtt/> elements transmitted during message correction of the previous message. When switching messages being edited (i.e. editing the current message versus editing the previous message), the first <rtt/> element MUST contain an event attribute value, such as 'reset' (See Message Reset).

4.3 Body Element

The real-time message is considered complete upon receipt of a <body/> element in a message stanza. The delivered message in the <body/> element is displayed instead of the real-time message. In the ideal case, the message from <body/> is redundant since this delivered message is identical to the final contents of the real-time message.

Senders MUST include an event attribute in the next <rtt/> element that is transmitted after a message stanza containing a <body/> element.

4.3.1 Backwards Compatible

The real-time text standard simply provides early delivery of text before the <body/> element. The <body/> element continues to follow the XMPP Core [9] specification. In particular, XMPP implementations need to ignore XML elements they do not understand. Clients, that do not support real-time text, will continue to behave normally, displaying complete lines of messages as they are delivered.

4.4 Transmission Interval

For the best balance between interoperability and usability, the default transmission interval of <rtt/> elements for a continuously-changing message SHOULD be approximately 0.7 second. This interval meets ITU-T Rec. F.700 Section A.3.2.1 for good quality real-time conversation. If a different transmission interval needs to be used, the interval SHOULD be between 0.3 second and 1 second.

A longer interval will lead to a less optimal user experience in most network conditions. Conversely, a much shorter interval may more frequently trigger throttling or flooding protection algorithms in public XMPP servers, leading to dropped <message/> elements and/or Congestion Considerations.

To provide fluid real-time text, one or more of the following methods can be used:

Preserving Key Press Intervals for natural typing display, independently of the transmission interval.
Use of Time Critical and Low Latency Methods, for real-time captioning/transcription.
For other options or reduced-precision options, see Low-Bandwidth and Low-Precision Text Smoothing.

4.5 Real-Time Text Actions

The <rtt/> element MAY contain one or more action elements representing real-time text operations, including text being appended, inserted, or deleted.

Many chat clients allow a sender to edit their message before sending (via a Send button, or pressing Enter). The inclusion of real-time text functionality, in existing client software, needs to preserve the sender's existing expectation of being able to edit their messages. In a chat session with real-time text, the recipient can watch the sender compose and edit their message before it is completed.

4.5.1 Action Elements

This is a short summary of action elements that operate on a real-time message. For detailed information, see List of Action Elements.

Table 2:

Action	Element	Description	Sender Support	Recipient Support
Insert Text	<t p='#'>text</t>	Insert specified text at position p in message.	REQUIRED	REQUIRED
Backspace	<e p='#' n='#'/>	Remove n characters before position p in message.	RECOMMENDED	REQUIRED
Forward Delete	<d p='#' n='#'/>	Remove n characters starting at position p in message.	RECOMMENDED	REQUIRED
Wait Interval	<w n='#'/>	Wait n thousandths of a second.	RECOMMENDED	RECOMMENDED

4.5.2 Summary of Attribute Values

The n attribute is a length value.
If n is omitted, the default value of n MUST be 1.
The p attribute is an absolute position value, as a 0-based index (0 represents beginning of message).
If p is omitted, the default value of p MUST be the current message length (p defaults to end of message).
For text modifications, length and position (n and p) is based on Unicode Character Counting.
Also see Accurate Processing of Action Elements.
Senders MUST NOT use negative values for any attribute, nor use p values beyond the current message length. However, recipients receiving such values MUST clip negative values to 0, and clip excessively high p values to the current length of the real-time message. Modifications only occur within the boundaries of the current real-time message, and not in other delivered messages.

4.5.3 List of Action Elements

Recipients MUST be able to process all <t/>, <e/> and <d/> action elements for incoming <rtt/> transmissions, even if senders do not use all of these for outgoing <rtt/> transmissions (e.g. Basic Real-Time Text). Support for <w/> is RECOMMENDED for both senders and recipients, in order to accommodate Preserving Key Press Intervals. Recipients MUST ignore unexpected or unsupported elements within <rtt/>, while continuing to process subsequent action elements (Compatibility is ensured via Namespace Versioning). Action elements are immediate child elements of the <rtt/> element, and are never nested. See examples in Use Cases.

4.5.3.1 Element <t/> – Insert Text

Support the transmission of text, including key presses, and text block inserts.
Note: Text can be any subset of text allowed in the <body/> element of a <message/>. If <t/> is empty, no text modification takes place.

<t>text</t>

Append specified text at the end of message. (p defaults to message length).
Note: This action element is the minimum sender support REQUIRED for Basic Real-Time Text.

<t p='#'>text</t>

Inserts specified text at position p in the message text.

4.5.3.2 Element <e/> – Backspace

Support the behavior of Backspace key presses. Text is removed towards beginning of the message.
Note: Excess backspaces MUST be ignored, with text being backspaced only to the beginning of the message in this case.

<e/>

Remove 1 character from end of message. (Both n and p at default values)

<e p='#'/>

Remove 1 character before position p in message. (n defaults to 1)

<e n='#'/>

Remove n characters from end of message. (p defaults to message length)

<e n='#' p='#'/>

Remove n characters before position p in message.

4.5.3.3 Element <d/> – Forward Delete

Support the behavior of Delete key presses, and text block deletes. Text is removed towards end of the message.
Note: Excess deletes MUST be ignored, with text being deleted only to the end of the message in this case.

<d p='#'/>

Remove 1 character beginning at position p in message. (n defaults to 1)

<d p='#' n='#'/>

Remove n characters beginning at position p in message.

4.5.3.4 Element <w/> – Wait Interval

Allow the transmission of intervals, between real-time text actions, to support the pauses between key presses. See Preserving Key Press Intervals.

<w n='#'/>

Wait n thousandths of a second before processing the next action element. This pause MAY be approximate, and not necessarily be of millisecond precision. The n value SHOULD NOT exceed the Transmission Interval. Also, if a Body Element arrives, pauses SHOULD be interrupted to prevent a delay in message delivery.

4.5.4 Accurate Processing of Action Elements

Real-time text is generated based on text normally allowed to be transmitted within the <body/> element.

Incorrectly generated action elements may lead to inconsistencies between the sender and recipient during real-time editing. The Unicode characters of the real-time text needs to make it transparently from the sender to the recipient, without further Unicode character modifications. This is the chain between the sender's creation of real-time text, to the recipient's processing of real-time text. Transparent transmission of Unicode characters is possible with sender pre-processing, as long as the transmission from the sender to the recipient remains standards-compliant, including compliant XML processors and compliant XMPP servers.

Any inconsistencies that occur during real-time message editing (i.e. non-compliant XMPP server that modifies messages) will recover during the next Message Reset, and also via Basic Real-Time Text.

4.5.4.1 Guidelines for Senders

Senders MUST generate real-time text based on the plain text version of the sender's message with all processing already completed. Processing include Unicode normalization, conversion of emoticons graphics to text, removal of illegal characters, line-break conversion, and all other Unicode character modifications. This is done concurrently to the displayed version of the same message (which may continue to have formatting, emoticon graphics, XHTML-IM [10], etc.).

For the purpose of calculating n and p values, line breaks MUST be treated as a single character, if line breaks are used within real-time text. Conversion of line breaks into a single LINE FEED U+000A is REQUIRED for XML processors, according to section 2.11 of XML [11].

4.5.4.2 Guidelines for Recipients

For recipients, p and n are calculated relative to real-time text obtained from a compliant XML processor, before any further Unicode character modifications. Recipients MUST NOT do Unicode normalization (or any other code point modifications) on their copy of the real-time message. This is to allow accurate processing of subsequent action elements (For display purposes, the recipient client can separately process/normalize a copy of the same real-time message text).

Note that Element <t/> – Insert Text is allowed to contain any subset sequence of Unicode characters from the real-time message. This may result in certain situations where the text transmitted in <t/> elements is allowed to be temporarily an incorrectly-formed Unicode string (e.g. incompletely formed glyphs, non-spacing characters, orphaned diacritic character, standalone control character including direction-change character for bidirectional Unicode) but becomes correct when inserted into the middle of the recipient's real-time message, and consequently passes recipient validation/normalization with no character modifications. Note that a compliant XML processor does not modify or fix Unicode errors caused by taking only a subset of characters from correctly-formed Unicode text. One alternative way for implementers to visualize this, is to visualize the Unicode text as an array of individual code points, and treat the p and n values accordingly.

4.5.4.3 Unicode Character Counting

For platform-independent interoperability, calculations of length and position values (p and n) MUST be based on Unicode code points. A single UTF-8 encoded character equals one code point. However, many platforms use different internal encodings (i.e. string formats) that is different from the transmission encoding (UTF-8). Consider these factors:

Multiple Unicode code points (e.g. combining marks) may combine into one displayable character.
Action elements operate on individual Unicode code points, not on displayable characters.
Unicode code points for characters U+10000 through U+10FFFF are represented as a surrogate pair in some Unicode encodings (e.g. UTF-16).
Action elements operate on individual Unicode code points, not on the separate components of a surrogate pair.
Some Unicode encodings use a variable number of bytes per Unicode code point (e.g. UTF-8).
Action elements operate on individual Unicode code points, not on individual bytes.

Incorrectly calculated length and position values (p and n) can result in inconsistencies in the real-time message, such as scrambled text. If this happens, this situation can recover during the Message Reset.

Length and position values (p and n) are relative to the internal Unicode text of the real-time message, independently of the directionality of actual displayed text. As a result, any valid Unicode text direction can be used with real-time text (right-to-left, left-to-right, and bidirectional).

4.6 Keeping Real-Time Text Synchronized

In a chat session, it is important that real-time text stays identical on both the sender and recipient ends. The loss of a single <rtt/> transmission could represent missing text or missing edits. Also, recipients can connect after the sender has already started composing a message. Recovery of in-progress real-time message via Message Reset is useful in several situations:

Resuming after connecting (e.g. wireless reception, recipient restarted software, participants joining).
Resuming after recipient discarded Stale Messages (e.g. sender resumes composing hours later).
XMPP servers may drop <message/> elements (e.g. flooding protection).
Multiple Simultaneous Logins (e.g. switching systems, switching windows, simultaneous typing).

4.6.1 Staying In Sync

For <rtt/> elements that do not contain an event attribute:

Senders MUST increment the seq attribute for consecutively transmitted <rtt/> elements.
Recipients MUST monitor the seq attribute value of received <rtt/> elements, to verify that it is incrementing.

Recipients MUST keep track of separate real-time messages per sender, including maintaining independent seq values. Recipients MAY also use additional methods to distinguish Simultaneous Logins, including using the full JID and/or <thread/>.

4.6.2 Recovery From Loss of Sync

Loss of sync occurs if the seq attribute do not increment as expected when Staying In Sync. In this case:

Recipients MUST freeze the current real-time message; and
Recipients MUST ignore action elements within the current and subsequent <rtt/> elements; and
An indication can be used to show the loss of sync (e.g. color coding, modified chat state message).

Recovery occurs when the recipient receives the following:

A <body/> element. The Body Element supersedes the real-time message.
An <rtt/> element containing an event attribute (e.g. new message, or Message Reset).

4.6.3 Message Reset

A message reset is a retransmission of the sender's partially composed text. The recipient can redisplay the real-time message as a result. It allows real-time text conversation to resume quickly, without waiting for senders to start a new message.

Retransmission SHOULD be done at an average interval of 10 seconds during active typing or composing. This interval is frequent enough to minimize user waiting time, while being infrequent enough to reduce bandwidth overhead. This interval MAY vary in order to reduce average bandwidth requirements for minor message changes and/or for long messages. For quicker recovery, senders MAY adjust the timing of the message retransmissions to occur right after any of the following additional events:

When the recipient starts sending messages from a different full JID (e.g. switched systems);
When the recipient sends a presence update (e.g. from offline to online);
When the sender resumes composing after an extended pause (e.g. recovery from Stale Messages handling);
When the conversation is unlocked (e.g. section 5.1 of XMPP IM [12]);

A message reset is done using the <rtt/> attribute event value of 'reset' (see RTT Attributes).

<rtt event='reset' seq='#' xmlns='urn:xmpp:rtt:0'>
  <t>This is a retransmission of the entire real-time message.</t>
</rtt>

Note: That there are no restrictions on using multiple Action Elements during a message reset. (e.g. typing or backspacing occurring at the end of a retransmitted message.)

5. Determining Support

If a client supports this real-time text protocol, it MUST advertise that fact in its responses to Service Discovery [13] information requests ("disco#info") by returning a feature of urn:xmpp:rtt:0

<iq from='romeo@montague.lit/orchard'
    id='disco1'
    to='juliet@capulet.lit/balcony'
    type='get'>
  <query xmlns='http://jabber.org/protocol/disco#info'/>
</iq>

<iq from='juliet@capulet.lit/balcony'
    id='disco1'
    to='romeo@montague.lit/orchard'
    type='result'>
  <query xmlns='http://jabber.org/protocol/disco#info'>
    <feature var='urn:xmpp:rtt:0'/>
  </query>
</iq>

In order for an application to determine whether an entity supports this protocol, where possible it SHOULD use the dynamic, presence-based profile of service discovery defined in Entity Capabilities [14]. However, if an application has not received entity capabilities information from an entity, it SHOULD use explicit service discovery instead.

6. Implementation Notes

6.1 Text Presentation

6.1.1 Avoid Bursty Text Presentation

If a long Transmission Interval is used without Preserving Key Press Intervals, then incoming text will appear in intermittent bursts if the display of text is not smoothed. This hurts user experience of real-time text.

6.1.2 Preserving Key Press Intervals

For high quality presentation of real-time text, the original look-and-feel of typing can be preserved independently of the transmission interval. This is achieved using Element <w/> – Wait Interval between other Action Elements. Sender clients can transmit the length of pauses between key presses, and send multiple key presses in a single <message/> stanza. Recipient clients that process <w/> elements are able to display the sender's typing smoothly, without sudden bursts of text. See Examples of Key Press Intervals.

When key press intervals are preserved at high precision, all subtleties of typing are preserved, including the 'mood' (calm typing versus panicked or emphatic typing, etc.). Much as VoIP allows accurate packet transmission of sound, this spec allows accurate packet transmission of original typing look-and-feel. This enables the real-time feel of typing over virtually any network connection, without requiring frequent transmission intervals. Look and feel of typing is also preserved over variable latency connections including XMPP Over BOSH [15], mobile phone, satellite and long international connections with heavy packet-bursting tendencies.

6.1.3 Time Critical and Low Latency Methods

There are specialized situations such as live transcriptions and captioning (e.g. transcription service, closed captioning provider, captioned telephone, Communication Access Realtime Translation (CART), relay services) that demands low latency transmission. Such systems typically use voice recognition and/or stenotype machines, which output text in word bursts rather than a character at a time. It is acceptable for senders with bursty output to immediately transmit word bursts of text without buffering. This eliminates any lag caused by the Transmission Interval. It is not necessary to transmit Element <w/> – Wait Interval for real-time transcription.

6.1.4 Low-Bandwidth and Low-Precision Text Smoothing

Some software platforms (e.g. JavaScript, BOSH, mobile devices) may have low-precision timers that impact Transmission Interval and/or Preserving Key Press Intervals. Clients can optimize for bandwidth, performance and/or screen repaints by eliminating, merging, or ignoring Element <w/> – Wait Interval selectively, especially those containing shorter intervals. In addition, it is acceptable for the transmission interval of <rtt/> to vary, either intentionally for optimizations, or due to precision limitation. Stream Compression [16] can also be used.

Clients can choose to implement alternate text-smoothing methods, such as adaptive-rate character-at-a-time output, and/or word buffering for incoming real-time text. Word buffering prevents most typing mistakes from being displayed, which can be a useful mode of operation for certain recipients who may dislike watching the sender's typing mistakes.

6.2 Activating and Deactivating Real-Time Text

Implementers can choose a preferred activation method for real-time text. For example, clients in the assistive market can choose to do immediate activation of real-time text. Popular mainstream clients might do user-initiated activation/confirmation of real-time text. The confirmation process could be similar to common activation methods used for audio/video. It is also beneficial for senders and recipients to easily synchronize the enabling/disabling of real-time text.

6.2.1 Activation Methods

Activation of real-time text in a chat session (immediate or user-initiated) can be done by:

Immediately transmitting real-time text (if allowed after Determining Support); or
Signaling first (by transmitting a single <rtt event='init'/> and waiting for response).

Recipient clients can respond to incoming real-time text with an appropriate response, such as:

Accepting immediately (by activating in response); or
Accepting after recipient confirmation (by also activating in response, after a user confirmation prompt); or
Deny (by transmitting <rtt event='cancel'/> which is used in Deactivation Methods); or
Ignoring (by discarding incoming <rtt/> as a last resort, without using Deactivation Methods).
Other appropriate responses (e.g. only display incoming real-time text during Multi-User Chat).

If Determining Support allows, then it is not necessary for senders or recipients to transmit <rtt event='init'/> first, as any incoming RTT Element (other than <rtt event='cancel'/>) signals the start of incoming real time text. However, it allows signaling of activation before the sender begins typing.

In the absence of Determining Support, sender clients can send a single <rtt event='init'/> element to attempt to activate real-time text. It is inappropriate for sender clients to send any further <rtt/> elements unless support is confirmed by discovery, or when the recipient client responds with incoming <rtt/> elements during the same chat session.

It can be acceptable for software to display incoming real-time text without activating outgoing real-time text. Displaying incoming real-time text while waiting for user confirmation, can be educational to users unfamiliar with real-time text. Care needs to be taken to prevent this situation from becoming confusing to the user. Implementers can add other additional behaviors that are appropriate, such as an introductory note upon first activation, for Privacy considerations.

6.2.2 Deactivation Methods

Real-time text can be deactivated by any of:

Sending a signal (using <rtt event='cancel'/> upon deactivation, deny, or end of chat session); or
Simply ending the chat session (without transmitting any further <rtt/> elements).

Recipient clients can respond to deactivation with appropriate responses, such as:

Discontinue transmission of <rtt/> elements as well (not applicable to Multi-User Chat); and
Handle the sender's unfinished incoming real-time message (such as clearing it and/or saving it); and
Inform the recipient user that sender ended real-time text (or denied/cancelled, if no real-time text was received).

Sending an <rtt event='cancel'/> is useful in situations where the user closes a chat window, and ends the chat session. It is useful when the user wants to deactivate real-time text, while still continuing the chat session. After deactivation, any client can reactivate real-time text again in accordance to Activation Methods.

6.3 Optional Remote Cursor

Recipient clients might choose to display a cursor (or caret) within incoming real-time messages. This enhances usability of real-time text further, since it becomes easier for a recipient to observe the sender's real-time message edits.

If a remote cursor is not used, then clients can simply ignore calculating a cursor position and skip this section. All action elements only have absolute positioning, and positioning does not depend on previous action elements, so clients do not need to remember the previous cursor position.

6.3.1 Calculating Cursor Position

When <t/>, <e/>, or <d/> action elements are processed in incoming real-time text, the beginning value for the cursor position calculation is the absolute position value of the p attribute, according to Summary of Attribute Values. The recipient can calculate the cursor position as follows:

After Element <t/> – Insert Text, the cursor position is the p attribute plus the length of the text being inserted. The cursor position is put at the end of inserted text.
This is the normal forward cursor movement during text insertion.
After Element <e/> – Backspace, the cursor position is the p attribute minus the n attribute.
This is the normal backwards cursor movement to a Backspace key.
After Element <d/> – Forward Delete, the cursor position is the p attribute, unaffected by the n attribute.
This is the normal stationary cursor response to a Delete key.
After an empty Element <t/> – Insert Text (in the format of <t p='#'/> with no text to insert), the cursor position is the p attribute, and no text modification is done.
This allows cursor response to arrow keys and/or mouse repositioning the cursor.

The remote cursor needs to be clearly distinguishable from the sender's real local cursor. One example is to use a non-blinking cursor, easily emulated with a Unicode character or the vertical bar character '|'.

It is acceptable for the sender to transmit Element <t/> – Insert Text as empty elements (with the cursor position in the p attribute) whenever the cursor position is changing without any text modifications (i.e. via arrow keys or mouse). This allows recipients supporting a remote cursor, to show the cursor movements. These extra elements are ignored by recipients that do not support a remote cursor.

6.4 Sending Real-Time Text

This section lists several possible methods of generating real-time text for transmission. For most situations, the preferred methodology is Monitoring Message Changes Instead Of Key Presses.

6.4.1 Monitoring Message Changes Instead Of Key Presses

Experience has found that the most reliable method for generating real-time text, is to monitor for text changes to the sender’s message entry field, instead of key press events. Text change events have the following advantages:

It captures all typing, including edits and deletes.
It captures copy & paste operations, as well as edits made via a pointing device.
It captures all automatic text changes (e.g. spell checker, auto-correct, macros, transcription, assistive devices).
It captures characters requiring multiple key presses to compose (e.g. accents, combining marks).
It makes no assumptions about different keyboards or input method editors (e.g. Chinese).
Text change events are more portable across platforms, including on mobile phones.

During a text change event, the sender’s current message text can be compared to the old message text (from before the text change event). In order to calculate what text changes took place, the first changed character and the last changed character is determined. From this, it is possible to generate Action Elements for any text insertion and deletions. In addition, if Preserving Key Press Intervals is supported, then Element <w/> – Wait Interval records the time elapsed between text change events.

Sender software can do the following:

Monitor for text changes in the sender’s message. Whenever a text change event occurs, compute action elements and append these action elements to a buffer. This is equivalent to recording a small sequence of typing.
During every Transmission Interval, all buffered action elements are transmitted in <rtt/> element in a <message/> stanza. This is equivalent to transmitting a small sequence of typing at a time.
If there are no message changes occurring, no unnecessary transmission takes place.

6.4.2 Monitoring Key Presses Directly

Real-time text can be generated via monitoring key presses. However, this does not have the advantages of Monitoring Message Changes Instead Of Key Presses. Care needs be taken with automatic changes to the message, generated by means other than key presses. This includes spell check auto-correct, copy and pastes, transcription, input method editors, and multiple key presses required to compose a character (i.e. accents). Key press events do not capture these text changes, and this can cause real-time text to go out of sync between the sender and the recipient.

6.4.3 Basic Real-Time Text

Sender clients may choose to implement Message Reset as the only method of transmitting changes to real-time message. The entire message is simply retransmitted every Transmission Interval whenever there are any text changes. The below is a transmission of the real-time message “Hello there!” at regular intervals while the sender is typing.

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123001' event='new'>
    <t>Hel</t>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='b02'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='456002' event='reset'>
    <t>Hello th</t>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='c03'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='789003' event='reset'>
    <t>Hello there!</t>
  </rtt>
</message>

The advantage is very simple implementation. However, disadvantages include the lack of Preserving Key Press Intervals, and extra bandwidth consumption that can occur with longer messages, unless stream compression is used.

6.4.4 Append-Only Real-Time Text

The use of Element <t/> – Insert Text without any attributes, simply appends text to the end of a message, while the use of Element <e/> – Backspace without any attributes, simply erases text from the end of the message. These simple action elements are useful if mid-message editing capabilities are not used (e.g. simple transcription, news tickers, relay services, captioned telephone).

If editing is needed in the middle of a message, without adding sender support for other Action Elements, the use of Message Reset supports situations where mid-message editing takes place. In this situation, the disadvantages include the lack of Preserving Key Press Intervals, and extra bandwidth consumption that can occur with longer messages, unless stream compression is used.

6.5 Receiving Real-Time Text

In order to allow Preserving Key Press Intervals in incoming real-time text, recipient clients can do the following:

Upon receiving Action Elements in incoming <rtt/> elements, they are added to a queue in the order they are received. This provides immunity to variable network conditions, since the queueing action smooth out the latency fluctuations of incoming transmission.
The recipient client processes action elements in the queue in sequential order, including pauses from Element <w/> – Wait Interval. This is equivalent to playing back the sender's original typing.
Upon receiving a Body Element indicating a completed message, the full message text from <body/> can be displayed immediately in place of the real-time message, and unprocessed action elements can be cleared from the queue. This ensures final message delivery is not delayed by late processing of action elements.

In processing Element <w/> – Wait Interval, excess lag in incoming real-time text might occur if multiple delayed <rtt/> elements suddenly get delivered (e.g. congestion, intermittent wireless reception). Recipients can avoid excess lag by monitoring the queue for excess <w/> action elements (e.g. unprocessed <w/> elements from two <rtt/> elements ago) and then ignoring or shortening the intervals in <w/> elements. This allows lagged real-time text to "catch up" more quickly. In addition, it is best to process <w/> elements using non-blocking programming techniques.

6.6 Other Guidelines

There are other special basic considerations for real-time message transmissions that need to be considered by implementers.

6.6.1 Message Length

A large sequence of action elements can result in an <rtt/> larger than the size of a message <body/>. This can occur normally during fast typing when Preserving Key Press Intervals during small messages. However, if the <rtt/> element becomes unusually huge (e.g. macros, multiple copy and pastes, leading to an <rtt/> exceeding one kilobyte) a Message Reset can instead be used, in order to save bandwidth. (Stream compression is another approach.)

Clients can limit the length of the text input for the sender's message, in order to keep the size of <message/> stanzas reasonable, including during Message Reset. Also, large <rtt/> elements may occur in situations such as large copy and pastes. To keep message stanza sizes reasonable, <rtt/> can be transmitted in a separate <message/> than the one containing <body/>.

For specialized clients that send continuous real-time text (e.g. news ticker, captioning, transcription, TTY gateway), a Body Element can be automatically sent when messages reach a certain length. This allows continuous real-time text without real-time messages becoming excessively large.

6.6.2 Usage with Chat States

Real-time text can be used in conjunction with XEP-0085 Chat State Notifications [17]. These are simple guidelines for <message/> stanzas that include an <rtt/> element:

For <rtt/> transmitted without an accompanying <body/>, include <composing/> chat state.
For <rtt/> transmitted with an accompanying <body/>, include the <active/> chat state.
Other chat states are handled as specified by XEP-0085 Chat States.

6.6.3 Usage with Multi-User Chat and Simultaneous Logins

The in-band nature of this real-time text standard allows one-to-many situations. Thus, real-time text is appropriate for use with Multi-User Chat [18] (MUC), as well as concurrent simultaneous logins.

6.6.3.1 Multi-User Chat

For simplicity, clients can implement real-time text only for one-on-one chat, and not for MUC. However, it can be appropriate to support <rtt/> elements in MUC, even if not all participants support real-time text. Participants that enable real-time text during group chat need to keep track of multiple concurrent real-time messages on a per-participant basis. Participants, with real-time text, will see real-time text coming from each participant that has real-time text enabled. Participant clients without real-time text (whether unsupported or turned off) will simply see group chat function normally on a line-by-line basis, since it is Backwards Compatible.

Participants that turn off real-time text for themselves, can simply ignore incoming <rtt/> and not transmit outgoing <rtt/>. Participant clients in MUC receiving an incoming <rtt event=’cancel’/> needs to keep outgoing transmission unaffected during Deactivation Methods (otherwise, one participant could deny real-time text between other willing participants).

To minimize on-screen clutter of multiple idle real-time messages, clients can hide idle messages, clear old Stale Messages, and/or prioritize the display of the most useful real-time messages. Prominent visibility of real-time text can be assigned to recent typists and/or moderators (e.g. classroom teacher, convention speaker). For the same participant logged in multiple times in the same room, see Simultaneous Logins. In situations of simultaneous typing by a large number of participants, see Congestion Considerations.

6.6.3.2 Simultaneous Logins

In simultaneous login situations, transmitting of <rtt/> works in one-to-many situations without any special software support. For many-to-one situations where there is incoming <rtt/> from more than one simultaneous login, Keeping Real-Time Text Synchronized will pause the real-time message upon conflicting <rtt/>, and resume during the next Message Reset, presumably from the active login. This provides a seamless system-switching experience. A good implementation of Message Reset will improve user experience, regardless of whether or not the client follows Best Practices For Resource Locking (XEP-0296). Clients can choose to distinguish the <rtt/> streams (via full JID and/or via <thread/>) and keep multiple concurrent real-time messages similar in manner to Multi-User Chat, with the Stale Messages being timed-out.

6.6.4 Stale Messages

There are situations where senders pause typing indefinitely. This can result in recipients displaying a real-time message for an extended time period. It may also be a screen clutter concern during Multi-User Chat. In addition, it may be a resource-consumption concern, as part of Congestion Considerations.

It is acceptable for recipients to clear (and/or save) incoming real-time messages that have been idle for an extended time period. There is no specific time-out period defined by this specification. For Multi-User Chat, the time-out period might be shorter because of the need to reduce screen clutter. For normal chat sessions, the time-out period might need to be longer to allow reasonable interruptions (i.e. sender pausing during a long phone call).

Senders that resume composing a message (i.e. continues a partially-composed message hours later) can transmit a Message Reset, which allows recipients to redisplay the real-time message.

6.6.5 Performance & Efficiency

With real-time text, frequent screen updates can occur. Screen updates are a potential performance bottleneck, since fast typists type many key presses per second. Optimizing screen updates becomes especially important for slower platforms. Real-time messages need to be redisplayed efficiently in a flicker-free manner. The real-time message might be implemented as a separate window or separate display element.

Battery life considerations are closely related to performance, as the addition of real-time text may impact battery life. If Preserving Key Press Intervals are supported, then the implementation of Element <w/> – Wait Interval needs to be implemented in a battery-efficient manner. The Transmission Interval may vary dynamically to optimize for battery life and wireless reception. For devices where screen updates are an unavoidable inefficient bottleneck, see Low-Bandwidth and Low-Precision Text Smoothing to reduce the number of screen updates per second.

6.6.6 Total Conversation – Combination with Audio and Video

According to ITU-T Rec. F.703, the “Total Conversation” accessibility standard defines the simultaneous use of audio, video, and real-time text. For convenience, messaging applications may be designed to have automatic negotiation of as many as possible of the three media preferred by the users.

In the XMPP session environment, the Jingle protocol (Jingle [19]) is available for negotiation and transport of the more time-critical, real-time audio and video media. Any combination of audio, video, and real-time text can be used together simultaneously.

7. Use Cases

Most of these examples are deliberately kept simple. In complete software implementations supporting key press intervals, transmissions will most resemble the last example, Full Message Including Key Press Intervals.

7.1 Example of Simple Real Time Text

All three examples shown below result in the same real-time message "HELLO" created by writing "HLL", backspacing two times, and then "ELLO". The action elements are Element <t/> – Insert Text and Element <e/> – Backspace.

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123001' event='new'>
    <t>HLL</t>
    <e/><e/>
    <t>ELLO</t>
  </rtt>
</message>

The example above sends the misspelled "HLL", then <e/><e/> backspaces 2 times, then sends "HELLO".

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123001' event='new'>
    <t>HLL</t>
    <e n='2'/>
    <t>ELLO</t>
  </rtt>
</message>

The example above shows that <e n='2'/> does the same thing as <e/><e/>.

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123001' event='new'>
    <t>HLL</t>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='b02'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123002'>
    <e n='2'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='c03'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123003'>
    <t>ELLO</t>
  </rtt>
</message>

The example above splits the same real-time text over multiple <message/> stanzas, which would occur if the typing was occurring more slowly, over several Transmission Interval cycles.

7.2 Example of Multiple Messages

The example below represents a short chat session of three separate messages:
Bob says: "Hello Alice"
Bob says: "This is Bob"
Bob says: "How are you?"

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123001' event='new'>
    <t>Hello</t>
  </rtt>
  <composing/>
</message>
 
<message to='alice@example.com' from='bob@example.com/home' type='chat' id='b02'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123002'>
    <t> Alice</t>
  </rtt>
  <body>Hello Alice</body>
  <active/>
</message>
 
 
<message to='alice@example.com' from='bob@example.com/home' type='chat' id='c03'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='456001' event='new'>
    <t>This i</t>
  </rtt>
  <composing/>
</message>
 
<message to='alice@example.com' from='bob@example.com/home' type='chat' id='d04'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='456002'>
    <t>s Bob</t>
  </rtt>
  <body>This is Bob</body>
  <active/>
</message>
 
 
<message to='alice@example.com' from='bob@example.com/home' type='chat' id='e05'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='789001' event='new'>
    <t>How a</t>
  </rtt>
  <composing/>
</message>
 
<message to='alice@example.com' from='bob@example.com/home' type='chat' id='f06'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='789002'>
    <t>re yo</t>
  </rtt>
  <composing/>
</message>
 
<message to='alice@example.com' from='bob@example.com/home' type='chat' id='g07'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='789003'>
    <t>u?</t>
  </rtt>
  <body>How are you?</body>
  <active/>
</message>

The example above represents moderate typing speed during a normal Transmission Interval, such as 0.7 seconds between <message/> stanzas for continuous typing. It illustrates the following:

The event attribute equals 'new' for the start of every new message.
The seq attribute increments within the same message.
The seq attribute randomizes when beginning of a new message.
This shows Usage with Chat States.

7.3 Examples of Message Edits

These examples illustrate real-time message editing via Action Elements.
Note: In most situations, during normal human typing speeds at a normal Transmission Interval, smaller fragments of text will be spread over multiple <rtt/>, than these demonstration <rtt/> examples below. See Sending Real-Time Text.

7.3.1 Deleting Text From Message

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123001' event='new'>
    <t>Hello Bob, this is Alice!</t>
    <d n='4' p='5'/>
  </rtt>
</message>

Final result of real-time message: "Hello, this is Alice!"
This example outputs "Hello Bob, this is Alice!" then <d n='4' p='5'/> deletes 4 characters from position 5. The Element <d/> – Forward Delete removes the text " Bob" including the preceding space character.

7.3.2 Inserting Text Into Message

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123001' event='new'>
    <t>Hello, this is Alice!</t>
    <t p='5'> Bob</t>
  </rtt>
</message>

Final result of real-time message: "Hello Bob, this is Alice!"
This is because this example outputs "Hello, this is Alice!" then the <t p='5'> inserts the specified text " Bob" at position 5, using Element <t/> – Insert Text.

7.3.3 Deleting and Replacing Text In Message

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123001' event='new'>
    <t>Hello Bob, tihsd is Alice!</t>
    <d p='11' n='5'/>
    <t p='11'>this</t>
  </rtt>
</message>

Final result of real-time message: "Hello Bob, this is Alice!"
This example outputs "Hello Bob, tihsd is Alice!", then <d p='11' n='5'/> deletes 5 characters at position 11 in the string of text (which erases the mistyped word "tihsd"). Finally, <t p='11'>this</t> inserts the text "this" place of the original misspelled word.

7.3.4 Multiple Message Edits

This is an example message containing multiple consecutive real-time message edits. This illustrates valid use of the <rtt/> element.

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123001' event='new'>
    <t>Helo</t>
    <e/>
    <t>lo...planet</t>
    <e n='6'/>
    <t> World</t>
    <d n='3' p='5'/>
    <t p='5'> there,</t>
  </rtt>
</message>

Resulting real-time message: "Hello there, World", completed in the following series of action elements:

Table 3:

Element	Action	Real -Time Message	Cursor Position*
<t>Helo</t>	Output "Helo"	Helo	4
<e/>	Backspace 1 character from end of line.	Hel	3
<t>lo...planet</t>	Output "lo...planet" at end of line.	Hello...planet	14
<e n='6'/>	Backspace 6 characters from end of line	Hello...	8
<t> World</t>	Output " World" at end of line.	Hello... World	14
<d n='3' p='5'/>	Delete 3 characters at position 5	Hello World	5
<t p='5'> there,</t>	Output " there," at position 5	Hello there, World	12

*The Cursor Position column is only relevant if the Optional Remote Cursor is implemented.

This example does not illustrate Preserving Key Press Intervals. Also, it is noted that most situations, during normal typing speeds at a normal Transmission Interval, the series of Action Elements will typically be spread over multiple separate <rtt/> elements.

7.4 Examples of Key Press Intervals

7.4.1 Comparison With and Without Intervals

All examples shown below, result in the same real-time message “HELLO”. Only the last example follows Preserving Key Press Intervals.

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123001' event='new'>
    <t>HELLO</t>
  </rtt>
</message>

The above example outputs “HELLO” in a single action element (Element <t/> – Insert Text).

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123001' event='new'>
    <t>H</t>
    <t>E</t>
    <t>L</t>
    <t>L</t>
    <t>O</t>
  </rtt>
</message>

The above example outputs “HELLO” in separate action elements for each key press.

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123001' event='new'>
    <t>H</t><w n='101'/>
    <t>E</t><w n='110'/>
    <t>L</t><w n='125'/>
    <t>L</t><w n='103'/>
    <t>O</t><w n='110'/>
  </rtt>
</message>

The above example outputs “HELLO” in separate action elements for each key press, while also Preserving Key Press Intervals. The Element <w/> – Wait Interval specifies the number of milliseconds between key presses, to allow smooth presentation in recipient clients that support <w/> action elements.

7.4.2 Full Message Including Key Press Intervals

This example is a transmission of “Hello there!” while Preserving Key Press Intervals. It illustrates a four-second typing sequence:

The misspelled phrase “Hello tehre!” is typed;
Optional transmission of cursor movements towards the typing mistake;
Two backspaces to delete the typing mistake;
Two correct key presses to correctly spell the word “there”.

The use Element <w/> – Wait Interval, between key presses, allows the receiving client execute a small pause between action elements. This allows recipient clients to play back the sender's typing fluidly.

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123001' event='new'>
    <t>H</t>
    <w n='115'/><t>e</t>
    <w n='154'/><t>l</t>
    <w n='151'/><t>l</t>
    <w n='115'/><t>o</t>
    <w n='165'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='b02'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123002'>
    <w n='40'/><t> </t>
    <w n='161'/><t>t</t>
    <w n='137'/><t>e</t>
    <w n='135'/><t>h</t>
    <w n='134'/><t>r</t>
    <w n='93'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='c03'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123003'>
    <w n='109'/><t>e</t>
    <w n='115'/><t>!</t>
    <w n='330'/><t p='11'/>
    <w n='108'/><t p='10'/>
    <w n='38'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='d04'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123004'>
    <w n='109'/><t p='9'/>
    <w n='111'/><e p='9'/>
    <w n='106'/><e p='8'/>
    <w n='138'/><t p='7'>h</t>
    <w n='209'/><t p='8'>e</t>
    <w n='27'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='d04'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='123005'>
    <w n='445'/><t p='12'/>
  </rtt>
  <body>Hello there!</body>
</message>

This example also illustrates the following:

Typing is done via Element <t/> – Insert Text.
Backspaces are done via Element <e/> – Backspace.
There is a final transmission with a Body Element, when the message is finished.
Intervals between key presses are done via Element <w/> – Wait Interval.
Each <message/> is delivered at a regular Transmission Interval, typically 0.7 seconds.
Cursor movements via empty <t/> elements. Sender transmission is not essential, but can be desirable for recipient clients supporting an Optional Remote Cursor.
Recipient clients that do not support Preserving Key Press Intervals and/or Optional Remote Cursor, will still display this message normally.
The total sum of all values in Element <w/> – Wait Interval in one <message/> equal the Transmission Interval during periods of continuous typing. This also results in some <w/> interval elements being split between consecutive messages. Although not critical, it can further improve the fluidity of Receiving Real-Time Text.
See Monitoring Message Changes Instead Of Key Presses for more info.

8. Interoperability Considerations

There are other real-time text formats with interoperability considerations relating to the session setup level, the media transport level, and presentation level. Interoperability specifications between multiple real-time text formats can be found in the Interoperability section of Real-Time Text Taskforce [20].

It is appropriate for implementers to choose the most appropriate real-time text standard for the session control standard in use during a particular session. Clients that use XMPP (e.g. Google Talk) would utilize this XEP-0301 specification. Clients that use SIP can utilize IETF RFC 4103, IETF RFC 5194 [21] and ITU-T T.140. Clients that run on multiple networks, may need to utilize multiple real-time text standards. To interoperate between incompatible real-time text standards, gateway servers can transcode between different real-time text standards, along with other media such as audio and video. This can include TTY and textphones. Also, see Total Conversation – Combination with Audio and Video.

8.1 RFC 4103 and T.140

In the SIP environment, real-time text is specified in IETF RFC 4103 and ITU-T T.140. SIP is a popular real-time session control protocol, and there are many implementations of real-time text controlled by SIP. This includes some emergency service organizations (e.g. Reach 112).

Interoperability considerations include addressing translation, media negotiation and translation, and media transcoding. Transcoding is straightforward between this specification and T.140/RFC4103, except for editing in the middle of messages. Text insertions or deletions, occurring far back in the message, can cause a large number of erase operations in T.140 that consume time and bandwidth. T.140 specifies the use of ISO 6429 control codes for presentation characteristics, such as text color, that are not supported by this specification. During transcoding, these control codes needs to be filtered off in order to not disturb the presentation of text.

9. Internationalization Considerations

The primary internationalization consideration involve real-time message editing via Action Elements, where text is inserted and deleted using index and position values. In particular, correct Unicode Character Counting needs to be followed, due to the existence of variable-length encodings and right-to-left text. Also, Accurate Processing of Action Elements will ensure that all possible valid Unicode text can be used via this protocol. This can include text containing multiple scripts/languages, ideographic symbols (e.g. Chinese), right-to-left text (e.g. Arabic), and bidirectional text.

10. Security Considerations

10.1 Privacy

It is important for implementers of real-time text to educate users about real-time text. Users of real-time text needs to be aware that their typing is now visible in real-time to everyone in the current chat conversation. This may have security implications if users copy & paste private information into their chat entry buffer (e.g. a shopping invoice) before editing out the private parts of the pasted text (e.g. a credit card number) before they send the message. With real-time message editing, recipients can watch all text changes that occur in the sender's text, before the sender sends the final message. Implementation behaviors and improved education can be added to reduce privacy issues. Examples include showing an introduction upon first activation of feature, special handling for copy and pastes (i.e. preventing them, or prompting for confirmation), recipient confirmation of real-time text via Activating and Deactivating Real-Time Text, etc.

10.2 Encryption

Real-time text (<rtt/> elements) transmit the content contained within messages. Therefore, a client that encrypts <body/>, also needs to also encrypt <rtt/> as well:

Encryption at the stream level (e.g. TLS) can be used normally with this specification. Stream-level encryption is the most common form of encryption.
Encryption at the <message/> stanza level (e.g. deferred XEP-0200) can be used for all stanzas containing either <rtt/> or <body/>. It is worth noting that stanza-level encryption produces significantly more overhead, due to the increased number of stanzas that real-time text causes, leading to Congestion Considerations.
Encryption at the <body/> level (e.g. deprecated XEP-0027) do not encrypt <rtt/>. In this case, <rtt/> needs to be encrypted separately. It is preferable to use a broader level of encryption, where possible.

10.3 Congestion Considerations

The nature of real-time text result in more frequent transmission of <message/> stanzas than may otherwise happen in a non-real-time text conversation. This may lead to increased network and server loading of XMPP networks.

Transmission of real-time text can be throttled temporarily during poor network conditions. It is appropriate to use latency monitoring mechanisms (e.g. Message Delivery Receipts [23] or Stream Management [24]) in order to temporarily adjust the Transmission Interval of real-time text beyond the recommended range. This results in lagged text (less real-time) but is better than failure during poor network conditions. The use of Message Reset can also retransmit real-time text lost by poor network conditions, including stanzas dropped by an overloaded server. This is also useful for mission-critical applications such as Next Generation 9-1-1 emergency services.

Excess numbers of real-time messages (e.g. during DoS scenario in Multi-User Chat) might cause local resource-consumption issues, which can be mitigated by accelerated time-out of Stale Messages.

Use of this specification in the recommended way will cause a load that is only marginally higher than a user communicating without this specification. Bandwidth overhead of real-time text is very low compared to many other activities possible on XMPP networks including in-band file transfers and audio. Bandwidth can also be further mitigated using stream compression, to benefit bandwidth-constrained networks (e.g. GPRS, 3G, satellite).

11. IANA Considerations

This document requires no interaction with the Internet Assigned Numbers Authority (IANA).

12. XMPP Registrar Considerations

12.1 Protocol Namespaces

The XMPP Registrar should include "urn:xmpp:rtt:0" in its registry of protocol namespaces (see <http://xmpp.org/registrar/namespaces.html>).

12.2 Namespace Versioning

If the protocol defined in this specification undergoes a revision that is not fully backwards-compatible with an older version, the XMPP Registrar shall increment the protocol version number found at the end of the XML namespaces defined herein, as described in Section 4 of XEP-0053.

13. XML Schema

<?xml version='1.0' encoding='UTF-8'?>

<xs:schema
    xmlns:xs='http://www.w3.org/2001/XMLSchema'
    targetNamespace='urn:xmpp:rtt:0'
    xmlns='urn:xmpp:rtt:0'
    elementFormDefault='qualified'>

  <xs:annotation>
    <xs:documentation>
      The protocol documented by this schema is defined in
      XEP-0301: http://www.xmpp.org/extensions/xep-0301.html
    </xs:documentation>
  </xs:annotation>

  <xs:element name='rtt'>
    <xs:complexType>
      <xs:attribute name='seq' type='xs:unsignedInt' use='required'/>
      <xs:attribute name='event' use='optional'>
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="new"/>
            <xs:enumeration value="reset"/>
            <xs:enumeration value="cancel"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
      <xs:attribute name='id' type='xs:string' use='optional'>
      <xs:sequence>
        <xs:element ref='t' minOccurs='0' maxOccurs='unbounded'/>
        <xs:element ref='e' minOccurs='0' maxOccurs='unbounded'/>
        <xs:element ref='d' minOccurs='0' maxOccurs='unbounded'/>
        <xs:element ref='w' minOccurs='0' maxOccurs='unbounded'/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name='t' type='xs:string'>
    <xs:complexType>
      <xs:attribute name='p' type='xs:nonNegativeInteger' use='optional'/>
    </xs:complexType>
  </xs:element>

  <xs:element name='e' type='empty'>
    <xs:complexType>
      <xs:attribute name='p' type='xs:nonNegativeInteger' use='optional'/>
      <xs:attribute name='n' type='xs:nonNegativeInteger' use='optional' default='1'/>
    </xs:complexType>
  </xs:element>

  <xs:element name='d' type='empty'>
    <xs:complexType>
      <xs:attribute name='p' type='xs:nonNegativeInteger' use='required'/>
      <xs:attribute name='n' type='xs:nonNegativeInteger' use='optional' default='1'/>
    </xs:complexType>
  </xs:element>

  <xs:element name='w' type='empty'>
    <xs:complexType>
      <xs:attribute name='n' type='xs:nonNegativeInteger' use='required'/>
    </xs:complexType>
  </xs:element>

  <xs:simpleType name='empty'>
    <xs:restriction base='xs:string'>
      <xs:enumeration value=''/>
    </xs:restriction>
  </xs:simpleType>

</xs:schema>

14. Acknowledgments

The author would like to thank Real-Time Text Taskforce (R3TF) at www.realtimetext.org for their contribution to the technology documented in this specification. Mark Rejhon leads the Jabber/XMPP Taskgroup at R3TF. Members of R3TF who have contributed to this document include Gunnar Hellstrom (Omnitor), Paul E. Jones (Cisco), Gregg Vanderheiden (Trace R&D Center, University of Wisconsin), Barry Dingle (Interopability Leader, R3TF), and Arnoud van Wijk (Founder, R3TF). Others contributors include Bernard Aboba (Microsoft), Darren Sturman (Teligent Telecom), Christian Vogler (Gallaudet University), Norm Williams (Gallaudet University), and several members from the XMPP Standards Mailing List, including Kevin Smith (XSF), Peter Saint Andre (XSF), and many others.

“Natural Typing”, the technique of Preserving Key Press Intervals, is acknowledged as an invention by Mark Rejhon, who is deaf. This technology is provided to XMPP.org as part of this specification in compliance of the XSF's Intellectual Property Rights Policy at http://xmpp.org/extensions/ipr-policy.shtml.

Appendices

Appendix A: Document Information

Series: XEP
Number: 0301
Publisher: XMPP Standards Foundation
Status: Experimental
Type: Standards Track
Version: 0.5
Last Updated: 2012-07-22
Approving Body: XMPP Council
Dependencies: XMPP Core, XEP-0020
Supersedes: None
Superseded By: None
Short Name: NOT_YET_ASSIGNED
Source Control: HTML
This document in other formats: XML PDF

Appendix B: Author Information

Mark Rejhon

Organization: RealJabber.org and Rejhon Technologies Inc.
Email: mark@realjabber.org
JabberID: markybox@gmail.com
URI: http://www.realjabber.com

Appendix C: Legal Notices

Copyright

Permissions

Permission is hereby granted, free of charge, to any person obtaining a copy of this specification (the "Specification"), to make use of the Specification without restriction, including without limitation the rights to implement the Specification in a software program, deploy the Specification in a network service, and copy, modify, merge, publish, translate, distribute, sublicense, or sell copies of the Specification, and to permit persons to whom the Specification is furnished to do so, subject to the condition that the foregoing copyright notice and this permission notice shall be included in all copies or substantial portions of the Specification. Unless separate permission is granted, modified works that are redistributed shall not contain misleading information regarding the authors, title, number, or publisher of the Specification, and shall not claim endorsement of the modified works by the authors, any organization or project to which the authors belong, or the XMPP Standards Foundation.

Disclaimer of Warranty

## NOTE WELL: This Specification is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. ##

Limitation of Liability

In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall the XMPP Standards Foundation or any author of this Specification be liable for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising from, out of, or in connection with the Specification or the implementation, deployment, or other use of the Specification (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if the XMPP Standards Foundation or such author has been advised of the possibility of such damages.

IPR Conformance

This XMPP Extension Protocol has been contributed in full conformance with the XSF's Intellectual Property Rights Policy (a copy of which can be found at <http://xmpp.org/about-xmpp/xsf/xsf-ipr-policy/> or obtained by writing to XMPP Standards Foundation, 1899 Wynkoop Street, Suite 600, Denver, CO 80202 USA).

Appendix D: Relation to XMPP

The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 3920) and XMPP IM (RFC 3921) specifications contributed by the XMPP Standards Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this document has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.

Appendix E: Discussion Venue

The primary venue for discussion of XMPP Extension Protocols is the <standards@xmpp.org> discussion list.

Appendix F: Requirements Conformance

The following requirements keywords as used in this document are to be interpreted as described in RFC 2119: "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", "RECOMMENDED"; "SHOULD NOT", "NOT RECOMMENDED"; "MAY", "OPTIONAL".

Appendix G: Notes

1. RealJabber.org, the author's website covering this specification, including animation examples of what real time text looks like. <http://www.realjabber.org>.

2. IETF RFC 4103: RTP Payload for Text Conversation. <http://tools.ietf.org/html/rfc4103>.

3. ITU-T T.140: Protocol for multimedia application text conversation. <http://www.itu.int/rec/T-REC-T.140>.

4. AOL AIM Real Time Text: <http://help.aol.com/help/microsites/microsite.do?cmd=displayKC&externalId=223568>.

5. Reach112: European emergency service with real-time text. <http://www.reach112.eu>.

6. ITU-T Rec. F.703: Multimedia conversational services. <http://www.itu.int/rec/T-REC-F.703>.

7. ITU-T Rec. F.700: Framework Recommendation for multimedia services <http://www.itu.int/rec/T-REC-F.700>.

8. XEP-0308: Last Message Correction <http://xmpp.org/extensions/xep-0308.html>.

9. RFC 6120: Extensible Messaging and Presence Protocol (XMPP): Core <http://tools.ietf.org/html/rfc6120>.

10. XEP-0071: XHTML-IM <http://xmpp.org/extensions/xep-0071.html>.

11. XML: Extensible Markup Language 1.0 (Fifth Edition). <http://www.w3.org/TR/xml/>.

12. RFC 6121: Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence <http://tools.ietf.org/html/rfc6121>.

13. XEP-0030: Service Discovery <http://xmpp.org/extensions/xep-0030.html>.

14. XEP-0115: Entity Capabilities <http://xmpp.org/extensions/xep-0115.html>.

15. XEP-0206: XMPP Over BOSH <http://xmpp.org/extensions/xep-0206.html>.

16. XEP-0138: Stream Compression <http://xmpp.org/extensions/xep-0138.html>.

17. XEP-0085: Chat State Notifications <http://xmpp.org/extensions/xep-0085.html>.

18. XEP-0045: Multi-User Chat <http://xmpp.org/extensions/xep-0045.html>.

19. XEP-0166: Jingle <http://xmpp.org/extensions/xep-0166.html>.

20. Real-Time Text Taskforce, a foundation for real-time text standardization <http://www.realtimetext.org>.

21. IETF RFC 5194: Framework for Real-Time Text over IP Using the Session Initiation Protocol (SIP). <http://tools.ietf.org/html/rfc5194>.

22. The International Symbol of Real-Time Text <http://www.fasttext.org>.

23. XEP-0184: Message Delivery Receipts <http://xmpp.org/extensions/xep-0184.html>.

24. XEP-0198: Stream Management <http://xmpp.org/extensions/xep-0198.html>.

Abstract:	This is a specification for real-time text transmitted in-band over an XMPP session.
Author:	Mark Rejhon
Copyright:	© 1999 - 2012 XMPP Standards Foundation. SEE LEGAL NOTICES.
Status:	Experimental
Type:	Standards Track
Version:	0.5
Last Updated:	2012-07-22