XEP-0301: In-Band Real Time Text

Abstract:This is a specification for real-time text transmitted in-band over an XMPP session.
Author:Mark Rejhon
Copyright:© 1999 - 2012 XMPP Standards Foundation. SEE LEGAL NOTICES.
Status:Experimental
Type:Standards Track
Version:0.2
Last Updated:2012-03-19

WARNING: This Standards-Track document is Experimental. Publication as an XMPP Extension Protocol does not imply approval of this proposal by the XMPP Standards Foundation. Implementation of the protocol described herein is encouraged in exploratory implementations, but production systems are advised to carefully consider whether it is appropriate to deploy implementations of this protocol before it advances to a status of Draft.


Table of Contents


1. Introduction
2. Requirements
    2.1. Fluid Real-Time Text
    2.2. In-Band Transmission
    2.3. Flexible and Interoperable
    2.4. Accessible
3. Glossary
4. Protocol
    4.1. RTT Element
    4.2. RTT Attributes
       4.2.1. seq
       4.2.2. event
    4.3. Body Element
       4.3.1. Backwards Compatible
    4.4. Transmission Interval
    4.5. Real-Time Message Editing
       4.5.1. Summary of Action Elements
       4.5.2. Rules for Attribute Values
       4.5.3. Action Elements
         4.5.3.1. Element <t/> – Insert Text
         4.5.3.2. Element <e/> – Backspace
         4.5.3.3. Element <d/> – Forward Delete
         4.5.3.4. Element <w/> – Interval
       4.5.4. Ensuring Accuracy Of Attribute Values
       4.5.5. Unicode Character Counting
    4.6. Automatic Recovery of Real-Time Text
       4.6.1. Staying In Sync
       4.6.2. Detecting Loss of Sync
       4.6.3. Recovery From Loss of Sync
       4.6.4. Message Retransmission
5. Determining Support
6. Implementation Notes
    6.1. Text Presentation
       6.1.1. Avoid Bursty Text Presentation
       6.1.2. Preserving Key Press Intervals
       6.1.3. Time Critical And Low Latency Methods
       6.1.4. Reduced Precision Text Smoothing Methods
    6.2. Real-Time Transmission
       6.2.1. Monitoring Message Edits
       6.2.2. Guidelines for Senders
       6.2.3. Guidelines for Receivers
    6.3. Optional Remote Cursor
       6.3.1. Calculating Cursor Position
       6.3.2. Guidelines for Senders
    6.4. Other Guidelines
       6.4.1. Message Length Limit
       6.4.2. Usage With Chat States
       6.4.3. Usage With Multi-User Chat and Simultaneous Logins
         6.4.3.1. Multi-User Chat
         6.4.3.2. Simultaneous Logins
       6.4.4. Performance & Efficiency
       6.4.5. Total Conversation – Combination With Audio And Video
7. Use Cases
    7.1. Three Backspaces
    7.2. Three Backspaces In One Action Element
    7.3. Message Edits Split Into Multiple Transmissions
    7.4. Deleting Text From Message
    7.5. Inserting Text Into Message
    7.6. Deleting And Replacing Text In Message
    7.7. Multiple Message Edits
    7.8. Three Consecutive Messages
    7.9. Full Message Including Key Press Intervals
8. Interoperability Considerations
    8.1. Other Real-Time Text Standards
    8.2. RFC 4103 and T.140
9. Internationalization Considerations
10. Security Considerations
    10.1. Privacy
    10.2. Congestion Considerations
11. IANA Considerations
12. XMPP Registrar Considerations
    12.1. Protocol Namespaces
    12.2. Namespace Versioning
13. XML Schema

Appendices
    A: Document Information
    B: Author Information
    C: Legal Notices
    D: Relation to XMPP
    E: Discussion Venue
    F: Requirements Conformance
    G: Notes
    H: Revision History


1. Introduction

Real-time text is text transmitted live while it is being typed or created. The recipient can immediately read the sender's typing, without waiting before reading. This is similar to a telephone conversation where one listens "as words are spoken". This allows text to be used conversationally, provides a sense of contact, eliminates waiting times found in messaging, and is favored by deaf individuals who prefer text conversation. For a visual animation of real-time text, see RealJabber.org [1].

Real-time text has been around for decades in various implementations:

Real-time text is suitable for smooth and rapid mainstream communication in text, as an all-inclusive technology to complement instant messaging. At the same time, real-time text has special usefulness to many audiences including the deaf and other people who cannot use speech on the telephone. This document defines a specification for real-time text transmitted in-band over an XMPP network.

2. Requirements

2.1 Fluid Real-Time Text

  1. Allow transmission of real-time text with a low latency.
  2. Balance low latencies versus system, network and server limitations.
  3. Support message editing in real-time, including text insertions and deletions.
  4. Support transmission of the original intervals between key presses, to preserve look-and-feel of typing independently of transmission intervals.

2.2 In-Band Transmission

  1. Reliable real-time text delivery.
  2. Be backwards compatible with XMPP clients that do not support real-time text.
  3. Minimize reliance on knowledge of network transversal protocols and/or out-of-band transmission protocols.
  4. Compatible with multi-user chat (MUC) and simultaneous logins.

2.3 Flexible and Interoperable

  1. Allow use within existing instant-messaging user interfaces, with minimal UI modifications.
  2. Allow alternate optional presentations of real-time text, including split screen and/or other layouts.
  3. Protocol design allows error recovery, and allows extensions for new features.
  4. Be interoperable with other real-time text protocols via gateways, including RFC 4103 and other standards.

2.4 Accessible

  1. Allow XMPP to follow the ITU-T Rec. F.703 [6] Total Conversation accessibility standard for simultaneous voice, video, and real-time text.
  2. Be a candidate technology for use with Next Generation 9-1-1 / 1-1-2 emergency services.
  3. Be suitable for transcription services and (when coupled with voice at user's choice) for TTY/text telephone alternatives, relay services, and captioned telephone systems.
  4. Be an accessible enhancement for mobile phone text messaging and mainstream instant messaging.

3. Glossary

real-time text – Text transmitted live while it is being typed or created.

real-time message – Recipient's real-time live view of the sender's message still being typed or created.

real-time message edit – An edit operation done by the remote sender, that is transmitted in real-time to the recipient.

action element – An XML element that represents a single real-time message edit, such as text insertion or deletion.

RTT – Acronym for real-time text.

4. Protocol

4.1 RTT Element

Real-time text is transmitted via an <rtt/> child element of a <message/> stanza. The <rtt/> element is transmitted at regular intervals by the sender while a chat message is being composed, to allow the recipient to watch the sender type (and edit) the message before the full message is sent in a <body/> element.

This is a basic example of a real-time message "Hello, my Juliet!", transmitted live while it is being typed, before a final message delivery:

Example 1: Introductory Example

<message to='juliet@capulet.lit' from='romeo@montague.lit/orchard' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello, </t>
  </rtt>
</message>

<message to='juliet@capulet.lit' from='romeo@montague.lit/orchard' type='chat' id='a02'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='1'>
    <t>my </t>
  </rtt>
</message>

<message to='juliet@capulet.lit' from='romeo@montague.lit/orchard' type='chat' id='a03'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='2'>
    <t>Juliet!</t>
  </rtt>
</message>

<message to='juliet@capulet.lit' from='romeo@montague.lit/orchard' type='chat' id='a04'>
  <body>Hello, my Juliet!</body>
</message>

The <rtt/> element contains a series of one or more child elements called action elements that represent real-time message edits such as text being appended, inserted, or deleted. Example 1 illustrates only the <t/> action element, which appends text to the end of a message. For more information, see Real-Time Message Editing.

Transmission of <rtt/> occurs at regular intervals whenever the sender is actively composing a message. If there are no changes to the message since the last transmission, no transmission occurs. For more information, see Transmission Interval.

The namespace of the <rtt/> element is “urn:xmp:rtt:0”.

4.2 RTT Attributes

4.2.1 seq

This REQUIRED attribute is a counter to maintain the integrity of a real-time message. Senders MUST increment the seq attribute by 1 for each subsequent <rtt/> transmitted. Recipients MUST monitor the seq value to verify that it is incrementing. For more info, see Automatic Recovery of Real-Time Text.

The bounds of seq is 31-bits, the range of positive values of a signed integer. The exception to the incrementing rule is <rtt/> elements with an 'event' attribute. In this case, senders MAY use any seq value as the new starting value. For best integrity, seq SHOULD be randomized. The new starting value SHOULD be less than 1 million to allow plenty of incrementing room, and to keep <rtt/> compact.

4.2.2 event

This attribute signals events for real-time messages, such as the start of a new real-time message. The event attribute is omitted from the <rtt/> element, when it is not needed, except in the following situations:

  1. event='new'
    Senders MUST use this value on the first <rtt/> element of a new message, which also delivers the first character(s) being typed in a message. Recipients MUST initialize a new real-time message for display, and then process action elements within this <rtt/>. A new seq value MAY be used.

  2. event='reset'
    Identical to event='new', except it replaces the existing real-time message. Senders MAY use this attribute during Automatic Recovery of Real-Time Text. Recipients MUST support this attribute, and process action elements within this <rtt/> to replace the existing real-time message.

  3. event='cancel'
    Senders MAY use this value to signal recipient to stop transmitting real-time text. Recipients SHOULD clear the real-time message, and discontinue sending back <rtt/> for the remainder of the current chat session until the sender sends another <rtt/> to resume real-time text. No action elements should be included within <rtt/>.

The first <rtt/> element in a chat session, signals the start of real-time text. The <rtt event='cancel'/> signals the end of real-time text in a chat session. There MUST NOT be more than one <rtt/> element per <message/>. 

4.3 Body Element

Upon receipt of <body/>, the message becomes permanent and can not be edited any further. The delivered message is displayed instead of the real-time message. In the ideal case, the message from <body/> is redundant since this delivered message is identical to the final contents of the real-time message. When the sender begins composing a new message after a <body/> is sent, the next <rtt/> transmitted by the sender MUST contain the event='new' attribute.

4.3.1 Backwards Compatible

The real-time text standard simply provides early delivery of text before the <body/> element. The <body/> element continues to follow the XMPP Core [7] standard. Clients that do not support real-time text, will continue to behave normally, displaying complete lines of messages as they are delivered.

4.4 Transmission Interval

For the best balance between interoperability and usability, the transmission interval of <rtt/> for a continuously-changing message SHOULD be approximately 0.7 second. This interval meets ITU-T Rec. F.700 [8] for good real-time text. If a different transmission interval needs to be used, the interval SHOULD be between 0.3 second and 1 second.

A longer interval will lead to a less optimal user experience. Conversely, a much shorter interval may more frequently trigger throttling or flooding protection algorithms in public XMPP servers, leading to dropped <message/> elements and/or Congestion Considerations.

To provide fluid real-time text, one or more of the following methods can be used:

4.5 Real-Time Message Editing

The <rtt/> element MAY contain one or more action elements representing real-time message editing operations, including text being appended, inserted, or deleted.

Most chat clients allow a sender to edit their message before sending (i.e. via a Send button, or hitting Enter). The inclusion of real-time functionality to existing chat client software must not degrade the sender's existing expectation of being able to edit their messages before sending. Thus, in a chat session with real-time text, the recipient can watch the sender compose and edit their message before it is delivered.

4.5.1 Summary of Action Elements

This is a short summary of action elements that operate on a real-time message. For detailed information, see Action Elements.

Table 1:

Action Element Description
Insert Text <t p='#'>text</t> REQUIRED. Insert specified text at position p in message.
Backspace <e p='#' n='#'/> REQUIRED. Remove n characters before position p in message.
Forward Delete <d p='#' n='#'/> REQUIRED. Remove n characters starting at position p in message.
Interval <w n='#'/> RECOMMENDED. Execute a pause of n thousandths of a second.

4.5.2 Rules for Attribute Values

4.5.3 Action Elements

Recipients are REQUIRED to support <t/>, <e/> and <d/> action elements for incoming <rtt/> transmissions, even if not all elements are used for outgoing <rtt/> transmissions. Support for <w/> is RECOMMENDED for both senders and recipients in order to accommodate Preserving Key Press Intervals. Recipients MUST ignore unexpected or unsupported elements within <rtt/>, while continuing to process subsequent action elements. Action elements are immediate child elements of the <rtt/> element, and are never nested. Examples can be found in Use Cases.

4.5.3.1 Element <t/> – Insert Text

REQUIRED. Supports the transmission of key presses, text block inserts, and text being pasted.
Note: Any text normally used in the <body/> element of a <message/> may be used. If the <t/> element is empty, no text modification takes place.

<t p='#'>text</t>

Inserts specified text at position p in the message text.

<t>text</t>

Appends specified text at the end of message. (p defaults to message length)

4.5.3.2 Element <e/> – Backspace

REQUIRED. Supports the behavior of Backspace key presses.
Note: Excess backspaces, at the start of the message, MUST be ignored.

<e n='#' p='#'/>

Remove n characters before position p in message.

<e p='#'/>

Remove 1 character before position p in message. (n defaults to 1)

<e n='#'/>

Remove n characters from end of message. (p defaults to message length)

<e/>

Remove 1 character from end of message. (Both n and p at default values)

4.5.3.3 Element <d/> – Forward Delete

REQUIRED. Supports the behavior of Delete key presses, text block deletes, and text being cut.
Note: Excess deletes, beyond end of message, MUST be ignored.

<d p='#' n='#'/>

Remove n characters beginning at position p in message.

<d p='#'/>

Remove 1 character beginning at position p in message. (n defaults to 1)

4.5.3.4 Element <w/> – Interval

RECOMMENDED. Allows the transmission of intervals between real-time message edits, such as the pauses between key presses. For more information, see Preserving Key Press Intervals.

<w n='#'/>

Executes a pause of n thousandths of a second. This pause may be approximate, and not necessarily be of millisecond precision. The n value SHOULD NOT exceed the Transmission Interval. Also, if a Body Element arrives, pauses SHOULD be interrupted to prevent a delay in message delivery.

4.5.4 Ensuring Accuracy Of Attribute Values

Real-time message edits work only within the boundaries of the current real-time message, and do not affect previous messages. Senders MUST NOT use negative values for any attribute, nor use p values beyond the current message length. However, recipients receiving such values MUST clip negative values to 0, and clip excessively high p values to the current message length.

For senders, p and n values are calculated relative to the plain text version of the message. This is the message otherwise normally transmitted in a <body/> element after all processing is complete, including emoticon graphics as plain text. For recipients, p and n are calculated relative to the message text immediately after XML processing, and before any further processing.

Regardless of the original format of line breaks during XMPP transmission, line breaks are treated as a single code point (LINE FEED U+000A). Conversion of line breaks into a single line feed is REQUIRED for XML processors, according to section 2.11 of XML [9], so a compliant XML processor already do this automatically, and already provide the correct original Unicode text for interoperability.

4.5.5 Unicode Character Counting

For platform-independent interoperability, calculations of p and n values MUST be based on Unicode code points. Different platforms use different internal Unicode encodings, which may be different from the transmission encoding (UTF-8) for XMPP. Consider these factors:

Incorrectly calculated p and n values may cause scrambled text during real-time message editing for many languages. This scrambled text persists until full message delivery, or Message Retransmission. From the perspective of p and n values, a real-time message is treated equivalent to an editable array of Unicode code points, even if not necessarily stored as such.

4.6 Automatic Recovery of Real-Time Text

In a chat session with real-time text, it is critical that the real-time message is identical on both the sender and recipient ends. The loss of a single <rtt/> transmission can represent missing text, or a missing edit. This leads to the real-time message getting out of sync. Recovery of in-progress real-time message is useful in several situations:

4.6.1 Staying In Sync

To stay synchronized, for <rtt/> elements that do not contain an 'event' attribute:

  1. The sender MUST increment the seq attribute for consecutive <rtt/> element.
  2. The recipient MUST monitor the seq attribute value of received <rtt/> elements, to verify that it is incrementing.
  3. The seq values for incoming messages, versus outgoing messages, are independent and kept track of separately.

4.6.2 Detecting Loss of Sync

The sync is considered lost if the seq attribute of the <rtt/> element does not increment as expected. Trying to process certain action elements, after loss of sync, can result in scrambled text. Therefore, to avoid this situation:

  1. The recipient MUST stop processing all subsequent action elements, and pause the current real-time message.
  2. An indicator MAY be used by the recipient to indicate the loss of sync. (i.e. reception bars, color code, missing text indicator, chat state message)

4.6.3 Recovery From Loss of Sync

Recovery occurs when any of the following happens:

  1. A message <body/> is delivered. The Body Element replaces the real-time message.
  2. The event attribute of <rtt/> has a value of new or reset. Processing of real-time MUST restart, with the new starting seq value obtained from this <rtt/> element.

4.6.4 Message Retransmission

In order to prevent recipients from waiting for Recovery From Loss of Sync, senders SHOULD retransmit the contents of a partially-composed message, in the following situations:

A message retransmit is done using the <rtt/> attribute event='reset' (see RTT Attributes).

<rtt event='reset' seq='#' xmlns='urn:xmpp:rtt:0'>
  <t>This is a retransmission of the entire real-time message.</t>
</rtt>

Retransmission SHOULD be done at a regular interval of 10 seconds, unless there are no message changes. This interval is frequent enough to minimize user waiting time, while being infrequent enough to reduce bandwidth overhead. This interval MAY vary in order to reduce average bandwidth requirements for minor message changes and/or for long messages.

5. Determining Support

If a client supports this real-time text protocol, it MUST advertise that fact in its responses via Service Discovery [10] information ("disco#info") requests by returning a feature of urn:xmpp:rtt:0

Example 1. A disco#info query

<iq from='romeo@montague.lit/orchard'
    id='disco1'
    to='juliet@capulet.lit/balcony'
    type='get'>
  <query xmlns='http://jabber.org/protocol/disco#info'/>
</iq>

Example 2. A disco#info response

<iq from='juliet@capulet.lit/balcony'
    id='disco1'
    to='romeo@montague.lit/orchard'
    type='result'>
  <query xmlns='http://jabber.org/protocol/disco#info'>
    <feature var='urn:xmpp:rtt:0'/>
  </query>
</iq>

If this successful response of <feature var='urn:xmpp:rtt:0'/> is not received, the client SHOULD NOT transmit any outgoing <rtt/> elements in <message/> transmissions. This avoids unnecessary consumption of bandwidth to clients that do not support this protocol.

6. Implementation Notes

6.1 Text Presentation

6.1.1 Avoid Bursty Text Presentation

If a long Transmission Interval is used without Preserving Key Press Intervals, then text will appear in intermittent bursts if the display of text is not smoothed. This hurts user experience of real-time text.

6.1.2 Preserving Key Press Intervals

For the highest quality display of text being typed, using Element <w/> – Interval allows the original look-and-feel of typing to be preserved, independently of the transmission interval. Using the <w/> element, the sender can record multiple key presses including key press intervals, and transmit them over the XMPP network in a single <message/>. The recipient can then play back the sender's typing in real-time at original typing speed including the intervals between key presses.

Much like VoIP is a packetization of sound, this spec enables packetization of typing including the original key press intervals. This enables the real-time feel of typing over virtually any network connection, without requiring frequent transmission intervals. Look and feel of typing is also preserved over variable latency connections including XMPP Over BOSH [11], mobile phone, satellite and long international connections with heavy packet-bursting tendencies.

The recipient can watch the sender fluidly compose/edit their message in real-time without any “bursting” effects. This is “Natural Typing”, and appears indistinguishable from local typing. When key press intervals are preserved at high precision, all subtleties of typing are preserved, including the 'mood' (calm typing versus panicked or emphatic typing, etc). For an example transmission of key intervals, see Full Message Including Key Press Intervals.

6.1.3 Time Critical And Low Latency Methods

There are specialized situations such as live transcriptions and captioning (i.e. transcription service, closed captioning provider, captioned telephone, relay services, Remote CART) that demands low latency transmission. Such systems typically use voice recognition and/or stenotype machines, which output text in word bursts rather than a character at a time. Senders with bursty output MAY immediately transmit word bursts of text without buffering. This eliminates any lag caused by the Transmission Interval. It is NOT REQUIRED to monitor or transmit Element <w/> – Interval for transcription. If additional accuracy is required, it is also possible to timecode the <rtt/> elements.

6.1.4 Reduced Precision Text Smoothing Methods

Some software platforms (i.e. JavaScript, BOSH, mobile devices, etc.) may have low-precision timers that impact Transmission Interval and/or Preserving Key Press Intervals. Clients MAY optimize for bandwidth, performance and/or screen repaints by eliminating, merging, or ignoring Element <w/> – Interval selectively, especially those containing shorter intervals. The transmission interval of <rtt/> MAY also vary, either intentionally for optimizations, or due to precision limitation.

Clients MAY choose to implement alternate text-smoothing methods, such as adaptive-rate character-at-a-time output, and/or word buffering for incoming real-time text. Word buffering prevents most typing mistakes from being displayed, which can be a useful mode of operation for certain recipients who may dislike watching the sender's typing mistakes.

6.2 Real-Time Transmission

6.2.1 Monitoring Message Edits

For sending clients, there are several potential methods of capturing typing and message edits, in order to generate action elements for an <rtt/> transmission. However, instead of monitoring key presses directly, the most reliable and practical method is to monitor the text changes to the local message text field:

In a text change event, the current message string can be compared to the previous message string in order to calculate what text changes took place. The appropriate action elements are then generated, to represent text insertions and deletions. If Preserving Key Press Intervals are supported, then the interval is implemented as the time elapsed between text change events. For additional information, see Action Elements and Rules for Attribute Values. The following guidelines are for clients that use keyboard input.

6.2.2 Guidelines for Senders

6.2.3 Guidelines for Receivers

6.3 Optional Remote Cursor

Recipient clients MAY choose to display a cursor (or caret) within incoming real-time messages. This enhances usability of real-time text further, since it becomes easier for a recipient to observe the sender's real-time message edits. Recipient clients that do not support a remote cursor, can simply ignore calculating a cursor position, and skip this section.

6.3.1 Calculating Cursor Position

All action elements always have absolute cursor positioning. When <t/>, <e/>, or <d/> action elements are processed in incoming real-time text, the beginning value for the cursor position calculation is the absolute position value of the p attribute, according to Rules for Attribute Values. The cursor position immediately after an action element, is calculated as follows:

The remote cursor SHOULD be clearly distinguishable from the sender's real local cursor. One example is to use a non-blinking cursor, easily emulated with a Unicode character or the vertical bar character '|'.

6.3.2 Guidelines for Senders

Whenever the cursor is moving without any text modifications (via arrow keys or mouse), the sender MAY transmit extra Element <t/> – Insert Text with an empty string to update the remote cursor position via attribute p. This maintains accurate positioning for the remote cursor in recipients that support a remote cursor. These extra elements are ignored by recipients that do not support a remote cursor.

Monitoring the actual cursor position may need to be done via a “selection changed” event of a text box field in many programming platforms. This event typically monitors text marking/selection operations, and doubles as the event for monitoring the cursor position.

6.4 Other Guidelines

There are other special basic considerations for real-time message transmissions that need to be considered by implementors.

6.4.1 Message Length Limit

A large sequence of rapid message changes may generate a large series of action elements in an <rtt/> element, resulting in the <message/> exceeding the XMPP server's maximum allowed length of a <message/> stanza. This may result in dropped messages. It is acceptable to simply retransmit the whole real-time message using <rtt event='reset'/> if the length of the <rtt/> element would otherwise exceed the application's maximum chat message length.

For long messages, the final <rtt/> transmission may be made in a separate <message/> than the <message/> containing the <body/>. For example:

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='dda'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='95'>
    <t>Hello World...In a Super Long Message! [etc]</t>
  </rtt>
  <body>Hello World...In a Super Long Message! [etc]</body>
</message>

The message MAY be split into two separate message transmissions:

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='dda'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='95'>
    <t>Hello World...In a Super Long Message! [etc]</t>
  </rtt>
</message>

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='ddb'>
  <body>Hello World...In a Super Long Message! [etc]</body>
</message>

6.4.2 Usage With Chat States

Real-time text MAY be accompanied with XEP-0085 Chat State Notifications [12]. These are simple guidelines for <message/> stanzas that include an <rtt/> element:

6.4.3 Usage With Multi-User Chat and Simultaneous Logins

The in-band nature of this real-time text standard allows one-to-many situations. Thus, real-time text is compatible with Multi-User Chat [13] (MUC), as well as concurrent simultaneous logins. Support for real-time text in MUC is OPTIONAL, and is fully Backwards Compatible with group chat participants that do not support real-time text.

6.4.3.1 Multi-User Chat

For MUC, the RTT Element event attribute value of 'cancel' SHOULD NOT be used. This prevents one participant from suppressing real-time text for all participants in a group chat. Participants that turn off real-time text for themselves, can simply ignore incoming <rtt/> and not transmitting outgoing <rtt/>. Participant clients without real-time text (whether unsupported or turned off) will simply see group chat function normally on a line-by-line basis. Participants that enable real-time text during group chat, need to keep track of separate real-time messages on a per-participant basis, via full JID. As a result, participants with real-time text, will see real-time text coming from each participant that have real-time text enabled. Software MAY hide idle real-time messages to minimize on-screen clutter when more than one person is typing. Congestion control MAY also be used, via automatic adjustment of Transmission Interval, see Congestion Considerations.

6.4.3.2 Simultaneous Logins

In simultaneous login situations, transmitting of <rtt/> works in one-to-many situations without any special software support. For many-to-one situations where there is incoming <rtt/> from more than one simultaneous login, the existing Automatic Recovery of Real-Time Text already catches this situation until there is only one typist. A good implementation of Message Retransmission will improve user experience, regardless of whether or not XEP-0296 is used (Best Practices for Resource Locking [14]). Alternatively, clients MAY choose to improve on this behavior, by keeping track of multiple separate real-time messages per full JID, similar to Multi-User Chat.

6.4.4 Performance & Efficiency

With real-time text, frequent screen updates may occur. Screen updates are a potential performance bottleneck, because fast typists type many key presses per second. Optimizing screen updates becomes especially important for slower platforms. Real-time messages should be updated efficiently in a flicker-free manner. Alternatively, to improve performance, the display of real-time messages may be implemented as a separate window or separate display element.

Battery life considerations are closely related to performance, as the addition of real-time text may impact battery life. If Preserving Key Press Intervals are supported, then the implementation of Element <w/> – Interval should be implemented in a battery-efficient manner. The Transmission Interval may vary dynamically to optimize for battery life and wireless reception. For devices where screen updates are an unavoidable inefficient bottleneck, see Reduced Precision Text Smoothing Methods to reduce the number of screen updates per second. Also see XMPP on Mobile Devices [15].

6.4.5 Total Conversation – Combination With Audio And Video

According to ITU-T Rec. F.703, the “Total Conversation” accessibility standard defines the simultaneous use of audio, video, and real-time text. For convenience, chat applications may be designed to have automatic negotiation of as many as possible of the three media preferred by the users.

In the XMPP session environment, the Jingle protocol (Jingle [16]) is available for negotiation and transport of the more time-critical, real-time audio and video media. Any combination of audio, video, and real-time text MAY be used together simultaneously.

7. Use Cases

Most of these examples are deliberately kept simple. In software implementations supporting key press intervals, transmissions will most resemble the last example, Full Message Including Key Press Intervals.

7.1 Three Backspaces

<message to='bob@example.com' from='alice@example.com/home' id='a01' type='chat'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello bcak</t><e/><e/><e/><t>ack</t>
  </rtt>
</message>

Resulting real-time message: "Hello back"
This code sends the misspelled "Hello bcak", then <e/><e/><e/> backspaces 3 times, then sends "ack".

7.2 Three Backspaces In One Action Element

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello bcak</t><e n='3'/><t>ack</t>
  </rtt>
</message>

Resulting real-time message: "Hello back"
This code is the same as the previous example, demonstrating that <e n='3'/> does the same thing as <e/><e/><e/>.

7.3 Message Edits Split Into Multiple Transmissions

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello</t>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='b02'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='1'>
    <t> bcak</t>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='c03'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='2'>
    <e n='3'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='d04'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='3'>
    <t>ack</t>
  </rtt>
</message>

Resulting real-time message: "Hello back"
This code results in the same final text as the previous two examples, segmented into four separate messages.

7.4 Deleting Text From Message

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello Bob, this is Alice!</t><d n='4' p='5'/>
  </rtt>
</message>

Resulting real-time message: "Hello, this is Alice!"
This code outputs "Hello Bob, this is Alice!" then <d n='4' p='5'/> deletes 4 characters from position 5.
(This erases the text " Bob" including the preceding space character).

7.5 Inserting Text Into Message

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello, this is Alice!</t><t p='5'> Bob</t>
  </rtt>
</message>

Resulting real-time message: "Hello Bob, this is Alice!"
This is because the code outputs "Hello, this is Alice!" then the <t p='5'> inserts the specified text " Bob" at position 5.

7.6 Deleting And Replacing Text In Message

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello Bob, tihsd is Alice!</t>
    <d p='11' n='5'/>
    <t p='11'>this</t>
  </rtt>
</message>

Resulting real-time message: "Hello Bob, this is Alice!"
This code outputs "Hello Bob, tihsd is Alice!", then <d p='11' n='5'/> deletes 5 characters at position 11 in the string of text. (erases the mistyped word "tihsd"). Finally, <t p='11'>this</t> inserts the text "this" place of the original misspelled word.

7.7 Multiple Message Edits

This is an example message containing multiple consecutive real-time message edits.

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Helo</t>
    <e/>
    <t>lo...planet</t>
    <e n='6'/>
    <t> World</t>
    <d n='3' p='5'/>
    <t p='5'> there,</t>
  </rtt>
</message>

Resulting real-time message: "Hello there, World", completed in the following series of steps:

Table 2:

Element Action Real -Time Message Cursor Pos
<t>Helo</t> Output "Helo" Helo 4
<e/> Backspace 1 character from end of line. Hel 3
<t>lo...planet</t> Output "lo...planet" at end of line. Hello...planet 14
<e n='6'/> Backspace 6 characters from end of line Hello... 8
<t> World</t> Output " World" at end of line. Hello... World 14
<d n='3' p='5'/> Delete 3 characters at position 5 Hello World 5
<t p='5'> there,</t> Output " there," at position 5 Hello there, World 12

Normally, the action elements are split into multiple separate transmissions. This example also does not illustrate Preserving Key Press Intervals. The Cursor Pos column is only relevant if the Optional Remote Cursor is implemented.

7.8 Three Consecutive Messages

Representing a short chat session of three separate messages:
Bob says: "Hello Alice"
Bob says: "This is Bob"
Bob says: "How are you?"

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello</t>
  </rtt>
</message>

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='b02'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='1'>
    <t> Alice</t>
  </rtt>
  <body>Hello Alice</body>
</message>

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='c03'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='2' event='new'>
    <t>This i</t>
  </rtt>
</message>

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='d04'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='3'>
    <t>s Bob</t>
  </rtt>
  <body>This is Bob</body>
</message>

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='e05'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='4' event='new'>
    <t>How a</t>
  </rtt>
</message>

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='f06'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='5'>
    <t>re yo</t>
  </rtt>
</message>

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='g07'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='6'>
    <t>u?</t>
  </rtt>
  <body>How are you?</body>
</message>

This example also illustrates the following:

7.9 Full Message Including Key Press Intervals

This example is a transmission of “Hello there!” while Preserving Key Press Intervals. It illustrates a four-second typing sequence:

In between each key press, is Element <w/> – Interval to allow the receiving client execute a small pause between action elements, which allows the playback of the typing at its original look-and-feel.

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>H</t>
    <w n='115'/><t>e</t>
    <w n='154'/><t>l</t>
    <w n='151'/><t>l</t>
    <w n='115'/><t>o</t>
    <w n='165'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='b02'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='1'>
    <w n='40'/><t> </t>
    <w n='161'/><t>t</t>
    <w n='137'/><t>e</t>
    <w n='135'/><t>h</t>
    <w n='134'/><t>r</t>
    <w n='93'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='c03'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='2'>
    <w n='109'/><t>e</t>
    <w n='115'/><t>!</t>
    <w n='330'/><t p='11'/>
    <w n='108'/><t p='10'/>
    <w n='38'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='d04'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='3'>
    <w n='109'/><t p='9'/>
    <w n='111'/><e p='9'/>
    <w n='106'/><e p='8'/>
    <w n='138'/><t p='7'>h</t>
    <w n='209'/><t p='8'>e</t>
    <w n='27'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='d04'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='4'>
    <w n='445'/><t p='12'>
  </rtt>
  <body>Hello there!</body>
</message>

This example also illustrate the following:

8. Interoperability Considerations

There are other real-time text formats with interoperability considerations relating to the session setup level, the media transport level, and presentation level. For each environment where interoperability is supported, an interoperability specification should be documented that covers addressing, session control, media negotiation and media transcoding.

8.1 Other Real-Time Text Standards

It is noted there is also another real-time text standard (RFC 4103, IETF RFC 5194 [17]), used for SIP sessions with real-time text. In the situation where an implementor needs to decide which real-time text standard to use, it is generally recommended to use the real-time text specification of the specific session control standard in use for that particular session. This varies from implementation to implementation. For example, Google Talk network uses XMPP messaging for instant messages sent during audio/video conversations. Therefore, in this situation, it is recommended to use this XEP-0301 specification to add real-time text functionality. However, there are other situations where it is necessary to support multiple real-time-text standards, and to interoperate between the multiple real-time text standards.

8.2 RFC 4103 and T.140

One environment for such interoperability considerations is SIP with real-time text (also called Text over IP, or ToIP) as specified in ITU-T T.140 and IETF RFC 4103. This protocol combination is specified by IETF, and by regional emergency service organizations, to be one of the protocols supported for IP based real-time emergency calls that support real-time text. Another reason is that SIP is the currently dominating peering protocol between services, and many implementations of real-time text in SIP exist.

Interoperability implies addressing translation, media negotiation and translation, and media transcoding. For media transcoding between this specification and T.140/RFC 4103, the real-time text transcoding is straight forward, except the editing feature of this specification. Backwards positioning and insertion or deletion far back in the message can cause a large number of erase operations in T.140, that takes time and bandwidth to convey.

It should be noted that T.140 specifies use of ISO 6429 control codes for presentation characteristics such as text color etc, that are not covered in this version of this specification. All control codes from both sides that cannot be presented on the other side of the conversion, must be filtered off in order to not disturb the presentation of text.

Also, see Total Conversation – Combination With Audio And Video.

9. Internationalization Considerations

The main internationalization consideration involve real-time message editing of international and mixed-language text. Correct calculations for Action Elements based on Unicode Character Counting is necessary to avoid scrambled text for many languages.

10. Security Considerations

10.1 Privacy

It is important for implementors of real-time text to educate users about real-time text. Users of real-time text should be aware that their typing in the local input buffer is now visible to everyone in the current chat conversation. This may have security implications if users copy & paste private information into their chat entry buffer (i.e. a shopping invoice) before editing out the private parts of the pasted text (i.e. a credit card number) before they send the message. With real-time message editing, recipients can watch all text changes that occur in the sender's text, before the sender sends the final message.

10.2 Congestion Considerations

The nature of real-time text result in more frequent transmission of <message/> elements than may otherwise happen in a non-real-time text conversation. This may lead to increased network and server loading of XMPP networks. Care SHOULD to be taken to use a reasonable Transmission Interval, and avoid transmitting messages at an excessive rate, to avoid creating unnecessary congestion on public XMPP networks. Also, see Best Practices to Discourage Denial of Service Attacks [18].

Network monitoring mechanisms (i.e. Message Delivery Receipts [19] and/or XMPP Ping [20], etc.) MAY be used to monitor reliability and latency, in order to temporarily adjust the interval to prevent failure of real-time text transmissions during extreme network conditions. This is also useful for mission-critical applications such as Next Generation 9-1-1 emergency services.

The load between participants using this specification in the recommended way, will cause a load that is only marginally higher than a user communicating without this specification. Bandwidth overhead of real-time text is very low compared to many other activities possible on XMPP networks including VoIP and file transfers.

11. IANA Considerations

This document requires no interaction with the Internet Assigned Numbers Authority (IANA).

12. XMPP Registrar Considerations

12.1 Protocol Namespaces

The XMPP Registrar should include "urn:xmpp:rtt:0" in its registry of protocol namespaces (see <http://xmpp.org/registrar/namespaces.html>).

12.2 Namespace Versioning

If the protocol defined in this specification undergoes a revision that is not fully backwards-compatible with an older version, the XMPP Registrar shall increment the protocol version number found at the end of the XML namespaces defined herein, as described in Section 4 of XEP-0053.

13. XML Schema

<?xml version='1.0' encoding='UTF-8'?>

<xs:schema
    xmlns:xs='http://www.w3.org/2001/XMLSchema'
    targetNamespace='urn:xmpp:rtt:0'
    xmlns='urn:xmpp:rtt:0'
    elementFormDefault='qualified'>

  <xs:annotation>
    <xs:documentation>
      The protocol documented by this schema is defined in
      XEP-0301: http://www.xmpp.org/extensions/xep-0301.html
    </xs:documentation>
  </xs:annotation>

  <xs:element name='rtt'>
    <xs:complexType>
      <xs:attribute name='seq' type='xs:unsignedInteger' use='required'/>
      <xs:attribute name='event' type='xs:string' use='optional'/>
      <xs:sequence>
        <xs:element ref='t' minOccurs='0' maxOccurs='unbounded'/>
        <xs:element ref='e' minOccurs='0' maxOccurs='unbounded'/>
        <xs:element ref='d' minOccurs='0' maxOccurs='unbounded'/>
        <xs:element ref='w' minOccurs='0' maxOccurs='unbounded'/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name='t' type='xs:string'>
    <xs:complexType>
      <xs:attribute name='p' type='xs:unsignedInteger' use='optional'/>
    </xs:complexType>
  </xs:element>

  <xs:element name='e' type='empty'>
    <xs:complexType>
      <xs:attribute name='p' type='xs:unsignedInteger' use='optional'/>
      <xs:attribute name='n' type='xs:unsignedInteger' use='optional'/>
    </xs:complexType>
  </xs:element>

  <xs:element name='d' type='empty'>
    <xs:complexType>
      <xs:attribute name='p' type='xs:unsignedInteger' use='required'/>
      <xs:attribute name='n' type='xs:unsignedInteger' use='optional'/>
    </xs:complexType>
  </xs:element>

  <xs:element name='w' type='empty'>
    <xs:complexType>
      <xs:attribute name='n' type='xs:unsignedInteger' use='required'/>
    </xs:complexType>
  </xs:element>

  <xs:simpleType name='empty'>
    <xs:restriction base='xs:string'>
      <xs:enumeration value=''/>
    </xs:restriction>
  </xs:simpleType>

</xs:schema>


Appendices


Appendix A: Document Information

Series: XEP
Number: 0301
Publisher: XMPP Standards Foundation
Status: Experimental
Type: Standards Track
Version: 0.2
Last Updated: 2012-03-19
Approving Body: XMPP Council
Dependencies: XMPP Core, XEP-0020
Supersedes: None
Superseded By: None
Short Name: NOT_YET_ASSIGNED
Source Control: HTML
This document in other formats: XML  PDF


Appendix B: Author Information

Mark Rejhon

Organization: RealJabber.org and Rejhon Technologies Inc.
Email: mark@realjabber.org
JabberID: markybox@gmail.com
URI: http://www.realjabber.com


Appendix C: Legal Notices

Copyright

This XMPP Extension Protocol is copyright © 1999 - 2012 by the XMPP Standards Foundation (XSF).

Permissions

Permission is hereby granted, free of charge, to any person obtaining a copy of this specification (the "Specification"), to make use of the Specification without restriction, including without limitation the rights to implement the Specification in a software program, deploy the Specification in a network service, and copy, modify, merge, publish, translate, distribute, sublicense, or sell copies of the Specification, and to permit persons to whom the Specification is furnished to do so, subject to the condition that the foregoing copyright notice and this permission notice shall be included in all copies or substantial portions of the Specification. Unless separate permission is granted, modified works that are redistributed shall not contain misleading information regarding the authors, title, number, or publisher of the Specification, and shall not claim endorsement of the modified works by the authors, any organization or project to which the authors belong, or the XMPP Standards Foundation.

Disclaimer of Warranty

## NOTE WELL: This Specification is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. ##

Limitation of Liability

In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall the XMPP Standards Foundation or any author of this Specification be liable for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising from, out of, or in connection with the Specification or the implementation, deployment, or other use of the Specification (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if the XMPP Standards Foundation or such author has been advised of the possibility of such damages.

IPR Conformance

This XMPP Extension Protocol has been contributed in full conformance with the XSF's Intellectual Property Rights Policy (a copy of which can be found at <http://xmpp.org/about-xmpp/xsf/xsf-ipr-policy/> or obtained by writing to XMPP Standards Foundation, 1899 Wynkoop Street, Suite 600, Denver, CO 80202 USA).

Appendix D: Relation to XMPP

The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 3920) and XMPP IM (RFC 3921) specifications contributed by the XMPP Standards Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this document has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.


Appendix E: Discussion Venue

The primary venue for discussion of XMPP Extension Protocols is the <standards@xmpp.org> discussion list.

Discussion on other xmpp.org discussion lists might also be appropriate; see <http://xmpp.org/about/discuss.shtml> for a complete list.

Errata can be sent to <editor@xmpp.org>.


Appendix F: Requirements Conformance

The following requirements keywords as used in this document are to be interpreted as described in RFC 2119: "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", "RECOMMENDED"; "SHOULD NOT", "NOT RECOMMENDED"; "MAY", "OPTIONAL".


Appendix G: Notes

1. RealJabber.org is the author's web site containing work related to this specification, including animation examples of what real time text looks like. <http://www.realjabber.org>.

2. IETF RFC 4103: RTP Payload for Text Conversation. <http://tools.ietf.org/html/rfc4103>.

3. ITU-T T.140: Protocol for multimedia application text conversation. <http://www.itu.int/rec/T-REC-T.140>.

4. AOL AIM Real Time Text: <http://help.aol.com/help/microsites/microsite.do?cmd=displayKC&externalId=223568>.

5. Reach112: European emergency service with real-time text. <http://www.reach112.eu>.

6. ITU-T Rec. F.703: Multimedia conversational services. <http://www.itu.int/rec/T-REC-F.703>.

7. RFC 6120: Extensible Messaging and Presence Protocol (XMPP): Core <http://tools.ietf.org/html/rfc6120>.

8. ITU-T Rec. F.700: Framework Recommendation for multimedia services <http://www.itu.int/rec/T-REC-F.700>.

9. XML: Extensible Markup Language 1.0 (Fifth Edition). <http://www.w3.org/TR/xml/>.

10. XEP-0030: Service Discovery <http://xmpp.org/extensions/xep-0030.html>.

11. XEP-0206: XMPP Over BOSH <http://xmpp.org/extensions/xep-0206.html>.

12. XEP-0085: Chat State Notifications <http://xmpp.org/extensions/xep-0085.html>.

13. XEP-0045: Multi-User Chat <http://xmpp.org/extensions/xep-0045.html>.

14. XEP-0296: Best Practices for Resource Locking <http://xmpp.org/extensions/xep-0296.html>.

15. XEP-0286: XMPP on Mobile Devices <http://xmpp.org/extensions/xep-0286.html>.

16. XEP-0166: Jingle <http://xmpp.org/extensions/xep-0166.html>.

17. IETF RFC 5194: Framework for Real-Time Text over IP Using the Session Initiation Protocol (SIP). <http://tools.ietf.org/html/rfc5194>.

18. XEP-0205: Best Practices to Discourage Denial of Service Attacks <http://xmpp.org/extensions/xep-0205.html>.

19. XEP-0184: Message Delivery Receipts <http://xmpp.org/extensions/xep-0184.html>.

20. XEP-0199: XMPP Ping <http://xmpp.org/extensions/xep-0199.html>.


Appendix H: Revision History

Note: Older versions of this specification might be available at http://xmpp.org/extensions/attic/

Version 0.2 (2012-03-19)

Lots of edits. Simplifications, improvements and corrections. Forward and backward compatible with version 0.1.

(MDR)

Version 0.1 (2011-06-29)

Initial published version.

(psa)

Version 0.0.3 (2011-06-25)

Third draft, recommended edits.

(MDR)

Version 0.0.2 (2011-06-15)

Second draft.

(MDR)

Version 0.0.1 (2011-02-21)

First draft.

(MDR)

END