XEP-0301: In-Band Real Time Text

Abstract:This is a specification for real-time text transmitted in-band over an XMPP session.
Author:Mark Rejhon
Copyright:© 1999 - 2011 XMPP Standards Foundation. SEE LEGAL NOTICES.
Status:Experimental
Type:Standards Track
Version:0.1
Last Updated:2011-06-29

WARNING: This Standards-Track document is Experimental. Publication as an XMPP Extension Protocol does not imply approval of this proposal by the XMPP Standards Foundation. Implementation of the protocol described herein is encouraged in exploratory implementations, but production systems are advised to carefully consider whether it is appropriate to deploy implementations of this protocol before it advances to a status of Draft.


Table of Contents


1. Introduction
2. Requirements
    2.1. Fluid Real-Time Text
    2.2. Interoperable
    2.3. In-Band Transmission
    2.4. Flexible
3. Glossary
4. Protocol
    4.1. RTT Element
    4.2. RTT Attributes
       4.2.1. xmlns
       4.2.2. seq
       4.2.3. event
    4.3. Body Element
       4.3.1. Backwards Compatible
       4.3.2. Behavior in Clients Supporting This Specification
    4.4. Transmission Interval
    4.5. Real-Time Actions
       4.5.1. Summary of Action Elements
         4.5.1.1. Edit Actions (Tier 1)
         4.5.1.2. Presentation Actions (Tier 2)
         4.5.1.3. Rules for Attribute Values
       4.5.2. Action Elements
         4.5.2.1. Element <t> – Insert Text
         4.5.2.2. Element <e> – Backspace
         4.5.2.3. Element <d> – Forward Delete
         4.5.2.4. Element <w> – Interval
         4.5.2.5. Element <c> – Cursor Position
         4.5.2.6. Element <g> – Flash
       4.5.3. Processing Rules
    4.6. Error Recovery of Real-Time Text
       4.6.1. Staying In Sync
       4.6.2. Detecting Loss of Sync
       4.6.3. Recovery From Loss of Sync
       4.6.4. Helping The Recipient Stay In Sync
5. Determining Support
6. Implementation Notes
    6.1. Key Press Intervals
       6.1.1. Avoid Bursty Text Presentation
       6.1.2. Preserving Key Press Intervals
    6.2. Real-time Transmission
       6.2.1. Monitoring Message Edits
       6.2.2. Guidelines for Senders
       6.2.3. Guidelines for Receivers
    6.3. Remote Cursor
       6.3.1. Calculating Cursor Position
       6.3.2. Guidelines for Senders
       6.3.3. Guidelines for Receivers
    6.4. Other Guidelines
       6.4.1. Message Length Limit
       6.4.2. Performance
       6.4.3. Battery Life
7. Use Cases
    7.1. Three Backspaces
    7.2. Three Backspaces In One Action Element
    7.3. Message Edits Split Into Multiple Transmissions
    7.4. Deleting Text From Message
    7.5. Inserting Text Into Message
    7.6. Deleting And Replacing Text In Message
    7.7. Multiple Message Edits
    7.8. Three Consecutive Messages
    7.9. Real World Message With Key Press Intervals
8. Interoperability Considerations
    8.1. RFC 4103 and T.140
    8.2. Combination With Other Real-Time Media
       8.2.1. Total Conversation
9. Internationalization Considerations
10. Security Considerations
    10.1. Privacy
    10.2. Congestion Considerations
11. IANA Considerations
12. XMPP Registrar Considerations
    12.1. Protocol Namespaces
    12.2. Namespace Versioning
13. XML Schema

Appendices
    A: Document Information
    B: Author Information
    C: Legal Notices
    D: Relation to XMPP
    E: Discussion Venue
    F: Requirements Conformance
    G: Notes
    H: Revision History


1. Introduction

This document defines a specification for real-time text transmitted in-band over an XMPP session.

Real-time text is text that is sent as it is created. The recipient can watch the sender type "as written words are typed" – similar to a telephone conversation where one listens to a conversation "as words are spoken". It provides a sense of contact in conversation, eliminates waiting times found in messaging, and is also favored by the deaf who prefer text conversation. For a visual animation of real-time text, see RealJabber.org [1].

Real-time text has been around for decades in various implementations:

2. Requirements

2.1 Fluid Real-Time Text

  1. Allow transmission of real-time text with a low latency.
  2. Support real-time message editing, including text insertions, deletions and cursor movements.
  3. Support transmission of the original intervals between key presses, to preserve look-and-feel of typing independently of transmission intervals.

2.2 Interoperable

  1. Balance low latencies versus system, network and server limitations.
  2. Be backwards compatible with XMPP clients that do not support real-time text.
  3. Be interoperable with other real-time text protocols via gateways, including RFC 4103 and other standards.

2.3 In-Band Transmission

  1. Reliable real-time text delivery.
  2. Provide a high level XML mechanism of transmitting real-time text.
  3. Minimize reliance on knowledge of network transversal protocols and/or out-of-band transmission protocols.

2.4 Flexible

  1. Allow use within existing instant-messaging user interfaces, with minimal modifications.
  2. Allow alternate presentations of real-time text, including split screen and/or other layouts.
  3. Protocol design extensible for new features.
  4. Protocol recovery from lost/missing messages.

3. Glossary

real-time text – Text transmitted and displayed in real-time as it is typed or entered.

real-time message – A chat message that changes in real-time, via real-time text, and as it is edited by the remote sender.

real-time action – An action done to a real-time message, such as an edit action or a presentation action.

real-time chat session – A chat session that supports real-time messages.

RTT – Acronym for real-time text. This is also the name of the main XML element used by this standard.

action element – An XML element that indicates a single edit action, or presentation action.

edit action – A text modification of any kind, including text insertion or deletion. This may be as small as a single key press.

presentation action – A presentation behavior such as the movement of a visible cursor, a pause, or a flash.

4. Protocol

4.1 RTT Element

Real-time text is transmitted via an <rtt> child element of a <message> stanza. The <rtt> element is transmitted at regular intervals by the sender while a chat message is being composed, to allow the recipient to watch the sender type (and edit) the message before the full message is sent.

This is a basic example of a real-time message "Hello, my Juliet!", transmitted in three real-time text fragments, followed by a final message delivery:

Example 1: Introductory Example

<message to='juliet@capulet.lit' from='romeo@montague.lit/orchard' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello, </t>
  </rtt>
</message>

<message to='juliet@capulet.lit' from='romeo@montague.lit/orchard' type='chat' id='a02'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='1'>
    <t>my </t>
  </rtt>
</message>

<message to='juliet@capulet.lit' from='romeo@montague.lit/orchard' type='chat' id='a03'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='2'>
    <t>Juliet!</t>
  </rtt>
</message>

<message to='juliet@capulet.lit' from='romeo@montague.lit/orchard' type='chat' id='a04'>
  <body>Hello, my Juliet!</body>
</message>

The <rtt> element contains a series of one or more child elements representing real-time actions including edit actions and/or presentation actions. Example 1 illustrates only a single edit action, the <t> action element, which simply adds text to the end of a message. For more information about action elements, see Real-Time Actions.

If the recipient client does not support this real-time text standard, the sender SHOULD NOT transmit the <rtt> element. For more information, see Determining Support.

4.2 RTT Attributes

4.2.1 xmlns

This REQUIRED attribute MUST be urn:xmpp:rtt:0

4.2.2 seq

This REQUIRED attribute indicates the sequence number, and MUST begin at 0 in the first <rtt> element sent at the start of a real-time chat session. This attribute MUST increment by 1 for every <rtt> element sent until the end of the session. This value may be used by the receiver to detect lost <message> elements, which affects the integrity of real-time text. For more information, see Error Recovery of Real-Time Text.

4.2.3 event

This attribute signals session events for real-time messages, such as the start of a new real-time message. The event attribute is omitted from the <rtt> element, when it is not needed.

  1. event='new'
    The sender MUST use this value on the first <rtt> element of a new real-time message. The recipient MUST initialize a blank real-time message for display, before processing the <rtt> payload, if any is provided.

  2. event='reset'
    The sender MAY use this value to retransmit a real-time message. The recipient MUST clear the existing real-time message, before processing the <rtt> payload. One use case is for error recovery. Another use case is where the recipient logs off and on, all while the sender is still composing a message. This allows the recipient re-display the real-time message.

  3. event='start'
    The sender MAY use this value to indicate the start of a real-time session. The <rtt> payload MUST be empty, with no real-time actions.

  4. event='cancel'
    The sender MAY use this value to indicate the end of a real-time session. The recipient SHOULD clear or close the current real-time message, if any is still displayed. An example case is closing a chat window before a message is delivered. The <rtt> payload MUST be empty, with no real-time actions.

Only one <rtt> element is allowed per <message>. Therefore, to transmit multiple events, use multiple consecutive <message>'s.

4.3 Body Element

To turn a real-time message into a permanent delivered message, the sender MUST transmit the whole message as a standard <body> child element within the <message> stanza.

4.3.1 Backwards Compatible

The <body> element continues to follow the XMPP Core [5] standard. This keeps backwards compatibility with XMPP clients that do not support this specification. Such clients will continue to behave normally, displaying complete lines of messages as they are delivered.

4.3.2 Behavior in Clients Supporting This Specification

Upon receipt of <body>, the recipient MUST replace the real-time message with the final delivered message from <body>. The message is thus becomes permanent and can not be edited any further.

In the ideal case, the message in a <body> is redundant since it simply repeats the entire contents of the real-time message. In the event that there are lost <messages>, the delivery of the <body> permits Error Recovery of Real-Time Text.

After sending a completed message as a <body>, the sender may begin a real-time message, using the event='new' attribute.

4.4 Transmission Interval

For the best balance between interoperability and usability, the interval of transmission of <rtt> for a continuously-changing real-time message SHOULD be once every 1 second. If there has been no changes to the real-time message, no transmission should take place.

A much shorter interval may more frequently trigger the flooding protection algorithms in XMPP servers, leading to dropped <message> elements and/or Congestion Considerations. A longer interval will lead to a less optimal user experience. One second is a balance that meets the requirements of real-time text. This interval is mentioned in other real-time text standards, including section 5.4 of IETF RFC 4103 and section 5.2.2 of IETF RFC 5194 [6], used for SIP.

To smooth the output of text, this specification supports transmission of the sender's original Key Press Intervals. This allows the recipient software to display the sender's typing at the original speed, regardless of the transmission interval.

4.5 Real-Time Actions

The <rtt> element is used to transmit a series of one or more real-time actions, including edit actions and presentation actions.

Most chat clients allow a sender to edit their message before sending (i.e. via a Send button, or hitting Enter). The inclusion of real-time functionality to existing chat client software must not degrade the sender's existing expectation of being able to edit their messages before sending. Thus, in a real-time chat session, the recipient can watch the sender compose and edit their message before it is delivered.

Edit actions include typing of text, backspacing, and blocks of text being inserted or deleted. In addition, a real-time chat session may also include presentation actions, including:

Each real-time action is represented by an action element. Examples can be found in Use Cases.

4.5.1 Summary of Action Elements

The following is a short summary. For more detailed information, see Action Elements.

4.5.1.1 Edit Actions (Tier 1)

Table 1:

Action Element Description
Insert Text <t ='#'>text</t> REQUIRED. Insert specified text at position p in message.
Backspace <e p='#' n='#'/> REQUIRED. Remove n characters to the left of position p in message.
Forward Delete <d p='#' n='#'/> REQUIRED. Remove n characters starting at position p in message.

4.5.1.2 Presentation Actions (Tier 2)

Table 2:

Action Element Description
Interval <w n='#'/> RECOMMENDED. Execute a pause of n thousandths of a second.
Cursor Position <c p='#'/> OPTIONAL. Move cursor to position p in message.
Flash <g/> OPTIONAL. Execute a visual flash, beep, or buzz.

4.5.1.3 Rules for Attribute Values

  • The n and p attributes are unsigned 32-bit integers, represented as a string.
  • If the n attribute is omitted, the default value for n is 1.
  • If the p attribute is omitted, the default value for p is the length of the current real-time message.
  • A p value of 0 represents the start of the message.
  • n and p values are counts of individual Unicode code points.

For interoperability of p and n values, processing MUST be done on the original Unicode real-time message. For both senders and receivers, this is the version of the Unicode message text without Unicode normalization, emoticon graphics images, display text formatting, processing of Unicode combining marks, etc. For recipients obtaining text from the <t> element, this is the Unicode text immediately after XML processing, and before any further processing. From the perspective of p and n values, a real-time message is treated as an editable array of Unicode code points.

Regardless of the original format of line breaks during XMPP transmission, line breaks are treated as a single code unit (LINE FEED U+000A) for the purposes of real-time message processing. Conversion of line breaks into a single U+000A character is REQUIRED for XML processors, according to section 2.11 of XML [7], so a compliant XML processor already do this automatically, and already provide the correct original Unicode text for interoperability.

NOTE WELL: Extreme care MUST be taken to correctly calculate n and p values based on Unicode code points, to avoid corruption of the real-time message during real-time editing. For more information, see Internationalization Considerations.

4.5.2 Action Elements

4.5.2.1 Element <t> – Insert Text

(Tier 1) REQUIRED. Supports the transmission of key presses, text block inserts, and text being pasted.
Note: Any text permitted in the <body> element of a <message> may be used, subject to the rules in XMPP Core. More examples can be found in Use Cases.

<t p='#'>text</t>

Inserts specified text at position p in the message text.

<t>text</t>

Appends specified text at the end of message. (p defaults to message length)

4.5.2.2 Element <e> – Backspace

(Tier 1) REQUIRED. Supports the behavior of Backspace key presses.
Note: Direction 'left' represents the numeric direction. Thus, for right-to-left text (i.e. Arabic), numeric 'left' represents visible 'right'.

<e n='#' p='#'/>

Remove n characters to the left of position p in message.

<e p='#'/>

Remove 1 character to the left of position p in message. (n defaults to 1)

<e n='#'/>

Remove n characters from end of message. (p defaults to message length)

<e/>

Remove 1 character from end of message. (Both n and p at default values)

4.5.2.3 Element <d> – Forward Delete

(Tier 1) REQUIRED. Supports the behavior of Delete key presses, text block deletes, and text being cut.
Note: Direction 'right' represents the numeric direction. Thus, for right-to-left text (i.e. Arabic), numeric 'right' represents visible 'left'.

<d p='#' n='#'/>

Remove n characters to the right of position p in message.

<d p='#'/>

Remove 1 character to the right of position p in message. (n defaults to 1)

4.5.2.4 Element <w> – Interval

(Tier 2) RECOMMENDED. Allows the transmission of the original intervals between real-time actions, including the pauses between key presses. For more information, see Key Press Intervals.

<w n='#'/>

Executes a pause of n thousandths of a second. The n value SHOULD NOT exceed the Transmission Interval. Also, if a Body Element arrives, pauses SHOULD be interrupted to prevent message delivery delay.

4.5.2.5 Element <c> – Cursor Position

(Tier 2) OPTIONAL. Allows the transmission of cursor positions. This allows the recipient to see the sender's cursor in their real-time message, and makes it easier to track the sender's message edits. For more information, see Remote Cursor.

<c p='#'/>

Moves cursor (caret) to the character position p in message.

4.5.2.6 Element <g> – Flash

(Tier 2) OPTIONAL. Allows a flash/beep/buzz feature. This feature is the real-time version of Attention [8], and MAY execute the same alerting method.
Note: This supports real-time text interoperability with similar features in text telephones for the deaf (TTY / TDD), ITU-T T.140 implementations, and Control-G beep at consoles.

<g/>

Executes a brief flash, sound, vibration, etc.

4.5.3 Processing Rules

4.6 Error Recovery of Real-Time Text

In a real-time chat session, it is critical that the real-time message is identical on both the sender and recipient ends. The loss of a single <rtt> transmission can represent missing text, or a missing edit. This leads to the real-time message getting out of sync, the message becoming different on the sender versus the recipient ends.

Transmissions of <message> elements may be lost for several reasons. One reason is that a recipient may disconnect and reconnect while a sender is still typing a message. Another reason is some XMPP servers may drop <message> elements automatically (i.e. flooding protection).

4.6.1 Staying In Sync

To stay synchronized:

  1. The sender MUST increment the seq attribute for each consecutive <rtt> element sent.
  2. The recipients MUST monitor the seq attribute value of received <rtt> elements, to verify that it is incrementing.
  3. The seq values for the sender, and for the recipient, are independent and kept track of separately.

4.6.2 Detecting Loss of Sync

The sync is considered lost if the seq attribute of the <rtt> element does not increment as expected. Trying to process certain real-time edit actions after loss of sync, will result in scrambled text. Therefore, to avoid this situation:

  1. The recipient MUST stop processing all subsequent real-time action elements, and freeze the current real-time message.
  2. An indicator (i.e. reception bars, color code, missing text indicator) or a chat state message (i.e. “Typing Frozen...”) MAY be used by the recipient to indicate the loss of sync.

4.6.3 Recovery From Loss of Sync

Recovery occurs when any of the following happens:

  1. A message <body> is delivered. The frozen real-time message MUST be replaced with this delivered message.
  2. The event attribute of <rtt> has a value of new or reset. Processing of real-time text MUST resume, with the new correct seq value obtained from this <rtt> element.

4.6.4 Helping The Recipient Stay In Sync

The sender MAY help the recipient stay in sync by automatically retransmitting the real-time message whenever the recipient status changes from offline to online. The entire contents of the real-time message, may be retransmitted using the <rtt> attribute event='reset' with a single Insert Text action.

<rtt event='reset' seq='#'>
  <t>This is a retransmission of the entire real-time message.</t>
</rtt>

5. Determining Support

If a client supports the Real Time Text protocol, it MUST advertise that fact in its responses via Service Discovery [9] information ("disco#info") requests by returning a feature of urn:xmpp:rtt:0

Example 1. A disco#info query

<iq from='romeo@montague.lit/orchard'
    id='disco1'
    to='juliet@capulet.lit/balcony'
    type='get'>
  <query xmlns='http://jabber.org/protocol/disco#info'/>
</iq>

Example 2. A disco#info response

<iq from='juliet@capulet.lit/balcony'
    id='disco1'
    to='romeo@montague.lit/orchard'
    type='result'>
  <query xmlns='http://jabber.org/protocol/disco#info'>
    <feature var='urn:xmpp:rtt:0'/>
  </query>
</iq>

If this successful response of <feature var='urn:xmpp:rtt:0'/> is not received, the client SHOULD NOT transmit any outgoing <rtt> elements in <message> transmissions. This avoids unnecessary consumption of bandwidth to clients that do not support this protocol.

6. Implementation Notes

6.1 Key Press Intervals

6.1.1 Avoid Bursty Text Presentation

To prevent flooding the public XMPP network, transmissions of messages containing real-time text is rate-limited to the recommended <rtt> message transmission interval, usually 1 second according to Transmission Interval. If the display of text is not smoothed, text will appear in intermittent bursts. This hurts usability of real-time text.

6.1.2 Preserving Key Press Intervals

Through the use of the RECOMMENDED Element <w> – Interval, the original look-and-feel of typing can be preserved, despite the long transmission interval. Using the <w> element, the sender can record multiple key presses including key press intervals, and transmit them over the XMPP network in a single <message>. The recipient can then play back the sender's typing in real-time at original typing speed including the intervals between key presses. The text is displayed exactly as it was typed.

Much like VoIP is a packetization of sound, this spec enables packetization of typing including the original key press intervals. This enables the real-time feel of typing over virtually any Internet connection, and without requiring shorter transmission intervals. Look and feel of typing is also preserved over polled XMPP including XMPP Over BOSH [10], as well as over satellite and long international connections with heavy packet-bursting tendencies and variable latencies.

The recipient can watch the sender fluidly compose/edit their message in real-time without any “bursting” effects. This is “Natural Typing”, and appears indistinguishable from local typing. Since all key press intervals are preserved at a high precision, all subtleties of typing are preserved, including the 'mood' (calm typing versus panicked or emphatic typing, etc).

For an example transmission of key intervals, see Real World Message With Key Press Intervals.

6.2 Real-time Transmission

6.2.1 Monitoring Message Edits

For sending clients, there are several methods of capturing typing and message edits, in order to generate action elements for an <rtt> transmission. The most reliable and practical method is to monitor the text change event of a text box field (rather than monitoring key press events) since:

In the text change event, the current message string can be compared to the previous message string in order to calculate what text changes took place. For more information, see Rules for Attribute Values. The appropriate action elements are then generated, to represent text insertions and deletions. The key press interval can be measured as the time elapsed in milliseconds between text change events.

6.2.2 Guidelines for Senders

6.2.3 Guidelines for Receivers

6.3 Remote Cursor

Senders MAY choose to transmit changes to cursor positions. Recipient clients MAY choose to display a cursor (or caret) within incoming real-time messages. This enhances usability of real-time text further, since it becomes easier for a recipient to observe the sender's real-time message edits.

6.3.1 Calculating Cursor Position

The <c> element (Cursor Position) MAY be used to specify an exact cursor position, as a zero-based index into the real-time message string. While between <c> elements, the current cursor position MAY be calculated from the last edit action element as follows:

6.3.2 Guidelines for Senders

The sender MAY choose to transmit cursor positions via <c> elements, only whenever the actual cursor position is different from the calculated position according to Calculating Cursor Position above.

Monitoring the actual cursor position may need to be done via a “selection changed” event of a text box field in many programming platforms. This event typically monitors the start/end indexes of a text marking operation, and usually doubles as the event for monitoring the cursor position. In this case, the start index should be used, since the transmission of the visual appearance of text marking operations is not supported in this current specification.

6.3.3 Guidelines for Receivers

Recipient software MAY choose to display a remote cursor within received real-time messages. The remote cursor SHOULD be clearly distinguishable from the sender's local cursor. One example is to use a non-blinking cursor, easily emulated with the vertical bar character '|'.

While waiting for the next <c> element (if any), the cursor position MAY be calculated from the last edit action element according to Calculating Cursor Position.

6.4 Other Guidelines

There are other special basic considerations for real-time message transmissions that need to be considered by implementors.

6.4.1 Message Length Limit

A large sequence of rapid message changes may generate a large series of action elements in an <rtt> element, resulting in the <message> exceeding the XMPP server's maximum allowed length of a <message> stanza. This may result in dropped messages. It is acceptable to simply retransmit the whole real-time message using <rtt event='reset'> if the length of the <rtt> element would otherwise exceed the application's maximum chat message length. The process of retransmitting the whole real-time message, has the disadvantage of discarding Key Press Intervals for one <rtt> element.

For long messages, the final <rtt> transmission may be made in a separate <message> than the <message> containing the <body>. For example:

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='dda'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='95'>
    <t>Hello World...In a Super Long Message! [etc]</t>
  </rtt>
  <body>Hello World...In a Super Long Message! [etc]</body>
</message>

The message MAY be split into two separate message transmissions:

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='dda'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='95'>
    <t>Hello World...In a Super Long Message! [etc]</t>
  </rtt>
</message>

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='ddb'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='96' />
  <body>Hello World...In a Super Long Message! [etc]</body>
</message>

6.4.2 Performance

The user interface display of the real-time text output, is usually the main performance bottleneck. Care should be taken not to do inefficient entire-screen repainting during every single key press, since fast typists may type over 10 key presses per second. This is especially important for slower platforms. To improve performance, the display of real-time messages may need to be implemented as a separate display element, rather than as a string concatenated to the current message history, so that the display can efficiently be refreshed every key press.

6.4.3 Battery Life

Battery life considerations are closely related to Performance. The addition of real-time text to a mobile device, will typically significantly impact battery life due mostly to more frequent screen refreshes. The specific implementation of interval action elements (play back of key press intervals) may play a factor, and this should be programmed efficiently. Also, in cases where screen updates are the primary inefficient bottleneck on a specific mobile device, and the code cannot be sufficiently optimized, the number of repaints per second may need to be throttled in order to prolong battery life, at the slight expense of the look-and-feel of typing transmissions. Also see XMPP on Mobile Devices [11].

7. Use Cases

The first examples are deliberately kept simple instead of real-world, and are designed to educate in a progressively more difficult manner. For simplicity, most examples do not include the RECOMMENDED Key Press Intervals except for the last Use Case example. For real-world communications in software implementations supporting key press intervals, most transmissions will tend to resemble the last Use Case example, Real World Message With Key Press Intervals.

7.1 Three Backspaces

<message to='bob@example.com' from='alice@example.com/home' id='a01' type='chat'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello bcak</t><e/><e/><e/><t>ack</t>
  </rtt>
</message>

Resulting real-time message: "Hello back"
This code sends the misspelled "Hello bcak", then <e/><e/><e/> backspaces 3 times, then sends "ack".

7.2 Three Backspaces In One Action Element

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello bcak</t><e n='3'/><t>ack</t>
  </rtt>
</message>

Resulting real-time message: "Hello back"
This code is the same as the previous example, demonstrating that <e n='3'/> does the same thing as <e/><e/><e/>.

7.3 Message Edits Split Into Multiple Transmissions

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello</t>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='b02'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='1'>
    <t> bcak</t>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='c03'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='2'>
    <e n='3'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='d04'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='3'>
    <t>ack</t>
  </rtt>
</message>

Resulting real-time message: "Hello back"
This code results in the same final text as the previous two examples, segmented into four separate messages.

7.4 Deleting Text From Message

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello Bob, this is Alice!</t><d n='4' p='5'/>
  </rtt>
</message>

Resulting real-time message: "Hello, this is Alice!"
This code outputs "Hello Bob, this is Alice!" then <d n='4' p='5'/> deletes 4 characters from position 5.
(This erases the text " Bob" including the preceding space character).

7.5 Inserting Text Into Message

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello, this is Alice!</t><t p='5'> Bob</t>
  </rtt>
</message>

Resulting real-time message: "Hello Bob, this is Alice!"
This is because the code outputs "Hello, this is Alice!" then the <t p='5'> inserts the specified text " Bob" at position 5.

7.6 Deleting And Replacing Text In Message

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello Bob, tihsd is Alice!</t>
    <d p='11' n='5'/>
    <t p='11'>this</t>
  </rtt>
</message>

Resulting real-time message: "Hello Bob, this is Alice!"
This code outputs "Hello Bob, tihsd is Alice!", then <d p='11' n='5'/> deletes 5 characters at position 11 in the string of text. (erases the mistyped word "tihsd"). Finally, <t p='11'>this</t> inserts the text "this" place of the original misspelled word.

7.7 Multiple Message Edits

This is an example message containing multiple consecutive edit actions.

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Helo</t>
    <e/>
    <t>lo...planet</t>
    <e n='6'/>
    <t> World</t>
    <d n='3' p='5'/>
    <t p='5'> there,</t>
    <c p='18'/>
  </rtt>
</message>

Resulting real-time message: "Hello there, World", completed in the following series of steps:

Table 3:

Element Action Real -Time Message Cursor Pos
<t>Helo</t> Output "Helo" Helo 4
<e/> Backspace 1 character from end of line. Hel 3
<t>lo...planet</t> Output "lo...planet" Hello...planet 14
<e n='6'/> Backspace 6 characters from end of line Hello... 8
<t> World</t> Output " World" Hello... World 14
<d n='3' p='5'/> Delete 3 characters at position 5 Hello World 5
<t p='5'> there,</t> Output " there," at position 5 Hello there, World 12
<c p='18'/> Move cursor to end Hello there, World 18

Normally, the action elements are split into multiple separate transmissions, with Key Press Intervals added.

7.8 Three Consecutive Messages

Representing a short chat session of three separate messages:
Bob says: "Hello Alice"
Bob says: "This is Bob"
Bob says: "How are you?"

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello</t>
  </rtt>
</message>

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='b02'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='1'>
    <t> Alice</t>
  </rtt>
  <body>Hello Alice</body>
</message>

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='c03'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='2' event='new'>
    <t>This i</t>
  </rtt>
</message>

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='d04'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='3'>
    <t>s Bob</t>
  </rtt>
  <body>This is Bob</body>
</message>

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='e05'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='4' event='new'>
    <t>How a</t>
  </rtt>
</message>

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='f06'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='5'>
    <t>re yo</t>
  </rtt>
</message>

<message to='alice@example.com' from='bob@example.com/home' type='chat' id='g07'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='6'>
    <t>u?</t>
  </rtt>
  <body>How are you?</body>
</message>

This example also illustrates the following:

7.9 Real World Message With Key Press Intervals

This is the most important example. It is a transmission of “Hello there!” with Key Press Intervals. It illustrates a four-second typing sequence:

In between each key press, is Element <w> – Interval to allow the receiving client execute a small pause between action elements, which allows the play back of the typing at its original look-and-feel.

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>H</t>
    <w n='215'/><t>e</t>
    <w n='154'/><t>l</t>
    <w n='251'/><t>l</t>
    <w n='115'/><t>o</t>
    <w n='265'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='b02'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='1'>
    <w n='40'/><t> </t>
    <w n='161'/><t>t</t>
    <w n='237'/><t>e</t>
    <w n='135'/><t>h</t>
    <w n='234'/><t>r</t>
    <w n='193'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='c03'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='2'>
    <w n='109'/><t>e</t>
    <w n='215'/><t>!</t>
    <w n='530'/><c p='11'/>
    <w n='108'/><c p='10'/>
    <w n='38'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='d04'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='3'>
    <w n='109'/><c p='9'>
    <w n='161'/><e p='9'/>
    <w n='150'/><e p='8'/>
    <w n='144'/><t>h</t>
    <w n='209'/><t>e</t>
    <w n='227'/>
  </rtt>
</message>

<message to='bob@example.com' from='alice@example.com/home' type='chat' id='d04'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='4'>
    <w n='445'/><c p='12'>
  </rtt>
  <body>Hello there!</body>
</message>

This real-world example also illustrate the following:

8. Interoperability Considerations

There are other real-time text formats with interoperability considerations relating to the session setup level, the media transport level, and presentation level. For each environment where interoperability is supported, an interoperability specification should be documented that covers addressing, session control, media negotiation and media transcoding.

8.1 RFC 4103 and T.140

One environment for such interoperability considerations is SIP with real-time text (also called Text over IP, or ToIP) as specified in ITU-T T.140 and IETF RFC 4103. One reason for its importance is that this protocol combination is specified by IETF and by regional emergency service organizations to be the protocols supported for IP based real-time emergency calls that support real-time text. Another reason is that SIP is the currently dominating peering protocol between services, and many implementations of real-time text in SIP exist.

Interoperability implies addressing translation, media negotiation and translation, and media transcoding. For media transcoding between this specification and T.140/RFC 4103, the real-time text transcoding is straight forward, except the editing feature of this specification. Backwards positioning and insertion or deletion far back in the message can cause a large number of erase operations in T.140, that takes time and bandwidth to convey.

It should be noted that T.140 specifies use of ISO 6429 control codes for presentation characteristics such as text color etc, that are not possible to represent in plain text according to this specification. All control codes from both sides that cannot be presented on the other side of the conversion, must be filtered off in order to not disturb the presentation of text.

Note that a future version of this specification may support real-time transmission of XHTML-IM formatting, in order to support transcoding of ISO 6429 formatting codes.

8.2 Combination With Other Real-Time Media

In some cases, it may be beneficial in a real-time conversation situation to have simultaneous availability of multiple real-time media.

In the XMPP session environment, the Jingle protocol (Jingle [12]) is available for negotiation and transport of the more time-critical, real-time audio and video media. For clients that already support audio and/or video, it is RECOMMENDED to continue providing real-time text according to this specification, regardless of whether audio and/or video is negotiated.

It is noted there is also another real-time text standard (RFC 4103, RFC 5194), used for SIP sessions with real-time text. In the situation where an implementor needs to decide which real-time text standard to use, it is generally recommended to use the real-time text specification of the specific session control standard in use for that particular session. This varies from implementation to implementation. For example, Google Talk network uses XMPP messaging for instant messages sent during audio/video conversations. Therefore, in this situation, it is recommended to use this XMPP extension document to add real-time text functionality. However, there are other situations where it is necessary to support multiple real-time-text standards, and to interoperate between the multiple real-time text standards. For more information, see the next section.

8.2.1 Total Conversation

According to ITU-T F.703 [13], "Total Conversation" defines the simultaneous use of audio, video, and real-time text. For convenience, some chat applications may be designed to have automatic negotiation of as many as possible of the three media preferred by the users.

9. Internationalization Considerations

There are special internationalization considerations involving real-time editing of international text, due to the character positioning and length values used by Action Elements, in the form of p and n attributes. Different programming platforms use different internal Unicode encodings, which may be different from the transmission encoding (UTF-8) for XMPP. To achieve universally correct calculations for p and n attributes, consider these factors:

Failure to correctly calculate p and n values by counting individual Unicode code points, will result in interoperability problems in the form of scrambled text during real-time editing. In some cases, this problem do not become visible until a chat session occurs in a different international language for the first time. It is critical to follow Rules for Attribute Values in order to maintain world-wide interoperability of international text.

10. Security Considerations

10.1 Privacy

It is important for implementors of real-time text to educate users about real-time text. Users of real-time text should be aware that their typing in the local input buffer is now visible to everyone in the current chat conversation. This may have security implications if users copy & paste private information into their chat entry buffer (i.e. a shopping invoice) before editing out the private parts of the pasted text (i.e. a credit card number) before they send the message. With real-time editing, recipients can watch all text changes that occur in the sender's text, before the sender sends the final message.

10.2 Congestion Considerations

The nature of real-time text result in more frequent transmission of <message> elements than may otherwise happen in a non-real-time text conversation. This may lead to increased network and server loading of XMPP networks. Care SHOULD to be taken to use a reasonable Transmission Interval, and avoid transmitting messages at an excessive rate, to avoid creating unnecessary congestion on public XMPP networks. Also, see Best Practices to Discourage Denial of Service Attacks [14].

Network monitoring mechanisms (i.e. Message Delivery Receipts [15] and/or XMPP Ping [16], etc.) MAY be used to monitor reliability and latency, in order to temporarily adjust the interval to prevent failure of real-time text transmissions during extreme network conditions.

That said, the load between participants using this specification in the recommended way, will cause a load that is only marginally higher than a user communicating without this specification. This is very low compared to many other activities possible on XMPP networks including VoIP and file transfers.

11. IANA Considerations

This document requires no interaction with the Internet Assigned Numbers Authority (IANA).

12. XMPP Registrar Considerations

12.1 Protocol Namespaces

The XMPP Registrar should include "urn:xmpp:rtt:0" in its registry of protocol namespaces (see <http://xmpp.org/registrar/namespaces.html>).

12.2 Namespace Versioning

If the protocol defined in this specification undergoes a revision that is not fully backwards-compatible with an older version, the XMPP Registrar shall increment the protocol version number found at the end of the XML namespaces defined herein, as described in Section 4 of XEP-0053.

13. XML Schema

<?xml version='1.0' encoding='UTF-8'?>

<xs:schema
    xmlns:xs='http://www.w3.org/2001/XMLSchema'
    targetNamespace='urn:xmpp:rtt:0'
    xmlns='urn:xmpp:rtt:0'
    elementFormDefault='qualified'>

  <xs:annotation>
    <xs:documentation>
      The protocol documented by this schema is defined in
      XEP-0xxx: http://www.xmpp.org/extensions/xep-0xxx.html
    </xs:documentation>
  </xs:annotation>

  <xs:element name='rtt'>
    <xs:complexType>
      <xs:attribute name='seq' type='xs:unsignedInteger' use='required'/>
      <xs:attribute name='event' type='xs:string' use='optional'/>
      <xs:sequence>
        <xs:element ref='t' minOccurs='0' maxOccurs='unbounded'/>
        <xs:element ref='e' minOccurs='0' maxOccurs='unbounded'/>
        <xs:element ref='d' minOccurs='0' maxOccurs='unbounded'/>
        <xs:element ref='c' minOccurs='0' maxOccurs='unbounded'/>
        <xs:element ref='w' minOccurs='0' maxOccurs='unbounded'/>
        <xs:element ref='g' minOccurs='0' maxOccurs='unbounded'/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name='t' type='xs:string'>
    <xs:complexType>
      <xs:attribute name='p' type='xs:unsignedInteger' use='optional'/>
    </xs:complexType>
  </xs:element>

  <xs:element name='e' type='empty'>
    <xs:complexType>
      <xs:attribute name='p' type='xs:unsignedInteger' use='optional'/>
      <xs:attribute name='n' type='xs:unsignedInteger' use='optional'/>
    </xs:complexType>
  </xs:element>

  <xs:element name='d' type='empty'>
    <xs:complexType>
      <xs:attribute name='p' type='xs:unsignedInteger' use='required'/>
      <xs:attribute name='n' type='xs:unsignedInteger' use='optional'/>
    </xs:complexType>
  </xs:element>

  <xs:element name='c' type='empty'>
    <xs:complexType>
      <xs:attribute name='p' type='xs:unsignedInteger' use='required'/>
    </xs:complexType>
  </xs:element>

  <xs:element name='w' type='empty'>
    <xs:complexType>
      <xs:attribute name='n' type='xs:unsignedInteger' use='required'/>
    </xs:complexType>
  </xs:element>
  
  <xs:element name='g' type='empty'/>

  <xs:simpleType name='empty'>
    <xs:restriction base='xs:string'>
      <xs:enumeration value=''/>
    </xs:restriction>
  </xs:simpleType>

</xs:schema>


Appendices


Appendix A: Document Information

Series: XEP
Number: 0301
Publisher: XMPP Standards Foundation
Status: Experimental
Type: Standards Track
Version: 0.1
Last Updated: 2011-06-29
Approving Body: XMPP Council
Dependencies: XMPP Core, XEP-0020
Supersedes: None
Superseded By: None
Short Name: NOT_YET_ASSIGNED
Source Control: HTML
This document in other formats: XML  PDF


Appendix B: Author Information

Mark Rejhon

Organization: RealJabber.org and Rejhon Technologies Inc.
Email: mark@realjabber.org
JabberID: markybox@gmail.com
URI: http://www.realjabber.com


Appendix C: Legal Notices

Copyright

This XMPP Extension Protocol is copyright © 1999 - 2011 by the XMPP Standards Foundation (XSF).

Permissions

Permission is hereby granted, free of charge, to any person obtaining a copy of this specification (the "Specification"), to make use of the Specification without restriction, including without limitation the rights to implement the Specification in a software program, deploy the Specification in a network service, and copy, modify, merge, publish, translate, distribute, sublicense, or sell copies of the Specification, and to permit persons to whom the Specification is furnished to do so, subject to the condition that the foregoing copyright notice and this permission notice shall be included in all copies or substantial portions of the Specification. Unless separate permission is granted, modified works that are redistributed shall not contain misleading information regarding the authors, title, number, or publisher of the Specification, and shall not claim endorsement of the modified works by the authors, any organization or project to which the authors belong, or the XMPP Standards Foundation.

Disclaimer of Warranty

## NOTE WELL: This Specification is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. ##

Limitation of Liability

In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall the XMPP Standards Foundation or any author of this Specification be liable for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising from, out of, or in connection with the Specification or the implementation, deployment, or other use of the Specification (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if the XMPP Standards Foundation or such author has been advised of the possibility of such damages.

IPR Conformance

This XMPP Extension Protocol has been contributed in full conformance with the XSF's Intellectual Property Rights Policy (a copy of which can be found at <http://xmpp.org/about-xmpp/xsf/xsf-ipr-policy/> or obtained by writing to XMPP Standards Foundation, 1899 Wynkoop Street, Suite 600, Denver, CO 80202 USA).

Appendix D: Relation to XMPP

The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 3920) and XMPP IM (RFC 3921) specifications contributed by the XMPP Standards Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this document has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.


Appendix E: Discussion Venue

The primary venue for discussion of XMPP Extension Protocols is the <standards@xmpp.org> discussion list.

Discussion on other xmpp.org discussion lists might also be appropriate; see <http://xmpp.org/about/discuss.shtml> for a complete list.

Errata can be sent to <editor@xmpp.org>.


Appendix F: Requirements Conformance

The following requirements keywords as used in this document are to be interpreted as described in RFC 2119: "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", "RECOMMENDED"; "SHOULD NOT", "NOT RECOMMENDED"; "MAY", "OPTIONAL".


Appendix G: Notes

1. RealJabber.org is the author's web site containing work related to this specification, including animation examples of what real time text looks like. <http://www.realjabber.org>.

2. AOL AIM Real Time Text: <http://help.aol.com/help/microsites/microsite.do?cmd=displayKC&externalId=223568>.

3. IETF RFC 4103: RTP Payload for Text Conversation. <http://tools.ietf.org/html/rfc4103>.

4. ITU-T T.140: Protocol for multimedia application text conversation. <http://www.itu.int/rec/T-REC-T.140>.

5. RFC 6120: Extensible Messaging and Presence Protocol (XMPP): Core <http://tools.ietf.org/html/rfc6120>.

6. IETF RFC 5194: Framework for Real-Time Text over IP Using the Session Initiation Protocol (SIP). <http://tools.ietf.org/html/rfc5194>.

7. XML: Extensible Markup Language 1.0 (Fifth Edition). <http://www.w3.org/TR/xml/>.

8. XEP-0224: Attention <http://xmpp.org/extensions/xep-0224.html>.

9. XEP-0030: Service Discovery <http://xmpp.org/extensions/xep-0030.html>.

10. XEP-0206: XMPP Over BOSH <http://xmpp.org/extensions/xep-0206.html>.

11. XEP-0286: XMPP on Mobile Devices <http://xmpp.org/extensions/xep-0286.html>.

12. XEP-0166: Jingle <http://xmpp.org/extensions/xep-0166.html>.

13. ITU-T F.703: Multimedia conversational services. <http://www.itu.int/rec/T-REC-F.703>.

14. XEP-0205: Best Practices to Discourage Denial of Service Attacks <http://xmpp.org/extensions/xep-0205.html>.

15. XEP-0184: Message Delivery Receipts <http://xmpp.org/extensions/xep-0184.html>.

16. XEP-0199: XMPP Ping <http://xmpp.org/extensions/xep-0199.html>.


Appendix H: Revision History

Note: Older versions of this specification might be available at http://xmpp.org/extensions/attic/

Version 0.1 (2011-06-29)

Initial published version.

(psa)

Version 0.0.3 (2011-06-24)

Third draft, minor edits.

(MDR)

Version 0.0.2 (2011-06-15)

Second draft.

(MDR)

Version 0.0.1 (2011-02-21)

First draft.

(MDR)

END