XEP-0231: Data Element

This specification defines an XMPP protocol extension for including small bits of binary data in an XML stanza.


NOTICE: This document is currently within Last Call or under consideration by the XMPP Council for advancement to the next stage in the XSF standards process.


Document Information

Series: XEP
Number: 0231
Publisher: XMPP Standards Foundation
Status: Proposed
Type: Standards Track
Version: 0.5
Last Updated: 2008-08-06
Approving Body: XMPP Council
Dependencies: XMPP Core, RFC 2045, RFC 2111, RFC 2965, RFC 4648
Supersedes: None
Superseded By: None
Short Name: NOT_YET_ASSIGNED
Wiki Page: <http://wiki.jabber.org/index.php/Data Element (XEP-0231)>


Author Information

Peter Saint-Andre

JabberID: stpeter@jabber.org
URI: https://stpeter.im/

Pavel Šimerda

JabberID: pavlix@pavlix.net
URI: http://www.pavlix.net/


Legal Notices

Copyright

This XMPP Extension Protocol is copyright (c) 1999 - 2008 by the XMPP Standards Foundation (XSF).

Permissions

Permission is hereby granted, free of charge, to any person obtaining a copy of this specification (the "Specification"), to make use of the Specification without restriction, including without limitation the rights to implement the Specification in a software program, deploy the Specification in a network service, and copy, modify, merge, publish, translate, distribute, sublicense, or sell copies of the Specification, and to permit persons to whom the Specification is furnished to do so, subject to the condition that the foregoing copyright notice and this permission notice shall be included in all copies or substantial portions of the Specification. Unless separate permission is granted, modified works that are redistributed shall not contain misleading information regarding the authors, title, number, or publisher of the Specification, and shall not claim endorsement of the modified works by the authors, any organization or project to which the authors belong, or the XMPP Standards Foundation.

Disclaimer of Warranty

## NOTE WELL: This Specification is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. In no event shall the XMPP Standards Foundation or the authors of this Specification be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the Specification or the implementation, deployment, or other use of the Specification. ##

Limitation of Liability

In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall the XMPP Standards Foundation or any author of this Specification be liable for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising out of the use or inability to use the Specification (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if the XMPP Standards Foundation or such author has been advised of the possibility of such damages.

IPR Conformance

This XMPP Extension Protocol has been contributed in full conformance with the XSF's Intellectual Property Rights Policy (a copy of which may be found at <http://www.xmpp.org/extensions/ipr-policy.shtml> or obtained by writing to XSF, P.O. Box 1641, Denver, CO 80201 USA).

Discussion Venue

The preferred venue for discussion of this document is the Standards discussion list: <http://mail.jabber.org/mailman/listinfo/standards>.

Given that this XMPP Extension Protocol normatively references IETF technologies, discussion on the XSF-IETF list may also be appropriate (see <http://mail.jabber.org/mailman/listinfo/jsf-ietf> for details).

Errata may be sent to <editor@xmpp.org>.

Relation to XMPP

The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 3920) and XMPP IM (RFC 3921) specifications contributed by the XMPP Standards Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this document has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.

Conformance Terms

The following keywords as used in this document are to be interpreted as described in RFC 2119: "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", "RECOMMENDED"; "SHOULD NOT", "NOT RECOMMENDED"; "MAY", "OPTIONAL".


Table of Contents


1. Introduction
2. Format
3. Caching of Data
4. Retrieving Uncached Data
5. Examples
6. Security Considerations
7. IANA Considerations
8. XMPP Registrar Considerations
    8.1. Protocol Namespaces
9. XML Schema
10. Acknowledgements
Notes
Revision History


1. Introduction

Sometimes it is desirable to include a small bit of binary data in an XMPP stanza. Typical use cases might be inclusion of an icon or emoticon in a message, a thumbnail in a file transfer request, a rasterized image in a whiteboarding session, or a small bit of media in a data form. At present, there is no lightweight method for including such data in an XMPP stanza, since existing methods (e.g., In-Band Bytestreams [1]) are designed for larger blobs of data and therefore require some form of negotiation (e.g., via SI File Transfer [2] or Jingle File Transfer [3]). Therefore, this document specifies just such a lightweight method, using a <data/> element that provides semantics similar to the data: URL scheme defined in RFC 2397 [4].

2. Format

The format for including binary data is straightforward: the data is encapsulated as the XML character data of a <data/> element qualified by the 'urn:xmpp:tmp:data-element' namespace (see Protocol Namespaces regarding issuance of one or more permanent namespaces), where the data MUST be encoded as Base64 in accordance with Section 4 of RFC 4648 [5] (note: the Base64 output MUST NOT include whitespace and MUST set the number of pad bits to zero).

The <data/> element MUST be used only to encapsulate small bits of binary data and MUST NOT be used for large data transfers. Naturally the definitions of "small" and "large" are rather loose. In general, the data SHOULD NOT be more than 8 kilobytes, and dedicated file transfer methods (e.g., SOCKS5 Bytestreams [6] or In-Band Bytestreams [7]) SHOULD be used for exchanging blobs of data larger than 8 kilobytes. Naturally, implementations or deployments may impose their own limits.

The following attributes are defined for the <data/> element.

Table 1: Defined Attributes

Attribute Description Inclusion
cid A Content-ID that can be mapped to a cid: URL as specified in RFC 2111 [8]. The 'cid' value MUST be generated so that the local-part is a UUID as specified in RFC 4122 [9] and the domain is the XMPP domain identifier portion of the sending entity's JabberID. RECOMMENDED
max-age A suggestion regarding how long (in seconds) to cache the data; the meaning matches the Max-Age attribute from RFC 2965 [10]. RECOMMENDED when sending a <data/> element containing XML character data
type The value of the 'type' attribute MUST match the syntax specified in RFC 2045 [11]. That is, the value MUST include a top-level media type, the "/" character, and a subtype; in addition, it MAY include one or more optional parameters (e.g., the "audio/ogg" MIME type in the example shown below includes a "codecs" parameter as specified in RFC 4281 [12]). The "type/subtype" string SHOULD be registered in the IANA MIME Media Types Registry [13], but MAY be an unregistered or yet-to-be-registered value. REQUIRED

The following example illustrates the format (line endings are provided for readability only).

Example 1. Data element format

<data xmlns='urn:xmpp:tmp:data-element' 
      cid='f81d4fae-7dec-11d0-a765-00a0c91e6bf6@shakespeare.lit'
      max-age='86400'
      type='image/png'>
  iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAABGdBTUEAALGP
  C/xhBQAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9YGARc5KB0XV+IA
  AAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAF1J
  REFUGNO9zL0NglAAxPEfdLTs4BZM4DIO4C7OwQg2JoQ9LE1exdlYvBBeZ7jq
  ch9//q1uH4TLzw4d6+ErXMMcXuHWxId3KOETnnXXV6MJpcq2MLaI97CER3N0
  vr4MkhoXe0rZigAAAABJRU5ErkJggg==
</data>
  

When the <data/> element is included in an XMPP <message/> or <presence/> stanza, it SHOULD be included as a first-level child of the stanza.

When the <data/> element is included in an XMPP <iq/> stanza for data retrieval, it MUST be included as a first-level child of the stanza.

When the <data/> element is included in an XMPP <iq/> stanza to refer to the data, it MUST be included as a second-level child of the stanza.

3. Caching of Data

When one party sends a data element to another party, it SHOULD NOT include the data itself unless the data is particularly small (e.g., less than 1k) or is ephemeral and therefore will never be used again. Instead, it SHOULD send an empty <data/> element with a 'cid' attribute, then depend on the receiving party to retrieve the data if not cached.

As a hint regarding the suggested period for caching the data, the sending party SHOULD include a 'max-age' attribute whenever it sends a non-empty <data/> element. The semantics of the 'max-age' attribute exactly matches that of the Max-Age attribute from RFC 2965.

It is RECOMMENDED for the receiving entity to cache data; however, the receiving entity MAY opt not to cache data, for example because it runs on an a device that does not have sufficient space for data storage.

The default behavior is for the receiving entity to cache the data only for the life of the entity's application session (not a controlling user's presence session with the server or the controlling user's communication session with the contact from whom the user received the data); that is, the receiving entity would clear the cache when the application is terminated or restarted.

If it is not suggested to cache the data (e.g., because it is ephemeral), the value of the 'max-age' attribute MUST be "0" (the number zero).

A receiving entity MUST cache based on the JID of the sending entity; this helps to prevent certain data poisoning attacks.

4. Retrieving Uncached Data

Data can be requested and transferred using the XMPP <iq/> stanza type by making reference to the 'cid' of the data to be retrieved. In particular, the requesting entity can request data by sending an IQ-get containing an empty <data/> element with a 'cid' attribute.

Example 2. Requesting data

<iq from='doctor@shakespeare.lit/pda'
    id='get-data-1'
    to='gentlewoman@shakespeare.lit/phone'
    type='get'>
  <data xmlns='urn:xmpp:tmp:data-element' 
        cid='f81d4fae-7dec-11d0-a765-00a0c91e6bf6@shakespeare.lit'/>
</iq>
  

The responding entity then would either return an error (e.g., <item-not-found/> if it does not have data matching the Content-ID) or return the data.

Example 3. Returning data

<iq from='gentlewoman@shakespeare.lit/phone'
    id='get-data-1'
    to='doctor@shakespeare.lit/pda'
    type='result'>
  <data xmlns='urn:xmpp:tmp:data-element' 
        cid='f81d4fae-7dec-11d0-a765-00a0c91e6bf6@shakespeare.lit'
        max-age='86400'
        type='image/png'>
    iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAABGdBTUEAALGP
    C/xhBQAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9YGARc5KB0XV+IA
    AAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAF1J
    REFUGNO9zL0NglAAxPEfdLTs4BZM4DIO4C7OwQg2JoQ9LE1exdlYvBBeZ7jq
    ch9//q1uH4TLzw4d6+ErXMMcXuHWxId3KOETnnXXV6MJpcq2MLaI97CER3N0
    vr4MkhoXe0rZigAAAABJRU5ErkJggg==
  </data>
</iq>
  

This specification does not place limits on the entities from which data can be requested. In particular, such an entity need not be the "owner" of the data (e.g., it could be a peer in a chatroom or whiteboarding session, or the chatroom or whiteboarding service itself).

In addition, bits of data could be hosted by XMPP servers, distributed via Publish-Subscribe [14] nodes, or included in data collections that are available via HTTP (e.g., emoticon sets). Such data could be identified by the value of the 'cid' attribute, but methods for specifying those values are out of scope for this specification.

5. Examples

As an example, consider the use of the <data/> element in conjunction with XHTML-IM [15]. Here the cid: URL scheme points to a data element within a <message/> stanza.

Example 4. A message with included data

<message from='ladymacbeth@shakespeare.lit/castle'
         to='macbeth@chat.shakespeare.lit'
         type='groupchat'>
  <body>Yet here's a spot.</body>
  <html xmlns='http://jabber.org/protocol/xhtml-im'>
    <body xmlns='http://www.w3.org/1999/xhtml'>
      <p>
        Yet here's a spot.
        <img alt='A spot'
             src='cid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6@shakespeare.lit'/>
      </p>
    </body>
  </html>
  <data xmlns='urn:xmpp:tmp:data-element' 
        cid='f81d4fae-7dec-11d0-a765-00a0c91e6bf6@shakespeare.lit'
        max-age='86400'
        type='image/png'/>
</message>
  

Once the data element is communicated, a subsequent message in the same session can refer to the data again (via a cid: URI) without including the data element itself.

Example 5. A message with referenced data

<message from='ladymacbeth@shakespeare.lit/castle'
         to='macbeth@chat.shakespeare.lit'
         type='groupchat'>
  <body>Out, damned spot!</body>
  <html xmlns='http://jabber.org/protocol/xhtml-im'>
    <body xmlns='http://www.w3.org/1999/xhtml'>
      <p>
        Out damned spot! 
        <img alt='A spot'
             src='cid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6@shakespeare.lit'/>
      </p>
    </body>
  </html>
</message>
  

If the receiving entity has not cached the data, it can request the data as described in the Retrieving Data section of this document.

6. Security Considerations

The ability to include arbitrary binary data implies that it is possible to send scripts, applets, images, and executable code, which may be potentially harmful. To reduce the risk of such exposure, an implementation MAY choose to not display or process such data but instead either completely ignore the data, show only the value of the 'alt' attribute, or prompt a human user for approval (either explicitly via user action or implicitly via a list of approved entities from whom the user will accept binary data without per-event approval).

The receiving entity SHOULD cache data based on the sender's JabberID; this helps to avoid data poisoning attacks.

7. IANA Considerations

This document requires no interaction with the Internet Assigned Numbers Authority (IANA) [16].

8. XMPP Registrar Considerations

8.1 Protocol Namespaces

Until this specification advances to a status of Draft, its associated namespace shall be "urn:xmpp:tmp:data-element"; upon advancement of this specification, the XMPP Registrar [17] shall issue a permanent namespace in accordance with the process defined in Section 4 of XMPP Registrar Function [18].

9. XML Schema

<?xml version='1.0' encoding='UTF-8'?>

<xs:schema
    xmlns:xs='http://www.w3.org/2001/XMLSchema'
    targetNamespace='urn:xmpp:tmp:data-element'
    xmlns='urn:xmpp:tmp:data-element'
    elementFormDefault='qualified'>

  <xs:element name='data'>
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base='xs:base64Binary'>
          <xs:attribute name='cid' type='xs:string' use='optional'/>
          <xs:attribute name='max-age' type='xs:nonNegativeInteger' use='optional'/>
          <xs:attribute name='type' type='xs:string' use='required'/>
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>

</xs:schema>
  

10. Acknowledgements

Thanks to Rachel Blackman, Dave Cridland, and Tomasz Sterna for their feedback.


Notes

1. XEP-0047: In-Band Bytestreams <http://www.xmpp.org/extensions/xep-0047.html>.

2. XEP-0096: SI File Transfer <http://www.xmpp.org/extensions/xep-0096.html>.

3. XEP-0234: Jingle File Transfer <http://www.xmpp.org/extensions/xep-0234.html>.

4. RFC 2397: The data: URL scheme <http://tools.ietf.org/html/rfc2397>.

5. RFC 4648: The Base16, Base32, and Base64 Data Encodings <http://tools.ietf.org/html/rfc4648>.

6. XEP-0065: SOCKS5 Bytestreams <http://www.xmpp.org/extensions/xep-0065.html>.

7. XEP-0047: In-Band Bytestreams <http://www.xmpp.org/extensions/xep-0047.html>.

8. RFC 2111: Content-ID and Message-ID Uniform Resource Locators <http://tools.ietf.org/html/rfc2111>.

9. RFC 4122: A Universally Unique IDentifier (UUID) URN Namespace <http://tools.ietf.org/html/rfc4122>.

10. RFC 2965: HTTP State Management Mechanism <http://tools.ietf.org/html/rfc2965>.

11. RFC 2045: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies <http://tools.ietf.org/html/rfc2045>.

12. RFC 4281: The Codecs Parameter for "Bucket" Media Types <http://tools.ietf.org/html/rfc4281>.

13. IANA registry of MIME media types <http://www.iana.org/assignments/media-types>.

14. XEP-0060: Publish-Subscribe <http://www.xmpp.org/extensions/xep-0060.html>.

15. XEP-0071: XHTML-IM <http://www.xmpp.org/extensions/xep-0071.html>.

16. The Internet Assigned Numbers Authority (IANA) is the central coordinator for the assignment of unique parameter values for Internet protocols, such as port numbers and URI schemes. For further information, see <http://www.iana.org/>.

17. The XMPP Registrar maintains a list of reserved protocol namespaces as well as registries of parameters used in the context of XMPP extension protocols approved by the XMPP Standards Foundation. For further information, see <http://www.xmpp.org/registrar/>.

18. XEP-0053: XMPP Registrar Function <http://www.xmpp.org/extensions/xep-0053.html>.


Revision History

Version 0.5 (2008-08-06)

Removed alt attribute; more clearly specified where to include the data element in message, presence, and IQ stanzas; moved use cases to other specifications; removed service discovery features; modified examples.

(psa/ps)

Version 0.4 (2008-08-05)

Generalized text regarding inclusion of parameters in type attribute per RFC 2045; added max-age attribute, matching semantics from RFC 2965; added section on caching of data; more clearly specified generation of Content-ID.

(psa/ps)

Version 0.3 (2008-06-18)

Allowed inclusion of codecs parameter in type attribute per RFC 4281.

(psa)

Version 0.2 (2008-05-29)

Added service discovery feature for in-band message images use case.

(psa)

Version 0.1 (2008-01-30)

Initial published version.

(psa)

Version 0.0.4 (2008-01-29)

Separately described service discovery feature for inclusion of the data element in file previews.

(psa)

Version 0.0.3 (2007-12-27)

Described use cases for previewing data to be exchanged in file transfers and for inclusion of media information in data forms.

(psa)

Version 0.0.2 (2007-12-18)

Changed syntax to not use data: URL scheme; added cid and type attributes; described use cases for messaging and data retrieval.

(psa)

Version 0.0.1 (2007-11-09)

First draft.

(psa)

END