XEP-0231: Bits of Binary

Abstract
This specification defines an XMPP protocol extension for including or referring to small bits of binary data in an XML stanza.
Authors
  • Peter Saint-Andre
  • Pavel Šimerda
Copyright
© 2007 – 2022 XMPP Standards Foundation. SEE LEGAL NOTICES.
Status

Stable

NOTICE: The protocol defined herein is a Stable Standard of the XMPP Standards Foundation. Implementations are encouraged and the protocol is appropriate for deployment in production systems, but some changes to the protocol are possible before it becomes a Final Standard.
Type
Standards Track
Version
1.1 (2022-07-25)
Document Lifecycle
  1. Experimental
  2. Proposed
  3. Stable
  4. Final

1. Introduction

Sometimes it is desirable to include a small bit of binary data in an XMPP stanza. Typical use cases might be to include icon or emoticon in a message, a thumbnail in a file transfer request, a rasterized image in a whiteboarding session, or a small bit of media in a data form. Currently, there is no lightweight method for including such data in an XMPP stanza, since existing methods (e.g., In-Band Bytestreams (XEP-0047) [1]) are designed for larger blobs of data and therefore require some form of negotiation (e.g., via SI File Transfer (XEP-0096) [2] or Jingle File Transfer (XEP-0234) [3]).

This document specifies just such a lightweight method. The key building blocks are:

  1. A Content-ID ("cid") that uniquely identifies the data.
  2. A <data/> element (similar to the data: URL scheme defined in RFC 2397 [4]) that enables the sender and recipient to exchange the data identified by the cid.

2. Protocol

2.1 Data Exchange

The RECOMMENDED approach is for the sender to include the cid when communicating with the recipient. The recipient SHOULD then check its cache of data to determine if the data identified by that cid is cached. If the data is cached, the recipient would then load its cached data. If the data is not cached, the recipient would then retrieve the data by sending an IQ-get to the sender (or potentially some other entity) containing an empty <data/> element whose 'cid' attribute specifies the data to be retrieved, to which the sender would reply with an IQ-result containing a <data/> element whose XML character data provides the binary data.

The <data/> element MUST be used only to encapsulate small bits of binary data and MUST NOT be used for large data transfers. Naturally the definitions of "small" and "large" are rather loose. In general, the data SHOULD NOT be more than 8 kilobytes, and dedicated file transfer methods (e.g., SOCKS5 Bytestreams (XEP-0065) [5] or In-Band Bytestreams (XEP-0047) [1]) SHOULD be used for exchanging blobs of data larger than 8 kilobytes. However, implementations or deployments MAY impose their own limits.

If the data to be shared is particularly small (e.g., less than 1k), then the sender MAY send it directly by including a <data/> element directly in a <message/>, <presence/>, or <iq/> stanza. The following rules apply:

  1. When the <data/> element is directly included in an XMPP <message/> or <presence/> stanza, it SHOULD be a first-level child of the stanza.
  2. When the <data/> element is directly included in an XMPP <iq/> stanza, it MUST be a child of the appropriate first-level child (since the IQ stanza must not include more than one first-level child).
  3. When the <data/> element is used to retrieve the data from the sender as described under Retrieving Uncached Data, it MUST be a first-level child of the stanza.

2.2 Referencing Data

The sender can refer to data that it hosts by including a cid in the data it sends. The following example shows how to include the cid in XHTML-IM (XEP-0071) [6] but any appropriate format can be used, such as Data Forms Media Element (XEP-0221) [7].

Example 1. An XHTML-IM message with a cid
<message from='ladymacbeth@shakespeare.lit/castle'
         to='macbeth@chat.shakespeare.lit'
         type='groupchat'>
  <body>Yet here's a spot.</body>
  <html xmlns='http://jabber.org/protocol/xhtml-im'>
    <body xmlns='http://www.w3.org/1999/xhtml'>
      <p>
        Yet here's a spot.
        <img alt='A spot'
             src='cid:sha1+8f35fef110ffc5df08d579a50083ff9308fb6242@bob.xmpp.org'/>
      </p>
    </body>
  </html>
</message>

The recipient can then retrieve the data from the sender as described in the next section.

2.3 Retrieving Uncached Data

Data is requested and transferred using the XMPP <iq/> stanza type by making reference to the cid. In particular, the recipient requests the binary data by sending an IQ-get containing an empty <data/> element with a 'cid' attribute that matches the cid URI previously communicated.

Example 2. Requesting data
<iq from='doctor@shakespeare.lit/pda'
    id='get-data-1'
    to='ladymacbeth@shakespeare.lit/castle'
    type='get'>
  <data xmlns='urn:xmpp:bob'
        cid='sha1+8f35fef110ffc5df08d579a50083ff9308fb6242@bob.xmpp.org'/>
</iq>

The recipient then would either return an error (e.g., <item-not-found/> if it does not have data matching the Content-ID) or return the data.

Example 3. Returning data
<iq from='ladymacbeth@shakespeare.lit/castle'
    id='get-data-1'
    to='doctor@shakespeare.lit/pda'
    type='result'>
  <data xmlns='urn:xmpp:bob'
        cid='sha1+8f35fef110ffc5df08d579a50083ff9308fb6242@bob.xmpp.org'
        max-age='86400'
        type='image/png'>
    iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAABGdBTUEAALGP
    C/xhBQAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9YGARc5KB0XV+IA
    AAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAF1J
    REFUGNO9zL0NglAAxPEfdLTs4BZM4DIO4C7OwQg2JoQ9LE1exdlYvBBeZ7jq
    ch9//q1uH4TLzw4d6+ErXMMcXuHWxId3KOETnnXXV6MJpcq2MLaI97CER3N0
    vr4MkhoXe0rZigAAAABJRU5ErkJggg==
  </data>
</iq>

2.4 Caching Data

It is RECOMMENDED for the recipient to cache data; however, the recipient MAY opt not to cache data, for example because it runs on a device that does not have sufficient space for data storage.

The default behavior is for the recipient to cache the data only for the life of the entity's application session (not a client's presence session with the server or the controlling user's communication session with the contact from whom the user received the data); that is, the recipient would clear the cache when the application is terminated or restarted.

As a hint regarding the suggested period for caching the data, the sender MAY include a 'max-age' attribute whenever it sends a <data/> element. The meaning of the 'max-age' attribute exactly matches that of the Max-Age attribute from RFC 2965 [8].

If it is not suggested to cache the data (e.g., because it is ephemeral), the value of the 'max-age' attribute MUST be "0" (the number zero).

A recipient SHOULD cache data based on the hash of the data as encapsulated in the cid. However, if a hash cannot be extracted from the cid, if the recipient does not support the hashing algorithm used, or the recipient does not support hashes, then the recipient SHOULD cache based on the JID of the sender.

2.5 Format of the <data/> Element

To exchange binary data, the data is encapsulated as the XML character data of a <data/> element qualified by the 'urn:xmpp:bob' namespace, where the data MUST be encoded as Base64 in accordance with Section 4 of RFC 4648 [9] (note: the Base64 output MUST NOT include whitespace and MUST set the number of pad bits to zero).

The following attributes are defined for the <data/> element.

Table 1: Attributes of the data Element
Attribute Description Inclusion
cid A Content-ID that can be mapped to a cid: URL as specified in RFC 2111 [10]. The 'cid' value SHOULD be of the form algo+hash@bob.xmpp.org, where the "algo" is the hashing algorithm used (e.g., "sha1" for the SHA-1 algorithm as specified in RFC 3174 [11], for more information see the next section) and the "hash" is the hex output of the algorithm applied to the binary data itself. REQUIRED
max-age A suggestion regarding how long (in seconds) to cache the data; the meaning matches the Max-Age attribute from RFC 2965 [8]. RECOMMENDED
type The value of the 'type' attribute MUST match the syntax specified in RFC 2045 [12]. That is, the value MUST include a top-level media type, the "/" character, and a subtype; in addition, it MAY include one or more optional parameters (e.g., the "audio/ogg" MIME type in the example shown below includes a "codecs" parameter as specified in RFC 4281 [13]). The "type/subtype" string SHOULD be registered in the IANA MIME Media Types Registry [14], but MAY be an unregistered or yet-to-be-registered value. REQUIRED if the <data/> element is non-empty

The following example illustrates the format (line endings are provided for readability only).

Example 4. Data element format
<data xmlns='urn:xmpp:bob'
      cid='sha1+8f35fef110ffc5df08d579a50083ff9308fb6242@bob.xmpp.org'
      max-age='86400'
      type='image/png'>
  iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAABGdBTUEAALGP
  C/xhBQAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9YGARc5KB0XV+IA
  AAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAF1J
  REFUGNO9zL0NglAAxPEfdLTs4BZM4DIO4C7OwQg2JoQ9LE1exdlYvBBeZ7jq
  ch9//q1uH4TLzw4d6+ErXMMcXuHWxId3KOETnnXXV6MJpcq2MLaI97CER3N0
  vr4MkhoXe0rZigAAAABJRU5ErkJggg==
</data>

2.6 Algorithm Names

The value of the 'algo' parameter MUST be one of the values from the IANA Hash Function Textual Names Registry [15] maintained by the Internet Assigned Numbers Authority (IANA) [16], or one of the values defined in Use of Cryptographic Hash Functions in XMPP (XEP-0300) [17] unless the hash is SHA-1 in which case the label "sha1" MUST be used for historical reasons.

3. Determining Support

If an entity supports the protocol specified herein, it MUST advertise that fact by returning a feature of "urn:xmpp:bob" in response to Service Discovery (XEP-0030) [18] information requests.

Example 5. Service discovery information request
<iq from='doctor@shakespeare.lit/pda'
    id='disco1'
    to='ladymacbeth@shakespeare.lit/castle'
    type='get'>
  <query xmlns='http://jabber.org/protocol/disco#info'/>
</iq>
Example 6. Service discovery information response
<iq from='ladymacbeth@shakespeare.lit/castle'
    id='disco1'
    to='doctor@shakespeare.lit/pda'
    type='result'>
  <query xmlns='http://jabber.org/protocol/disco#info'>
    ...
    <feature var='urn:xmpp:bob'/>
    ...
  </query>
</iq>

In order for an application to determine whether an entity supports this protocol, where possible it SHOULD use the dynamic, presence-based profile of service discovery defined in Entity Capabilities (XEP-0115) [19]. However, if an application has not received entity capabilities information from an entity, it SHOULD use explicit service discovery instead.

4. Security Considerations

The ability to include arbitrary binary data implies that it is possible to send scripts, applets, images, and executable code, which may be potentially harmful. To reduce the risk of such exposure, an implementation MAY choose to not display or process such data but instead either completely ignore the data, show only the value of the 'alt' attribute, or prompt a human user for approval (either explicitly via user action or implicitly via a list of approved entities from whom the user will accept binary data without per-event approval).

5. IANA Considerations

This document requires no interaction with the Internet Assigned Numbers Authority (IANA) [16].

6. XMPP Registrar Considerations

6.1 Protocol Namespaces

The XMPP Registrar [20] includes "urn:xmpp:bob" in its registry of protocol namespaces (see <https://xmpp.org/registrar/namespaces.html>).

7. XML Schema

<?xml version='1.0' encoding='UTF-8'?>

<xs:schema
    xmlns:xs='http://www.w3.org/2001/XMLSchema'
    targetNamespace='urn:xmpp:bob'
    xmlns='urn:xmpp:bob'
    elementFormDefault='qualified'>

  <xs:annotation>
    <xs:documentation>
      The protocol documented by this schema is defined in
      XEP-0231: http://www.xmpp.org/extensions/xep-0231.html
    </xs:documentation>
  </xs:annotation>

  <xs:element name='data'>
    <xs:complexType>
      <xs:simpleContent>
        <xs:extension base='xs:base64Binary'>
          <xs:attribute name='cid' type='xs:string' use='required'/>
          <xs:attribute name='max-age' type='xs:nonNegativeInteger' use='optional'/>
          <xs:attribute name='type' type='xs:string' use='optional'/>
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>

</xs:schema>

8. Acknowledgements

Thanks to Rachel Blackman, Dave Cridland, Zenon Kuder, and Tomasz Sterna for their feedback.


Appendices

Appendix A: Document Information

Series
XEP
Number
0231
Publisher
XMPP Standards Foundation
Status
Stable
Type
Standards Track
Version
1.1
Last Updated
2022-07-25
Approving Body
XMPP Council
Dependencies
XMPP Core, RFC 2045, RFC 2111, RFC 2965, RFC 3174, RFC 4648
Supersedes
None
Superseded By
None
Short Name
bob
Schema
<http://www.xmpp.org/schemas/bob.xsd>
Source Control
HTML

This document in other formats: XML  PDF

Appendix B: Author Information

Peter Saint-Andre
Email
stpeter@stpeter.im
JabberID
stpeter@jabber.org
URI
https://stpeter.im/
Pavel Šimerda
JabberID
pavlix@pavlix.net
URI
http://www.pavlix.net/

Copyright

This XMPP Extension Protocol is copyright © 1999 – 2024 by the XMPP Standards Foundation (XSF).

Permissions

Permission is hereby granted, free of charge, to any person obtaining a copy of this specification (the "Specification"), to make use of the Specification without restriction, including without limitation the rights to implement the Specification in a software program, deploy the Specification in a network service, and copy, modify, merge, publish, translate, distribute, sublicense, or sell copies of the Specification, and to permit persons to whom the Specification is furnished to do so, subject to the condition that the foregoing copyright notice and this permission notice shall be included in all copies or substantial portions of the Specification. Unless separate permission is granted, modified works that are redistributed shall not contain misleading information regarding the authors, title, number, or publisher of the Specification, and shall not claim endorsement of the modified works by the authors, any organization or project to which the authors belong, or the XMPP Standards Foundation.

Disclaimer of Warranty

## NOTE WELL: This Specification is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. ##

Limitation of Liability

In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall the XMPP Standards Foundation or any author of this Specification be liable for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising from, out of, or in connection with the Specification or the implementation, deployment, or other use of the Specification (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if the XMPP Standards Foundation or such author has been advised of the possibility of such damages.

IPR Conformance

This XMPP Extension Protocol has been contributed in full conformance with the XSF's Intellectual Property Rights Policy (a copy of which can be found at <https://xmpp.org/about/xsf/ipr-policy> or obtained by writing to XMPP Standards Foundation, P.O. Box 787, Parker, CO 80134 USA).

Visual Presentation

The HTML representation (you are looking at) is maintained by the XSF. It is based on the YAML CSS Framework, which is licensed under the terms of the CC-BY-SA 2.0 license.

Appendix D: Relation to XMPP

The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 6120) and XMPP IM (RFC 6121) specifications contributed by the XMPP Standards Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this document has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.

Appendix E: Discussion Venue

The primary venue for discussion of XMPP Extension Protocols is the <standards@xmpp.org> discussion list.

Discussion on other xmpp.org discussion lists might also be appropriate; see <https://xmpp.org/community/> for a complete list.

Given that this XMPP Extension Protocol normatively references IETF technologies, discussion on the <xsf-ietf@xmpp.org> list might also be appropriate.

Errata can be sent to <editor@xmpp.org>.

Appendix F: Requirements Conformance

The following requirements keywords as used in this document are to be interpreted as described in RFC 2119: "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", "RECOMMENDED"; "SHOULD NOT", "NOT RECOMMENDED"; "MAY", "OPTIONAL".

Appendix G: Notes

1. XEP-0047: In-Band Bytestreams <https://xmpp.org/extensions/xep-0047.html>.

2. XEP-0096: SI File Transfer <https://xmpp.org/extensions/xep-0096.html>.

3. XEP-0234: Jingle File Transfer <https://xmpp.org/extensions/xep-0234.html>.

4. RFC 2397: The data: URL scheme <http://tools.ietf.org/html/rfc2397>.

5. XEP-0065: SOCKS5 Bytestreams <https://xmpp.org/extensions/xep-0065.html>.

6. XEP-0071: XHTML-IM <https://xmpp.org/extensions/xep-0071.html>.

7. XEP-0221: Data Forms Media Element <https://xmpp.org/extensions/xep-0221.html>.

8. RFC 2965: HTTP State Management Mechanism <http://tools.ietf.org/html/rfc2965>.

9. RFC 4648: The Base16, Base32, and Base64 Data Encodings <http://tools.ietf.org/html/rfc4648>.

10. RFC 2111: Content-ID and Message-ID Uniform Resource Locators <http://tools.ietf.org/html/rfc2111>.

11. RFC 3174: US Secure Hash Algorithm 1 (SHA1) <http://tools.ietf.org/html/rfc3174>.

12. RFC 2045: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies <http://tools.ietf.org/html/rfc2045>.

13. RFC 4281: The Codecs Parameter for "Bucket" Media Types <http://tools.ietf.org/html/rfc4281>.

14. IANA registry of MIME media types <http://www.iana.org/assignments/media-types>.

15. IANA registry of Hash Function Textual Names <http://www.iana.org/assignments/hash-function-text-names>.

16. The Internet Assigned Numbers Authority (IANA) is the central coordinator for the assignment of unique parameter values for Internet protocols, such as port numbers and URI schemes. For further information, see <http://www.iana.org/>.

17. XEP-0300: Use of Cryptographic Hash Functions in XMPP <https://xmpp.org/extensions/xep-0300.html>.

18. XEP-0030: Service Discovery <https://xmpp.org/extensions/xep-0030.html>.

19. XEP-0115: Entity Capabilities <https://xmpp.org/extensions/xep-0115.html>.

20. The XMPP Registrar maintains a list of reserved protocol namespaces as well as registries of parameters used in the context of XMPP extension protocols approved by the XMPP Standards Foundation. For further information, see <https://xmpp.org/registrar/>.

Appendix H: Revision History

Note: Older versions of this specification might be available at https://xmpp.org/extensions/attic/

  1. Version 1.1 (2022-07-25)

    Mention where to get textual names of hash functions.

    ssw
  2. Version 1.0 (2008-09-03)

    Per a vote of the XMPP Council, advanced status to Draft; concurrently, the XMPP Registrar issued the urn:xmpp:bob namespace.

    psa
  3. Version 0.9 (2008-08-16)

    Modified cid generation rules to use a hash of the data instead of a UUID (of the form algo+hash@bob.xmpp.org); modified caching rules to typically base checking on the hash, not the sender JID.

    psa/ps
  4. Version 0.8 (2008-08-07)

    Added section on determining support.

    psa/ps
  5. Version 0.7 (2008-08-06)

    Simplified the protocol; removed fetch element because the cid: URI uniquely identifies the data; changed the name of the protocol to something more catchy.

    psa/ps
  6. Version 0.6 (2008-08-06)

    More clearly described recommended protocol and usage; added fetch element to diambiguate data from reference; cleaned up text throughout.

    psa/ps
  7. Version 0.5 (2008-08-06)

    Removed alt attribute; more clearly specified where to include the data element in message, presence, and IQ stanzas; moved use cases to other specifications; removed service discovery features; modified examples.

    psa/ps
  8. Version 0.4 (2008-08-05)

    Generalized text regarding inclusion of parameters in type attribute per RFC 2045; added max-age attribute, matching semantics from RFC 2965; added section on caching of data; more clearly specified generation of Content-ID.

    psa/ps
  9. Version 0.3 (2008-06-18)

    Allowed inclusion of codecs parameter in type attribute per RFC 4281.

    psa
  10. Version 0.2 (2008-05-29)

    Added service discovery feature for in-band message images use case.

    psa
  11. Version 0.1 (2008-01-30)

    Initial published version.

    psa
  12. Version 0.0.4 (2008-01-29)

    Separately described service discovery feature for inclusion of the data element in file previews.

    psa
  13. Version 0.0.3 (2007-12-27)

    Described use cases for previewing data to be exchanged in file transfers and for inclusion of media information in data forms.

    psa
  14. Version 0.0.2 (2007-12-18)

    Changed syntax to not use data: URL scheme; added cid and type attributes; described use cases for messaging and data retrieval.

    psa
  15. Version 0.0.1 (2007-11-09)

    First draft.

    psa

Appendix I: Bib(La)TeX Entry

@report{saint-andre2007bob,
  title = {Bits of Binary},
  author = {Saint-Andre, Peter and Šimerda, Pavel},
  type = {XEP},
  number = {0231},
  version = {1.1},
  institution = {XMPP Standards Foundation},
  url = {https://xmpp.org/extensions/xep-0231.html},
  date = {2007-11-09/2022-07-25},
}

END