File sharing in XMPP has mainly been addressed by synchronous solutions like SI File Transfer (XEP-0096)  and Jingle File Transfer (XEP-0234) . However, these extensions only address the transfer of files and there is more to file sharing than the simple transfer of the data.
Extentions that go beyond the simple transfer of data are File Information Sharing (XEP-0329)  and HTTP File Upload (XEP-0363) . XEP-0329 allows sharing folder structures to other users, allowing them to browse the remote folder and fetch interesting files using existing file-transfer protocols. XEP-0363 describes a protocol to ask a server component for a HTTP storage URL where a client can use HTTP PUT to save a file to and afterwards share the public URL with other users to share the file. While this provides some form of asynchronus file sharing it does not provide integrity protection and requires a server component.
This proposal aims to provide a protocol that will enable XMPP clients to implement a great user experience (UX) around the process of sharing media in conversations. Shared media can take any form of static media like photos, videos, documents, compresses archives, etc. This is directly refelected in the requirents of this extension lined out in the following sections.
The state of sharing media with chat partners in the XMPP community is a protocol zoo in 2016. There are three major protocols for sharing media in XMPP.
Bits of Binary (XEP-0231)  is designed for small media, i.e. less than 8 KB in size, that is hosted server-side and transferred Base64 encoded in-band of an existing XMPP stream. Example use-cases are custom emoticons that are referenced in XHTML-IM (XEP-0071)  img-tags, or thumbnails for Jingle File Transfer (XEP-0234) .
Jingle File Transfer (XEP-0234)  describes a peer-to-peer protocol for synchronous file-transfer between two XMPP entities. It attempts a direct transmission, followed by a proxied transmission, via Jingle SOCKS5 Bytestreams Transport Method (XEP-0260) . If neither works it will fallback to Jingle In-Band Bytestreams Transport Method (XEP-0261)  which will transfer the data inband of the exsiting XMPP stream.
HTTP File Upload (XEP-0363)  was designed as a simpler to implement alternative to Bits of Binary (XEP-0231) . This is achieved by reusing the HTTP APIs in todays mobile and language SDKs. It requires a server component where clients can request HTTP URLs to upload data to and share the corresponding download URL as part of plain text in a conversation.
|Protocol||File Size Limit||Integrity Verification||Transport||Multi Receiver Support||Server Support||Resumption|
|Bits of Binary (XEP-0231) ||8 KB||Yes||Inband||Yes||Required||No|
|Jingle File Transfer (XEP-0234) ||No||Yes||Inband/Direct/Proxy||No||Optional||Yes|
|HTTP File Upload (XEP-0363) ||Service Dependent||No||Outband (HTTP)||Yes||Required||Download only|
To share a photo, or any kind of media, a user sends a message stanza to the contact. If the message has an empty body, it is recommended to add a message processing hint, see Message Processing Hints (XEP-0334) , to indicate the message to be stored in message stores like Message Archive Management (XEP-0313) .
The file element is the same as from Jingle File Transfer (XEP-0234) . It MUST specify media-type, size, description, and one or multiple hash elements as described in Use of Cryptographic Hash Functions in XMPP (XEP-0300) . The hash elements are essential as they provide end-to-end file integrity and allow efficient caching and flexible retrieval methods.
On receive of a reference to a <media-sharing> element inside a message, a client SHOULD lookup in a local storage, whether the media with any of the proivded hashes has already been retrieved and is available. In that case no transfer needs to be initated and the image can be displayed in-line of the chat.
If the media file is not available locally, the media file can be obtained by one of the references in the <sources> element. If a client support HTTP downloads, it can simply download HTTP references.
If not, it can fetch the media file via a Publishing Available Jingle Sessions (XEP-0358)  URI reference in the sources and initiate a Jingle File-Transfer. If the client does not support Publishing Available Jingle Sessions (XEP-0358) , it can attempt fetching the media file via Jingle File Transfer (XEP-0234)  by using the hash elements in the file element as described in Jingle File-Transfer.
A client MAY retrieve the file from other sources than these mentioned in the sources element. This may be via Jingle File Transfer (XEP-0234)  from the senders' other resources or from a media caching service located at the local service. The standardization of such cache is out of scope for this document.
Regardless of the transport method used to obtain the file, the received content MUST be verified against one of the hashes. If the verification fails, the retrieved content MUST be discarded and retrieval using a different source can be attempted.
This XEP delegates actual transport of the media data to one of the existing file-transfer XEPs. Thus a client supporting this XEP MUST implement Jingle File Transfer (XEP-0234)  and HTTP File Upload (XEP-0363) .
If a users server supports HTTP File Upload (XEP-0363) , it SHOULD upload the file to the service and add the retrieval URL to the <sources> tag, unless the user specifically asked to not store media in the cloud.
Using HTTP File Upload (XEP-0363)  for media file transfer highly increases the UX, since the HTTP server has a higher availability than XMPP end-user clients and can easily handle the load of lots of requests that result from sharing media in Multi-User Chat (XEP-0045)  and Mediated Information eXchange (MIX) (XEP-0369)  rooms.
Sharing the raw data of media does not provide a complete user experience. Clients ideally need to be able to display the media inline of the chat. For this we set baseline requirements for audio, video and picture formats, that a client supports to display. These requrirements are shown in the following table.
A client usually will always send in one format per media type, if it creates that media itself.
|Media Type||Mime Type||Format/Container||Codec||Requirement||Comment|
|Audio||audio/m4a||MPEG4||AAC||SHOULD||Can be encoded/decoded by stock Android and iOS systems.|
|Image||image/jpeg||-||JPEG||SHOULD||Supported on common desktop and mobile systems. Use for photos.|
|Image||image/png||-||PNG||SHOULD||Supported on common desktop and mobile systems. Use for non-photos.|
|Video||image/gif||-||GIF||SHOULD||Widespread history animation format supported everywhere.|
|Video||video/mp4||MPEG4||H.264 AVC||SHOULD||Can be encoded/decoded by stock Android and iOS systems.|
Depending on the size of the shared media, a client MAY want to automatically download and display the media instead of fetching and displaying the thumbnail. The size threshold depends on the network environment the client currently runs in.
If a client supports automatic retrieval it MUST disclose this feature to the end user and provide a way to disable it, as it may result in high network traffic.
In cases where media is shared in a Multi-User Chat (XEP-0045)  or Mediated Information eXchange (MIX) (XEP-0369)  room the sender has to expect that a large number of clients may retrieve the shared media automatically. Ideally multiple sources, including HTTP or other high availability sources, are provided in the <sources> tag of the <media-sharing> tag in case the media is shared in a MUC/MIX room.
TODO: Describe protocol for MIX members to advertise media availabililty to peers in a dedicated MIX channel PubSub node. Maybe as a dedicated XEP.
For the media sharing described in this XEP to work, it is REQUIRED for MAM to store the whole stanza instead of only the body content. If the MAM component of the user's server strips away the <media-sharing> tag, any shared media will be missing in archived messages.
To refer to shared media in a XHTML-IM message, this XEP takes advantage of the requirement for hash elements in the file metadata and RFC 6920  and its ni URI format. Using the URI format, XHTML-IM can easily refer to media that is attached to a message via a <media-sharing> element, as shown in the following example.
This way the client can aquire the content addressable resource mentioned in the img-tag in the XHTML-IM message, and when finished show in in the rendered XHTML-IM message.
The size element in the file element provides clients to automatically load small files and if not provide the users with a hint on how long a transfer might take.
The OPTIONAL thumbnail element in the file element improves the user experience as it provides further hints for users on whether the file could be of interest to them.
The desc element in the file element is criticial for clients to enable them to provide accessibility to users who use screen readers.
Mobile devices are able to attach the geographic location of where a photo was taken to the photo. It is RECOMMENDED that a client implementing this XEP attempts to detect privacy exposing metadata in media shared and if found provides the user with an option to clear the media of such metadata.
Requiring end-to-end media integrity prevents trival server side optimizations or other processing on shared media as it will change the cryptographic hash of the media file. On the other hand, requring a matching cryptographic hash guarantees that everybody sees the exact same media a user has shared in a group conversation.
Thanks to Kim Alvefur, Emmanuel Gil Peyrot, Kevin Smith, Nicolas Vérité, and Florian Schmaus for their helpful comments.
The XMPP Registrar  includes the following information in its registries.
The XMPP Registrar will include the following namespace in its registry of protocol namespaces at <https://xmpp.org/registrar/namespaces.html>:
REQUIRED for protocol specifications.
This document in other formats: XML PDF
This XMPP Extension Protocol is copyright © 1999 – 2020 by the XMPP Standards Foundation (XSF).
Permission is hereby granted, free of charge, to any person obtaining a copy of this specification (the "Specification"), to make use of the Specification without restriction, including without limitation the rights to implement the Specification in a software program, deploy the Specification in a network service, and copy, modify, merge, publish, translate, distribute, sublicense, or sell copies of the Specification, and to permit persons to whom the Specification is furnished to do so, subject to the condition that the foregoing copyright notice and this permission notice shall be included in all copies or substantial portions of the Specification. Unless separate permission is granted, modified works that are redistributed shall not contain misleading information regarding the authors, title, number, or publisher of the Specification, and shall not claim endorsement of the modified works by the authors, any organization or project to which the authors belong, or the XMPP Standards Foundation.
## NOTE WELL: This Specification is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. ##
In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall the XMPP Standards Foundation or any author of this Specification be liable for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising from, out of, or in connection with the Specification or the implementation, deployment, or other use of the Specification (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if the XMPP Standards Foundation or such author has been advised of the possibility of such damages.
This XMPP Extension Protocol has been contributed in full conformance with the XSF's Intellectual Property Rights Policy (a copy of which can be found at <https://xmpp.org/about/xsf/ipr-policy> or obtained by writing to XMPP Standards Foundation, P.O. Box 787, Parker, CO 80134 USA).
The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 6120) and XMPP IM (RFC 6121) specifications contributed by the XMPP Standards Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this document has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.
The primary venue for discussion of XMPP Extension Protocols is the <firstname.lastname@example.org> discussion list.
Discussion on other xmpp.org discussion lists might also be appropriate; see <http://xmpp.org/about/discuss.shtml> for a complete list.
Errata can be sent to <email@example.com>.
The following requirements keywords as used in this document are to be interpreted as described in RFC 2119: "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", "RECOMMENDED"; "SHOULD NOT", "NOT RECOMMENDED"; "MAY", "OPTIONAL".
17. The XMPP Registrar maintains a list of reserved protocol namespaces as well as registries of parameters used in the context of XMPP extension protocols approved by the XMPP Standards Foundation. For further information, see <https://xmpp.org/registrar/>.
Note: Older versions of this specification might be available at http://xmpp.org/extensions/attic/
Fix reference to XEP-0234.
Use 'urn:xmpp:hashes:2' and 'urn:xmpp:jingle:apps:file-transfer:5'.
Initial version approved by the council.
First draft processed by editor.