This JEP defines a storage protocol and common disk format for archiving of messages.
WARNING: This Standards-Track JEP is Experimental. Publication as a Jabber Enhancement Proposal does not imply approval of this proposal by the Jabber Software Foundation. Implementation of the protocol described herein is encouraged in exploratory implementations, but production systems should not deploy implementations of this protocol until it advances to a status of Draft.
Status: Experimental
Type: Standards Track
Number: 0136
Version: 0.4
Last Updated: 2005-12-21
JIG: Standards JIG
Approving Body: Jabber Council
Dependencies: XMPP Core, XMPP IM, JEP-0030
Supersedes: None
Superseded By: None
Short Name: archive
Wiki Page: <http://wiki.jabber.org/index.php/Message Archiving (JEP-0136)>
Email: justin@affinix.com
JID: justin@andbit.net
Email: ian.paterson@clientside.co.uk
JID: ian@zoofy.com
This Jabber Enhancement Proposal is copyright 1999 - 2005 by the Jabber Software Foundation (JSF) and is in full conformance with the JSF's Intellectual Property Rights Policy <http://www.jabber.org/jsf/ipr-policy.shtml>. This material may be distributed only subject to the terms and conditions set forth in the Creative Commons Attribution License (<http://creativecommons.org/licenses/by/2.5/>).
The preferred venue for discussion of this document is the Standards-JIG discussion list: <http://mail.jabber.org/mailman/listinfo/standards-jig>.
The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 3920) and XMPP IM (RFC 3921) specifications contributed by the Jabber Software Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this JEP has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Historically, clients have archived messages in local storage. However, that is clearly inconvenient for people who use more than one client machine (home, work, mobile) and whenever people upgrade to a new machine. Furthermore, security and resource limitations often prevent clients that run in constrained environments from accessing (sufficient) local storage.
This specification defines a protocol for storing and retrieving messages on a server. Each storage item consists of a collection of messages. This is usually a message thread. Clients are able to add/update/remove collections from the server. This document also specifies a disk format. This allows clients to share message archive files, in a way similar to email clients sharing common formats like mbox and Maildir.
A server autoarchiving approach would have eliminated the need to submit collections to the server. However, this specification empowers clients instead. This approach enables them to store out-of-band messages like email as well. Also, since end-to-end encryption schemes typically use evanescent keys that are discarded immediately after use, server autoarchived encrypted messages would not be decryptable.
The protocol is designed to minimise the size of collections. This is necessary to mitigate the memory and bandwidth limitations of constrained clients and to alleviate karma issues.
Messages are stored in message collections on the server. The client uniquely specifies a collection using the pair of attributes: 'with' (bare JID with which the messages were exchanged) and 'start' (thread start-time).
The content of each individual message MUST be encapsulated in a <to/> or <from/> element. The time in seconds of the message relative to the start-time of the collection SHOULD be specified with a 'secs' attribute. The content SHOULD include a <body/> element. Other elements MAY be included, but they are NOT RECOMMENDED. To conserve bandwidth and storage, elements scoped by the 'http://jabber.org/protocol/xhtml-im' namespace SHOULD NOT be included. <thread/> elements and elements scoped by the 'jabber:x:delay', 'jabber:x:event' and 'http://jabber.org/protocol/chatstates' namespaces MUST NOT be included.
Complying with XMPP Core, the server MUST respond to all <iq/> elements. However, most 'successful' reponses have been omitted from this document in the interest of conciseness.
All times MUST be in the UTC time zone.
The client discovers whether the server supports this protocol using Service Discovery [1].
<iq type='get' to='montague.net'> <query xmlns='http://jabber.org/protocol/disco#info'/> </iq>
If the server supports this protocol, it MUST return a <feature/> element in the result with the 'var' attribute set to 'http://jabber.org/protocol/archive'.
<iq type='result' from='montague.net' to='romeo@montague.net/orchard'> <query xmlns='http://jabber.org/protocol/disco#info'/> ... <feature var='http://jabber.org/protocol/archive'/> ... </query> </iq>
The messages to be stored are encapsulated in the <store/> element.
<iq type='set' to='montague.net'> <store xmlns='http://jabber.org/protocol/archive' with='juliet@capulet.com' start='1469-07-21T02:56:15Z' subject='She speaks!'> <from secs='0'><body>Art thou not Romeo, and a Montague?</body></from> <to secs='11'><body>Neither, fair saint, if either thee dislike.</body></to> <from secs='14'><body>How cam'st thou hither, tell me, and wherefore?</body></from> </store> </iq>
If the collection does not exist then the server MUST create a new collection. If the collection already exists then the server MUST append the messages to the existing collection.
A friendly name for the collection MAY be specified with a 'subject' attribute. If the collection already has a 'subject' then it is simply replaced. Note the Privacy Considerations for subject attributes.
<iq type='result' to='romeo@montague.net/orchard'/>
If the server cannot service a store request because the collection is too large then it MUST return a Not Acceptable error:
<iq type='error' to='romeo@montague.net/orchard'> <error code='406' type='modify'> <not-acceptable xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/> </error> </iq>
The client MAY specify an absolute time for any message by providing a longer 'utc' attribute instead of a 'secs' attribute:
<iq type='set' to='montague.net'> <store xmlns='http://jabber.org/protocol/archive' with='juliet@capulet.com' start='1469-07-21T02:56:15Z' subject='She speaks!'> <from utc='1469-07-21T00:32:29Z'><body>Art thou not Romeo, and a Montague?</body></from> <to secs='11'><body>Neither, fair saint, if either thee dislike.</body></to> <from secs='14'><body>How cam'st thou hither, tell me, and wherefore?</body></from> </store> </iq>
The client SHOULD include the 'name' attribute to specify the 'resource' of all messages that it received from a room:
<iq type='set' to='montague.net'> <store xmlns='http://jabber.org/protocol/archive' with='balcony@house.capulet.com' start='1469-07-21T03:16:37Z'> <from secs='0' name='benvolio'><body>She will indite him to some supper.</body></from> <from secs='5' name='mercutio'><body>A bawd, a bawd, a bawd! So ho!</body></from> <from secs='11' name='romeo'><body>What hast thou found?</body></from> </store> </iq>
The client sends an empty <retrieve/> element to request the download of a collection:
<iq type='get' to='montague.net'> <retrieve xmlns='http://jabber.org/protocol/archive' with='juliet@capulet.com' start='1469-07-21T02:56:15Z'/> </iq>
<iq type='result' to='montague.net'> <store xmlns='http://jabber.org/protocol/archive' with='juliet@capulet.com' start='1469-07-21T02:56:15Z' subject='She speaks!'> <from secs='0'><body>Art thou not Romeo, and a Montague?</body></from> <to secs='11'><body>Neither, fair saint, if either thee dislike.</body></to> <from secs='14'><body>How cam'st thou hither, tell me, and wherefore?</body></from> </store> </iq>
If the collection does not exist then the server MUST return a Not Found error:
<iq type='error' to='romeo@montague.net/orchard'> <error code='404' type='cancel'> <item-not-found xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/> </error> </iq>
To request the removal of a collection the client sends an empty <remove/> element.
<iq type='set' to='montague.net'> <remove xmlns='http://jabber.org/protocol/archive' with='juliet@capulet.com' start='1469-07-21T02:56:15Z'/> </iq>
If the collection does not exist then the server MUST return a Not Found error:
<iq type='error' to='romeo@montague.net/orchard'> <error code='404' type='cancel'> <item-not-found xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/> </error> </iq>
The client may remove several collections at once. The 'start' and 'end' elements MAY be specified to indicate a date range.
<iq type='set' to='montague.net'> <remove xmlns='http://jabber.org/protocol/archive' with='juliet@capulet.com' start='1469-07-21T02:00:00Z' end='1469-07-21T04:00:00Z'/> </iq>
If the 'with' attribute is omitted then collections with any JID are removed.
If the end date is in the future then then all collections after the start date are removed.
<iq type='set' to='montague.net'> <remove xmlns='http://jabber.org/protocol/archive' start='1469-07-21T02:00:00Z' end='2038-01-01T00:00:00Z'/> </iq>
If the start date is before all the collections in the archive then all collections prior to the end date are removed.
<iq type='set' to='montague.net'> <remove xmlns='http://jabber.org/protocol/archive' start='0000-01-01T00:00:00Z' end='1469-07-21T04:00:00Z'/> </iq>
<iq type='set' to='montague.net'> <remove xmlns='http://jabber.org/protocol/archive'/> </iq>
To request a list of collections the client sends an empty <list/> element. The 'start' and 'end' elements MAY be specified to indicate a date range.
If the 'with' attribute is omitted then collections with any JID are returned. If only 'start' is specified then all collections on or after that date should be returned. If only 'end' is specified then all collections prior to that date should be returned.
<iq type='get' to='montague.net'> <list xmlns='http://jabber.org/protocol/archive' with='juliet@capulet.com' start='1469-07-21T02:00:00Z' end='1479-07-21T04:00:00Z'/> </iq>
<iq type='get' to='montague.net'> <list xmlns='http://jabber.org/protocol/archive' with='juliet@capulet.com'/> </iq>
The client MAY limit the number of items returned by the server with the 'maxitems' attribute.
<iq type='get' to='montague.net'> <list xmlns='http://jabber.org/protocol/archive' maxitems='50'/> </iq>
The collections (empty <store/> elements) in the result MUST be listed in chronological order.
<iq type='result' id='a1' to='romeo@montague.net/orchard'> <list xmlns='http://jabber.org/protocol/archive'> <store with='juliet@capulet.com' start='1469-07-21T02:56:15Z' subject='She speaks!'/> <store with='balcony@house.capulet.com' start='1469-07-21T03:16:37Z'/> ... </list> </iq>
If the requested list would be too long to return in its entirety without exceeding karma limits (or any other limit specified by an administrator), then the server SHOULD only return the first part of the list. In this case the server MUST indicate that the list is incomplete by setting the optional 'partial' attribute of the <list/> element to 'true'. The client MAY then request the remainder of the list, taking care to set the value of the 'start' attribute to one second after the time of the last collection in the partial list that it received.
<iq type='result' id='a1' to='romeo@montague.net/orchard'> <list partial='true' xmlns='http://jabber.org/protocol/archive'> <store with='juliet@capulet.com' start='1469-07-21T02:56:15Z' subject='She speaks!'/> <store with='balcony@house.capulet.com' start='1469-07-21T03:16:37Z'/> ... </list> </iq>
Note: This section is a work in progress.
Most of the examples in this document are not encrypted for clarity. However, this protocol strongly RECOMMENDS the encryption of all collections.
To generate a secret symmetric encryption key, K, and an RSA-encrypted version of the key, C, the client SHOULD use the RSA-KEM key encapsulation mechanism (see ISO 18033-2) along with the user's public RSA key.
The client SHOULD encrypt the complete sequence of <from/> and <to/> elements, M, that it wants to store with the encryption key, K, and a randomly generated public label, L, employing the DEM1 data encapsulation mechanism with the SC2 symmetric encryption algorithm (see ISO 18033-2).
Note that the client MAY use same key, K, for more than one collection. But it MUST use the label, L, with only one plain text, M.
The client MUST base64 encode the encrypted messages, wrap them in a single <crypt/> element and set the 'label' attribute to the base64 encoded random label L.
The client MUST set the 'keyalg=' attribute of the <store/> element to 'RSA-KEM-KDF2-SHA256' and the 'dataalg' attribute to 'DEM1-SC2-SHA256'. The 'key' attribute MUST also be set to the base64 encoded encrypted version of the key, C.
<iq type='set' to='montague.net'> <store xmlns='http://jabber.org/protocol/archive' with='juliet@capulet.com' start='1469-07-21T02:56:15Z' subject='She speaks!' keyalg='RSA-KEM-KDF2-SHA256' dataalg='DEM1-SC2-SHA256' key='bfXv33i+Ybqypa4ETLyorGkVl73v67SMvzX41MPRKA5cOp9wGDMgd8SirwIDAQAB'> <crypt label='VROLURBVEFDb3JwU0dDL'>E5Qbvfa2gI5lBZMAHryv4g+OGQ0SR+ysraP6LnD43m77VkIVni5c7yPeIbkFdicZ</crypt> </store> </iq>
The client MAY append messages to a collection in exactly the same way. In this case the client MUST use the same symmetric encryption key, K, and the same algorithms, but it MUST NOT use the same label, L. Note: when an encrypted collection is retrieved it may contain more than one <crypt/> element.
The NESSIE-recommended RSA-KEM (with KDF2/SHA-256) key encapsulation scheme (see ISO 18033-2 at http://www.shoup.net/iso/std5.pdf, or ANSI-X9.44) was specified because its security is tightly proven (unlike RSA-OAEP or PKCS #1 v1.5) and it is very simple to implement.
The SHA-256 hash was specified since SHA-1 is broken (assuming the attacker has plenty of computing power). Other standard hashes are not optimised for 32-bit processors (e.g. Whirlpool, SHA-384, SHA-512).
The client SHOULD support the mechanisms specified in this document. The client MAY support other mechanisms. Future versions of this document MAY be modified to recommend other mechanisims.
The mechanisms for the publishing of public keys and the storage and retrieval of private keys are beyond the scope of this document. A future JEP will specify how clients may do this in an interoperable way.
Since collections should be stored in encrypted form on the server, this protocol does not provide for server-side searching of the content of messages. Although it is inconvenient for people who use more than one client machine, the historical approach of archiving to local storage offers significantly better performance when searching content. This section describes how an implementation could combine the two approaches to provide the benefits of both. The basic concept is that archived collections are 'replicated' locally. [2]
Each time the client connects to the server it 'synchronizes' its local archive with the 'master' archive on the server. It simply notes the time of the most recent collection in its local storage, adds one second, and retrieves the list of all the collections from the server on or after that time (see Obtaining a List of Collections). It then retrieves all the listed collections (see Retrieving a Collection) and adds them to its local copy of the archive.
<iq type='get' to='montague.net'> <list xmlns='http://jabber.org/protocol/archive' start='1469-07-21T02:56:16Z' end='2038-01-01T00:00:00Z'/> </iq>
The client can then use its local copy of the archive to perform efficient content searches of all collections (which may have been archived by any of the user's clients).
Before presenting the results of a search to its user the client SHOULD confirm that each of the collections it has found has not been deleted or modified by another of the user's clients. [3]
<iq type='get' to='montague.net'> <retrieve xmlns='http://jabber.org/protocol/archive' with='juliet@capulet.com' start='1469-07-21T02:56:15Z'/> </iq>
The file format uses the same XML constructs as the protocol. Each file may contain messages exchanged with a single JID. Any number of items may be stored in an archive file.
<?xml version='1.0'?> <archive xmlns='http://jabber.org/protocol/archive' with='juliet@capulet.com'> <store start='1469-07-21T02:56:15Z' subject='She speaks!'> <from secs='0'><body>Art thou not Romeo, and a Montague?</body></from> <to secs='11'><body>Neither, fair saint, if either thee dislike.</body></to> <from secs='14'><body>How cam'st thou hither, tell me, and wherefore?</body></from> </store> <store start='1469-07-21T09:56:15Z' keyalg='RSA-KEM-KDF2-SHA256' dataalg='DEM1-SC2-SHA256' key='bfXv33i+Ybqypa4ETLyorGkVl73v67SMvzX41MPRKA5cOp9wGDMgd8SirwIDAQAB'> <crypt label='VROLURBVEFDb3JwU0dDL'>E5Qbvfa2gI5lBZMAHryv4g+OGQ0SR+ysraP6LnD43m77VkIVni5c7yPeIbkFdicZ</crypt> </store> <store start='1469-07-23T23:08:25Z' keyalg='RSA-KEM-KDF2-SHA256' dataalg='DEM1-SC2-SHA256' key='VQQKExVUaGUgVVNFUlRSVVNUIE5ldHdvcmsxITAfBgNVBAsTGGh0dHA6Ly93d3cu'> <crypt label='nVzdC5jb20xGzAZBgNVB'>j98C5OBxOvG0I3KgqgHf35g+FFCgMSa9KOlaMCZ1+XtgHI3zzVAmbQQnmt/VDUVHQ2AswkDwf9c3V6aPryuvEeKaq</crypt> </store> </archive>
Clients should not store one message at a time on the server since this increases both bandwidth consumption and the total number of transactions. It is instead RECOMMENDED that clients store messages only when the conversation thread appears to be terminated, i.e. when the user closes the chat window. If the user reopens the window and the thread continues then the client should append the new messages to the collection when the user closes the window again.
When appending messages to a collection clients SHOULD try to ensure that the total size of the collection will not exceed karma limits when it is retrieved later. This may be achieved by starting a new collection whenever a message thread becomes too long.
It is RECOMMENDED that the client synchronises all the times it sends to the server with server time. The client can achieve this using Entity Time [4] to estimate the difference between the server and client clocks.
Server implementations SHOULD give system administrators the option to disable support for this protocol since archived conversations can consume significant storage space.
The client that originates a message MAY specify a 'false' value for the 'store' header (see Stanza Headers and Internet Metadata (SHIM) [5]). The recipient MUST NOT archive such a message or any of the information it contains. If the sender plans to use 'store' headers it MUST use Service Discovery to determine whether or not the recipient supports them. If not, the sender MUST warn its human user (if any) before sending the message.
Since the subject of each collection is not encrypted, the client MUST warn its human user (if any) before including 'subject' attributes on encrypted collections.
No interaction with the Internet Assigned Numbers Authority (IANA) [6] is required as a result of this JEP.
The Jabber Registrar [7] shall register the 'http://jabber.org/protocol/archive' namespace as a result of this JEP.
To follow.
Encryption Section
XML Schemas
1. JEP-0030: Service Discovery <http://www.jabber.org/jeps/jep-0030.html>.
2. Clients that run in constrained environments may not be able to implement the 'replication' technique if they are prevented from accessing (sufficient) local storage.
3. The replication mechanism described here is not perfect. For example, even if a client has removed a collection from its local archive and from the server's 'master' archive, then that change would not be reflected in any other local copies of the archive maintained by clients on other machines.
4. JEP-0090: Entity Time <http://www.jabber.org/jeps/jep-0090.html>.
5. JEP-0131: Stanza Headers and Internet Metadata (SHIM) <http://www.jabber.org/jeps/jep-0131.html>.
6. The Internet Assigned Numbers Authority (IANA) is the central coordinator for the assignment of unique parameter values for Internet protocols, such as port numbers and URI schemes. For further information, see <http://www.iana.org/>.
7. The Jabber Registrar maintains a list of reserved Jabber protocol namespaces as well as registries of parameters used in the context of protocols approved by the Jabber Software Foundation. For further information, see <http://www.jabber.org/registrar/>.
END