JEP-0136: Message Archiving

This JEP defines a storage protocol and common disk format for archiving of messages.


WARNING: This Standards-Track JEP is Experimental. Publication as a Jabber Enhancement Proposal does not imply approval of this proposal by the Jabber Software Foundation. Implementation of the protocol described herein is encouraged in exploratory implementations, but production systems should not deploy implementations of this protocol until it advances to a status of Draft.


JEP Information

Status: Experimental
Type: Standards Track
Number: 0136
Version: 0.2
Last Updated: 2005-04-18
JIG: Standards JIG
Approving Body: Jabber Council
Dependencies: XMPP Core, XMPP IM, JEP-0030
Supersedes: None
Superseded By: None
Short Name: archive

Author Information

Justin Karneges

Email: justin@affinix.com
JID: justin@andbit.net

Ian Paterson

Email: ian.paterson@clientside.co.uk
JID: ian@zoofy.com

Legal Notice

This Jabber Enhancement Proposal is copyright 1999 - 2005 by the Jabber Software Foundation (JSF) and is in full conformance with the JSF's Intellectual Property Rights Policy <http://www.jabber.org/jsf/ipr-policy.shtml>. This material may be distributed only subject to the terms and conditions set forth in the Creative Commons Attribution License (<http://creativecommons.org/licenses/by/2.5/>).

Discussion Venue

The preferred venue for discussion of this document is the Standards-JIG discussion list: <http://mail.jabber.org/mailman/listinfo/standards-jig>.

Relation to XMPP

The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 3920) and XMPP IM (RFC 3921) specifications contributed by the Jabber Software Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this JEP has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.

Conformance Terms

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.


Table of Contents

1. Introduction
2. Concepts
3. Determining Server Support
4. Storing Messages
5. Retrieving a Collection
6. Removing Collections
7. Obtaining a List of Collections
8. Encryption
9. File Format
10. Implementation Notes
11. Privacy Considerations
12. Known Issues
13. IANA Considerations
14. Jabber Registrar Considerations
Notes
Revision History


1. Introduction

Historically, clients have archived messages in local storage. However, that is clearly inconvenient for people who use more than one client machine (home, work, mobile) and whenever people upgrade to a new machine. Furthermore, security and resource limitations often prevent clients that run in constrained environments from accessing (sufficient) local storage.

This specification defines a protocol for storing and retrieving messages on a server. Each storage item consists of a collection of messages. This is usually a message thread. Clients are able to add/update/remove collections from the server. This document also specifies a disk format. This allows clients to share message archive files, in a way similar to email clients sharing common formats like mbox and Maildir.

A server autoarchiving approach would have eliminated the need to submit collections to the server. However, this specification empowers clients instead. This approach enables them to store out-of-band messages like email as well. Also, since end-to-end encryption schemes typically use evanescent keys that are discarded immediately after use, server autoarchived encrypted messages would not be decryptable.

The protocol is designed to minimise the size of collections. This is necessary to mitigate the memory and bandwidth limitations of constrained clients and to alleviate karma issues.

2. Concepts

Messages are stored in message collections on the server. The client uniquely specifies a collection using the pair of attributes: 'with' (bare JID with which the messages were exchanged) and 'start' (thread start-time).

The content of each individual message MUST be encapsulated in a <to/> or <from/> element. The time in seconds of the message relative to the start-time of the collection SHOULD be specified with a 'secs' attribute. The content SHOULD include a <body/> element. Other elements MAY be included, but they are NOT RECOMMENDED. To conserve bandwidth and storage, elements scoped by the 'http://jabber.org/protocol/xhtml-im' namespace SHOULD NOT be included. <thread/> elements and elements scoped by the 'jabber:x:delay', 'jabber:x:event' and 'http://jabber.org/protocol/chatstates' namespaces MUST NOT be included.

Complying with XMPP Core, the server MUST respond to all <iq/> elements. However, most 'successful' reponses have been omitted from this document in the interest of conciseness.

All times MUST be in the UTC time zone.

3. Determining Server Support

The client discovers whether the server supports this protocol using Service Discovery [1].

Example 1. Client Service Discovery request

    
<iq type='get' to='montague.net'>
  <query xmlns='http://jabber.org/protocol/disco#info'/>
</iq>

If the server supports this protocol, it MUST return a <feature/> element in the result with the 'var' attribute set to 'http://jabber.org/protocol/archive'.

Example 2. Server Service Discovery response

    
<iq type='result'
    from='montague.net'
    to='romeo@montague.net/orchard'>
  <query xmlns='http://jabber.org/protocol/disco#info'/>
    ...
    <feature var='http://jabber.org/protocol/archive'/>
    ...
  </query>
</iq>

4. Storing Messages

The messages to be stored are encapsulated in the <store/> element.

Example 3. Storing messages in a collection

    
<iq type='set' to='montague.net'>
  <store xmlns='http://jabber.org/protocol/archive'
         with='juliet@capulet.com'
         start='1469-07-21T02:56:15Z'
         subject='She speaks!'>
    <from secs='0'><body>Art thou not Romeo, and a Montague?</body></from>
    <to secs='11'><body>Neither, fair saint, if either thee dislike.</body></to>
    <from secs='14'><body>How cam'st thou hither, tell me, and wherefore?</body></from>
  </store>
</iq>

If the collection does not exist then the server MUST create a new collection. If the collection already exists then the server MUST append the messages to the existing collection.

A friendly name for the collection MAY be specified with a 'subject' attribute. If the collection already has a 'subject' then it is simply replaced.

Example 4. Successful reply

    
<iq type='result' to='romeo@montague.net/orchard'/>

If the server cannot service a store request because the collection is too large then it MUST return a Not Acceptable error:

Example 5. Unsuccessful reply

    
<iq type='error' to='romeo@montague.net/orchard'>
  <error code='406' type='modify'>
    <not-acceptable xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/>
  </error>
</iq>

The client MAY specify an absolute time for any message by providing a longer 'utc' attribute instead of a 'secs' attribute:

Example 6. Storing offline messages in a collection

    
<iq type='set' to='montague.net'>
  <store xmlns='http://jabber.org/protocol/archive'
         with='juliet@capulet.com'
         start='1469-07-21T02:56:15Z'
         subject='She speaks!'>
    <from utc='1469-07-21T00:32:29Z'><body>Art thou not Romeo, and a Montague?</body></from>
    <to secs='11'><body>Neither, fair saint, if either thee dislike.</body></to>
    <from secs='14'><body>How cam'st thou hither, tell me, and wherefore?</body></from>
  </store>
</iq>

The client SHOULD include the 'name' attribute to specify the 'resource' of all messages that it received from a room:

Example 7. Storing groupchat messages in a collection

    
<iq type='set' to='montague.net'>
  <store xmlns='http://jabber.org/protocol/archive'
         with='balcony@house.capulet.com'
         start='1469-07-21T03:16:37Z'>
    <from secs='0' name='benvolio'><body>She will indite him to some supper.</body></from>
    <from secs='5' name='mercutio'><body>A bawd, a bawd, a bawd! So ho!</body></from>
    <from secs='11' name='romeo'><body>What hast thou found?</body></from>
  </store>
</iq>

5. Retrieving a Collection

The client sends an empty <retrieve/> element to request the download of a collection:

Example 8. Requesting a collection

    
<iq type='get' to='montague.net'>
  <retrieve xmlns='http://jabber.org/protocol/archive'
            with='juliet@capulet.com'
            start='1469-07-21T02:56:15Z'/>
</iq>

Example 9. Receiving a collection

    
<iq type='result' to='montague.net'>
  <store xmlns='http://jabber.org/protocol/archive'
         with='juliet@capulet.com'
         start='1469-07-21T02:56:15Z'
         subject='She speaks!'>
    <from secs='0'><body>Art thou not Romeo, and a Montague?</body></from>
    <to secs='11'><body>Neither, fair saint, if either thee dislike.</body></to>
    <from secs='14'><body>How cam'st thou hither, tell me, and wherefore?</body></from>
  </store>
</iq>

If the collection does not exist then the server MUST return a Not Found error:

Example 10. Unsuccessful reply

    
<iq type='error' to='romeo@montague.net/orchard'>
  <error code='404' type='cancel'>
    <item-not-found xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/>
  </error>
</iq>

6. Removing Collections

To request the removal of a collection the client sends an empty <remove/> element.

Example 11. Removing a single collection

    
<iq type='set' to='montague.net'>
  <remove xmlns='http://jabber.org/protocol/archive'
          with='juliet@capulet.com'
          start='1469-07-21T02:56:15Z'/>
</iq>

If the collection does not exist then the server MUST return a Not Found error:

Example 12. Unsuccessful reply

    
<iq type='error' to='romeo@montague.net/orchard'>
  <error code='404' type='cancel'>
    <item-not-found xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/>
  </error>
</iq>

The client may remove several collections at once. The 'start' and 'end' elements MAY be specified to indicate a date range.

Example 13. Removing all collections with a specified JID between two times

    
<iq type='set' to='montague.net'>
  <remove xmlns='http://jabber.org/protocol/archive'
          with='juliet@capulet.com'
          start='1469-07-21T02:00:00Z'
          end='1469-07-21T04:00:00Z'/>
</iq>

If the 'with' attribute is omitted then collections with any JID are removed. If only 'start' is specified then all collections after that date should be removed. If only 'end' is specified then all collections prior to that date should be removed.

Example 14. Removing all collections

    
<iq type='set' to='montague.net'>
  <remove xmlns='http://jabber.org/protocol/archive'/>
</iq>

7. Obtaining a List of Collections

To request a list of collections the client sends an empty <list/> element. The 'start' and 'end' elements MAY be specified to indicate a date range.

If the 'with' attribute is omitted then collections with any JID are returned. If only 'start' is specified then all collections after that date should be returned. If only 'end' is specified then all collections prior to that date should be returned.

Example 15. Requesting a list

    
<iq type='get' to='montague.net'>
  <list xmlns='http://jabber.org/protocol/archive'
        with='juliet@capulet.com'
        start='1469-07-21T02:00:00Z'
        end='1479-07-21T04:00:00Z'/>
</iq>

Example 16. Requesting a list of all collections with a JID

    
<iq type='get' to='montague.net'>
  <list xmlns='http://jabber.org/protocol/archive'
        with='juliet@capulet.com'/>
</iq>

The client MAY limit the number of items returned by the server with the 'maxitems' attribute.

Example 17. Requesting a list of all collections

    
<iq type='get' to='montague.net'>
  <list xmlns='http://jabber.org/protocol/archive'
        maxitems='50'/>
</iq>

Example 18. Receiving a list

    
<iq type='result' id='a1' to='romeo@montague.net/orchard'>
  <list xmlns='http://jabber.org/protocol/archive'>
    <store with='juliet@capulet.com'
           start='1469-07-21T02:56:15Z'
           subject='She speaks!'/>
    <store with='balcony@house.capulet.com'
           start='1469-07-21T03:16:37Z'/>
    ...
  </list>
</iq>

8. Encryption

Most of the examples in this document are not encrypted for clarity. However, this protocol strongly RECOMMENDS the encryption of all collections.

To generate a secret symmetric encryption key, K, and an RSA-encrypted version of the key, C, the client SHOULD use the RSA-KEM key encapsulation mechanism (see ISO 18033-2) along with the user's public RSA key.

The client SHOULD encrypt the complete sequence of <from/> and <to/> elements, M, that it wants to store with the encryption key, K, and a randomly generated public label, L, employing the DEM1 data encapsulation mechanism with the SC2 symmetric encryption algorithm (see ISO 18033-2).

Note that the client MAY use same key, K, for more than one collection. But it MUST use the label, L, with only one plain text, M.

The client SHOULD base64 encode the encrypted messages, wrap them in a single <crypt/> element and set the 'label' attribute to the base64 encoded random label L.

The client MUST set the 'keyalg=' attribute of the <store/> element to 'RSA-KEM-KDF2-SHA256' and the 'dataalg' attribute to 'DEM1-SC2-SHA256'. The 'key' attribute MUST also be set to the base64 encoded encrypted version of the key, C.

Example 19. Storing encrypted messages in a collection

    
<iq type='set' to='montague.net'>
  <store xmlns='http://jabber.org/protocol/archive'
         with='juliet@capulet.com'
         start='1469-07-21T02:56:15Z'
         subject='She speaks!'
         keyalg='RSA-KEM-KDF2-SHA256'
         dataalg='DEM1-SC2-SHA256'
         key='bfXv33i+Ybqypa4ETLyorGkVl73v67SMvzX41MPRKA5cOp9wGDMgd8SirwIDAQAB'>
    <crypt label='VROLURBVEFDb3JwU0dDL'>E5Qbvfa2gI5lBZMAHryv4g+OGQ0SR+ysraP6LnD43m77VkIVni5c7yPeIbkFdicZ</crypt>
  </store>
</iq>

The NESSIE-recommended RSA-KEM (with KDF2/SHA-256) key encapsulation scheme (see ISO 18033-2 at http://www.shoup.net/iso/std5.pdf, or ANSI-X9.44) was specified because its security is tightly proven (unlike RSA-OAEP or PKCS #1 v1.5) and it is very simple to implement.

The SHA-256 hash was specified since SHA-1 is broken (assuming the attacker has plenty of computing power). Other standard hashes are not optimised for 32-bit processors (e.g. Whirlpool, SHA-384, SHA-512).

The client SHOULD support the mechanisms specified in this document. The client MAY support other mechanisms. Future versions of this document MAY be modified to recommend other mechanisims.

The mechanisms for the publishing of public keys and the storage and retrieval of private keys are beyond the scope of this document. A future JEP will specify how clients may do this in an interoperable way.

9. File Format

The file format uses the same XML constructs as the protocol. Each file may contain messages exchanged with a single JID. Any number of items may be stored in an archive file.

Example 20. Example file

    
<?xml version='1.0'?>
<archive xmlns='http://jabber.org/protocol/archive'
         with='juliet@capulet.com'>
  <store start='1469-07-21T02:56:15Z'
         subject='She speaks!'>
    <from secs='0'><body>Art thou not Romeo, and a Montague?</body></from>
    <to secs='11'><body>Neither, fair saint, if either thee dislike.</body></to>
    <from secs='14'><body>How cam'st thou hither, tell me, and wherefore?</body></from>
  </store>
  <store start='1469-07-21T09:56:15Z'
         keyalg='RSA-KEM-KDF2-SHA256'
         dataalg='DEM1-SC2-SHA256'
         key='bfXv33i+Ybqypa4ETLyorGkVl73v67SMvzX41MPRKA5cOp9wGDMgd8SirwIDAQAB'>
    <crypt label='VROLURBVEFDb3JwU0dDL'>E5Qbvfa2gI5lBZMAHryv4g+OGQ0SR+ysraP6LnD43m77VkIVni5c7yPeIbkFdicZ</crypt>
  </store>
  <store start='1469-07-23T23:08:25Z'
         keyalg='RSA-KEM-KDF2-SHA256'
         dataalg='DEM1-SC2-SHA256'
         key='VQQKExVUaGUgVVNFUlRSVVNUIE5ldHdvcmsxITAfBgNVBAsTGGh0dHA6Ly93d3cu'>
    <crypt label='nVzdC5jb20xGzAZBgNVB'>j98C5OBxOvG0I3KgqgHf35g+FFCgMSa9KOlaMCZ1+XtgHI3zzVAmbQQnmt/VDUVHQ2AswkDwf9c3V6aPryuvEeKaq</crypt>
  </store>
</archive>

10. Implementation Notes

Clients should not store one message at a time on the server since this increases both bandwidth consumption and the total number of transactions. It is instead RECOMMENDED that clients store messages only when the conversation thread appears to be terminated, i.e. when the user closes the chat window. If the user reopens the window and the thread continues then the client should append the new messages to the collection when the user closes the window again.

When appending messages to a collection clients SHOULD try to ensure that the total size of the collection will not exceed karma limits when it is retrieved later. This may be achieved by starting a new collection whenever a message thread becomes too long.

It is RECOMMENDED that the client synchronises all the times it sends to the server with server time. The client can achieve this using Entity Time [2] to estimate the difference between the server and client clocks.

Server implementations SHOULD give system administrators the option to disable support for this protocol since archived conversations can consume significant storage space.

11. Privacy Considerations

The client that originates a message MAY specify a 'false' value for the 'store' header (see Stanza Headers and Internet Metadata (SHIM) [3]). The recipient MUST NOT archive such a message or any of the information it contains. If the sender plans to use 'store' headers it MUST use Service Discovery to determine whether or not the recipient supports them. If not, the sender MUST warn its human user (if any) before sending the message.

12. Known Issues

The subject of each collection is not encrypted (like S/MIME?). The client MUST warn its human user (if any) before including 'subject' attributes on encrypted collections.

Servers will not be able to search the content of encrypted collections.

The 'Encryption' section is a work in progress.

13. IANA Considerations

No interaction with the Internet Assigned Numbers Authority (IANA) [4] is required as a result of this JEP.

14. Jabber Registrar Considerations

The Jabber Registrar [5] shall register the 'http://jabber.org/protocol/archive' namespace as a result of this JEP.


Notes

1. JEP-0030: Service Discovery <http://www.jabber.org/jeps/jep-0030.html>.

2. JEP-0090: Entity Time <http://www.jabber.org/jeps/jep-0090.html>.

3. JEP-0131: Stanza Headers and Internet Metadata (SHIM) <http://www.jabber.org/jeps/jep-0131.html>.

4. The Internet Assigned Numbers Authority (IANA) is the central coordinator for the assignment of unique parameter values for Internet protocols, such as port numbers and URI schemes. For further information, see <http://www.iana.org/>.

5. The Jabber Registrar maintains a list of reserved Jabber protocol namespaces as well as registries of parameters used in the context of protocols approved by the Jabber Software Foundation. For further information, see <http://www.jabber.org/registrar/>.


Revision History

Version 0.2 (2005-04-18)

Complete rewrite. (ip)

Version 0.1 (2004-06-04)

Initial version. (jk)


END