JEP-0136: Message Archiving

This JEP defines a storage protocol and common disk format for archiving of messages.


WARNING: This Standards-Track JEP is Experimental. Publication as a Jabber Enhancement Proposal does not imply approval of this proposal by the Jabber Software Foundation. Implementation of the protocol described herein is encouraged in exploratory implementations, but production systems should not deploy implementations of this protocol until it advances to a status of Draft.


JEP Information

Status: Experimental
Type: Standards Track
Number: 0136
Version: 0.4
Last Updated: 2005-12-21
JIG: Standards JIG
Approving Body: Jabber Council
Dependencies: XMPP Core, XMPP IM, JEP-0030
Supersedes: None
Superseded By: None
Short Name: archive
Wiki Page: <http://wiki.jabber.org/index.php/Message Archiving (JEP-0136)>

Author Information

Justin Karneges

Email: justin@affinix.com
JID: justin@andbit.net

Ian Paterson

Email: ian.paterson@clientside.co.uk
JID: ian@zoofy.com

Legal Notice

This Jabber Enhancement Proposal is copyright 1999 - 2005 by the Jabber Software Foundation (JSF) and is in full conformance with the JSF's Intellectual Property Rights Policy <http://www.jabber.org/jsf/ipr-policy.shtml>. This material may be distributed only subject to the terms and conditions set forth in the Creative Commons Attribution License (<http://creativecommons.org/licenses/by/2.5/>).

Discussion Venue

The preferred venue for discussion of this document is the Standards-JIG discussion list: <http://mail.jabber.org/mailman/listinfo/standards-jig>.

Relation to XMPP

The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 3920) and XMPP IM (RFC 3921) specifications contributed by the Jabber Software Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this JEP has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.

Conformance Terms

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.


Table of Contents

1. Introduction
2. Concepts
3. Determining Server Support
4. Storing Messages in a Collection
5. Retrieving a Collection
6. Removing Collections
7. Obtaining a List of Collections
8. Encryption
9. Replication and Searching
10. File Format
11. Implementation Notes
11.1. Bandwidth Considerations
11.2. Karma Considerations
11.3. Synchronization
11.4. Storage Considerations
12. Privacy Considerations
12.1. Store Headers
12.2. Subject Attributes
13. IANA Considerations
14. Jabber Registrar Considerations
15. XML Schemas
16. To Do
Notes
Revision History


1. Introduction

Historically, clients have archived messages in local storage. However, that is clearly inconvenient for people who use more than one client machine (home, work, mobile) and whenever people upgrade to a new machine. Furthermore, security and resource limitations often prevent clients that run in constrained environments from accessing (sufficient) local storage.

This specification defines a protocol for storing and retrieving messages on a server. Each storage item consists of a collection of messages. This is usually a message thread. Clients are able to add/update/remove collections from the server. This document also specifies a disk format. This allows clients to share message archive files, in a way similar to email clients sharing common formats like mbox and Maildir.

A server autoarchiving approach would have eliminated the need to submit collections to the server. However, this specification empowers clients instead. This approach enables them to store out-of-band messages like email as well. Also, since end-to-end encryption schemes typically use evanescent keys that are discarded immediately after use, server autoarchived encrypted messages would not be decryptable.

The protocol is designed to minimise the size of collections. This is necessary to mitigate the memory and bandwidth limitations of constrained clients and to alleviate karma issues.

2. Concepts

Messages are stored in message collections on the server. The client uniquely specifies a collection using the pair of attributes: 'with' (bare JID with which the messages were exchanged) and 'start' (thread start-time).

The content of each individual message MUST be encapsulated in a <to/> or <from/> element. The time in seconds of the message relative to the start-time of the collection SHOULD be specified with a 'secs' attribute. The content SHOULD include a <body/> element. Other elements MAY be included, but they are NOT RECOMMENDED. To conserve bandwidth and storage, elements scoped by the 'http://jabber.org/protocol/xhtml-im' namespace SHOULD NOT be included. <thread/> elements and elements scoped by the 'jabber:x:delay', 'jabber:x:event' and 'http://jabber.org/protocol/chatstates' namespaces MUST NOT be included.

Complying with XMPP Core, the server MUST respond to all <iq/> elements. However, most 'successful' reponses have been omitted from this document in the interest of conciseness.

All times MUST be in the UTC time zone.

3. Determining Server Support

The client discovers whether the server supports this protocol using Service Discovery [1].

Example 1. Client Service Discovery request

    
<iq type='get' to='montague.net'>
  <query xmlns='http://jabber.org/protocol/disco#info'/>
</iq>

If the server supports this protocol, it MUST return a <feature/> element in the result with the 'var' attribute set to 'http://jabber.org/protocol/archive'.

Example 2. Server Service Discovery response

    
<iq type='result'
    from='montague.net'
    to='romeo@montague.net/orchard'>
  <query xmlns='http://jabber.org/protocol/disco#info'/>
    ...
    <feature var='http://jabber.org/protocol/archive'/>
    ...
  </query>
</iq>

4. Storing Messages in a Collection

The messages to be stored are encapsulated in the <store/> element.

Example 3. Storing messages in a collection

    
<iq type='set' to='montague.net'>
  <store xmlns='http://jabber.org/protocol/archive'
         with='juliet@capulet.com'
         start='1469-07-21T02:56:15Z'
         subject='She speaks!'>
    <from secs='0'><body>Art thou not Romeo, and a Montague?</body></from>
    <to secs='11'><body>Neither, fair saint, if either thee dislike.</body></to>
    <from secs='14'><body>How cam'st thou hither, tell me, and wherefore?</body></from>
  </store>
</iq>

If the collection does not exist then the server MUST create a new collection. If the collection already exists then the server MUST append the messages to the existing collection.

A friendly name for the collection MAY be specified with a 'subject' attribute. If the collection already has a 'subject' then it is simply replaced. Note the Privacy Considerations for subject attributes.

Example 4. Successful reply

    
<iq type='result' to='romeo@montague.net/orchard'/>

If the server cannot service a store request because the collection is too large then it MUST return a Not Acceptable error:

Example 5. Unsuccessful reply

    
<iq type='error' to='romeo@montague.net/orchard'>
  <error code='406' type='modify'>
    <not-acceptable xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/>
  </error>
</iq>

The client MAY specify an absolute time for any message by providing a longer 'utc' attribute instead of a 'secs' attribute:

Example 6. Storing offline messages in a collection

    
<iq type='set' to='montague.net'>
  <store xmlns='http://jabber.org/protocol/archive'
         with='juliet@capulet.com'
         start='1469-07-21T02:56:15Z'
         subject='She speaks!'>
    <from utc='1469-07-21T00:32:29Z'><body>Art thou not Romeo, and a Montague?</body></from>
    <to secs='11'><body>Neither, fair saint, if either thee dislike.</body></to>
    <from secs='14'><body>How cam'st thou hither, tell me, and wherefore?</body></from>
  </store>
</iq>

The client SHOULD include the 'name' attribute to specify the 'resource' of all messages that it received from a room:

Example 7. Storing groupchat messages in a collection

    
<iq type='set' to='montague.net'>
  <store xmlns='http://jabber.org/protocol/archive'
         with='balcony@house.capulet.com'
         start='1469-07-21T03:16:37Z'>
    <from secs='0' name='benvolio'><body>She will indite him to some supper.</body></from>
    <from secs='5' name='mercutio'><body>A bawd, a bawd, a bawd! So ho!</body></from>
    <from secs='11' name='romeo'><body>What hast thou found?</body></from>
  </store>
</iq>

5. Retrieving a Collection

The client sends an empty <retrieve/> element to request the download of a collection:

Example 8. Requesting a collection

    
<iq type='get' to='montague.net'>
  <retrieve xmlns='http://jabber.org/protocol/archive'
            with='juliet@capulet.com'
            start='1469-07-21T02:56:15Z'/>
</iq>

Example 9. Receiving a collection

    
<iq type='result' to='montague.net'>
  <store xmlns='http://jabber.org/protocol/archive'
         with='juliet@capulet.com'
         start='1469-07-21T02:56:15Z'
         subject='She speaks!'>
    <from secs='0'><body>Art thou not Romeo, and a Montague?</body></from>
    <to secs='11'><body>Neither, fair saint, if either thee dislike.</body></to>
    <from secs='14'><body>How cam'st thou hither, tell me, and wherefore?</body></from>
  </store>
</iq>

If the collection does not exist then the server MUST return a Not Found error:

Example 10. Unsuccessful reply

    
<iq type='error' to='romeo@montague.net/orchard'>
  <error code='404' type='cancel'>
    <item-not-found xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/>
  </error>
</iq>

6. Removing Collections

To request the removal of a collection the client sends an empty <remove/> element.

Example 11. Removing a single collection

    
<iq type='set' to='montague.net'>
  <remove xmlns='http://jabber.org/protocol/archive'
          with='juliet@capulet.com'
          start='1469-07-21T02:56:15Z'/>
</iq>

If the collection does not exist then the server MUST return a Not Found error:

Example 12. Unsuccessful reply

    
<iq type='error' to='romeo@montague.net/orchard'>
  <error code='404' type='cancel'>
    <item-not-found xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/>
  </error>
</iq>

The client may remove several collections at once. The 'start' and 'end' elements MAY be specified to indicate a date range.

Example 13. Removing all collections with a specified JID between two times

    
<iq type='set' to='montague.net'>
  <remove xmlns='http://jabber.org/protocol/archive'
          with='juliet@capulet.com'
          start='1469-07-21T02:00:00Z'
          end='1469-07-21T04:00:00Z'/>
</iq>

If the 'with' attribute is omitted then collections with any JID are removed.

If the end date is in the future then then all collections after the start date are removed.

Example 14. Removing all collections after a date

    
<iq type='set' to='montague.net'>
  <remove xmlns='http://jabber.org/protocol/archive'
          start='1469-07-21T02:00:00Z'
          end='2038-01-01T00:00:00Z'/>
</iq>

If the start date is before all the collections in the archive then all collections prior to the end date are removed.

Example 15. Removing all collections before a date

    
<iq type='set' to='montague.net'>
  <remove xmlns='http://jabber.org/protocol/archive'
          start='0000-01-01T00:00:00Z'
          end='1469-07-21T04:00:00Z'/>
</iq>

Example 16. Removing all collections

    
<iq type='set' to='montague.net'>
  <remove xmlns='http://jabber.org/protocol/archive'/>
</iq>

7. Obtaining a List of Collections

To request a list of collections the client sends an empty <list/> element. The 'start' and 'end' elements MAY be specified to indicate a date range.

If the 'with' attribute is omitted then collections with any JID are returned. If only 'start' is specified then all collections on or after that date should be returned. If only 'end' is specified then all collections prior to that date should be returned.

Example 17. Requesting a list

    
<iq type='get' to='montague.net'>
  <list xmlns='http://jabber.org/protocol/archive'
        with='juliet@capulet.com'
        start='1469-07-21T02:00:00Z'
        end='1479-07-21T04:00:00Z'/>
</iq>

Example 18. Requesting a list of all collections with a JID

    
<iq type='get' to='montague.net'>
  <list xmlns='http://jabber.org/protocol/archive'
        with='juliet@capulet.com'/>
</iq>

The client MAY limit the number of items returned by the server with the 'maxitems' attribute.

Example 19. Requesting a list of all collections

    
<iq type='get' to='montague.net'>
  <list xmlns='http://jabber.org/protocol/archive'
        maxitems='50'/>
</iq>

The collections (empty <store/> elements) in the result MUST be listed in chronological order.

Example 20. Receiving a list

    
<iq type='result' id='a1' to='romeo@montague.net/orchard'>
  <list xmlns='http://jabber.org/protocol/archive'>
    <store with='juliet@capulet.com'
           start='1469-07-21T02:56:15Z'
           subject='She speaks!'/>
    <store with='balcony@house.capulet.com'
           start='1469-07-21T03:16:37Z'/>
    ...
  </list>
</iq>
  

If the requested list would be too long to return in its entirety without exceeding karma limits (or any other limit specified by an administrator), then the server SHOULD only return the first part of the list. In this case the server MUST indicate that the list is incomplete by setting the optional 'partial' attribute of the <list/> element to 'true'. The client MAY then request the remainder of the list, taking care to set the value of the 'start' attribute to one second after the time of the last collection in the partial list that it received.

Example 21. Receiving a partial list

    
<iq type='result' id='a1' to='romeo@montague.net/orchard'>
  <list partial='true' xmlns='http://jabber.org/protocol/archive'>
    <store with='juliet@capulet.com'
           start='1469-07-21T02:56:15Z'
           subject='She speaks!'/>
    <store with='balcony@house.capulet.com'
           start='1469-07-21T03:16:37Z'/>
    ...
  </list>
</iq>
  

8. Encryption

Note: This section is a work in progress.

Most of the examples in this document are not encrypted for clarity. However, this protocol strongly RECOMMENDS the encryption of all collections.

To generate a secret symmetric encryption key, K, and an RSA-encrypted version of the key, C, the client SHOULD use the RSA-KEM key encapsulation mechanism (see ISO 18033-2) along with the user's public RSA key.

The client SHOULD encrypt the complete sequence of <from/> and <to/> elements, M, that it wants to store with the encryption key, K, and a randomly generated public label, L, employing the DEM1 data encapsulation mechanism with the SC2 symmetric encryption algorithm (see ISO 18033-2).

Note that the client MAY use same key, K, for more than one collection. But it MUST use the label, L, with only one plain text, M.

The client MUST base64 encode the encrypted messages, wrap them in a single <crypt/> element and set the 'label' attribute to the base64 encoded random label L.

The client MUST set the 'keyalg=' attribute of the <store/> element to 'RSA-KEM-KDF2-SHA256' and the 'dataalg' attribute to 'DEM1-SC2-SHA256'. The 'key' attribute MUST also be set to the base64 encoded encrypted version of the key, C.

Example 22. Storing encrypted messages in a collection

    
<iq type='set' to='montague.net'>
  <store xmlns='http://jabber.org/protocol/archive'
         with='juliet@capulet.com'
         start='1469-07-21T02:56:15Z'
         subject='She speaks!'
         keyalg='RSA-KEM-KDF2-SHA256'
         dataalg='DEM1-SC2-SHA256'
         key='bfXv33i+Ybqypa4ETLyorGkVl73v67SMvzX41MPRKA5cOp9wGDMgd8SirwIDAQAB'>
    <crypt label='VROLURBVEFDb3JwU0dDL'>E5Qbvfa2gI5lBZMAHryv4g+OGQ0SR+ysraP6LnD43m77VkIVni5c7yPeIbkFdicZ</crypt>
  </store>
</iq>

The client MAY append messages to a collection in exactly the same way. In this case the client MUST use the same symmetric encryption key, K, and the same algorithms, but it MUST NOT use the same label, L. Note: when an encrypted collection is retrieved it may contain more than one <crypt/> element.

The NESSIE-recommended RSA-KEM (with KDF2/SHA-256) key encapsulation scheme (see ISO 18033-2 at http://www.shoup.net/iso/std5.pdf, or ANSI-X9.44) was specified because its security is tightly proven (unlike RSA-OAEP or PKCS #1 v1.5) and it is very simple to implement.

The SHA-256 hash was specified since SHA-1 is broken (assuming the attacker has plenty of computing power). Other standard hashes are not optimised for 32-bit processors (e.g. Whirlpool, SHA-384, SHA-512).

The client SHOULD support the mechanisms specified in this document. The client MAY support other mechanisms. Future versions of this document MAY be modified to recommend other mechanisims.

The mechanisms for the publishing of public keys and the storage and retrieval of private keys are beyond the scope of this document. A future JEP will specify how clients may do this in an interoperable way.

9. Replication and Searching

Since collections should be stored in encrypted form on the server, this protocol does not provide for server-side searching of the content of messages. Although it is inconvenient for people who use more than one client machine, the historical approach of archiving to local storage offers significantly better performance when searching content. This section describes how an implementation could combine the two approaches to provide the benefits of both. The basic concept is that archived collections are 'replicated' locally. [2]

Each time the client connects to the server it 'synchronizes' its local archive with the 'master' archive on the server. It simply notes the time of the most recent collection in its local storage, adds one second, and retrieves the list of all the collections from the server on or after that time (see Obtaining a List of Collections). It then retrieves all the listed collections (see Retrieving a Collection) and adds them to its local copy of the archive.

Example 23. Requesting the List of Changes Since the Last Synchronization

    
<iq type='get' to='montague.net'>
  <list xmlns='http://jabber.org/protocol/archive'
            start='1469-07-21T02:56:16Z'
            end='2038-01-01T00:00:00Z'/>
</iq>

The client can then use its local copy of the archive to perform efficient content searches of all collections (which may have been archived by any of the user's clients).

Before presenting the results of a search to its user the client SHOULD confirm that each of the collections it has found has not been deleted or modified by another of the user's clients. [3]

Example 24. Verifying a Search Result

    
<iq type='get' to='montague.net'>
  <retrieve xmlns='http://jabber.org/protocol/archive'
            with='juliet@capulet.com'
            start='1469-07-21T02:56:15Z'/>
</iq>

10. File Format

The file format uses the same XML constructs as the protocol. Each file may contain messages exchanged with a single JID. Any number of items may be stored in an archive file.

Example 25. Example file

    
<?xml version='1.0'?>
<archive xmlns='http://jabber.org/protocol/archive'
         with='juliet@capulet.com'>
  <store start='1469-07-21T02:56:15Z'
         subject='She speaks!'>
    <from secs='0'><body>Art thou not Romeo, and a Montague?</body></from>
    <to secs='11'><body>Neither, fair saint, if either thee dislike.</body></to>
    <from secs='14'><body>How cam'st thou hither, tell me, and wherefore?</body></from>
  </store>
  <store start='1469-07-21T09:56:15Z'
         keyalg='RSA-KEM-KDF2-SHA256'
         dataalg='DEM1-SC2-SHA256'
         key='bfXv33i+Ybqypa4ETLyorGkVl73v67SMvzX41MPRKA5cOp9wGDMgd8SirwIDAQAB'>
    <crypt label='VROLURBVEFDb3JwU0dDL'>E5Qbvfa2gI5lBZMAHryv4g+OGQ0SR+ysraP6LnD43m77VkIVni5c7yPeIbkFdicZ</crypt>
  </store>
  <store start='1469-07-23T23:08:25Z'
         keyalg='RSA-KEM-KDF2-SHA256'
         dataalg='DEM1-SC2-SHA256'
         key='VQQKExVUaGUgVVNFUlRSVVNUIE5ldHdvcmsxITAfBgNVBAsTGGh0dHA6Ly93d3cu'>
    <crypt label='nVzdC5jb20xGzAZBgNVB'>j98C5OBxOvG0I3KgqgHf35g+FFCgMSa9KOlaMCZ1+XtgHI3zzVAmbQQnmt/VDUVHQ2AswkDwf9c3V6aPryuvEeKaq</crypt>
  </store>
</archive>

11. Implementation Notes

11.1 Bandwidth Considerations

Clients should not store one message at a time on the server since this increases both bandwidth consumption and the total number of transactions. It is instead RECOMMENDED that clients store messages only when the conversation thread appears to be terminated, i.e. when the user closes the chat window. If the user reopens the window and the thread continues then the client should append the new messages to the collection when the user closes the window again.

11.2 Karma Considerations

When appending messages to a collection clients SHOULD try to ensure that the total size of the collection will not exceed karma limits when it is retrieved later. This may be achieved by starting a new collection whenever a message thread becomes too long.

11.3 Synchronization

It is RECOMMENDED that the client synchronises all the times it sends to the server with server time. The client can achieve this using Entity Time [4] to estimate the difference between the server and client clocks.

11.4 Storage Considerations

Server implementations SHOULD give system administrators the option to disable support for this protocol since archived conversations can consume significant storage space.

12. Privacy Considerations

12.1 Store Headers

The client that originates a message MAY specify a 'false' value for the 'store' header (see Stanza Headers and Internet Metadata (SHIM) [5]). The recipient MUST NOT archive such a message or any of the information it contains. If the sender plans to use 'store' headers it MUST use Service Discovery to determine whether or not the recipient supports them. If not, the sender MUST warn its human user (if any) before sending the message.

12.2 Subject Attributes

Since the subject of each collection is not encrypted, the client MUST warn its human user (if any) before including 'subject' attributes on encrypted collections.

13. IANA Considerations

No interaction with the Internet Assigned Numbers Authority (IANA) [6] is required as a result of this JEP.

14. Jabber Registrar Considerations

The Jabber Registrar [7] shall register the 'http://jabber.org/protocol/archive' namespace as a result of this JEP.

15. XML Schemas

To follow.

16. To Do

Encryption Section

XML Schemas


Notes

1. JEP-0030: Service Discovery <http://www.jabber.org/jeps/jep-0030.html>.

2. Clients that run in constrained environments may not be able to implement the 'replication' technique if they are prevented from accessing (sufficient) local storage.

3. The replication mechanism described here is not perfect. For example, even if a client has removed a collection from its local archive and from the server's 'master' archive, then that change would not be reflected in any other local copies of the archive maintained by clients on other machines.

4. JEP-0090: Entity Time <http://www.jabber.org/jeps/jep-0090.html>.

5. JEP-0131: Stanza Headers and Internet Metadata (SHIM) <http://www.jabber.org/jeps/jep-0131.html>.

6. The Internet Assigned Numbers Authority (IANA) is the central coordinator for the assignment of unique parameter values for Internet protocols, such as port numbers and URI schemes. For further information, see <http://www.iana.org/>.

7. The Jabber Registrar maintains a list of reserved Jabber protocol namespaces as well as registries of parameters used in the context of protocols approved by the Jabber Software Foundation. For further information, see <http://www.jabber.org/registrar/>.


Revision History

Version 0.4 (2005-12-21)

Added Replication and Searching section, partial attribute; minor improvements (ip)

Version 0.3 (2005-10-21)

Added more examples to Removing Collections (ip)

Version 0.2 (2005-04-18)

Complete rewrite. (ip)

Version 0.1 (2004-06-04)

Initial version. (jk)


END