XEP-xxxx: Jingle Audio/Video Conferences

Abstract
This specification defines a way to hold multiparty conferences with an SFU via Jingle.
Author
Jérôme Poisson
Copyright
© 2024 – 2024 XMPP Standards Foundation. SEE LEGAL NOTICES.
Status

ProtoXEP

WARNING: This document has not yet been accepted for consideration or approved in any official manner by the XMPP Standards Foundation, and this document is not yet an XMPP Extension Protocol (XEP). If this document is accepted as a XEP by the XMPP Council, it will be published at <https://xmpp.org/extensions/> and announced on the <standards@xmpp.org> mailing list.
Type
Standards Track
Version
0.0.1 (2024-07-29)
Document Lifecycle
  1. Experimental
  2. Proposed
  3. Stable
  4. Final

1. Introduction

Audio/Video calls are possible with a single destination via Jingle (XEP-0166) [1] and Jingle RTP Sessions (XEP-0167) [2], and associated XEPs. It may be desirable to have calls with multiple people at the same time. Three main strategies exist to achieve that:

The mesh strategy is covered by Multiparty Jingle (XEP-0272) [3]. There have been attempts to specify MCU and SFU strategies, notably with COnferences with LIghtweight BRIdging (COLIBRI) (XEP-0340) [4], but it is unmaintained and unused.

This specification proposes a simple way to implement a SFU strategy with a service called in a similar way to one-to-one calls.

2. Requirements

The design goals of this XEP are:

3. Glossary

4. Overview

A/V conferences works using a Jingle session per stream: all sessions use unidirectional streams. An A/V conference is joined by calling the JID of an conference entity, and by sending our own stream. The conference service then call us for each stream it wants to send us.

Delivering Conference Information to Jingle Participants (Coin) (XEP-0298) [5] is used to associate stream with the originating entity.

Ad-Hoc Commands (XEP-0050) [6] is used with a well-known node to configure the room.

5. Overview

A/V conferences work by using a Jingle session per stream: all sessions use unidirectional streams. To join an A/V conference, a client calls the JID of a conference entity and sends its own stream. The conference service then contacts the client for each stream it wants to send.

Delivering Conference Information to Jingle Participants (Coin) (XEP-0298) [5] is used to associate a stream with the originating entity.

Ad-Hoc Commands (XEP-0050) [6] is used with a well-known node to configure the room.

6. Joining a Conference

To join a conference, a user calls the conference room JID using Jingle RTP Sessions (XEP-0167) [2] as for a one-to-one call. The calling entity sends its audio/video streams (typically webcam and microphone) in an unidirectional way. It MUST have its content 'senders' attribute set to "initiator" as explained in Jingle (XEP-0166) [1] § 7.3.

A joining entity MAY send the same audio or video stream with different quality for simulcast purposes. It is up to the SFU service to then select which one to forward.

A SFU service MUST use Delivering Conference Information to Jingle Participants (Coin) (XEP-0298) [5] and notably it MUST set the 'isfocus' attribute of the <conference-info> element to "true".

7. Receiving Streams

The SFU service MAY call the joining entity as many times as necessary using Jingle RTP Sessions (XEP-0167) [2] Jingle sessions from the joined room. The joining entity SHOULD accept each session coming from a joined room. Each session MUST be unidirectional (stream sent from SFU service to joining entity) and MUST have its content 'senders' attribute set to "initiator" as explained in Jingle (XEP-0166) [1] § 7.3. Those streams are the streams of other participants as selected by the SFU.

To identify the streams' origin, the SFU service must specify it using Delivering Conference Information to Jingle Participants (Coin) (XEP-0298) [5], in particular by specifying the <user> element and its 'entity' attribute.

8. Examples

8.1 Juliet Joins a Conference

Example 1. Juliet initiates a Jingle session with the conference room
    <iq id="av_iq_1" type="set" from="juliet@capulet.lit/balcony" to="ball@conferences.shakespeare.lit">
      <jingle xmlns="urn:xmpp:jingle:1"
        sid="av_jingle_session_1"
        action="session-initiate"
        initiator="juliet@capulet.lit/balcony">
      <group xmlns="urn:xmpp:jingle:apps:grouping:0" semantics="BUNDLE">
        <content name="video0"/>
        <content name="audio1"/>
      </group>
      <content creator="initiator" name="video0" senders="initiator">
        <description xmlns="urn:xmpp:jingle:apps:rtp:1" media="video">
          <!-- video stream description -->
        </description>
        <transport xmlns="urn:xmpp:jingle:transports:ice-udp:1"
          ufrag="ufrag_1"
          pwd="ice_pwd_1">
        <fingerprint xmlns="urn:xmpp:jingle:apps:dtls:0" hash="sha-256" setup="actpass">
          <!-- some fingerprint -->
        </fingerprint>
      </transport>
    </content>
    <content creator="initiator" name="audio1" senders="initiator">
      <description xmlns="urn:xmpp:jingle:apps:rtp:1" media="audio">
        <!-- audio stream description -->
      </description>
      <transport xmlns="urn:xmpp:jingle:transports:ice-udp:1"
        ufrag="ufrag_1"
        pwd="ice_pwd_1">
        <!-- transport child elements -->
      </transport>
  </content>
</jingle>
</iq>

<iq to="juliet@capulet.lit/balcony" from="ball@conferences.shakespeare.lit" id="av_iq_1" type="result"/>

  
Example 2. Conference room accepts the session
<iq id="room_iq_1" type="set" from="ball@conferences.shakespeare.lit" to="juliet@capulet.lit/balcony">
  <jingle xmlns="urn:xmpp:jingle:1"
    sid="av_jingle_session_1"
    action="session-accept"
    responder="ball@conferences.shakespeare.lit">
    <group xmlns="urn:xmpp:jingle:apps:grouping:0" semantics="BUNDLE">
      <content name="audio1"/>
      <content name="video0"/>
    </group>
    <content creator="initiator" name="video0">
      <description xmlns="urn:xmpp:jingle:apps:rtp:1" media="video">
        <!-- video stream description -->
      </description>
      <!-- transport element -->
    </content>
    <content creator="initiator" name="audio1">
      <description xmlns="urn:xmpp:jingle:apps:rtp:1" media="audio">
        <!-- audio stream description -->
      </description>
      <!-- transport element -->
    </content>
    <conference-info xmlns='urn:xmpp:coin:1' isfocus='true'/>
  </jingle>
</iq>
  

8.2 Romeo Joins the Conference

Example 3. Conference room initiates a session with Juliet to send Romeo's stream
    <iq id="room_iq_2" type="set" from="ball@conferences.shakespeare.lit" to="juliet@capulet.lit/balcony">
      <jingle xmlns="urn:xmpp:jingle:1"
        sid="av_jingle_session_2"
        action="session-initiate"
        initiator="ball@conferences.shakespeare.lit">
      <group xmlns="urn:xmpp:jingle:apps:grouping:0" semantics="BUNDLE">
        <content name="0"/>
        <content name="1"/>
      </group>
      <content creator="initiator" name="0" senders="initiator">
        <description xmlns="urn:xmpp:jingle:apps:rtp:1" media="audio">
          <!-- audio stream description -->
        </description>
        <transport xmlns="urn:xmpp:jingle:transports:ice-udp:1"
          ufrag="ufrag_2"
          pwd="ice_pwd_2">
        <fingerprint xmlns="urn:xmpp:jingle:apps:dtls:0" hash="sha-256" setup="actpass">
          <!-- some fingerprint -->
        </fingerprint>
      </transport>
    </content>
    <content creator="initiator" name="1" senders="initiator">
      <description xmlns="urn:xmpp:jingle:apps:rtp:1" media="video">
        <!-- video stream description -->
      </description>
      <transport xmlns="urn:xmpp:jingle:transports:ice-udp:1"
          ufrag="ufrag_2"
          pwd="ice_pwd_2">
        <fingerprint xmlns="urn:xmpp:jingle:apps:dtls:0" hash="sha-256" setup="actpass">
          <!-- some fingerprint -->
        </fingerprint>
    </transport>
  </content>
  <conference-info xmlns='urn:xmpp:coin:1' isfocus='true'>
    <users>
      <user entity='xmpp:romeo@montague.lit' />
    </users>
  </conference-info>
</jingle>
</iq>
  

9. Configuration

It may be desirable to have configuration options on the SFU service or on one of its rooms. For instance, a conference room may offer the option to record the session. While the definition of those options is beyond the scope of this specification, the configuration MUST be done via Ad-Hoc Commands (XEP-0050) [6] on the well-known node "urn:xmpp:av-conferences:config".

10. Business Rules

Only other peers' streams are sent to a joining entity. A SFU service MUST NOT send back a stream to the entity that initially sent it.

A stream MAY originate from a non-XMPP entity (e.g., using SFU's own user interface or with a gateway to another protocol). That means that the <user> 'entity' attribute may be something else than an "xmpp:" URL.

11. Discovering Support

If a client or a service supports the protocol specified in this XEP, it MUST advertise it by including the "urn:xmpp:jingle:av-conferences:0" discovery feature in response to a Service Discovery (XEP-0030) [7] information request.

Example 4. Service Discovery information request
<iq type='get'
    from='juliet@example.org/balcony'
    to='romeo@example.org/orchard'
    id='disco1'>
  <query xmlns='http://jabber.org/protocol/disco#info'/>
</iq>
Example 5. Service Discovery information response
<iq type='result'
    from='romeo@example.org/orchard'
    to='juliet@example.org/balcony'
    id='disco1'>
  <query xmlns='http://jabber.org/protocol/disco#info'>
    ...
    <feature var='urn:xmpp:jingle:av-conferences:0'/>
    ...
  </query>
</iq>

If an entity is a A/V conferences service as specified by this XEP, it MUST have an identity with a category of "conference" and a value of "audio-video" in a response to a Service Discovery (XEP-0030) [7] information request.

Example 6. Service Discovery information request
<iq type='get'
    from='juliet@example.org/balcony'
    to='ball@conference.example.org'
    id='disco2'>
  <query xmlns='http://jabber.org/protocol/disco#info'/>
</iq>
Example 7. Service Discovery information response
<iq type='result'
    from='ball@conference.example.org'
    to='juliet@example.org/balcony'
    id='disco2'>
  <query xmlns='http://jabber.org/protocol/disco#info'>
    ...
    <identity category='conference' type='audio-video'/>
    ...
    <feature var='urn:xmpp:jingle:av-conferences:0'/>
    ...
  </query>
</iq>

12. Security Considerations

Streams are sent to the SFU service for redistribution. Although it is technically possible to have the stream encrypted end-to-end, this specification assumes that streams are decoded at the SFU level and are therefore not end-to-end encrypted, unlike one-to-one calls. Future specifications may outline a method for maintaining end-to-end encryption, but this is beyond the scope of this XEP; developers and end-users MUST assume that streams are decrypted at the SFU service level.

The general security considerations for one-to-one calls apply, with the added risk that more parties are participating in the call. However, the IP address of a device sending a stream is only known by the SFU service, not by other peers, unlike in one-to-one or Multiparty Jingle (XEP-0272) [3] calls.

13. IANA Considerations

TODO

14. XMPP Registrar Considerations

TODO

15. XML Schema

TODO

16. Acknowledgements

Thanks to NLNet foundation/NGI Assure for funding the work on this specification.


Appendices

Appendix A: Document Information

Series
XEP
Number
xxxx
Publisher
XMPP Standards Foundation
Status
ProtoXEP
Type
Standards Track
Version
0.0.1
Last Updated
2024-07-29
Approving Body
XMPP Council
Dependencies
XMPP Core, XEP-0001, XEP-0166, XEP-0167, XEP-0298
Supersedes
None
Superseded By
None
Short Name
av-conferences

This document in other formats: XML  PDF

Appendix B: Author Information

Jérôme Poisson
Email
goffi@goffi.org
JabberID
goffi@jabber.fr

Copyright

This XMPP Extension Protocol is copyright © 1999 – 2024 by the XMPP Standards Foundation (XSF).

Permissions

Permission is hereby granted, free of charge, to any person obtaining a copy of this specification (the "Specification"), to make use of the Specification without restriction, including without limitation the rights to implement the Specification in a software program, deploy the Specification in a network service, and copy, modify, merge, publish, translate, distribute, sublicense, or sell copies of the Specification, and to permit persons to whom the Specification is furnished to do so, subject to the condition that the foregoing copyright notice and this permission notice shall be included in all copies or substantial portions of the Specification. Unless separate permission is granted, modified works that are redistributed shall not contain misleading information regarding the authors, title, number, or publisher of the Specification, and shall not claim endorsement of the modified works by the authors, any organization or project to which the authors belong, or the XMPP Standards Foundation.

Disclaimer of Warranty

## NOTE WELL: This Specification is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. ##

Limitation of Liability

In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall the XMPP Standards Foundation or any author of this Specification be liable for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising from, out of, or in connection with the Specification or the implementation, deployment, or other use of the Specification (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if the XMPP Standards Foundation or such author has been advised of the possibility of such damages.

IPR Conformance

This XMPP Extension Protocol has been contributed in full conformance with the XSF's Intellectual Property Rights Policy (a copy of which can be found at <https://xmpp.org/about/xsf/ipr-policy> or obtained by writing to XMPP Standards Foundation, P.O. Box 787, Parker, CO 80134 USA).

Visual Presentation

The HTML representation (you are looking at) is maintained by the XSF. It is based on the YAML CSS Framework, which is licensed under the terms of the CC-BY-SA 2.0 license.

Appendix D: Relation to XMPP

The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 6120) and XMPP IM (RFC 6121) specifications contributed by the XMPP Standards Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this document has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.

Appendix E: Discussion Venue

The primary venue for discussion of XMPP Extension Protocols is the <standards@xmpp.org> discussion list.

Discussion on other xmpp.org discussion lists might also be appropriate; see <https://xmpp.org/community/> for a complete list.

Errata can be sent to <editor@xmpp.org>.

Appendix F: Requirements Conformance

The following requirements keywords as used in this document are to be interpreted as described in RFC 2119: "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", "RECOMMENDED"; "SHOULD NOT", "NOT RECOMMENDED"; "MAY", "OPTIONAL".

Appendix G: Notes

1. XEP-0166: Jingle <https://xmpp.org/extensions/xep-0166.html>.

2. XEP-0167: Jingle RTP Sessions <https://xmpp.org/extensions/xep-0167.html>.

3. XEP-0272: Multiparty Jingle <https://xmpp.org/extensions/xep-0272.html>.

4. XEP-0340: COnferences with LIghtweight BRIdging (COLIBRI) <https://xmpp.org/extensions/xep-0340.html>.

5. XEP-0298: Delivering Conference Information to Jingle Participants (Coin) <https://xmpp.org/extensions/xep-0298.html>.

6. XEP-0050: Ad-Hoc Commands <https://xmpp.org/extensions/xep-0050.html>.

7. XEP-0030: Service Discovery <https://xmpp.org/extensions/xep-0030.html>.

Appendix H: Revision History

Note: Older versions of this specification might be available at https://xmpp.org/extensions/attic/

  1. Version 0.0.1 (2024-07-29)

    First draft.

    jp

Appendix I: Bib(La)TeX Entry

@report{poisson2024av-conferences,
  title = {Jingle Audio/Video Conferences},
  author = {Poisson, Jérôme},
  type = {XEP},
  number = {xxxx},
  version = {0.0.1},
  institution = {XMPP Standards Foundation},
  url = {https://xmpp.org/extensions/xep-xxxx.html},
  date = {2024-07-29/2024-07-29},
}

END