Audio/Video calls are possible with a single destination via Jingle (XEP-0166) [1] and Jingle RTP Sessions (XEP-0167) [2], and associated XEPs. It may be desirable to have calls with multiple people at the same time. Three main strategies exist to achieve that:
A mesh network where each peer contacts each other peer: this is straightforward to implement and doesn't require any extra service, but it is heavy on network bandwidth and doesn't scale.
An MCU (Multipoint Conferencing Unit) which uses a central service that generally mixes the contents of participants: this is lighter on network bandwidth but heavy on computing resources and hard to scale.
A SFU (Selective Forwarding Unit) which uses a service that redistributes streams of participants as efficiently as possible: this is lighter on computing resources, acceptable in terms of network bandwidth, and can scale.
The mesh strategy is covered by Multiparty Jingle (XEP-0272) [3]. There have been attempts to specify MCU and SFU strategies, notably with COnferences with LIghtweight BRIdging (COLIBRI) (XEP-0340) [4], but it is unmaintained and unused.
This specification proposes a simple way to implement a SFU strategy with a service called in a similar way to one-to-one calls.
The design goals of this XEP are:
A/V conferences works using a Jingle session per stream: all sessions use unidirectional streams. An A/V conference is joined by calling the JID of an conference entity, and by sending our own stream. The conference service then call us for each stream it wants to send us.
Delivering Conference Information to Jingle Participants (Coin) (XEP-0298) [5] is used to associate stream with the originating entity.
Ad-Hoc Commands (XEP-0050) [6] is used with a well-known node to configure the room.
A/V conferences work by using a Jingle session per stream: all sessions use unidirectional streams. To join an A/V conference, a client calls the JID of a conference entity and sends its own stream. The conference service then contacts the client for each stream it wants to send.
Delivering Conference Information to Jingle Participants (Coin) (XEP-0298) [5] is used to associate a stream with the originating entity.
Ad-Hoc Commands (XEP-0050) [6] is used with a well-known node to configure the room.
To join a conference, a user calls the conference room JID using Jingle RTP Sessions (XEP-0167) [2] as for a one-to-one call. The calling entity sends its audio/video streams (typically webcam and microphone) in an unidirectional way. It MUST have its content 'senders' attribute set to "initiator" as explained in Jingle (XEP-0166) [1] § 7.3.
A joining entity MAY send the same audio or video stream with different quality for simulcast purposes. It is up to the SFU service to then select which one to forward.
A SFU service MUST use Delivering Conference Information to Jingle Participants (Coin) (XEP-0298) [5] and notably it MUST set the 'isfocus' attribute of the <conference-info> element to "true".
The SFU service MAY call the joining entity as many times as necessary using Jingle RTP Sessions (XEP-0167) [2] Jingle sessions from the joined room. The joining entity SHOULD accept each session coming from a joined room. Each session MUST be unidirectional (stream sent from SFU service to joining entity) and MUST have its content 'senders' attribute set to "initiator" as explained in Jingle (XEP-0166) [1] § 7.3. Those streams are the streams of other participants as selected by the SFU.
To identify the streams' origin, the SFU service must specify it using Delivering Conference Information to Jingle Participants (Coin) (XEP-0298) [5], in particular by specifying the <user> element and its 'entity' attribute.
It may be desirable to have configuration options on the SFU service or on one of its rooms. For instance, a conference room may offer the option to record the session. While the definition of those options is beyond the scope of this specification, the configuration MUST be done via Ad-Hoc Commands (XEP-0050) [6] on the well-known node "urn:xmpp:av-conferences:config".
Only other peers' streams are sent to a joining entity. A SFU service MUST NOT send back a stream to the entity that initially sent it.
A stream MAY originate from a non-XMPP entity (e.g., using SFU's own user interface or with a gateway to another protocol). That means that the <user> 'entity' attribute may be something else than an "xmpp:" URL.
If a client or a service supports the protocol specified in this XEP, it MUST advertise it by including the "urn:xmpp:jingle:av-conferences:0" discovery feature in response to a Service Discovery (XEP-0030) [7] information request.
If an entity is a A/V conferences service as specified by this XEP, it MUST have an identity with a category of "conference" and a value of "audio-video" in a response to a Service Discovery (XEP-0030) [7] information request.
Streams are sent to the SFU service for redistribution. Although it is technically possible to have the stream encrypted end-to-end, this specification assumes that streams are decoded at the SFU level and are therefore not end-to-end encrypted, unlike one-to-one calls. Future specifications may outline a method for maintaining end-to-end encryption, but this is beyond the scope of this XEP; developers and end-users MUST assume that streams are decrypted at the SFU service level.
The general security considerations for one-to-one calls apply, with the added risk that more parties are participating in the call. However, the IP address of a device sending a stream is only known by the SFU service, not by other peers, unlike in one-to-one or Multiparty Jingle (XEP-0272) [3] calls.
TODO
TODO
TODO
Thanks to NLNet foundation/NGI Assure for funding the work on this specification.
This document in other formats: XML PDF
This XMPP Extension Protocol is copyright © 1999 – 2024 by the XMPP Standards Foundation (XSF).
Permission is hereby granted, free of charge, to any person obtaining a copy of this specification (the "Specification"), to make use of the Specification without restriction, including without limitation the rights to implement the Specification in a software program, deploy the Specification in a network service, and copy, modify, merge, publish, translate, distribute, sublicense, or sell copies of the Specification, and to permit persons to whom the Specification is furnished to do so, subject to the condition that the foregoing copyright notice and this permission notice shall be included in all copies or substantial portions of the Specification. Unless separate permission is granted, modified works that are redistributed shall not contain misleading information regarding the authors, title, number, or publisher of the Specification, and shall not claim endorsement of the modified works by the authors, any organization or project to which the authors belong, or the XMPP Standards Foundation.
## NOTE WELL: This Specification is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. ##
In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall the XMPP Standards Foundation or any author of this Specification be liable for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising from, out of, or in connection with the Specification or the implementation, deployment, or other use of the Specification (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if the XMPP Standards Foundation or such author has been advised of the possibility of such damages.
This XMPP Extension Protocol has been contributed in full conformance with the XSF's Intellectual Property Rights Policy (a copy of which can be found at <https://xmpp.org/about/xsf/ipr-policy> or obtained by writing to XMPP Standards Foundation, P.O. Box 787, Parker, CO 80134 USA).
The HTML representation (you are looking at) is maintained by the XSF. It is based on the YAML CSS Framework, which is licensed under the terms of the CC-BY-SA 2.0 license.
The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 6120) and XMPP IM (RFC 6121) specifications contributed by the XMPP Standards Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this document has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.
The primary venue for discussion of XMPP Extension Protocols is the <standards@xmpp.org> discussion list.
Discussion on other xmpp.org discussion lists might also be appropriate; see <https://xmpp.org/community/> for a complete list.
Errata can be sent to <editor@xmpp.org>.
The following requirements keywords as used in this document are to be interpreted as described in RFC 2119: "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", "RECOMMENDED"; "SHOULD NOT", "NOT RECOMMENDED"; "MAY", "OPTIONAL".
1. XEP-0166: Jingle <https://xmpp.org/extensions/xep-0166.html>.
2. XEP-0167: Jingle RTP Sessions <https://xmpp.org/extensions/xep-0167.html>.
3. XEP-0272: Multiparty Jingle <https://xmpp.org/extensions/xep-0272.html>.
4. XEP-0340: COnferences with LIghtweight BRIdging (COLIBRI) <https://xmpp.org/extensions/xep-0340.html>.
5. XEP-0298: Delivering Conference Information to Jingle Participants (Coin) <https://xmpp.org/extensions/xep-0298.html>.
6. XEP-0050: Ad-Hoc Commands <https://xmpp.org/extensions/xep-0050.html>.
7. XEP-0030: Service Discovery <https://xmpp.org/extensions/xep-0030.html>.
Note: Older versions of this specification might be available at https://xmpp.org/extensions/attic/
First draft.
@report{poisson2024av-conferences, title = {Jingle Audio/Video Conferences}, author = {Poisson, Jérôme}, type = {XEP}, number = {xxxx}, version = {0.0.1}, institution = {XMPP Standards Foundation}, url = {https://xmpp.org/extensions/xep-xxxx.html}, date = {2024-07-29/2024-07-29}, }
END