Abstract: | This specification defines an XMPP protocol extension for providing language translation facilities over XMPP. It supports human, machine, client-based, and server-based translations. |
Authors: | Boyd Fletcher, Daniel LaPrade, Keith Lirette, Brian Raymond |
Copyright: | © 1999 - 2011 XMPP Standards Foundation. SEE LEGAL NOTICES. |
Status: | Draft |
Type: | Standards Track |
Version: | 1.1rc1 |
Last Updated: | in progress, last updated 2010-06-09 |
NOTICE: The protocol defined herein is a Draft Standard of the XMPP Standards Foundation. Implementations are encouraged and the protocol is appropriate for deployment in production systems, but some changes to the protocol are possible before it becomes a Final Standard.
1. Introduction
2. Glossary
3. Requirements
4. Use Cases
4.1. Message Delivery
4.1.1. Direct Translation
4.1.2. Translation With Pivot
4.1.3. Translation With Pivot Specifying Details
4.2. Discovering Translation Providers
4.2.1. Discovering Translation Providers On a Server
4.2.2. Discovering Identity of Providers
4.2.3. Discovering Language Support
4.3. Requesting a Translation from a Service
4.3.1. Requesting a Basic Translation
4.3.2. Requesting a Translation With Multiple Destination Languages
4.3.3. Requesting a Translation With a Specific Dictionary
5. Implementation Notes
6. Internationalization Considerations
7. Security Considerations
8. IANA Considerations
9. XMPP Registrar Considerations
9.1. Protocol Namespaces
9.2. Service Discovery Identities
10. XML Schema
10.1. langtrans
10.2. langtrans:items
Appendices
A: Document Information
B: Author Information
C: Legal Notices
D: Relation to XMPP
E: Discussion Venue
F: Requirements Conformance
G: Notes
H: Revision History
There currently exists no standard for describing language translations over a text chat protocol. While numerous products and services exist to provide translation of text, there exists no standardized protocol extension for requesting a translation and expressing the details of the translation over XMPP (see XMPP Core [1]). This document describes how to express a translation and its components in an XMPP message as well as a method to request translation.
Direct translation can be realized by either client-side translation before sending or transparent components translating messages on the fly. Discovering XMPP entities capable of translation allows for clients to request translation from them based on their capabilities. The remote XMPP entity could be either an automated translation service or a human providing translation.
The protocol defined herein addresses the following requirements:
Enable an XMPP entity to request a translation from a remote XMPP entity.
Enable an XMPP entity to express the following mandatory elements of a translation for any receiving entities.
Enable an XMPP entity to express the following optional elements of a translation for any receiving entities.
The following methods of translation are supported:
The following use cases illustrate simple scenarios for translation of expressions as well as requesting a translation from remote entities.
A message directly translated by the originating XMPP entity or a transparent XMPP entity delivered to a remote entity with only the required elements of source and destination language; this is the simplest case for a translation from one language to another. The source language is known because there is no <translation/> tag describing it. Three translation methods are supported by doing the following:
<message xml:lang='en' from='bard@shakespeare.lit/globe' to='playwright@marlowe.lit/theatre'> <subject xml:lang='en'>Hello</subject> <subject xml:lang='fr'>Bonjour</subject> <body xml:lang='en'>How are you?</body> <body xml:lang='fr'>comment allez-vous?</body> <x xmlns='urn:xmpp:langtrans'> <translation destination_lang='fr' source_lang='en'/> </x> </message>
A message translated by the originating XMPP entity or a transparent XMPP entity delivered to a remote entity with the pivot languages used to accomplish the translation. The source language is known because there is no translation tag describing it. When a translation is done via a pivot language, the pivot languages and their order of use MUST be specified.
<message xml:lang='fr' from='bard@shakespeare.lit/globe' to='playwright@marlowe.lit/theatre'> <subject xml:lang='fr'>Bonjour</subject> <subject xml:lang='en'>Hello</subject> <subject xml:lang='ru'>x443;лте</subject> <body xml:lang='fr'>comment allez-vous?</body> <body xml:lang='en'>How are you?</body> <body xml:lang='ru'>Как вы?</body> <x xmlns='urn:xmpp:langtrans'> <translation destination_lang='en' source_lang='fr'/> <translation destination_lang='ru' source_lang='en'/> </x> </message>
A message translated by the originating XMPP entity or a transparent XMPP entity delivered to a remote entity using pivot languages and machine translation. The source language is known because there is no <x/> translation tag describing it.
<message xml:lang='fr' from='bard@shakespeare.lit/globe' to='playwright@marlowe.lit/theatre'> <subject xml:lang='fr'>Bonjour</subject> <subject xml:lang='en'>Hello</subject> <subject xml:lang='ru'>x443;лте</subject> <body xml:lang='fr'>comment allez-vous?</body> <body xml:lang='en'>How are you?</body> <body xml:lang='ru'>Как вы?</body> <x xmlns='urn:xmpp:langtrans'> <translation destination_lang='en' engine='SYSTRANS' source_lang='fr'/> <translation destination_lang='ru' engine='SYSTRANS' source_lang='en'/> </x> </message>
When connected to a server, a XMPP entity can locate translation providers by asking a server which translation providers are attached to the server; this MUST be done using Service Discovery [2]. The server SHOULD return the availability of of translation providers and language pairings for which the user has rights to use.
<iq type='get' id='disco1' to='shakespeare.lit'> <query xmlns='http://jabber.org/protocol/disco#items'/> </iq>
<iq type='result' id='disco1' from='shakespeare.lit' to='bard@shakespeare.lit/globe'> <query xmlns='http://jabber.org/protocol/disco#items'> ... <item jid='towerofbabel@shakespeare.lit' name='Tower of Babel Translation Bot'/> <item jid='translation.shakespeare.lit' name='Translation Provider Service'/> ... </query> </iq>
Service Discovery is used to determine if a JID provides translation services. The JID can also be a bot (e.g., <towerofbabel@shakespeare.lit>) or a server component (e.g., <translation.shakespeare.lit>).
<iq type='get' to='translation.shakespeare.lit' from='bard@shakespeare.lit/globe'> <query xmlns='http://jabber.org/protocol/disco#info'/> </iq>
<iq type='result' to='bard@shakespeare.lit/globe' from='translation.shakespeare.lit'> <query xmlns='http://jabber.org/protocol/disco#info'> ... <identity category='automation' type='translation'/> <feature var='urn:xmpp:langtrans'/> ... </query> </iq>
The supported languages and other details for the service must be known to use it. It is permissible for a translation service to provide multiple translation engines for the same language pairing -- if this is done, then a separate <item/> tag MUST be used for each pairing. A 'dictionary' attribute MAY be used to specify the dictionary for a specific <item/>. In order to specify more than one dictionary for a given language pairing then a separate <item/> tag MUST be used for each dictionary specification for that language pairing.
<iq type='get' to='translation.shakespeare.lit' from='bard@shakespeare.lit/globe'> <query xmlns='urn:xmpp:langtrans:items'/> </iq>
<iq type='result' to='bard@shakespeare.lit/globe' from=' translation.shakespeare.lit'> <query xmlns='urn:xmpp:langtrans:items'> <item source_lang='en' jid='translation.shakespeare.lit' destination_lang='fr' engine='SYSTRANS 2005 Release 2' pivotable='true'/> <item source_lang='en' jid='translation.shakespeare.lit' destination_lang='ko' engine='SYSTRANS 2005 Release 2' pivotable='true'/> <item source_lang='en' jid='translation.shakespeare.lit' destination_lang='ru' engine='SYSTRANS 2005 Release 2' pivotable='true'/> <item source_lang='en' jid='translation.shakespeare.lit' destination_lang='ru' engine='SYSTRANS 2005 Release 2' pivotable='true' dictionary='medical'/> <item source_lang='fr' jid='translation.shakespeare.lit' destination_lang='en' engine='SYSTRANS 2005 Release 2' pivotable='true' dictionary='standard'/> <item source_lang='ru' jid='translation.shakespeare.lit' destination_lang='en' engine='SYSTRANS 2005 Release 2' pivotable='true' dictionary='Medical 1.0'/> <item source_lang='ko' jid='translation.shakespeare.lit' destination_lang='en' engine='SYSTRANS 2005 Release 2' pivotable='true'/> </query> </iq>
To request service from a translation provider you can send a message to a provider requesting translations. The lack of a 'source_lang' attribute in the <translation/> element indicates a request for a translation.
<iq from='bard@shakespeare.lt/globe' id='translationReq_2' to='translation.shakespeare.lit' type='get'> <x xmlns='urn:xmpp:langtrans'> <source xml:lang='en'>How are you?</source> <translation destination_lang='fr'/> </x> </iq>
<iq type='result' id='translationReq_2' from='translation.shakespeare.lit' to='bard@shakespeare.lt/globe'> <x xmlns='urn:xmpp:langtrans'> <source xml:lang='en'>hello, how are you?</source> <translation destination_lang='fr' source_lang='en' engine='default'>comment allez-vous?</translation> </x> </iq>
<iq from='bard@shakespeare.lt/globe' id='translationReq_4' to='translation.shakespeare.lit' type='get'> <x xmlns='urn:xmpp:langtrans'> <source xml:lang='en'>How are you?</source> <translation destination_lang='it'/> <translation destination_lang='de'/> </x> </iq>
<iq type='result' id='translationReq_4' from='translation.shakespeare.lit' to='bard@shakespeare.lt/globe'> <x xmlns='urn:xmpp:langtrans'> <source xml:lang='en'>How are you?</source> <translation destination_lang='it' source_lang='en' engine='default'>Come siete?</translation> <translation destination_lang='de' source_lang='en' engine='default'>Wie geht es Ihnen?</translation> </x> </iq>
If a specific dictionary is required you MAY request a dictionary. This SHOULD have been returned when discoing the server although a dictionary MAY be requested which was not. The dictionaries are translation engine specific and are free form text.
<iq from='bard@shakespeare.lt/globe' id='translationReq_6' to='translation.shakespeare.lit' type='get'> <x xmlns='urn:xmpp:langtrans'> <source xml:lang='en'>How are you?</source> <translation destination_lang='fr' dictionary='medical'/> </x> </iq>
<iq type='result' id='translationReq_6' from='translation.shakespeare.lit' to='bard@shakespeare.lt/globe'> <x xmlns='urn:xmpp:langtrans'> <source xml:lang='en'>hello, how are you?</source> <translation destination_lang='fr' dictionary='medical' engine='default' source_lang='en'>comment allez-vous?</translation> </x> </iq>
If the translation service cannot complete the translation it SHOULD return a error indicating some part of the translation request was problematic, unless doing so would violate the privacy and security considerations in XMPP Core and XMPP IM, or local security and privacy policies.
<iq type='error' id='translationReq_7' from='translation.shakespeare.lit' to='bard@shakespeare.lt/globe'> <x xmlns='urn:xmpp:langtrans'> <source xml:lang='en'>How are you?</source> <translation destination_lang='dy'/> </x> <error type='modify'> <bad-request xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/> </error> </iq>
If privacy or security considerations make returning an error not feasible it SHOULD return a error.
<iq type='error' id='translationReq_7' from='translation.shakespeare.lit' to='bard@shakespeare.lt/globe'> <x xmlns='urn:xmpp:langtrans'> <source xml:lang='en'>How are you?</source> <translation destination_lang='dy'/> </x> <error type='cancel'> <service-unavailable xmlns='urn:ietf:xml:params:ns:xmpp-stanzas'/> </error> </iq>
In order to reduce user confusion and misunderstanding of a translated message body, it is RECOMMENDED that implementations of langtran implement the following user interface features.
Note: The 'reviewed' and 'pivotable' attributes are of type "boolean" and MUST be handled accordingly. [3]
In order to properly process multi-language messages, clients MUST implement support for multiple message bodies differentiated by the 'xml:lang' attribute as described in RFC 6120.
Potential attacks may be easier against services that implement translation because of the potential disclosure of information regarding language pairings, engines, and dictionaries used however no specific vulnerabilities are introduced.
This possible weakness can be mitigated by not returning specifics to requesting entities and the responding entity MAY perform authorization checks in order to determine how to respond.
This document requires no interaction with the Internet Assigned Numbers Authority (IANA) [4].
The XMPP Registrar [5] includes 'urn:xmpp:langtrans' and 'urn:xmpp:langtrans:items' within its registry of protocol namespaces (see <http://xmpp.org/registrar/namespaces.html>).
Note: Before version 1.1 of this specification, the name of the items namespace was urn:xmpp:langtrans#items, however the '#' character is not recommended in URN syntax (see RFC 2141 [6]) so the name was changed to urn:xmpp:langtrans:items.
The XMPP Registrar includes type of "translation" in the "automation" category within its registry of service discovery identities (see <http://xmpp.org/registrar/disco-categories.html>).
<?xml version='1.0' encoding='UTF-8'?> <xs:schema xmlns='urn:xmpp:langtrans' xmlns:xs='http://www.w3.org/2001/XMLSchema' targetNamespace='urn:xmpp:langtrans' elementFormDefault='qualified'> <xs:annotation> <xs:documentation> The protocol documented by this schema is defined in XEP-0171: http://www.xmpp.org/extensions/xep-0171.html </xs:documentation> </xs:annotation> <xs:element name='x'> <xs:complexType> <xs:element ref='source' use='required'/> <xs:sequence> <xs:element ref='translation' use='required' minOccurs='1'/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name='source'> <xs:complexType> <xs:simpleContent> <xs:extension base='xs:string'> <xs:attribute ref='xml:lang' use='required'/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name='translation'> <xs:complexType> <xs:simpleContent> <xs:extension base='empty'> <xs:attribute name='charset' type='xs:string' use='optional'/> <xs:attribute name='source_lang' type='xs:language' use='optional' /> <xs:attribute name='destination_lang' type='xs:string' use='required'/> <xs:attribute name='dictionary' type='xs:string' use='required'/> <xs:attribute name='engine' type='xs:string' use='optional' /> <xs:attribute name='reviewed' type='xs:boolean' use='optional' default='false'/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:simpleType name='empty'> <xs:restriction base='xs:string'> <xs:enumeration value=''/> </xs:restriction> </xs:simpleType> </xs:schema>
<?xml version='1.0' encoding='UTF-8'?> <xs:schema xmlns='urn:xmpp:langtrans:items' xmlns:xs='http://www.w3.org/2001/XMLSchema' targetNamespace='urn:xmpp:langtrans:items' elementFormDefault='qualified'> <xs:annotation> <xs:documentation> The protocol documented by this schema is defined in XEP-0171: http://www.xmpp.org/extensions/xep-0171.html </xs:documentation> </xs:annotation> <xs:element name='query'> <xs:complexType> <xs:sequence> <xs:element ref='item' minOccurs='0' maxOccurs='unbounded'/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name='item'> <xs:complexType> <xs:simpleContent> <xs:extension base='empty'> <xs:attribute name='dictionary' type='xs:string'/> <xs:attribute name='destination_lang' type='xs:language'/> <xs:attribute name='engine' type='xs:string' use='optional'/> <xs:attribute name='jid' type='xs:string' use='required'/> <xs:attribute name='name' type='xs:string' use='optional'/> <xs:attribute name='pivotable' type='xs:boolean' use='optional' default='false'/> <xs:attribute name='source_lang' type='xs:language' use='required'/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:simpleType name='empty'> <xs:restriction base='xs:string'> <xs:enumeration value=''/> </xs:restriction> </xs:simpleType> </xs:schema>
Series: XEP
Number: 0171
Publisher: XMPP Standards Foundation
Status:
Draft
Type:
Standards Track
Version: 1.1rc1
Last Updated: in progress, last updated 2010-06-09
Approving Body: XMPP Council
Dependencies: XMPP Core, XMPP IM, XEP-0030
Supersedes: None
Superseded By: None
Short Name: langtrans
XML Schema for langtrans namespace: <http://www.xmpp.org/schemas/langtrans.xsd>
XML Schema for langtrans:items namespace: <http://www.xmpp.org/schemas/langtrans-items.xsd>
Source Control:
HTML
RSS
This document in other formats:
XML
PDF
Email:
boyd.fletcher@us.army.mil
Email:
dlaprade@echostorm.net
Email:
keith.lirette@tridsys.com
Email:
braymond@echostorm.net
The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 3920) and XMPP IM (RFC 3921) specifications contributed by the XMPP Standards Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this document has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.
The primary venue for discussion of XMPP Extension Protocols is the <standards@xmpp.org> discussion list.
Discussion on other xmpp.org discussion lists might also be appropriate; see <http://xmpp.org/about/discuss.shtml> for a complete list.
Errata can be sent to <editor@xmpp.org>.
The following requirements keywords as used in this document are to be interpreted as described in RFC 2119: "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", "RECOMMENDED"; "SHOULD NOT", "NOT RECOMMENDED"; "MAY", "OPTIONAL".
1. RFC 6120: Extensible Messaging and Presence Protocol (XMPP): Core <http://tools.ietf.org/html/rfc6120>.
2. XEP-0030: Service Discovery <http://xmpp.org/extensions/xep-0030.html>.
3. In accordance with Section 3.2.2.1 of XML Schema Part 2: Datatypes, the allowable lexical representations for the xs:boolean datatype are the strings "0" and "false" for the concept 'false' and the strings "1" and "true" for the concept 'true'; implementations MUST support both styles of lexical representation.
4. The Internet Assigned Numbers Authority (IANA) is the central coordinator for the assignment of unique parameter values for Internet protocols, such as port numbers and URI schemes. For further information, see <http://www.iana.org/>.
5. The XMPP Registrar maintains a list of reserved protocol namespaces as well as registries of parameters used in the context of XMPP extension protocols approved by the XMPP Standards Foundation. For further information, see <http://xmpp.org/registrar/>.
6. RFC 2141: URN Syntax <http://tools.ietf.org/html/rfc2141>.
Note: Older versions of this specification might be available at http://xmpp.org/extensions/attic/
With author approval, the XMPP Registrar changed the items namespace from urn:xmpp:langtrans#items to urn:xmpp:langtrans:items because # is not recommended in URN syntax.
(psa)Per a vote of the XMPP Council, advanced status to Draft.
(psa)Modified semantics to use IQ stanzas for communication with servers; changed dst_lang to destination_lang and src_lang to source_lang; changed destination to destination_lang and derived_from to source_lang.
(kl/bf)Added text about use of Thread IDs.
(psa)Initial version.
(psa)Converted to XML format, cleaned up text, modified examples, changed pivotable and reviewed attributes to xs:boolean, corrected schema.
(psa)Changed xml:lang to destination, derived to derived_from; added service discovery identity.
(bf)Miscellaneous edits.
(bf)First draft.
(bf)END