Abstract: | This specification defines a formatted text syntax for use in instant messages with simple text styling. |
Author: | Sam Whited |
Copyright: | © 1999 – 2017 XMPP Standards Foundation. SEE LEGAL NOTICES. |
Status: | Experimental |
Type: | Standards Track |
Version: | 0.1.3 |
Last Updated: | 2018-02-14 |
WARNING: This Standards-Track document is Experimental. Publication as an XMPP Extension Protocol does not imply approval of this proposal by the XMPP Standards Foundation. Implementation of the protocol described herein is encouraged in exploratory implementations, but production systems are advised to carefully consider whether it is appropriate to deploy implementations of this protocol before it advances to a status of Draft.
1. Introduction
2. Requirements
3. Use Cases
4. Glossary
5. Business Rules
5.1. Blocks
5.1.1. Plain
5.1.2. Preformatted Text
5.1.3. Quotations
5.2. Spans
5.2.1. Plain
5.2.2. Strong
5.2.3. Emphasis
5.2.4. Strike through
5.2.5. Preformatted Span
6. Implementation Notes
7. Accessibility Considerations
8. Security Considerations
9. IANA Considerations
10. XMPP Registrar Considerations
11. XML Schema
12. Acknowledgements
Appendices
A: Document Information
B: Author Information
C: Legal Notices
D: Relation to XMPP
E: Discussion Venue
F: Requirements Conformance
G: Notes
H: Revision History
Historically, XMPP has had no system for simple text styling. Instead, specifications like XHTML-IM (XEP-0071) [1] that require full layout engines have been used, leading to numerous security issues with implementations. Some entities have also performed their own styling based on identifiers in the body. While this has worked well in the past, it is not interoperable and leads to entities each supporting their own informal styling languages.
This specification aims to provide a single, interoperable formatted text syntax that can be used by entities that do not require full layout engines.
Many important terms used in this document are defined in Unicode [2]. The terms "left-to-right" (LTR) and "right-to-left" (RTL) are defined in Unicode Standard Annex #9 [3]. The term "formatted text" is defined in RFC 7764 [4].
Parsers implementing message styling will first parse blocks and then parse child blocks or spans if allowed by the specific block type.
Individual lines of text that are not inside of a preformatted text block are considered a "plain" block. Plain blocks are not bound by styling directives and do not imply formatting themselves, but they may contain spans which imply formatting. Plain blocks may not contain child blocks.
<body> (There are three blocks in this body marked by parens,) (but there is no *formatting) (as spans* may not escape blocks.) </body>
A preformatted text block is started by a line beginning with "```" (U+0060 GRAVE ACCENT), and ended by a line containing only three grave accents or the end of the parent block (whichever comes first). Preformatted text blocks cannot contain child blocks or spans. Text inside a preformatted block SHOULD be displayed in a monospace font.
<body> ```ignored (println "Hello, world!") ``` This should show up as monospace, preformatted text ⤴ </body>
<body> > ``` > (println "Hello, world!") The entire blockquote is a preformatted text block, but this line is plaintext! </body>
A quotation is indicated by one or more lines with a byte stream beginning with a '>' (U+003E GREATER-THAN SIGN). Block quotes may contain any child block, including other quotations. Lines inside the block quote MUST have leading spaces trimmed before parsing the child block. It is RECOMMENDED that text inside of a block quote be indented or distinguished from the surrounding text in some other way.
<body> > That that is, is. Said the old hermit of Prague. </body>
<body> >> That that is, is. > Said the old hermit of Prague. Who? </body>
Matches of spans between two styling directives MUST contain some text between the two styling directives and the opening styling directive MUST be located at the beginning of the line, or after a whitespace character. The opening styling directive MUST NOT be followed by a whitespace character and the closing styling directive MUST NOT be preceeded by a whitespace character. Spans are always parsed from the beginning of the byte stream to the end and are lazily matched. Characters that would be styling directives but do not follow these rules are not considered when matching and thus may be present between two other styling directives.
For example, each of the following would be styled as indicated:
Nothing would be styled in the following messages (where "\n" represents a new line):
Any text inside of a block that is not part of another span is implicitly considered to be inside of a "plain text" span.
<body> (Two spans, both )(*alike in dignity*) </body>
Text enclosed by '*' (U+002A ASTERISK) is strong and SHOULD be displayed with a heavier font weight than the surrounding text (bold).
<body> The full title is "Twelfth Night, or What You Will" but *most* people shorten it. </body>
Text enclosed by '_' (U+005F LOW LINE) is emphasized and SHOULD be displayed in italics.
<body> The full title is _Twelfth Night, or What You Will_ but _most_ people shorten it. </body>
Text enclosed by '~' (U+007E TILDE) SHOULD be displayed with a horizontal line through the middle.
<body> Everyone ~dis~likes cake. </body>
Text enclosed by a '`' (U+0060 GRAVE ACCENT) is a preformatted span SHOULD be displayed inline in a monospace font. A preformatted span may only contain a single plain span. Inline formatting directives inside the preformatted span are not rendered. For example, the following all contain valid preformatted spans:
<body> Wow, I can write in `monospace`! </body>
This document does not define a regular grammar and thus styling cannot be matched by a regular expression. Instead, a simple parser can be constructed by first parsing all text into blocks and then recursively parsing the child-blocks inside block quotations, the spans inside individual lines, and by returning the text inside preformatted blocks without modification.
It is RECOMMENDED that formatting characters be displayed and formatted in the same manner as the text they apply to. For example, the string "*emphasis*" would be rendered as "*emphasis*".
When displaying text with formatting, developers should take care to ensure sufficient contrast exists between styled and unstyled text so that users with vision deficiencies are able to distinguish between the two.
Formatted text may also be rendered poorly by screen readers. When applying formatting it may be desirable to include directives to exclude formatting characters from being read.
Authors of message styling parsers should take care that improperly formatted messages cannot lead to buffer overruns or code execution.
This document requires no interaction with the Internet Assigned Numbers Authority (IANA) [5].
This specification requires no interaction with the XMPP Registrar [6]
This document does not define any new XML structure requiring a schema.
The author wishes to thank Kevin Smith for his review and feedback.
Series: XEP
Number: 0393
Publisher: XMPP Standards Foundation
Status:
Experimental
Type:
Standards Track
Version: 0.1.3
Last Updated: 2018-02-14
Approving Body: XMPP Council
Dependencies: XMPP Core, XEP-0001
Supersedes: XEP-0071
Superseded By: None
Short Name: styling
Source Control:
HTML
This document in other formats:
XML
PDF
Email:
sam@samwhited.com
JabberID:
sam@samwhited.com
URI:
https://blog.samwhited.com/
The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 6120) and XMPP IM (RFC 6121) specifications contributed by the XMPP Standards Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this document has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.
The primary venue for discussion of XMPP Extension Protocols is the <standards@xmpp.org> discussion list.
Discussion on other xmpp.org discussion lists might also be appropriate; see <http://xmpp.org/about/discuss.shtml> for a complete list.
Errata can be sent to <editor@xmpp.org>.
The following requirements keywords as used in this document are to be interpreted as described in RFC 2119: "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", "RECOMMENDED"; "SHOULD NOT", "NOT RECOMMENDED"; "MAY", "OPTIONAL".
1. XEP-0071: XHTML-IM <https://xmpp.org/extensions/xep-0071.html>.
2. The Unicode Standard, The Unicode Consortium <http://www.unicode.org/versions/latest/>.
3. Unicode Standard Annex #9, "Unicode Bidirectional Algorithm", edited by Mark Davis, Aharon Lanin, and Andrew Glass. An integral part of The Unicode Standard, <http://unicode.org/reports/tr9/>.
4. RFC 7764: Guidance on Markdown: Design Philosophies, Stability Strategies, and Select Registrations <http://tools.ietf.org/html/rfc7764>.
5. The Internet Assigned Numbers Authority (IANA) is the central coordinator for the assignment of unique parameter values for Internet protocols, such as port numbers and URI schemes. For further information, see <http://www.iana.org/>.
6. The XMPP Registrar maintains a list of reserved protocol namespaces as well as registries of parameters used in the context of XMPP extension protocols approved by the XMPP Standards Foundation. For further information, see <https://xmpp.org/registrar/>.
Note: Older versions of this specification might be available at http://xmpp.org/extensions/attic/
Reorder block and span sections, simplify block parsing, and update the definition of a span.
(ssw)Clarify block quote and plain text parsing and formatting behavior.
(ssw)Minor clarifications and updates, add security considerations, and expand the glossary.
(ssw)First draft approved by the XMPP Council.
(XEP Editor (ssw))First draft.
(ssw)END