diff --git a/doc/apiref.rst b/doc/apiref.rst index 15b9b9c..5424af9 100644 --- a/doc/apiref.rst +++ b/doc/apiref.rst @@ -294,8 +294,9 @@ String Jansson uses UTF-8 as the character encoding. All JSON strings must be valid UTF-8 (or ASCII, as it's a subset of UTF-8). Normal null terminated C strings are used, so JSON strings may not contain -embedded null characters. All other Unicode codepoints U+0001 through -U+10FFFF are allowed. +embedded null characters. All other Unicode codepoints U+0000 through +U+10FFFF are allowed, but you must use length-aware functions if you +wish to embed NUL bytes in strings. .. function:: json_t *json_string(const char *value) @@ -568,6 +569,9 @@ Object A JSON object is a dictionary of key-value pairs, where the key is a Unicode string and the value is any JSON value. +Even though NUL bytes are allowed in string values, they are not +allowed in object keys. + .. function:: json_t *json_object(void) .. refcounting:: new @@ -987,6 +991,19 @@ macros can be ORed together to obtain *flags*. .. versionadded:: 2.5 +``JSON_ALLOW_NUL`` + Allow ``\u0000`` escape inside string values. This is a safety + measure; If you know your input can contain NUL bytes, use this + flag. If you don't use this flag, you don't have to worry about NUL + bytes inside strings unless you explicitly create themselves by + using e.g. :func:`json_stringn()` or ``s#`` format specifier for + :func:`json_pack()`. + + Object keys cannot have embedded NUL bytes even if this flag is + used. + + .. versionadded:: 2.6 + Each function also takes an optional :type:`json_error_t` parameter that is filled with error information if decoding fails. It's also updated on success; the number of bytes of input read is written to diff --git a/doc/conformance.rst b/doc/conformance.rst index 09ada0e..de3947d 100644 --- a/doc/conformance.rst +++ b/doc/conformance.rst @@ -19,8 +19,11 @@ Strings ======= JSON strings are mapped to C-style null-terminated character arrays, -and UTF-8 encoding is used internally. All Unicode codepoints U+0000 -through U+10FFFF are allowed. +and UTF-8 encoding is used internally. + +All Unicode codepoints U+0000 through U+10FFFF are allowed in string +values. However, U+0000 is not allowed in object keys because of API +restrictions. Unicode normalization or any other transformation is never performed on any strings (string values or object keys). When checking for