XML IN FIFTEEN MINUTES OR LESS
Written by David Megginson, david@megginson.com
Last modified: $Date$
This document is in the Public Domain and comes with NO WARRANTY!
1. Introduction
---------------
FlightGear uses XML for much of its configuration. This document
provides a minimal introduction to XML syntax, concentrating only on
the parts necessary for writing and understanding FlightGear
configuration files. For a full description, read the XML
Recommendation at
http://www.w3.org/TR/
This document describes general XML syntax. Most of the XML
configuration files in FlightGear use a special format called
"Property Lists" -- a separate document will describe the specific
features of the property-list format.
2. Elements and Attributes
--------------------------
An XML document is a tree structure with a single root, much like a
file system or a recursive, nested list structure (for LISP fans).
Every node in the tree is called an _element_: the start and end of
every element is marked by a _tag_: the _start tag_ appears at the
beginning of the element, and the _end tag_ appears at the end.
Here is an example of a start tag:
<foo>
Here is an example of an end tag:
</foo>
Here is an example of an element:
<foo>Hello, world!</foo>
The element in this example contains only data element, so it is a
leaf node in the tree. Elements may also contain other elements, as
in this example:
<bar>
<foo>Hello, world!</foo>
<foo>Goodbye, world!</foo>
</bar>
This time, the 'bar' element is a branch that contains other, nested
elements, while the 'foo' elements are leaf elements that contain only
data. Here's the tree in ASCII art (make sure you're not using a
proportional font):
bar +-- foo -- "Hello, world!"
|
+-- foo -- "Goodbye, world!"
There is always one single element at the top level: it is called the
_root element_. Elements may never overlap, so something like this is
always wrong (try to draw it as a tree diagram, and you'll understand
why):
<a><b></a></b>
Every element may have variables, called _attributes_, attached to
it. The attribute consists of a simple name=value pair in the start
tag:
<foo type="greeting">Hello, world!</foo>
Attribute values must be quoted with '"' or "'" (unlike in HTML), and
no two attributes may have the same name.
There are rules governing what can be used as an element or attribute
name. The first character of a name must be an alphabetic character
or '_'; subsequent characters may be '_', '-', '.', an alphabetic
character, or a numeric character. Note especially that names may not
begin with a number.
3. Data
-------
Some characters in XML documents have special meanings, and must
always be escaped when used literally:
< <
& &
Other characters have special meanings only in certain contexts, but
it still doesn't hurt to escape them:
> >
' '
" "
Here is how you would escape "x < 3 && y > 6" in XML data:
x < 3 && y > 6
Most control characters are forbidden in XML documents: only tab,
newline, and carriage return are allowed (that means no ^L, for
example). Any other character can be included in an XML document as a
character reference, by using its Unicode value; for example, the
following represents the French word "cafe" with an accent on the
final 'e':
café
By default, 8-bit XML documents use UTF-8, **NOT** ISO 8859-1 (Latin
1), so it's safest always to use character references for characters
above position 127 (i.e. for non-ASCII).
Whitespace always counts in XML documents, though some specific
applications (like property lists) have rules for ignoring it in some
contexts.
4. Comments
-----------
You can add a comment anywhere in an XML document except inside a tag
or declaration using the following syntax:
<!-- comment -->
The comment text must not contain "--", so be careful about using
dashes.
5. XML Declaration
------------------
Every XML document may begin with an XML declaration, starting with
"<?xml" and ending with "?>". Here is an example:
<?xml version="1.0" encoding="UTF-8"?>
The XML declaration must always give the XML version, and it may also
specify the encoding (and other information, not discussed here).
UTF-8 is the default encoding for 8-bit documents; you could also try
<?xml version="1.0" encoding="ISO-8859-1"?>
to get ISO Latin 1, but some XML parsers might not support that
(FlightGear's does, for what it's worth).
6. Other Stuff
--------------
There are other kinds of things allowed in XML documents. You don't
need to use them for FlightGear, but in case anyone leaves one lying
around, it would be useful to be able to recognize it.
XML documents may contain different kinds of declarations starting
with "<!" and ending with ">":
<!DOCTYPE html SYSTEM "html.dtd">
<!ELEMENT foo (#PCDATA)>
<!ENTITY myname "John Smith">
and so on. They may also contain processing instructions, which look
a bit like the XML declaration:
<?foo processing instruction?>
Finally, they may contain references to _entities_, like the ones used
for escaping special characters, but with different names (we're
trying to avoid these in FlightGear):
&chapter1;
&myname;
Enjoy.