The good, clever folks down at the W3C have drafted another language (also derived from SGML, HTML's big daddy). This language is known as XML, and is a very good thing. Firstly, it's another mark-up language ("eXtensible Mark-up Language", to be precise). But it doesn't have any defined 'tags' or keywords whatsoever!
How is that useful? you ask. What can we do with a language that has no words in it?! Well, the reason it's known as an 'extensible' mark-up language is that you can 'extend' it - that is, make it up. You create the words, and everything adheres to the language 'grammar.' XML is meta-data: data about data. The full language specification is at http://www.w3.org/TR/REC-xml, and while it's heavy reading, describes every aspect of the language from start to finish.
OK, so what's so great about this, then?
If your program dumps a whole load of data to a file, then what happens when you want to use that data in another program? You have to drag up format specifications, the code that created the file in the first place, and so on. The reason is that once your data's in the file, that's all it is: data. A stream of numbers with no real meaning to anyone or anything. You've effectively encrypted it - anyone who doesn't have the format specification will have no idea how to read the data. Sure, they could try and figure it out - but that's as slow and difficult as standard code-breaking.
Surely, in the days of object-orientation and massively-multiplayer online games, there must be a better way? I think XML can fill part of the gap.
[size="5"]XML 101
A single 'item' in XML is called an "element". An element consists, at a bare minimum, of a tagname (in HTML, things like P, H1, or TABLE are tagnames), which is in an opening tag and a closing tag. If my tagname is "gibbon", I could write it like this:
At it's fullest, an element can have three things: Attributes, Children, and Data.
An Attribute is a "name=value" pair (e.g. 'family="mammal"'). All the attributes go in the opening tag, after the tagname:
Finally, an element can have 'data.' Data is anything that you put between the opening and closing tag, and which isn't an element. At it's simplest, you can just have plain text in there - there's also something called CDATA, which you use when your text might contain [lessthan] and > symbols (thus confusing the parser).
There's one last rule about XML. All your XML has to be 'well-formed.' To do that, you just have to make sure that every opening tag has a matching close-tag (or is in the condensed form), and that you close things in the same order you open them. So, you can't do this:
Conveniently, Internet Explorer (up till IE6, at least), when presented with an XML file, will check it and display it as a tree (and tell you if you messed it up), so you can check your XML syntax and layout by opening it in IE. There are plenty of other syntax-checking utilities out there, of course - including, I'm sure, something to write the XML for you, while you just build up a tree of your elements.
[size="3"]An Example
Here's a little chunk of XML:
According to the code above, my fridge contains some mild cheddar cheese, a can of cola, and a Tupperware box containing a half-eaten sandwich. Could you get that just by reading it? I'll guess you did - well-written XML is very easy to understand like that. If I were to eat more of the sandwich, and put, I dunno, a piece of broccoli into the box, I could just change the code to:
You may be wondering about that first line - [lessthan]?xml version="1.0"?>. It's given in the spec as a requirement for 'proper' XML data - really, it just gives the version of the language used to make the file (as the language will, nay, has, changed - they're already up to 1.1, but the parsers are still catching up). It's not totally necessary, if your file sizes are constricted or something, but it's a good thing to use.
[size="5"]XML in games
In my opinion, XML could be a valuable technology in games. It may not seem so, but I'll give a couple of applied examples.
[size="3"]Adventure games
It's not too hard to describe adventure game worlds in XML. For example:
It's a box. It's empty.
It's a box. I think there's something in it.
I click 'Pick up' and then click on the ball, that I can see in the box. The game looks up the 'onPickup' attribute, and because it starts with 'do:' it executes the command shown on my cunning adventure-game VM. The value of the 'onLook' attribute of theBox gets set to "string:look_empty", and the game "get"s the ball - something which I predefined as a command.
If I then look again at the box, the string "look_empty" is looked up, so I get "It's a box. It's empty."
Bear in mind that all the script engine work is done by my game engine; XML doesn't do that for you, it just allows you to store the relevant information in a simple way. It's easily represented with a few classes (and I mean, two or three), and can be serialised efficiently (for saving/loading games - the entire world can be saved just by recursively writing out each element with all children). It's also excellent for testing or creating; you can edit the world state in Notepad and test the effects, rather than having to play with a hex editor or compile with custom-built tools.
[size="3"]Saved/Loaded games
Talking of saving/loading games, there's an entire application right there.
So, we've got a hairy_monster, a volume of water (which is full, as opposed to drained), containing a shark.
The only disadvantage of using XML as a format for saved games is that it might be a little too easy to read... you might not want people editing their saved games. However, in that case, all you need to do is encrypt/decrypt the file before/after you use it.
[size="3"]Asset management
And so on and so on. The above code describes a simple layout of a few files on disk and in a .PAK file (or whatever you want to use, maybe ZIP files, maybe your own equivalent). It'd be brilliant for virtual file systems; perhaps as a packing list (to ensure that the whole package is present, or by adding checksums to each item to check that it hasn't been tampered with), or information about files to load at start-up. In my game code, I could call 'LoadModel("shotgun")' and it'd be able to look up the "model" "shotgun" (located at "shotgun/shotgun.mdl" inside folder "weapons" inside folder "game/media/models", making a grand total of "game/media/models/weapons/shotgun/shotgun.mdl"). If you happen to move all your assets around, you don't really want to be changing all your code. Not to mention the fact that this method gives you the capability to change things on the fly - you could change the "loc" of the "models" folder to "games/models" if you discover that you've been configured to use the old models, for example. It also lets you look at the files you've got without accessing the hard disk; if I want to spawn a random gib, I can just pick a random child from "gibs" and load it, without having to check through the directory itself.
[size="5"]Conclusion
I hope I've set your mind off a little bit. These are only a few examples; but quite frankly, it appears to me that XML can be applied to, well, anything. You can describe structures or interfaces in it; dword name="dwWidth"> [Ed Note: no idea what is missing here] could be useful sometime. You can use items to reference other items (as I demonstrated with the adventure game example). Heck, you can do anything.
Next time, I'll look at how we actually use XML in code - I'll show you how to use a particular XML parser, expat (http://expat.sf.net/), to read the XML data in and get it into a tree structure, and then I'll show you how to write it back out again.
I wonder why I write all my articles at 2:30am... and then revise them at 3:00am...
Richard Fine (a.k.a. Superpig) [email="rfine@lycos.com"]rfine@lycos.com[/email], or catch me in the forums or on #gamedev. Happy coding.
Really interesting article. What do you think about creating an rpg Skill Tree using XML? Do you think is a good idea? Thanks!