Advertisement

Sadistic library authors (my rant about Xerces for C++)

Started by March 24, 2012 10:49 PM
26 comments, last by wack 12 years, 10 months ago
I am going to rant about my experiences with the xerces XML library for C++. I would also like to hear your own personal rants about your most hated software libraries, so feel free to post them here. Well here goes...

First of all, I should mention that I'm not new to programming. I have seen a lot of hairy stuff. I have been using C++ for almost 15 years now. I am experienced in lots of related semi-obscure technologies such as COM, DCOM, CORBA, ODBC...

... But as over-engineeered as all of the above technologies are, I have found that they have absolutely nothing on xerces.

It all started with a hobby project, where I needed to read and write XML files. I started looking around for something that would fit my requirements of:

  • Fast loading of files from disk
  • Validation against a schema during load
  • Writing of XML files to disk
  • Cross-platform

    The only one that seemed to fit the bill was xerces. So, I download a copy, and the first thing I notice, is that the compiled, optimized xerces DLL is a full 2.41 MB. Ok, it's not gigantic, but it seemed pretty hefty for something that is basically just manipulating text files.

    Soon enough, I started seeing why. Reading of files with the SAX parser turned out to be multiple inheritance galore. And the strings...they have rolled their own string class too. After pulling hairs for a long time, I managed to finally get it to read my XML files, but not thanks to their documentation, which is utter gibberish and assumes you have a PhD in xerces already.

    Just before I started writing this [s]post[/s] rant, i spent a few hours trying figure out how to to get xerces to write an XML file. I finally gave up, when I realized it would involve all the junk seen here: http://stackoverflow...nd-c-on-windows
    And that's just for saving the file, the DOM tree that contains the actual data needs to be generated separately.

    Realizing it would gain me nothing to use xerces to write the file also (it doesn't perform validation, etc. when saving) I just gave up and started writing an implementation that writes it directly to a C++ stream object instead.

    My final words about this topic is: Screw you xerces, I hate you. And I hate your documentation even more.
Xerces was written by Java enterprise architects and later also provided with same API in C++. it doesn't take into consideration any common C++ design idioms since it has to be somewhat identical between the two.

It's also a standard-conforming XML parser, meaning all that cruft needs to be there.

But as over-engineeered as all of the above technologies are, I have found that they have absolutely nothing on xerces.[/quote]

CORBA wins that competition hands down.

Apache projects are best avoided unless you work with Java. WIth possible exception of the server.

Just before I started writing this [s]post[/s] rant, i spent a few hours trying figure out how to to get xerces to write an XML file. I finally gave up, when I realized it would involve all the junk seen here: http://stackoverflow...nd-c-on-windows[/quote]

Is it wrong that I glanced over that and thought: "what's wrong with that"?

It's a disaster, but there's a perverted reason why that design makes sense. it's just somewhat less verbose in Java. It also shows that it is a product of architects.

Xerces is nice reminder of the prime time of Java Architecture astronauts. People who had absolutely no clue of actual coding, but could suddenly build software architectures. While there are nuggets of good practices in there (such as external memory allocation and passing in factories to create types), the final result is a mess.
Advertisement
Had good experience with http://www.grinninglizard.com/tinyxml/

Had good experience with http://www.grinninglizard.com/tinyxml/
Which can't validate, unfortunately. God help the OP if he needs XSLT.

Xerces is nice reminder of the prime time of Java Architecture astronauts. People who had absolutely no clue of actual coding, but could suddenly build software architectures.
To a large extent this is the fault of XML, itself the result of architecture astronauts who are solving fanciful hypothetical problems rather than real ones. Xerces may just be an honest expression of the total fusion of Java and XML.

It's days like these I sympathize with the guys who swear by C. Not because it's a good idea to use C everywhere, but because so much of the world went this route.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
There have also been a couple of wrapper libraries for Xerces that greatly simplify common operations. XMLDOM seems to be the only one that's still around though (or at least it's the only one that I can find with a quick look). I don't know if it meets all the OPs needs, but its worth a look.
Someone recently asked me if I knew how to do string manipulation in C. I replied: why would I want to know how to clip my nails with a sledgehammer?

This sounds like an analogous situation.

There are so many languages in which things like this are completely painless. If you really must use C(pp) (and there are plenty of good reasons), use it as an extension module to a program written in a language, that will not have you pulling your hair out over such trivial things.

My rule of thumb: if you are writing int main(){} anywhere in your code, you are abusing C++ and doing it wrong.
Advertisement

Is it wrong that I glanced over that and thought: "what's wrong with that"?


smile.png




Someone recently asked me if I knew how to do string manipulation in C. I replied: why would I want to know how to clip my nails with a sledgehammer?

This sounds like an analogous situation.

There are so many languages in which things like this are completely painless. If you really must use C(pp) (and there are plenty of good reasons), use it as an extension module to a program written in a language, that will not have you pulling your hair out over such trivial things.

My rule of thumb: if you are writing int main(){} anywhere in your code, you are abusing C++ and doing it wrong.


While that sounds good in theory, my experience is that few things in C++ are difficult enough to warrant the complexities in debugging and glue code that combining multiple languages always involve. My rant is mostly about xerces making things more difficult than they need to be, in any language.



There have also been a couple of wrapper libraries for Xerces that greatly simplify common operations. XMLDOM seems to be the only one that's still around though (or at least it's the only one that I can find with a quick look). I don't know if it meets all the OPs needs, but its worth a look.


Interesting. Though it looks like it's been a while since that one was updated also. From a quick look, it seems to use the DOM model only. One strong reason for chosing xerces in the first place is that it also supports the SAX method, which doesn't need to create the whole XML tree in memory, and is a lot faster. I am mostly done with the XML handling of my app now anyways and just wanted to vent a little, but I am pissed off enough to perhaps learn xerces well and write my own wrapper around it and release on the unsuspecting public. If I do, I am going to call it Leonidas.


[quote name='Eelco' timestamp='1332670504' post='4925077']
Someone recently asked me if I knew how to do string manipulation in C. I replied: why would I want to know how to clip my nails with a sledgehammer?

This sounds like an analogous situation.

There are so many languages in which things like this are completely painless. If you really must use C(pp) (and there are plenty of good reasons), use it as an extension module to a program written in a language, that will not have you pulling your hair out over such trivial things.

My rule of thumb: if you are writing int main(){} anywhere in your code, you are abusing C++ and doing it wrong.


While that sounds good in theory, my experience is that few things in C++ are difficult enough to warrant the complexities in debugging and glue code that combining multiple languages always involve. My rant is mostly about xerces making things more difficult than they need to be, in any language.
[/quote]
I dunno; not having such libraries be part of the language ecosystem seems like a typical C++ problem, that youd run in time and time again. The fact that there apparently is no sensible library available for C++ is also a testament to its shortcomings as a language.

Coming from a python and .net perspective, the glue code has never been much of a barrier, but I suppose those are the exceptions.

[quote name='wack' timestamp='1332672306' post='4925081']
[quote name='Eelco' timestamp='1332670504' post='4925077']
Someone recently asked me if I knew how to do string manipulation in C. I replied: why would I want to know how to clip my nails with a sledgehammer?

This sounds like an analogous situation.

There are so many languages in which things like this are completely painless. If you really must use C(pp) (and there are plenty of good reasons), use it as an extension module to a program written in a language, that will not have you pulling your hair out over such trivial things.

My rule of thumb: if you are writing int main(){} anywhere in your code, you are abusing C++ and doing it wrong.


While that sounds good in theory, my experience is that few things in C++ are difficult enough to warrant the complexities in debugging and glue code that combining multiple languages always involve. My rant is mostly about xerces making things more difficult than they need to be, in any language.
[/quote]
I dunno; not having such libraries be part of the language ecosystem seems like a typical C++ problem, that youd run in time and time again. The fact that there apparently is no sensible library available for C++ is also a testament to its shortcomings as a language.

Coming from a python and .net perspective, the glue code has never been much of a barrier, but I suppose those are the exceptions.
[/quote]

I think neither [s]Pyhton[/s] or C# would help in this case, since as far as I know, both of them only have DOM parsers, no SAX parsers. So it is perhaps Promit is right. It's XML itself that is the problem. Or would you say it's a testament to the shortcomings of C# and [s]Python[/s] that there is no SAX parser?

Edit: it seems python has a SAX parser after all, but calling it from C++ would frankly involve a lot more code than just using xerces.

[quote name='Eelco' timestamp='1332673834' post='4925084']
[quote name='wack' timestamp='1332672306' post='4925081']
[quote name='Eelco' timestamp='1332670504' post='4925077']
Someone recently asked me if I knew how to do string manipulation in C. I replied: why would I want to know how to clip my nails with a sledgehammer?

This sounds like an analogous situation.

There are so many languages in which things like this are completely painless. If you really must use C(pp) (and there are plenty of good reasons), use it as an extension module to a program written in a language, that will not have you pulling your hair out over such trivial things.

My rule of thumb: if you are writing int main(){} anywhere in your code, you are abusing C++ and doing it wrong.


While that sounds good in theory, my experience is that few things in C++ are difficult enough to warrant the complexities in debugging and glue code that combining multiple languages always involve. My rant is mostly about xerces making things more difficult than they need to be, in any language.
[/quote]
I dunno; not having such libraries be part of the language ecosystem seems like a typical C++ problem, that youd run in time and time again. The fact that there apparently is no sensible library available for C++ is also a testament to its shortcomings as a language.

Coming from a python and .net perspective, the glue code has never been much of a barrier, but I suppose those are the exceptions.
[/quote]

I think neither [s]Pyhton[/s] or C# would help in this case, since as far as I know, both of them only have DOM parsers, no SAX parsers. So it is perhaps Promit is right. It's XML itself that is the problem. Or would you say it's a testament to the shortcomings of C# and [s]Python[/s] that there is no SAX parser?

Edit: it seems python has a SAX parser after all, but calling it from C++ would frankly involve a lot more code than just using xerces.
[/quote]
My point is to not write your high level application in C++; I dont know of your other constraints obviously, so these are more theoretical than practical musings, but if at all possible, id write my main application in python, do my parsing from there (which is probably just a tidy wrapper around some efficient C library), and use boost::python to effortlessly integrate any C++ code that id need to write.

If the python ecosystem does not provide a specific kind of parser, yes id consider that a failure of that languages ecosystem, but of all the languages ive worked with, python has the strongest ecosystem, hands down. Unless you are trying to do something very arcane, id be very surprised if python couldnt do it. A quick google seems to indicate this functionality is in the standard library, even.

This topic is closed to new replies.

Advertisement