Advertisement

Sadistic library authors (my rant about Xerces for C++)

Started by March 24, 2012 10:49 PM
26 comments, last by wack 12 years, 10 months ago

[quote name='wack' timestamp='1332676418' post='4925085']
[quote name='Eelco' timestamp='1332673834' post='4925084']
[quote name='wack' timestamp='1332672306' post='4925081']
[quote name='Eelco' timestamp='1332670504' post='4925077']
Someone recently asked me if I knew how to do string manipulation in C. I replied: why would I want to know how to clip my nails with a sledgehammer?

This sounds like an analogous situation.

There are so many languages in which things like this are completely painless. If you really must use C(pp) (and there are plenty of good reasons), use it as an extension module to a program written in a language, that will not have you pulling your hair out over such trivial things.

My rule of thumb: if you are writing int main(){} anywhere in your code, you are abusing C++ and doing it wrong.


While that sounds good in theory, my experience is that few things in C++ are difficult enough to warrant the complexities in debugging and glue code that combining multiple languages always involve. My rant is mostly about xerces making things more difficult than they need to be, in any language.
[/quote]
I dunno; not having such libraries be part of the language ecosystem seems like a typical C++ problem, that youd run in time and time again. The fact that there apparently is no sensible library available for C++ is also a testament to its shortcomings as a language.

Coming from a python and .net perspective, the glue code has never been much of a barrier, but I suppose those are the exceptions.
[/quote]

I think neither [s]Pyhton[/s] or C# would help in this case, since as far as I know, both of them only have DOM parsers, no SAX parsers. So it is perhaps Promit is right. It's XML itself that is the problem. Or would you say it's a testament to the shortcomings of C# and [s]Python[/s] that there is no SAX parser?

Edit: it seems python has a SAX parser after all, but calling it from C++ would frankly involve a lot more code than just using xerces.
[/quote]
My point is to not write your high level application in C++; I dont know of your other constraints obviously, so these are more theoretical than practical musings, but if at all possible, id write my main application in python, do my parsing from there (which is probably just a tidy wrapper around some efficient C library), and use boost::python to effortlessly integrate any C++ code that id need to write.

If the python ecosystem does not provide a specific kind of parser, yes id consider that a failure of that languages ecosystem, but of all the languages ive worked with, python has the strongest ecosystem, hands down. Unless you are trying to do something very arcane, id be very surprised if python couldnt do it. A quick google seems to indicate this functionality is in the standard library, even.
[/quote]

Well, let's just say we disagree. I have combined muliple languages many times, and maintained apps where it has been done by others. Almost always you end up regretting and cursing the debugging horror that appears. Most of the time it is better to select one language that has the features you need and stick with it, even if some of the features are better implemented in other languages.

[quote name='DoctorGlow' timestamp='1332636735' post='4925008']
Had good experience with http://www.grinninglizard.com/tinyxml/
Which can't validate, unfortunately. God help the OP if he needs XSLT.

Xerces is nice reminder of the prime time of Java Architecture astronauts. People who had absolutely no clue of actual coding, but could suddenly build software architectures.
To a large extent this is the fault of XML, itself the result of architecture astronauts who are solving fanciful hypothetical problems rather than real ones. Xerces may just be an honest expression of the total fusion of Java and XML.

It's days like these I sympathize with the guys who swear by C. Not because it's a good idea to use C everywhere, but because so much of the world went this route.
[/quote]

I actually take that back.

Xerces was designed long time ago, before C++ standardization, before new design ideas like Alexandrescu's. It's an example of evolved C with classes.

ACE suffers from a similar problem, but we can compare it to asio. With XML, being verbose as it is, nobody is probably going to bother to write a modernized version.

Most standard Java libraries suffer from same problem. They were designed and grew as Java got adopted. But during 15 years, many things changed and most APIs would be approached differently, benefiting from the insight gained.

C# and .Net are actually making use of these experiences and evolving both the language as well as APIs.

As an example - ORM are flawed by design. They map 1:1 table->Table, row->Row, database->Database... Why build abstractions that don't abstract anyway. LINQ is the meaningful step forward, it focuses on what one really wants to do with data, namely query and mangle it.

So even if xerces does not necessarily have an equivalent counterpart, they demonstrate that software design has improved.
Advertisement

Well, let's just say we disagree. I have combined muliple languages many times, and maintained apps where it has been done by others. Almost always you end up regretting and cursing the debugging horror that appears. Most of the time it is better to select one language that has the features you need and stick with it, even if some of the features are better implemented in other languages.

It depends on the languages involved I suppose. But mixing C into python is really as easy as breathing, and in the latest .NET the interop is almost completely transparent as well. But again, afaik these are indeed the exceptions; it can get pretty messy for other languages.

Indeed debugging external code can be hard; though in my experience the extensions are usually fairly isolated and small bits of code. boost::python supports automated translation of C++ exceptions into python exceptions; I havnt used that myself, but it sounds nice in theory at least.

[quote name='wack' timestamp='1332679280' post='4925095']
Well, let's just say we disagree. I have combined muliple languages many times, and maintained apps where it has been done by others. Almost always you end up regretting and cursing the debugging horror that appears. Most of the time it is better to select one language that has the features you need and stick with it, even if some of the features are better implemented in other languages.

It depends on the languages involved I suppose. But mixing C into python is really as easy as breathing, and in the latest .NET the interop is almost completely transparent as well. But again, afaik these are indeed the exceptions; it can get pretty messy for other languages.

Indeed debugging external code can be hard; though in my experience the extensions are usually fairly isolated and small bits of code. boost::python supports automated translation of C++ exceptions into python exceptions; I havnt used that myself, but it sounds nice in theory at least.
[/quote]

I do infact plan on using Python in my app, but as an extension language to write small extensions, and not the main language.

I hate to say this, and probably will be flamed by almost everyone for it, but there are good reasons why python (and similar languages) will never be popular for writing large applications. It's easy to write stuff in, but when you start getting into large scale stuff that will need to be maintained for years or even decades, the typing system of Python will cause your app to become an unstable mess. There are many reasons for this, including:

  • People will come and go to the project over the years, and quite honestly, most of them will be morons. There is no static type system that will catch their errors early. You will have to run all of the app to ensure it's stable. Over and over and over again. Anyone who has been involved in testing apps with a few million lines of code, know that just doing one test run is very time consuming, and will not even cover all scenarios.
  • Writing automatic tests are often proposed as a solution for this, but rarely works in practice, because (including, but not limited to):

    1. Most people are morons, and can't be trusted to properly evaluate how to write a sufficient test for the code.
    2. It is a gross waste of time to write tests for stuff that the compiler would easily catch if a statically typed language was used.
    3. It is extremely common that projects go on for longer than planned, becaue of unforeseen difficulties or bad estimates. When time starts running short, the things that are not "strictly necesary" are skipped. Yes, this means automated testing.

    In Python, it is too easy to make a change that breaks your app in interesting ways, without anybody noticing it until much later. Maybe only when it's too late.

    So, in summary, Python is fine for small stuff, but attempting to do anything large is doomed to fail, even if I'm sure someone can manage to find examples where people have succeeded against all odds.
Dunno, ive never much worked on huge applications maintained over long periods. If thats the aim, C++ seems like the stuff of nightmares though. Nor have I ever missed typing in python for anything other than code completion. And of course, enthought.traits gives type checking in python plus more. Stupid programmers are not going to be saved by using another language; least of all C++.

Ive never heard the developers of mercurial complain about their work being impossible. And I suppose the reason why you dont see much python in commercial applications is because it is so easy to reverse engineer.

But true, I dont really know what im talking about from experience.

Still, if you are looking for a non-quircky and strict language to serve as a long term backbone for the high level structure of your project, how is C# for instance not a far better choice than C++?
Dunno, ive never much worked on huge applications maintained over long periods. If thats the aim, C++ seems like the stuff of nightmares though. Nor have I ever missed typing in python for anything other than code completion. And of course, enthought.traits gives type checking in python plus more. Stupid programmers are not going to be saved by using another language; least of all C++. Stupid programmers are just going to grind to a halt working on a big project in C++; but true, at least that guarantees they will do no harm.

Ive never heard the developers of mercurial complain about their work being impossible. And I suppose the reason why you dont see much python in commercial applications is because it is so easy to reverse engineer.

But true, I dont really know what im talking about from experience.
Advertisement

Still, if you are looking for a non-quircky and strict language to serve as a long term backbone for the high level structure of your project, how is C# for instance not a far better choice than C++?


Basically, as I see it, there are three somewhat sane language choices for larger projects today. They are C++, Java and C#. The D language that once looked promising seems to have failed. The similarities between these languages are far greater than the differences, so a lot of it comes down to personal preference. Since the thing I'm working on is still a personal project, that did have something to do with the choice. But I did use a set of pro/con criteria that are important to me when selecting the language also:

C#

  • +The syntax and language features have improved a lot lately. The only thing I still hate is the exception handling (which, to be fair, is equally terrible in C++)
  • +Extensive standard libraries.
  • -Not actally cross platform, it would be pretty stupid to use C# if you even suspect you need to run on non-Microsoft environments now, or at any point in the future. Especially the server part of my app is intended to run on Linux, without using the sub-par Mono environment.
  • Long term future: Uncertain, as you can see, Microsoft is losing traction in lots of area that are not desktop computing. Microsoft also has a proven track record of dumping languages when it suits them. Remember VB6?

    Java

    • +Run-time environments are available for all major platforms.
    • -Language and associated libraries are starting to feel quite dated.
    • -Integration with the underlying platform is clumsy, users usually feel there is something "different".
    • Long term future: Uncertain. If there is one thing that can be counted on, it's that having anything to do with Oracle will come back and bite you in the ass somehow.

      C++

      • +Flexible language, the news in C++11 add a lot of things that have been sorely missing.
      • +Cross platform, if you want it to be.
      • +No run-time environments required.
      • -Standard libraries contain the bare essentials only, you will likely need third-party libs.
      • Long term future: Seems stable. There are multiple good implementations, and no single company controlling the language. They have a proven track record of making sure to break as few things as possible between new versions of the standard

        But generally speaking, depending on what you are trying to accomplish, any of the three is a fine choice. It is easy enough to find people who can program in them, but the people who know C++ tend to be better programmers in general than those who know only Java for instance.

        As you can see, the "famous speed" of C++ wasn't even a deciding factor in this particular project, it's just a nice bonus.
I dont think microsoft deserves a bad rap for support. And no way they will drop C#, considering its widespread use in some fields. But if crossplatform is important to you, yeah... mono seems cute, but now there is some support I dont trust. That said, if MS ever dropped the ball on C# for inexplicable reasons, the momentum behind mono would quickly swell.

Java, I wouldnt want to use. Indeed the libraries suck, the language is cluncky, and breaking out of safe code and writing some C is a major pain in the butt.

D is quite nice actually. Ive used it a lot when it was still under development, and stopped using it eventually, but the recent releases are very stable and functional, and the toolchain has improved a lot too. The only thing that still sucks is library availability, which is what drove me away in the first place...
That said, it does have xml parsing in the stdlib, and not only that, but it blows the best C++ parser out of the water, in terms of performance : http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs-with-rapidxml/

Still, I cant imagine doing a large project in C++. The build times make my compile-and-correct coding style completely impossible, and it seriously pisses me off to repeat myself in a header file, performing the kind of automatable tasks that computers were invented to perform in the first place. And then there is the ecosystem, which got us started here; there are large gaps in functionality, and what is out there often takes longer to configure, compile and reverse-engineer than it takes to roll your own solution. Ugh.

D is quite nice actually. Ive used it a lot when it was still under development, and stopped using it eventually, but the recent releases are very stable and functional, and the toolchain has improved a lot too. The only thing that still sucks is library availability, which is what drove me away in the first place...
That said, it does have xml parsing in the stdlib, and not only that, but it blows the best C++ parser out of the water, in terms of performance : http://dotnot.org/bl...-with-rapidxml/


The D language has completely failed to gain any traction whatsoever since they started. It seems well on it's way to becoming a minor footnote in programming history at this point. Which is a shame, becaue it did indeed seem promising.

The failure of D has, as far as I can see (having followed it from a safe distance) is mainly because of two reasons:

  1. No support from any of the large OS vendors, who aw greater benefit in peddling their own stuff instead.
  2. Internal bickering. Instead of making a decision and sticking with it, there are now two different standard libraries for it. Whopee.



There are many XML-parsers out there that are faster than Xerces, but they are mostly toy XML-parsers for people who often have no idea of why they are using XML in the first place. The one benchmarked in D seems to be one of them.
I'd like to remind all of you that we are talking about XML, Xerces, and libraries. Not languages. If this becomes a language thread, I will end it.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

This topic is closed to new replies.

Advertisement