So I decided to write a c++ parser, lots of string manipulation, plenty of substr's and find_first_of \n \t \f et al.
I was up to about 500 lines of code, with quite a few functions such as trim and various splitting functions, tailored for certain situations such as if you are parsing pre-processor directives you can have whitespace after the # to the directives keyword, such as '#323232include', all this has to be taken care of, so many little situations like this that the code just bloats up and this is just for comments and macro and only just checking that a PP directive is found and is valid never mind actually checking if the statement itself is valid.
Then I decide to load up all the gcc source and look at the lexical and parser sections and apart from the horrendous notation in gcc, you can see right away that it is in a different ballpark, the code and comments show this, making my code look like script kiddies JavaScript.
I also load up the NASM source, which is nicely written code, but, and even though it is a full compiler/assembler and the parser parts are only a fraction overall, it's obvious that I am a long way away from the understanding and coding efficiency that these guys have. I also see that Boost has a lexical analyzer and parser, part of their Spirit library, which again, after reading through a bit of it seems so far from what I am doing, they have a complete academic understanding and design of parsing, tokenizing and validation of source code.
Quite demoralizing I must say, I would make a good scripter I think, but I do wonder if I am cut out for this, and if I will ever progress to the levels of this compiler source I am talking about. Is it mainly practise and hard work or are some people just born to code at these high levels?