Advertisement

How can I make a text parser?

Started by September 30, 2015 03:05 PM
17 comments, last by louie999 9 years, 2 months ago

Hi, so I got a question.

How can I make a text parser? like:

1. Open a file.

2. Read the contents of file.

3. Consider something like this in the text file:


NewObject MyObject

HP   = 100
DMG  = 5
Weapon = SomeWeapon

End

How can I make it find keywords such as "NewObject" then save it's value(which is "MyObject") to an std::map or something?

4. How can I make it identify wrong things in the text file? like, if there is a "NewObject" keyword then there should be and "End" keyword, if not then it will produce an error.

Is there any good tutorials, c++ functions to allow me make this? Any help is appreciated. Thanks in advance...

Something like this:


foo=1,2,3,4
bar=0

And you parse it like this:


int main()
{
    std::ifstream file( "sample.txt" );

    std::string line;
    while( std::getline( file, line ) )   
    {
        std::istringstream iss( line );

        std::string result;
        if( std::getline( iss, result , '=') )
        {
            if( result == "foo" )
            {
                std::string token;
                while( std::getline( iss, token, ',' ) )
                {
                    std::cout << token << std::endl;
                }
            }
            if( result == "bar" )
            {
               //...
    }

}

I think, therefore I am. I think? - "George Carlin"
My Website: Indie Game Programming

My Twitter: https://twitter.com/indieprogram

My Book: http://amzn.com/1305076532

Advertisement

For smaller files, I like to load the entire thing, split it into statements, and then split each statement into tokens like this:


std::vector<std::string> lines = SplitString(LoadFileAsString("blah.txt"), '\n', IgnoreEmptyLines);

std::unordered_map<std::string, std::string> configMap;
for(const std::string &line : lines)
{
     //Ignore comments.
     if(line.front() == '#') continue;
     
     //Split line into tokens.
     std::vector<std::string> tokens = SplitString(line, '=');
     
     if(tokens.size() == 2)
     {
          configMap[tokens[0]] = tokens[1];
     }
     else
     {
          //Parse error or different line format...
     }
}

Which can then be wrapped into its own function:


std::unordered_map<std::string, std::string> configMap = LoadFileAsStringMap("config.txt", '\n', '=');

That's just for basic key-value pairs. If I wanted more complicated structures, I'd take the same basic idea (breaking up the logic into smaller re-usable functions), but instead I treat the entire file as one string, breaking up by whitespace, and I leave the equals and even the newlines in as tokens, and treat the file as one continuous string of tokens.

Then you walk through the entire set of tokens, one by one, but keep track of the current state ("I'm inside of an object called 'MyObject', I'm nested four deep, this variable is named 'HP', etc..."), which lets you detect "wrong" things. For example, if you reach the end of the list of tokens but 'state' is still in a New Object, then you've clearly forgotten an 'End', and need to report it as a syntax error. Likewise if you're inside a New Object, and encounter another New Object, then you either report a syntax error (or interpert it to mean to nest one object inside the other, if your format requires that behavior).

Thanks guys for the quick answers smile.png I think I'm beggining to understand now. Though, are there also pattern matching in C++? like in Lua I would do something like:
string.match(str, "(.*) = (%d+),(%d+),(%d+)")
to match something like,
something = 1,2,3

Using templates you could create something like that, but it isn't an easy beginner task to write.

I don't believe there is something like that already in the standard library, but I'm not familiar with all the new C++11 and C++14 library extensions so I could be wrong. C++11 did add a regex library (#include <regex>).

If this is a part of something bigger, don't reinvent the wheel. Go get a JSON or YAML parser.

Why?

1. Toolchains: jq lets you pretty-print and query JSON files. Editors have modes to help you work on the files. There are syntax verifiers and so on.

2. Not finding all the bugs and edge-cases again. Someone's already worked out what to do what you try and put a JSON text into a JSON value..

Advertisement

C++ has regex now, but regex alone isn't enough to implement a parser for a programming language -- the basic shortcoming is that regex can't count things for itself, and counting tokens in one form or another is necessary (e.g. keeping track of matching brackets is a form of counting) in any programming language you'd want to use. Regex is sufficient for a more declarative sytnax though -- say, key-value pairs for an initialization file.

Regex is a reasonable tool for tokenizing symbols though -- you just need to program the parsing/semantic analysis around that.

throw table_exception("(? ???)? ? ???");


class MyObj
{
string myStr;
int myInt;
float myFloat;

void WriteToStream(ofstream & file)
    {
    file << myStr << ' ' << myInt << ' ' << myFloat << std::endl;
    }

void ReadFromStream(ifstream & file)
    {
    file >> myStr >> myInt >> myFloat;
    }
};

I think the simplest way to do this is as above. The first method uses a stream to output the classes fields in order, with a ' ' delimiter.

The second simply reads the same data back in and the stream will automatically know about the space for the delimiter since that's how streams work.

You can write any number of these to the same file and read them in until eof.

Otherwise if you really need something more complex, then you probably need to be using xml or json.

If this post or signature was helpful and/or constructive please give rep.

// C++ Video tutorials

http://www.youtube.com/watch?v=Wo60USYV9Ik

// Easy to learn 2D Game Library c++

SFML2.2 Download http://www.sfml-dev.org/download.php

SFML2.2 Tutorials http://www.sfml-dev.org/tutorials/2.2/

// Excellent 2d physics library Box2D

http://box2d.org/about/

// SFML 2 book

http://www.amazon.com/gp/product/1849696845/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=1849696845&linkCode=as2&tag=gamer2creator-20

If you want to be robust and well-designed, I recommend reading up on the following:

https://en.wikipedia.org/wiki/Recursive_descent_parser
https://en.wikipedia.org/wiki/LALR_parser

And examine the tools available to do most of the hard work for you:

https://en.wikipedia.org/wiki/Compiler-compiler
https://en.wikipedia.org/wiki/Category:Parser_generators

How can I make a text parser?


I hope you realize this is an entire branch of computer science?

Really, if you're not writing a text parser as an end in itself, choose to use a widespread common format (XML, YAML, JSON, INI) and use a library. There are many, all of which have been tested.

Otherwise, look in to writing lexical analyzers, defining grammars built in the lexemes and using semantic insertion, and creating parsers for those grammars. There's handy tools for that, like boost::spirit (if it's still around).

Stephen M. Webb
Professional Free Software Developer

This topic is closed to new replies.

Advertisement