Advertisement

Simple parsing.

Started by February 20, 2015 02:09 PM
3 comments, last by Kain5056 9 years, 10 months ago

Sorry for the kind of useless topic, but I'd like someone more experienced to tell me if I go the right way about it.

I'm trying to parse very simple commands from an external text file using a vector of char[]'s and std::ifstream infile, but it seems a little clunky to me, so I have doubts about it.

In this test program, I'm trying to get the function to recognize and react to new lines and a smiley face.

This is the text file:

data.txt


Hello, I am a text file! SMILEY And this is a test!
SMILEY

SMILEY A very happy test! SMILEY

...And this is the program I test the method with:

main.cpp


#include <fstream>
#include <iostream>
#include <vector>
#include <cstring>


struct character { char ch[10]; };

int main(int argc , char * argv[] )
{
    std::vector<character> word_list;
    character ch_main;

    std::ifstream infile( "data.txt" );
    while( infile >> ch_main.ch )
    {
        word_list.push_back( ch_main );
        if( infile.peek() == '\n' )
        {
            character ch_temp;
            strcpy( ch_temp.ch , "LINE" );
            word_list.push_back( ch_temp );
        }
    }

    for( int i = 0 ; i < word_list.size() ; i++ )
    {
        if( !strcmp( word_list[i].ch , "LINE" ) ) std::cout << std::endl;
        else if( !strcmp( word_list[i].ch , "SMILEY" ) ) std::cout << ":-) ";
        else
        {
            std::cout << word_list[i].ch;
            std::cout << " ";
        }
    }


    std::cin.ignore();
    return 0;
}

My intention is to find a simple method to read external files of component lists for a component based entity system. I have in mind something like this:

enemy behavior.txt


walk : looking_direction , speed 10 , seconds 2
shoot : bullet.txt , looking_direction , times 3
turn_around

Bear in mind I have zero experience with component systems, so I might go completely wrong about the whole thing.

Thank you in advance. :-)

if your original intention is to dive into component systems and you're not wanting to learn parser development, but it's rather just a tool you have to create, I'd suggest you to save time and use something existing. for example: http://www.grinninglizard.com/tinyxml/

your parsing might otherwise become more and more complex. you might want to not only read but also write it, in a way to keep hand written and program written stuff compatible. you might want more and more flexible formating to keep your scripts maintainable. you might want to have tools that help you editing files (e.g. blend big branches out of the script file). you might want to have recursive parsing (e.g. an enemy might have a weapon and you might want to just instantiate that directly passing the sub tree of the parsed file.......

it might really pay off to switch to something existing.
Advertisement

You can create a class that manages the input file stream parsing instead of hardcoding them using functions.

The class has the stream object and functions to consume the stream such float ReadFloat(), unsigned int ReadUnsignedInt(), ReadByte(), ReadFloatArra(...), etc.

Example.

I second the suggestion to use an existing system for this - be that XML or JSON or something similar where you can get a library to do the heavy lifting.

If you're set on doing it yourself (and hey, that's fun too smile.png ) then here are some suggestions to start with:

First, don't use char arrays, use std::string or even a vector of chars. It won't have the buffer overrun problems your example program has (what if I type in a word longer then 9 characters? You'll probably crash - if you're lucky)

Second, just read in your text file in its entirety. You'll have much more flexibility for look-ahead (and look-behind) if you're dealing with a vector of characters or a string. Unless you are expecting to type up several megs worth of text in these files, you shouldn't have memory issues with loading it all at once.

Third, familiarize yourself with "tokenizing". Processing strings is slow, so you don't want to do it more then you have to. You can simplify this by turning your text file into a token stream. Psuedo-code example:


enum class TokenType
{
  Word,
  Smiley,
  EndOfInput
}

struct Token
{
  TokenType type;
  std::string text;

  Token(): type(TokenType::EndOfInput), text("") {}
}

enum class TokenizerState
{
  SkipWhitespace,
  ReadToken
}

Token GetNextToken(const std::vector<char>& aInput, size_t& aCurPosition)
{
  Token retVal;
  TokenizerState state = TokenizerState::SkipWhitespace;
  while (aCurPosition < aInput.size())
  {
    char curChar = aInput[aCurPosition];
    // if we are in the SkipWhitespace state and curChar is a space, increment position and continue
    // if we are in the SkipWhitespace state and curChar is NOT a space, switch to ReadToken state, do not increment, and continue
    // if we are in the ReadToken state and curChar is NOT a space, add the char to token.text, increment, and continue
    // if we are in the ReadToken state and curChar is a space, do not increment, and exit while loop
  }

  // now token.text has the token text we need to parse
  // if it is empty, simply return as we've hit the end of input
  // if it equals "SMILY" set the token.type to Smily, and return
  // otherwise set the token.type to Word, and return
  return retVal;
}

std::vector<Token> ReadInput(const std::vector<char>& aInput)
{
  std::vector<Token> tokenList;
  size_t position = 0;
  Token nextToken = GetNextToken(aInput, position);
  while (nextToken.type != TokenType::EndOfInput)
  {
    tokenList.Add(nextToken);
    nextToken = GetNextToken(aInput, position);
  }
  return nextToken;
}
(should be trivial to expand the example to separate out newlines if you want - otherwise it will treat newlines as a "word", which is kind of what your smiley example was doing anyway)

Now you'll have a vector of tokens, and you can use integer checks to see what kind of input they are and do what you want with them as a result. You'll only need to look at their text if you want to.

To do something more complex then your smiley example, you'll want to look into "parsing" and "abstract syntax trees" which will group your token list into trees that can be traversed to do what you like.

If you're really wanting to get into full blown parsing and compilation, then you'll want to look into tools like ANTLR.

Thank you very much SmkViper, this is exactly what I wanted. :-) As Krypt0n also suggested, I will eventually look into existing parsers, but for the time being I'd like to play a bit with this subject myself, at least for the small test programs.

Thanks again. :-)

This topic is closed to new replies.

Advertisement