Advertisement

.obj file text parsing

Started by October 25, 2017 05:55 AM
6 comments, last by matt77hias 7 years, 1 month ago

Hi,

I'm trying to load an .obj model to OpenGL and I need to parse the file. What I've done until now:
v 0.000000 3.897879 -1.378193
v -0.309017 3.856391 -1.352228

vt 0.314607 0.256292
vt 0.226357 0.365001

vn -0.156435 0.837235 -0.523990
vn -0.309017 0.806183 -0.504556

f 13/3/45 14/6/46 24/29/47
f 24/29/47 14/6/46 25/31/48

So for the v / vt / vn the code source is:


 std::vector<glm::vec3> vertex;
    std::vector<glm::vec2> texture;
    std::vector<glm::vec3> normals;
    
   
    std::string lineStr;

    
    while (inf)
    {
        getline(inf, lineStr);

        //std::cout << "LINESTR: " << lineStr << std::endl;

        if (lineStr[0] == 'v' && lineStr[1] == ' ')
        {
            std::stringstream os;
            std::string unsused;
            double posx, posy, posz;
            os << lineStr;
            os >> unsused >> posx >> posy >> posz;
            std::cout << " Pos X: " << posx << " Pos Y: " << posy << " Pos Z: " << posz << std::endl;

            vertex.push_back(glm::vec3(posx, posy, posz));
        }
       

// The same source code is for vt and vn
 }

It seems that it works. But I have some problems parsing the faces. I don't know exactly how to do it. Well I could use a lot of "if", but I'm not sure if this would be a good idea.
Any advices ? :P

Thank you!

I used a set of switch statements in my obj parser like this


//based on graphics API (GL/VK) defined GHandle
GHandle Asset::ObjRead(IDataReader& stream)
{
   ...
   while(!stream.Eof())
   {
       String::Skip(stream, " \t\r\n");
       switch(stream.Peek())
       {
           case '#': //skip comment  
           case 'v':
           {
               stream.Get();
               switch(stream.Peek())
               {
                   case 't': //process texcoord
                   case 'n': //process normal
                   default: //process vertex
               }
           }
           break;
           case 'f': //process face group
           case 'g': //process group
           case 'o': //process object
           case 'u':
           {
               ...
               if(bufferLength == 6 && memcmp(buffer, "usemtl", 6) == 0) //process material use
           }
           break;
           case 'm':
           {
               ...
               if(bufferLength == 6 && memcmp(buffer, "mtllib", 6) == 0) //process material include
           }
           break;
       }
   }
   ...
     
   return vboHandle;
}

My engine uses exessive streaming rather than pure text-array support so it is setup be stream optimized and do as low string reading as possible except it is needed (for those material tags and group/object names for example) and a strict string comparsion that skip early outs as soon as possible

Advertisement

I would personally read 'word' by 'word', letting spaces and newlines be the separators.

For the faces, I would simply read an integer, a char, an integer, a char and an integer again, so with something like this:

 


	int vert1,vert2,vert3; // face indices
	char dump; // for the slash
	ifs >> vert1 >> dump >> vert2 >> dump >> vert3;
	

I already did something. I didn't to use the char for the slash. It would have been more easy.:P. Instead I eliminated the slash from the line. 
Here is my code. :P

 


if (lineStr[0] == 'f')
		{
			std::cout << "LINE STR FACE: " << lineStr << std::endl;

			for (int i = 0; i < lineStr.length(); i++)
			{
				if (lineStr == '/')
				{
					lineStr = ' ';
				}
			}

			std::stringstream os;
			std::string unused;

			os << lineStr;

			unsigned int v1, t1, n1, v2, t2, n2, v3, t3, n3;

			os >> unused >> v1 >> t1 >> n1 >> v2 >> t2 >> n2 >> v3 >> t3 >> n3;
              
              //...
              }

 

That certainly works. However, it's not the most efficient way to solve this since you essentialy parse the string twice. You could use a combination of peek(), ignore() and operator>> to do this better. You also have to consider the different combinations of indices. There are 4 possible ways a face vertex can be given:

  1. f v v v ...
  2. f v/vt v/vt v/vt ...
  3. f v//vn v//vn v//vn ...
  4. f v/vt/vn v/vt/vn v/vt/vn ...

You can assume that a vertex position is always given and then check with peek() if the next character is a slash. This way you can distinguish between the different cases. If it's a slash just use ignore() to skip it. Also, if you want to support even more possibilities then you have to consider negative indices. For more in depth information about the format you can read the relevant parts on http://paulbourke.net/dataformats/obj/ and http://paulbourke.net/dataformats/mtl/

Here, is what I did for parsing a single vertex:   
   


 template < typename VertexT >
 const XMUINT3 OBJReader< VertexT >::ReadOBJVertexIndices() {
     
     const char *token = ReadChars();

     U32 vertex_index = 0;
     U32 texture_index = 0;
     U32 normal_index = 0;

     if (str_contains(token, "//")) {
       // v1//vn1
       const char *index_end = strchr(token, '/');
       if (StringToU32(token, index_end, vertex_index) == TokenResult::Invalid) {
         throw FormattedException(
           "%ls: line %u: invalid vertex index value found in %s.", 
           GetFilename().c_str(), GetCurrentLineNumber(), token);
       }
       if (StringToU32(index_end + 2, normal_index) == TokenResult::Invalid) {
         throw FormattedException(
           "%ls: line %u: invalid normal index value found in %s.", 
           GetFilename().c_str(), GetCurrentLineNumber(), token);
       }
     }
     else if (str_contains(token, '/')) {
       // v1/vt1 or v1/vt1/vn1
       const char *index_end = strchr(token, '/');
       if (StringToU32(token, index_end, vertex_index) == TokenResult::Invalid) {
         throw FormattedException(
           "%ls: line %u: invalid vertex index value found in %s.", 
           GetFilename().c_str(), GetCurrentLineNumber(), token);
       }

       if (str_contains(index_end + 1, '/')) {
         const char *texture_end = strchr(index_end + 1, '/');
         if (StringToU32(index_end + 1, texture_end, texture_index) == TokenResult::Invalid) {
           throw FormattedException(
             "%ls: line %u: invalid texture index value found in %s.", 
             GetFilename().c_str(), GetCurrentLineNumber(), token);
         }
         if (StringToU32(texture_end + 1, normal_index) == TokenResult::Invalid) {
           throw FormattedException(
             "%ls: line %u: invalid normal index value found in %s.", 
             GetFilename().c_str(), GetCurrentLineNumber(), token);
         }
       }
       else if (StringToU32(index_end + 1, texture_index) == TokenResult::Invalid) {
         throw FormattedException(
           "%ls: line %u: invalid texture index value found in %s.", 
           GetFilename().c_str(), GetCurrentLineNumber(), token);
       }
     }
     else if (StringToU32(token, vertex_index) == TokenResult::Invalid) {
       throw FormattedException(
         "%ls: line %u: invalid vertex index value found in %s.", 
         GetFilename().c_str(), GetCurrentLineNumber(), token);
     }

     return XMUINT3(vertex_index, texture_index, normal_index);
   }

Full OBJ/MTL has some difficulties for quick scanning due to optional tokens (especially MTL). What I basically did for all my ANSI file formats is to provide a method for reading and one for checking beyond the read head without advancing the read head, for all basic types. That way lexing and parsing is quite easy and result in small code blocks. Unfortunately, OBJ face definitions needed to be different for some reason, so only for that method I have such a giant code blob.

🧙

Advertisement
2 hours ago, _Silence_ said:

I would personally read 'word' by 'word', letting spaces and newlines be the separators.

In most of these ANSI file formats you can use " \t\n\r" as your string of delimiter characters. (Note the tab character which you didn't mention but is quite common.) Load the file in memory, parse line by line (e.g. C's fgets) and tokenize the line (e.g. C's strtok_s).

🧙

This topic is closed to new replies.

Advertisement