In this article we will look at the most common motion capture format: BVH. BVH is an acronym that stands for BioVision Hierarchical data and is used for storing motion capture data. It is simple and easy to understand. We will write a simple class that can load, display and play data from the file.
Codebase now available:
https://github.com/edin-m/gamedev-bvh-loader-article
BVH format
Much about format can be found at these two links: http://www.cs.wisc.edu/graphics/Courses/cs-838-1999/Jeff/BVH.html? http://www.dcs.shef.ac.uk/intranet/research/resmes/CS0111.pdf?
Basically, it has two parts HIERARCHY and MOTION. Like the names suggest those two parts contain just that: hierarchies of skeletons and motion data. Inside the hierarchy part we have a description of skeletons. Even if the format permits having multiple skeleton definitions, rarely it will contain more then one.
Skeletons are defined by defining bones which themselves are defined with joints; meaning we define a skeleton by defining joints. But if an elbow joint is the child of a shoulder joint how do we know the length of the upper arm? By defining an offset. Lets look at an example:
HIERARCHY ROOT Hips {
OFFSET 0.00 0.00 0.00
CHANNELS 6 Xposition Yposition Zposition Zrotation Xrotation Yrotation
JOINT Chest {
OFFSET 5.00 0.00 0.00
CHANNELS 3 Zrotation Xrotation Yrotation End
Site { OFFSET 0.00 5.00 0.00 }
}
JOINT Leg {
OFFSET -5.0 0.0 0.0
CHANNELS 3 Zrotation Xrotation Yrotation End
Site { OFFSET 0.0 5.0 0.0 }
}
}
MOTION Frames: 2
Frame Time: 0.033333 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 45.00 0.00 0.00 0.00 0.00
First joint of the hierarchy is a root joint so it is defined by using the keyword ROOT. Every other joint that is a descendant is defined using the JOINT keyword followed by the joint name.
Special joints are End Site joints which are joints without any children or name. Contents of a joint are OFFSET and CHANNELS. We use an offset to know the length (or offset from) of bones between joints of a joint's parent and itself.
Most commonly, a ROOT joint will have an offset of (0, 0, 0) (note these are, of course: x, y, z components). CHANNELS line defines the number of channels following which channels that MOTION parts contain animation data for. Again, the most common use is a ROOT joint that has 6 channels (xyz position and zxy rotation) while other joints will have 3.
End Site joints don't have animation data so they do not need to have CHANNELS data. They only have an OFFSET so we know it's length. The MOTION part contains two lines (frames defining number of frames ... and frame time which is frame rate; bvh motion FPS = 1. / frame_time) followed by lines for each frame that has float data of each joint/channel(specified) combination beginning from parent to children nodes, just in same order they were specified in hierarchy part, from top to bottom.
The example is dull and has all zeroes but you get the point. When we make the loader you can change values and play with it. Interpreting MOTION and actually changing joint positions is described later on. First, we'll do the loading.
Code
We will define a few structures we'll need for storing data:
#define Xposition 0x01
#define Yposition 0x02
#define Zposition 0x04
#define Zrotation 0x10
#define Xrotation 0x20
#define Yrotation 0x40
typedef struct { float x, y, z; } OFFSET;
typedef struct JOINT JOINT;
struct JOINT
{
const char* name = NULL; // joint name
JOINT* parent = NULL; // joint parent
OFFSET offset; // offset data
unsigned int num_channels = 0; // num of channels joint has
short* channels_order = NULL; // ordered list of channels
std::vector children; // joint's children
glm::mat4 matrix; // local transofrmation matrix (premultiplied with parents'
unsigned int channel_start = 0; // index of joint's channel data in motion array
};
typedef struct { JOINT* rootJoint; int num_channels; } HIERARCHY;
typedef struct
{
unsigned int num_frames; // number of frames
unsigned int num_motion_channels = 0; // number of motion channels
float* data = NULL; // motion float data array
unsigned* joint_channel_offsets; // number of channels from beggining of hierarchy for i-th joint
} MOTION;
Most of these parameters are self-explanatory. For each joint we need a list of children, local transformation matrix and channel order at least.
Bvh class
class Bvh
{
JOINT* loadJoint(std::istream& stream, JOINT* parent = NULL);
void loadHierarchy(std::istream& stream);
void loadMotion(std::istream& stream);
public:
Bvh();
~Bvh();
// loading
void load(const std::string& filename);
/** Loads motion data from a frame into local matrices */
void moveTo(unsigned frame);
const JOINT* getRootJoint() const { return rootJoint; }
unsigned getNumFrames() const { return motionData.num_frames; }
private:
JOINT* rootJoint; MOTION motionData;
};
This is a simple class and below are functions for loading:
void Bvh::load(const std::string& filename)
{
std::fstream file;
file.open(filename.c_str(), std::ios_base::in);
if( file.is_open() )
{
std::string line;
while( file.good() )
{
file >> line;
if( trim(line) == "HIERARCHY" )
loadHierarchy(file);
break;
}
file.close();
}
}
void Bvh::loadHierarchy(std::istream& stream)
{
std::string tmp;
while( stream.good() )
{
stream >> tmp;
if( trim(tmp) == "ROOT" )
rootJoint = loadJoint(stream);
else if( trim(tmp) == "MOTION" )
loadMotion(stream);
}
}
JOINT* Bvh::loadJoint(std::istream& stream, JOINT* parent)
{
JOINT* joint = new JOINT;
joint->parent = parent; // load joint name
std::string* name = new std::string;
stream >> *name;
joint->name = name->c_str();
std::string tmp;
// setting local matrix to identity
joint->matrix = glm::mat4(1.0);
static int _channel_start = 0;
unsigned channel_order_index = 0;
while( stream.good() )
{
stream >> tmp;
tmp = trim(tmp);
// loading channels
char c = tmp.at(0);
if( c == 'X' || c == 'Y' || c == 'Z' )
{
if( tmp == "Xposition" )
{
joint->channels_order[channel_order_index++] = Xposition;
}
if( tmp == "Yposition" )
{
joint->channels_order[channel_order_index++] = Yposition;
}
if( tmp == "Zposition" )
{
joint->channels_order[channel_order_index++] = Zposition;
}
if( tmp == "Xrotation" )
{
joint->channels_order[channel_order_index++] = Xrotation;
}
if( tmp == "Yrotation" )
{
joint->channels_order[channel_order_index++] = Yrotation;
}
if( tmp == "Zrotation" )
{
joint->channels_order[channel_order_index++] = Zrotation;
}
}
if( tmp == "OFFSET" )
{
// reading an offset values
stream >> joint->offset.x >> joint->offset.y >> joint->offset.z;
} else if( tmp == "CHANNELS" )
{
// loading num of channels
stream >> joint->num_channels;
// adding to motiondata
motionData.num_motion_channels += joint->num_channels;
// increasing static counter of channel index starting motion section
joint->channel_start = _channel_start;
_channel_start += joint->num_channels;
// creating array for channel order specification
joint->channels_order = new short[joint->num_channels];
} else if( tmp == "JOINT" )
{
// loading child joint and setting this as a parent
JOINT* tmp_joint = loadJoint(stream, joint);
tmp_joint->parent = joint;
joint->children.push_back(tmp_joint);
} else if( tmp == "End" )
{
// loading End Site joint
stream >> tmp >> tmp;
// Site
{
JOINT* tmp_joint = new JOINT;
tmp_joint->parent = joint;
tmp_joint->num_channels = 0;
tmp_joint->name = "EndSite";
joint->children.push_back(tmp_joint);
stream >> tmp;
if( tmp == "OFFSET" )
stream >> tmp_joint->offset.x >> tmp_joint->offset.y >> tmp_joint->offset.z;
stream >> tmp;
} else if( tmp == "}" ) return joint;
}
}
void Bvh::loadMotion(std::istream& stream)
{
std::string tmp;
while( stream.good() )
{
stream >> tmp;
if( trim(tmp) == "Frames:" )
{
// loading frame number
stream >> motionData.num_frames;
}
else if( trim(tmp) == "Frame" )
{
// loading frame time
float frame_time;
stream >> tmp >> frame_time;
int num_frames = motionData.num_frames;
int num_channels = motionData.num_motion_channels;
// creating motion data array
motionData.data = new float[num_frames * num_channels];
// foreach frame read and store floats
for( int frame = 0; frame < num_frames; frame++ )
{
for( int channel = 0; channel < num_channels; channel++)
{
// reading float
float x;
std::stringstream ss;
stream >> tmp;
ss << tmp;
ss >> x;
// calculating index for storage
int index = frame * num_channels + channel;
motionData.data[index] = x;
}
}
}
}
}
}
The loading code should be easy to read. load() calls loadHierarchy() which calls loadRoot() for root joint and loadMotion() when the time comes. loadJoint() loads joint and all those ifs just try to take care of channel ordering. loadMotion() just loads frame number and frame time, and then iterates through all channels, reads float, calculates where to store a float and stores it. This version does not support multiple hierarchies, which can be easily added.
JOINT transformations
If we imagine a simplified human skeleton, hand would be child of an arm and itself child of a shoulder etc... We can go all the way up to the root joint which can be, for example, hips (which it actually is in most files). In order to find out the absolute position of all of a root joint's descendents we'll have to apply the parent's transformation onto them. You probably know that this can be achieved using matrices. That's why we have a joint's "local transformation matrix".
Basically, the transformation matrix is composed of rotation and translation parameters (BVH does not support bone scaling so we dont have one). This can be represented using a standard 4x4 matrix where translation parameters are present in the 4-th column. Note that OpenGL uses column-major ordering which looks just like the transponse of a row-major ordered matrix. Since OpenGL uses it GLSL uses it and also GLM which is based on GLSL which we use here. This is said because we need to know it and we'll need it later. The function that does the positioning is moveTo() and uses a static helper function defined inside the .cpp file (it cannot be used outside, and does not need to):
/** Calculates JOINT's local transformation matrix for specified frame starting index */
static void moveJoint(JOINT* joint, MOTION* motionData, int frame_starts_index)
{
// we'll need index of motion data's array with start of this specific joint
int start_index = frame_starts_index + joint->channel_start;
// translate indetity matrix to this joint's offset parameters
joint->matrix = glm::translate(glm::mat4(1.0),
glm::vec3(joint->offset.x, joint->offset.y, joint->offset.z));
// here we transform joint's local matrix with each specified channel's values
// which are read from motion data
for(int i = 0; i < joint->num_channels; i++)
{
// channel alias
const short& channel = joint->channels_order;
// extract value from motion data
float value = motionData->data[start_index + i];
if( channel & Xposition )
{
joint->matrix = glm::translate(joint->matrix, glm::vec3(value, 0, 0));
}
if( channel & Yposition )
{
joint->matrix = glm::translate(joint->matrix, glm::vec3(0, value, 0));
}
if( channel & Zposition )
{
joint->matrix = glm::translate(joint->matrix, glm::vec3(0, 0, value));
}
if( channel & Xrotation )
{
joint->matrix = glm::rotate(joint->matrix, value, glm::vec3(1, 0, 0));
}
if( channel & Yrotation )
{
joint->matrix = glm::rotate(joint->matrix, value, glm::vec3(0, 1, 0));
}
if( channel & Zrotation )
{
joint->matrix = glm::rotate(joint->matrix, value, glm::vec3(0, 0, 1));
}
}
// then we apply parent's local transfomation matrix to this joint's LTM (local tr. mtx. :)
if( joint->parent != NULL )
joint->matrix = joint->parent->matrix * joint->matrix;
// when we have calculated parent's matrix do the same to all children
for(auto& child : joint->children)
moveJoint(child, motionData, frame_starts_index);
}
void Bvh::moveTo(unsigned frame)
{
// we calculate motion data's array start index for a frame
unsigned start_index = frame * motionData.num_motion_channels;
// recursively transform skeleton
moveJoint(rootJoint, &motionData, start_index);
}
What we do (for each joint, starting from root) is take the value from the motion data and apply it in the order it was loaded / defined in the file with both the glm::translate() and glm::rotate() functions. We use static helper function moveJoint() to help us with transforming joints using recursion. What we yet need to do is display it.
Using the class and displaying a skeleton
Constructing vertices array from skeleton's joint data is not BVH class' job. We'll do that where we need it. Using recursion and std::vector() we can easily construct the vertices array:
std::vector vertices;
std::vector indices;
GLuint bvhVAO;
GLuint bvhVBO;
Bvh* bvh = NULL;
/** put translated joint vertices into array */
void bvh_to_vertices(JOINT* joint, std::vector& vertices,
std::vector& indices, GLshort parentIndex = 0)
{
// vertex from current joint is in 4-th ROW (column-major ordering)
glm::vec4 translatedVertex = joint->matrix[3];
// pushing current
vertices.push_back(translatedVertex);
// avoid putting root twice
GLshort myindex = vertices.size() - 1;
if( parentIndex != myindex )
{
indices.push_back(parentIndex);
indices.push_back(myindex);
}
// foreach child same thing
for(auto& child : joint->children)
tmpProcess(child, vertices, indices, myindex);
}
void bvh_load_upload(int frame = 1)
{
// using Bvh class
if( bvh == NULL )
{
bvh = new Bvh;
bvh->load("file.bvh");
}
bvh->moveTo(frame);
JOINT* rootJoint = (JOINT*) bvh->getRootJoint();
bvh_to_vertices(rootJoint, vertices, indices);
// here goes OpenGL stuff with gen/bind buffer and sending data
// basically you want to use GL_DYNAMIC_DRAW so you can update same VBO
}
Note there are some C++11 features. If you use GCC you should add few C++11 switches like -std=c++11 and -std=gnu++11 to make it compile. The bvh_tovertices() function helps us to reconstruct vertices using skeleton info.
Outro
So we've looked at the BVH format and how to load and display it. This is just a basic loader, which can be the stepping stone for some more advanced things like animation blending and mixing. That's it. Hope you like it.
I thought your article was really useful. I like the way that it demonstrates a core technology and gets into the nitty gritty details of implementing the loader yourself rather than just delegating the task out to a library or a commercial product.
Keep up the good work :)