Advertisement

How to all the coordinate systems relate to each other?

Started by February 23, 2016 04:53 AM
8 comments, last by Nypyren 8 years, 10 months ago

I really want clarification, on the coordinate systems that are available with OpenGL.

This is what I know:

Model Coordinates are coordinates that are local to a object. Were the origin is usually at the center of the object. Common in software packages like Blender and other modeling tools.

World Coordinates are coordinates, that hold all the objects in the space. Were the origin is usually at a specific area in the world.

View coordinates is were the origin is usually on the viewer or camera.

Clip space is were coordinates are clipped by dividing the w component with the rest of the components x, y, z.

Screen coordinates are coordinates that have the origin at the lower left corner of the screen. The X axis increases to the right, Y axis increases up.

My questions are:

1) How does one go about getting the model coordinates of an object using OpenGL? Say for example one is constructing a Cube with the following vertices:


    const GLfloat vertices[] =

    {      

      

       -0.5f,  0.5f, 0.0f,  // Top Left

       -0.5f, -0.5f, 0.0f,  // Bottom Left

        0.5f, -0.5f, 0.0f,  // Bottom Right

        0.5f,  0.5f, 0.0f,  // Top Right

    

      -0.5f,  0.5f, -1.0f,  // Top Left     (back)

      -0.5f, -0.5f, -1.0f,  // Bottom Left  (back)

       0.5f, -0.5f, -1.0f,  // Bottom Right (back)

       0.5f,  0.5f, -1.0f   // Top Right    (back)    

    };

and the following indices:


const int indices[] =

    {

        0, 1, 3,

        1, 2, 3,

    

        4, 5, 6,

        4, 6, 7,



        4, 5, 1,

        1, 0, 4,



        3, 6, 2,

        7, 6, 3,



        7, 4, 3,

        4, 0, 3,



        1, 2, 5,

        2, 5, 6



    };

2) How does one activate the view coordinate system? Because it appears by default, when you execute the program, the viewer is essentially staring at world space.

* What are the uses of view coordinate spaces, any way?

3) How is screen coordinate system accessed? How does one go about interacting with the screen coordinate system?

4) What is the point of clip space? In most of my OpenGL programs so far, most of my data values have been between -1 and 1. And from my understanding, this is where the w component is divided with the other components, where normalized device coordinates are produced.


    const GLfloat vertices[] =

    {      

      

       -0.5f,  0.5f, 0.0f,  // Top Left

       -0.5f, -0.5f, 0.0f,  // Bottom Left

        0.5f, -0.5f, 0.0f,  // Bottom Right

        0.5f,  0.5f, 0.0f,  // Top Right

    

      -0.5f,  0.5f, -1.0f,  // Top Left     (back)

      -0.5f, -0.5f, -1.0f,  // Bottom Left  (back)

       0.5f, -0.5f, -1.0f,  // Bottom Right (back)

       0.5f,  0.5f, -1.0f   // Top Right    (back)    

    };

5) Even without the use of model, view, and projection matrix; Is the order in which coordinates system are transformed still, local -> world -> view -> clip -> screen or does execution just skip directly to world coordinate space.

This is quite a confusing topic. You mentioned 'screen space' has the origin in the bottom left but that's not quite true. OpenGL defines the origin to be the dead centre of the screen with x going left to right from -1 to +1 and y going bottom to top from -1 to +1. (I am no longer sure about this).

Spaces are always just relative things, you could use any strange space you want and as long as it ends up in a space opengl knows then you are fine. I live at a particular number in my street but that street is also a particular street to my town and the town to the country etc. I could use longitude/latitude and skip right to where I am in the world without having to have local/world coordinates first.

To transform between spaces you can use transformation matrices. You will often see 'world' matrix, 'view' matrix, 'projection' matrix etc but it's easier to understand them if instead you call them modelToWorld, WorldToView, ViewToProjection etc because that's what they are doing. They are transforming an object from one space to another. The model to world matrix is transforming an objects vertices from model space to view space. If you have a vertex in model space at (1, 2, 3) and the model itself is in world space position (1, 1, 1) then to transform the actual vertices into world space you just add on the world position (translation) to give (2, 3, 4). they key is those transformations. You don't always need to use a matrix but they can hold a lot of transformations and the maths is consistent.

View space is useful because it makes the projection easier and you can make certain assumptions because you know you will be at (0,0,0). Many lighting effects need to know the direction to a point or lights, that's easy to work out when you know for sure you are at (0,0,0). The direction to a point IS the position of that point. You also know for sure that the direction you are looking IS (0, 0, -1). You certainly don't need this but it helps.

You could quite happily define your vertices in screen space directly (thus world space is your screen space) so there is no order of transformation then. local, world, view, projection are not necessary. The order in which you transform them is important, you could even go lower than model space. You could also have many parallel spaces, as long as you can get them all in to the same space at the end (screen space) you are ok.

You can combine multiple transformations into a single transform and thus go straight from model to screen space.

I did write a bit about screen space/clip space but in searching to verify I was correct I ended up more confused than I started with so I'll let someone else who has a clue comment on it. Clip space is important in far/near plane clipping though.

Interested in Fractals? Check out my App, Fractal Scout, free on the Google Play store.

Advertisement


1) How does one go about getting the model coordinates of an object using OpenGL? Say for example one is constructing a Cube with the following vertices:

You don't get them, you define them. The numbers you have listed _are_ the coordinates of the cube in model-space.

A "space" is just a coordinate system, and you keep your vertices in whatever space is convenient for the calculations you need to do at the time.

When modelling, its convenient to keep the origin close to where you work, to keep numbers low, and you want the origin somewhere where it makes sense when you place the object in the world (maybe in the middle for a spaceship, or at its feet for a character)


2) How does one activate the view coordinate system? Because it appears by default, when you execute the program, the viewer is essentially staring at world space.
* What are the uses of view coordinate spaces, any way?

This question doesn't really make sense.

you don't activate coordinate systems. You transform vertices between spaces.

The view space is a coordinate system where the world and all its objects transformed such that the origin is where the camera is, and z-axis goes out of the scene, y is up, and x is to the right.

This is convenient when for example, you need to do calculations to find out if an item is visible from the cameras viewpoint.

After the view transform, you apply a perspective transform, which makes things further away look smaller, and this calculation is easier to do if you assume a coordinate system with origin in the camera.


3) How is screen coordinate system accessed? How does one go about interacting with the screen coordinate system?

This also doesn't really make sense. You don't interact with coordinate systems. You do calculations on vertices, which are described using a coordinate system.

You can then transform the vertex to _change_ which coordinate system it is expressed in.

You do this because the new coordinate system makes for easier calculations for whatever you want to do.

Its a bit like watching something from different angles. If you look at a cube from straight up, it looks like a square, which is an easier shape then the mess of triangles you see when you look at it from an angle.


4) What is the point of clip space?

The point of it is that it is a space where clipping is easy to do, with good precision. (floats have its best resolution between -1 and 1)

Anything over 1 means it is outside of view and doesn't have to be drawn.


5) Even without the use of model, view, and projection matrix; Is the order in which coordinates system are transformed still, local -> world -> view -> clip -> screen or does execution just skip directly to world coordinate space.

This is where linear algebra comes in. Without going into any kind of detail, the transformation between spaces is done with matrix multiplications.

A nice property of matrix multiplications is that you can multiply together the matrixes to concatenate them, and "jump" between spaces without having to transform through each step.

When giving vertices to the GPU, it depends a bit on what you want to do if you want to give them in local coordinates, or in world coordinates.

But in any case, after giving it the vertices, and the matrixes it needs, it will at least transform them directly into clip space, then clipped, then transformed to screen space and rasterized. (the details depend on the type of GPU, but in any case it does not go through all the "spaces" step by step.)

I always thought the terminology was a bit confusing in regard to this. Typically, you have something like:

Object Space -- Model Transform --> World Space -- View Transform --> Camera Space -- Projection Transform --> Clip Space (or Normalized Device Coordinates)

That's a lot of names, and some people use slightly different ones. We could get rid of half of the names if we just used the same ones for spaces and transforms (e.g. object, world, view, projection space/transform). However somehow I doubt people will agree on one convention anytime soon...

That's a lot of names, and some people use slightly different ones. We could get rid of half of the names if we just used the same ones for spaces and transforms (e.g. object, world, view, projection space/transform). However somehow I doubt people will agree on one convention anytime soon...

If you just name the spaces, and the tranformation matrixes from them, I find it less confusing.

Model space ---(m2w matrix)--> world space ---(w2v matrix)---> view space ---(v2p)---> projection space, etc..

This way it's also easy to generate names for any kind of matrix, "jumping" through spaces, or even going backwards.

A matrix to go from model, directly to projection space would be called m2p and a matrix to go from view space to model space would be called v2m.

If you just name the spaces, and the tranformation matrixes from them, I find it less confusing.

Model space ---(m2w matrix)--> world space ---(w2v matrix)---> view space ---(v2p)---> projection space, etc..


Yes, that is exactly the kind of convention I would love to see catching on. I think it would remove a lot of potential for confusion...
Advertisement

If you have a vertex in model space at (1, 2, 3) and the model itself is in world space position (1, 1, 1) then to transform the actual vertices into world space you just add on the world position (translation) to give (2, 3, 4). they key is those transformations. You don't always need to use a matrix but they can hold a lot of transformations and the maths is consistent.

This is also another question I have, how do transformations(translation, rotation, scale) transform a vertex point exactly. In my head, I always get the feeling that if you translate a vertex at (1, 2, 3) with (1, 1, 1), all you did is move it at (2, 3, 4) in model space. I do not understand how using a series of transforms can just convert a vertex to another coordinate system. It looks like all I need to do is pick an arbitrary point in a different coordinate system (say world space), and perform a transformation with the model vertex. Or is there something more to it than that.

You don't get them, you define them. The numbers you have listed _are_ the coordinates of the cube in model-space.

const GLfloat vertices[] =

    {      

       -0.5f,  0.5f, 0.0f,  // Top Left

       -0.5f, -0.5f, 0.0f,  // Bottom Left

        0.5f, -0.5f, 0.0f,  // Bottom Right

        0.5f,  0.5f, 0.0f,  // Top Right

    

      -0.5f,  0.5f, -1.0f,  // Top Left     (back)

      -0.5f, -0.5f, -1.0f,  // Bottom Left  (back)

       0.5f, -0.5f, -1.0f,  // Bottom Right (back)

       0.5f,  0.5f, -1.0f   // Top Right    (back)    

    };

I always got the impression that these were coordinates in world space. So, if I wanted to convert these model coordinates into world coordinates, I would have to perform a model2world matrix. Which means I would have to multiply a Model Matrix, which will consist of values in the vertices array, and a world matrix. My question is what values would be in the world matrix? Would the world matrix consist of the desired location in world space I would want to place my model object onto?

If you have a vertex in model space at (1, 2, 3) and the model itself is in world space position (1, 1, 1) then to transform the actual vertices into world space you just add on the world position (translation) to give (2, 3, 4). they key is those transformations. You don't always need to use a matrix but they can hold a lot of transformations and the maths is consistent.

This is also another question I have, how do transformations(translation, rotation, scale) transform a vertex point exactly. In my head, I always get the feeling that if you translate a vertex at (1, 2, 3) with (1, 1, 1), all you did is move it at (2, 3, 4) in model space. I do not understand how using a series of transforms can just convert a vertex to another coordinate system. It looks like all I need to do is pick an arbitrary point in a different coordinate system (say world space), and perform a transformation with the model vertex. Or is there something more to it than that.

You don't get them, you define them. The numbers you have listed _are_ the coordinates of the cube in model-space.


const GLfloat vertices[] =

    {      

       -0.5f,  0.5f, 0.0f,  // Top Left

       -0.5f, -0.5f, 0.0f,  // Bottom Left

        0.5f, -0.5f, 0.0f,  // Bottom Right

        0.5f,  0.5f, 0.0f,  // Top Right

    

      -0.5f,  0.5f, -1.0f,  // Top Left     (back)

      -0.5f, -0.5f, -1.0f,  // Bottom Left  (back)

       0.5f, -0.5f, -1.0f,  // Bottom Right (back)

       0.5f,  0.5f, -1.0f   // Top Right    (back)    

    };

I always got the impression that these were coordinates in world space. So, if I wanted to convert these model coordinates into world coordinates, I would have to perform a model2world matrix. Which means I would have to multiply a Model Matrix, which will consist of values in the vertices array, and a world matrix. My question is what values would be in the world matrix? Would the world matrix consist of the desired location in world space I would want to place my model object onto?

Unfortunately the why or how this works is a very big topic (a subset of linear algebra). While I can do the maths behind it I don't feel I am qualified to explain why it works. in the world of 3d graphics the types of transformations you do have a pattern to them https://en.wikipedia.org/wiki/Transformation_matrix you can use these known patterns to create a matrix that you require. Multiplying a matrix by another has a certain effect, in terms of computer graphics one matrix would be the transform and the other would be a vector representing your position. That then gives another matrix as a result which is your new vector (position).

Consider a point on the x axis at x=3., you then have a 'translation' transform that will translate that point 4 units down the positive x axis. That transform looks like this: +4. You apply it and you get 3 + 4 = 7. Your new x position in 'world space' is now 7. Transforming using matrices works exactly the same (just a lot more maths behind it) transform*modelPos = worldPos. I have had to change the order of the values since there are conditions about what can be multiplied by what which really just depends how you define your vectors and matrices.

The coordinates you defined are in whatever space you decide they are in. If the origin, scale and axis of two spaces are the same then really it doesn't matter what space they are in. It only becomes an issue when they are no longer the same (when you actually move your model). The model matrix itself would not contain anything about your actual vertices. If you move your model so that it is at x=5 then your modelToWorld matrix will be a translation matrix with x = 5. https://en.wikipedia.org/wiki/Translation_(geometry) that shows the general form of a translation matrix. https://upload.wikimedia.org/math/e/d/7/ed79fcb53215e1db1c711daaeec03b11.png in that vx would be 5, vy 0 vz 0.

You should check out http://glm.g-truc.net/0.9.7/index.html you can just use it without having to worry about how to create all these matrices yourself (and do the maths with them). It is a very involved topic.

Interested in Fractals? Check out my App, Fractal Scout, free on the Google Play store.

Model Coordinates are coordinates that are local to a object. Were the origin is usually at the center of the object.

It doesn't have to be at the center. Personally, I'd rather have the model origin at "ground level" i.e. at the center of the model's feet.

And if, for example, it's a model of a warrior holding a very long spear that juts out five feet in front of him, I don't want the origin to be literally in the center of the entire model, I'd want it to be in the center of the human part of the model, without taking the spear into consideration, if that's what the game engine is expecting.

Model Coordinates... ...Common in software packages like Blender and other modeling tools.

These aren't file formats you are switching between, where it doesn't really matter whether you choose PNG or JPEG.

In 3D applications, multiple coordinate spaces need to be used, and your triangles get converted between them to end up on the screen.

I always got the impression that these were coordinates in world space.


If a model origin lines up directly on the world origin (and if other things like rotation and scale are also equal), then those two coordinate systems (for that one instance of that model) line up and it just so happens that no conversion would be necessary. The conversion would basically be like multiplying a number against '1'. 5 * 1 is still 5, the multiplication didn't change the value.

When you were creating the cube, you were imagining it at the center of the world, and were accidentally mixing world space and model space, because it so happened that they both were the same for that one cube (since the model's origin was at the world's origin). Possibly even for the camera as well, since the camera origin also starts off by default at world origin.

I always got the impression that these were coordinates in world space.


So, the numbers are just numbers. Like "2" could be miles or kilometers if you wanted it to be. In a miles/kilometers analogy, 1.6 kilometers-per-mile would be the transform between the two systems. Coordinate systems and matrices and vectors can be thought of in the same ways.

My question is what values would be in the world matrix? Would the world matrix consist of the desired location in world space I would want to place my model object onto?


The easiest way that I've found to visualize what's inside a matrix is:


                                                   x y z w
New X Axis direction relative to the old X axis = (1,0,0,0)
New Y Axis direction relative to the old Y axis = (0,1,0,0)
New Z Axis direction relative to the old Z axis = (0,0,1,0)
New Origin position relative to the old origin  = (0,0,0,1)  // also called 'Translation'
(Caveat: depending on the convention, the rows should be columns instead. I've written it this way so I can label the "rows" in ASCII.)

Basically, if you imagine an untransformed "coordinate system" as four ideas:

- Which direction the X axis goes (obviously it goes in the 1,0,0 direction)
- Which direction the Y axis goes (0,1,0)
- Z axis (0,0,1)
- Where the origin is (0,0,0)

When the matrix has 1's down the diagonal and everything else is 0 like I've written here, that's the "Identity" matrix. When you multiply a vector or another matrix by this matrix, the output is the same as the input.

Rotation matrices put values in the upper left 3x3 corner of the matrix which, if you drew those rows as if they were at the arrowhead of an axis line on a graph, look exactly as if you had rotated the axis lines themselves.

Scale matrices also modify the upper left 3x3, but only along the diagonal portion. If you drew those rows as axis arrowheads, it would look as if the axis indicators were stretching.

Translation matrices put their X,Y,Z values directly in the X,Y,Z elements of the 4th row.


You might be wondering what the heck the 'w' column of the matrix is for...

There are two major types of vectors in 3D models: Positions, and Normals. Normals need to rotate when the model rotates, but their X,Y,Z values are relative to the vertex they're on, so they *shouldn't* be affected by the translation part of the matrix. Positions need to rotate and translate.

The "w" part of a vector allows it to control whether or not to ignore the translation row of the matrix (which gets multiplied by the 'w' part of the vector)

Position vectors use (x,y,z,1).
Direction vectors use (x,y,z,0). Normals are direction vectors.

When w is zero like it is in the Normal, it zeroes out the result of the multiplication with the Translation part of the matrix, which causes it to be "ignored."

If you look back at the identity matrix, you'll see that the three axis rows are 'Direction' vectors, and the Translation row is a 'Position' vector.

Multiplying two matrices together effectively computes the same things as if those four rows are vectors themselves.

This topic is closed to new replies.

Advertisement