I'm trying to understand the details of space transformations that take a vertex from local space to a screen space. I'm working with DirectX so I'm following the following this documentation. This is an excerpt from there which shows how the pipeline should transform a vertex:

I'm pretty clear on the first 3 transformations, model to world, world to view space, view space to projection space (clip space). What I'm confused about is that some of the things in this flow don't seem to match my own experimentation in code. Here is an example:

In this vertex shader I do the 3 basic transform, after the perspective transform the object should be in clip space, and as you can see the dubug value of the vertex at the end of the perspective transform is <16.14 -2 7.3 7.4>
So now the vertex should be in projection space aka ‘clip space’.
Going back to the documentation, this is what the paper says happens in the clip space:

So looking at this in order for the vertex to pass the clip test X should be (X > -Wp) in my case X is 16.14 so and W is 7.4 so it's true, but X should also be (X ≤ Wp), and in my case that is not so. So that's the first issue I have, which is not making sense to me.
Ok so that issue aside, moving on to the next step after clipping, which is view port scaling. The doc says that this is the viewport matrix

It's scaling X by the dwWidth, and Y by (-Height) to flip it and scale it at the same time. However how can this be done at this stage? My vertex position is still <16.14 -2 7.3 7.4> as far as I understand, and in order to do this scaling doesn't the vertex already have to be in normalized device coordinates by the time this operation is performed? That's my second issue.
So moving to the last step, which is the perspective divide. In my case if I transform my current vertex by the Wp matrix (in the above diagram)
I get <12800 5640 7.3 7.4>, after the perspective divide the numbers are still completely out of bounds. So what am I missing here?