No offense, but most of your opening post is very confusing (at least to me). E.g. there is no such thing like a target position within a view matrix. Moreover, "farther viewing" should be done by zooming. And multiplying a position is not meaningful from a mathematical point of view. So it seems me that a deeper insight may be helpful ...
Let's mostly ignore that the camera is somewhat special due to its view defining purpose. Instead, let's think of the camera as an object in the world like any other one. The placement (position and orientation) of the object w.r.t. the world is given as matrix C. Then the relation of a point (or direction) called v in the local space of the camera and its counterpart v' in world space is just
v' = C * v
Bringing C onto the other side of the equation like so (where inv(...) means the inverse matrix)
inv(C) * v' = inv(C) * C * v = v
defines the other way: Expressing a world co-ordinate v' in the local space of C.
Especially w.r.t. the camera, C may be called "camera matrix", and inv(C) may be substituted by V what usually is called the "view matrix" in the context of cameras, because the view matrix is just the inverse of the camera matrix. (The math itself is universal to spaces regardless that we apply it to the camera in this case.)
Attaching the camera to the vehicle means to make a geometrical constraint, so that moving the vehicle "automatically" moves the camera, too. This kind of thing is usually called "parenting" or - more technically - "forward kinematic". This means that we define a "local placement" L for the camera, i.e. a spatial relation that expresses the position and orientation of the camera not in the world but w.r.t. the vehicles world placement W. In other words, L defines how much translation and rotation must be applied to a vector in camera's local space so that it is given in the vehicle's local space.
The formula needed to transform from a local space to its parent space is already shown above, where we've used the world space as the parent space. However, an indefinite number of parent spaces can be used. What we want here is to go from the camera local space into the vehicle local space, and from there to the world space. So we have
v' = L * v
v'' = W * v'
or together
v'' = W * ( L * v ) = ( W * L ) * v
from what we see that the "parenting" just means to concatenate the particular transformation matrices. But be aware that matrix multiplication is not commutative, so the order of the matrices is important. In the given example we have a composited matrix
W * L
for parenting.
Now the question pop up of how L is build. As a placement it has both a positional and an orientational part. Both can be set to fixed values, meaning that the camera is installed with a static device into the vehicle cockpit. Less strict parenting can be done, too. E.g. the position can be fixed while the orientation can be driven by targeting a "look-at" point. Let's investigate this example a bit further.
So we define that the placement matrix L is composed from a translational and a rotational part, T and R resp., in the usual order (as you've hopefully noticed, this post uses column vectors):
L := T * R
To calculate the look-at vector, i.e. the unit direction vector from the position of the camera to the target point, the both positions must be given in the same space, and the resulting look-at vector will be in that space, too. Because L is given in vehicle space, and R (which the look-at vector is a part of) is hence also, we are interested in a vehicle local look-at vector. T is already vehicle local, but the target point p should be given in world space. So we need to transform it first
inv(W) * p
and can then compute the difference vector
d = inv(W) * p - T * 0
where 0 denotes the origin point vector in homogeneous co-ordinates, i.e. [ 0 0 0 1 ]. From here normalization and matrix building is done as usual, so I neglect that stuff here.
HtH