In my low level renderer:
Resource lists hold buffer & texture bindings. State groups hold resource lists, UBO bindings, pipeline state (depth/stencil, blend, raster) and a high level shader technique binding.
A shader technique contains many different shader program objects to be used by the same object - depth only rendering, forward shading, deferred g-buffer filling, etc.
A draw-item is created from a collection of state groups, a draw description (linear/indexed, primitive type, number primitives), and a shader-pass ID (shadow pass, forward shading pass, etc). From those inputs, the minimal set of states and bindings can be extracted to form the draw item.
In my high level renderer:
Rendering pipelines declare lists of render stages that they want to collect draw items for. A stage has a shader pass ID, render target(s) / depth target (FBO), a camera frustum, and a state group that will be appended to any draw items created for that stage.
A model is made up of nodes and meshes. Nodes have bounding volumes for visibility culling, and meshes. Meshes have a collection of draw items (potentially one for each stage declared by the current set of pipelines). When creating a model, the pipelines a queried to find out the list of potential stages, so that draw items for each stage can be pre-created.
Scenes are collections of models.
To render a scene, the current set of pipelines first generate a list of stages that they will be drawing. The scene then collects a list of draw items applicable to each stage (meshes that have a draw item for that stage / their shader has a program for the stage's pass ID, and who's node is visible to the stage's camera frustum). The pipelines can then submit those lists of draw items in the appropriate order.
To implement shadow mapping:
I'd add a new shadow stage at the start of my pipeline, which uses the depth-only shader pass ID and the light's frustum. I'd modify the forward shading stage to contain a texture binding for the shadow map and a UBO binding for any related data (e.g. light's view matrix) in its state group, ensuring every object in the forward shading pass can access the shadow map. The scene will then frustum cull from the lights point of view and collect and shadow casters. The pipeline will submit these draws before the forward shading stage, which then can consume the results.