Geometry
The biggest show-stopper are draw-calls, the more vertices fit into a mesh the better is it handled by the GPU and the faster your level. There is atechnique named static mesh baking in the major leading engines that combines meshes that never change into a set of full capacity vertex buffers (every vertex buffer has a limit so this will be split to fit into N buffers) and so produce just one draw-call at a time. The same is used for HUD/UI on AAA engines. It is less costly to change a set of vertex buffers than drawing every UI element one by one.
I have worked on a console game where every tile of the game was made as a single block or element from the original designers like drawing pixels in the 90's. After baking the whole scene in 3D Max, the game run 60% faster.
Vegetation
Depends on the kind of vegetation. Trees and other bigger plants are a combination of mesh and shader; they are flattened to a 2D fake tree that always 'looks' to a players location and expand into full blown 3D meshes when the player is into certain distance. This effect can be watched on The Elder Scrolls series when using a cheat to fly high, the trees below become flat at some distance.
Gras and foliage is rendered using a Geometry Shader that creates those at the player's spot to simulate endless meadow, it is also used to have it moving 'in the wind' or when the player walks through.
Dynamic Units
Those enemies, items and whatever is moving in the world is loaded as a flat data class in memory and simulated as long as it isn't necessary to render a full blown 3D model. When a player enters certain distance to the model, it will be loaded/shown on demand and unloaded/hidden when the distance and certain threshold is leaved.
I have worked on a car driving game where we needed to simulate a highway full of cars in both directions. We used just arround 50 - 100 models at a time to simulate traffic of 1000 or 10.000 cars per world chunk simply by putting the cars that got out of reach for the player into a pool and took those cars from the pool if they entered the visibility distance. Anything else was just a plain data description in memory
Conclusion
There is no real 'they did it so' answer to your question and you shouldn't refer to Crysis (or Cry Engine) anymore because they get obsolete these days. 90% of all games are made with Unity or Unreal these days except for the AAA inhouse engines big studios use. So this will always be an own research thing by case of a game