I haven't done ocean rendering but have done a few water shaders with ripples, fresnel etc.
One important thing to note is that your geometry has a limited complexity set by you (vertex grid density) and therefore your knowledge of “true” normals only goes as far as that. Extra normal data defined on any finer level can't really be accurate, since normals in the pixel/fragment shader are simply interpolated between vertices. But they can be faked with noise. So if the algorithm uses a more coarse vertex grid for processing purposes you can still sprinkle some good ole perlin variation with careful parameter selection to add finer changes (basically procedural normal mapping, can be surprisingly cheap). If you want a lot of control over them it can get involved to parameterize (aka if your waves change size or speed how to make the noise still look natural and fitting).
Having said that, the paper I think you are talking about uses a deterministic, analytic formula to displace the surface based on some parameters like time t and “wave vector”. In volumetric rendering, when it's done by people smarter than me, there are examples where a function's derivative is found in order to determine a normal of the surface it describes. Whether this function is possible to derive I don't know, sorry. Alternative option is to sample the function at several really close points and use the differences to fake the derivative.
What I'm getting at is that, if this analytic solution to a normal or even just brute multisampling works fast enough, I'm not sure if you need the vertex grid at all. A volumetric post process shader can do a lot (I made volumetric clouds like that, no vertices except for the screen quad).