Compilation depends on many circumstances. First the compiler has to handle your preprocessor directives, these are includes but also anything conditional you define with a # in front. Therefore the preprocessing unit has to resolve all of these statements from top to bottom before anything else can take place.
Every include will be processed, the file will be loaded and added to the process too. After the conditionals are resolved and the preprocessor knows what code to push to the compiler, macro replacement takes place. Any macro definition (so a define with additional arguments required) are resolved recursively wherever you used the definition in code. It is truely recursively until either the preprocessor dosen't find anymore defined identifiers in the macro code or the macro calls itself, then the processor breaks. This steps may happen at the same time, this is preprocessor dependant.
The preprocessor I wrote in C# to detect dependencies between C++ files does all of this on the fly for example.
Templates are resolved (specific code is generated for each template for each different arguments passed to them, this is why templates may cause code bloat) and then the code is pushed finally to the compilation unit.
So to answer your question, it depends: how many include files do you have, how much and complex are the macros you use, how many templates do you use with different arguments. Did you set the include guards correctly (to not include a file twice as you already processed it in this compilation unit) or included more files as necessary, does a simple forward declare can be used instead of the whole header?
By the way, 40 seconds are nothing. Huge projects like game engines (Unreal for example) use so called "Unity Files" where anything is included at once in one file. This is a try to reduce huge build times that may occure else, our Unreal project for example took more than 10 minutes to compile before we toggled it to generate a Unity File.
What I like to do in such cases is to have our custom build tool generate a dependency graph of each include and where it has been used. This not just helps avoid circular dependencies and helps modularizing the project but also shows unnecessary include directives that can cause much higher compile times