Hello,
so for context, I've got my own IL/bytecode-based compiled programming-language, which is stack-base (similar to Javas IL). I have a JIT-compiler which ATM does nothing more than just translate all the VM-based operations into one blob of machine-code (so that no loop is necessary, and constants need not be fetched from a bytecode-stream but can be treaded as actual constants). This is already multiple times faster than executing the bytecode in a VM, however it is still slower since it needs to use the VMs stack for passing parameters, for example:
PushConst 4
LoadLocal 0
Mul
PushConst 1
Add
Is the equivalent of the operation (4 * x + 1). Ignoring any crazy optimizations, the same operation would look like this in machine-code:
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], edi
mov eax, dword ptr [rbp - 4]
shl eax, 2
add eax, 1
pop rbp
ret
Obviously all the operands are eigther stored in a register, or the applications own stack.
What I'm curious is, is there any general algorithm to turn such a stack-based IL to a full register-based variant, that can then be optimized? For a direct compiled language like C++, I suppose it's easy, as all those steps are performed based on the AST. But for a IL like Javas bytecode, there is no AST. Is the Java IL simply parsed back to an AST somehow, or is there another method?
For very simple examples, I could come up with something, but for anything more complicated including jumps and labels, I don't have any idea. I could do another primitive optimization by looking for patterns like “Add" preceeded by “PushConst”, and use registers for adjacent operations, but I'd rather know a solution that is scalable.