Virtual machine design.
Greetings,
After one month''s struggle, I have managed to construct a C-like script compiler and it can generate intermediate code for most of the language constructs. The intermediate code is in a quadruple form: op oprand1 oprand2 opdest; op stands for for operator, oprand1 and oprand2 stand for the source operands, and opdest stands for the destination operand. And now I plan to build a virtual machine for my little compiler; instead of further transform the quadruples, I intend that the vm runs these quadruples directly. Can anybody give some suggestions for doing that? I think the hard part should be function-related quadruples like call, ret, etc. and the run-time stack manipulation.
Thanks for any response and regards!
That's awesome! I found building my first compiler to be so rewarding.
To your question... you mention you don't want to further transform the intermediate code. Are you implying that you have already resolved identifiers with actual addresses? I imagine not, because you havn't thought of the VM architecture yet. My experience has taught me that it helps significantly if you do this in the compilation\assembling phase rather than trying to do this while executing the instructions.
Regardless, the way that I went about it was to have a lookup table where the opcode is the index. In order for this to work you either to need be reading in bytecode at this point, or have a hashing function for the opcode string. That lookup table contained structures which described the individual opcodes. The VM would then gobble up the correct number of operands from the bytecode stream and invoke a function pointer in that structure and pass the operands to it. For each opcode, I had one function defined which simply executed the logic for that opcode.
Resolving the operands was trivial because I had already transformed all identifier references into addresses. In my case, the VM was stack based, so the addresses were offsets from either the current function frame (for function arguments and local variables) or relative addresses from the bottom of the stack(for global variables). Pointers were supported by flagging each operand with an "is indirect" bit. (*edit: I should also mention that when implementing more complex variable types such as classes, arrays, and constants there is a bit more voodoo to actually resolving the operands. The general gist is that before each operand in the bytecode stream there are a few bits which flag what the operand is and the VM then performs whatever operations on that to genereate an actual value. If your opcodes only accept specific types of data, and you change the opcode depending on the data then this is not necessary).
Handling function instructions isn't that tough if you're using a stack. What I did was this: when the VM encountered a function call instruction it would push the current instruction address onto the stack and then change the next instruction address to be whatever is the argument to call. This, of course, is assuming that the assembler\compiler expanded out the arguments of the function call into PUSH operations before the CALL instruction, greatly simplifying things. Doing this though, you need to be sure, when resolving variable addresses, to remember there is a value between the beginning of the function-frame on the stack and the first parameter.
When exiting the function, you require the function to pop off the return value -- save it -- pop off all its parameters and then jmp to the return value. (Keep in mind, you don't need to physically pop values off a stack, you can just decrement the top-address. Usually the stack is implemented as a large array).
Anyway, there are TONS of things to consider when building a VM. I highly recommend picking up Game Scripting Mastery by Alex Varnese. It's a great book and addresses many of the questions you have. (be sure to go to amazon from gamedev so they get some commision!!)
Good luck!
- Jason Citron
- Team nine:14, System Architect
- www.nine14.net
--------------------------------------------------
Checkout the prototype of Fates Forgiven at www.fatesforgiven.com
[edited by - clash on June 11, 2004 7:48:37 AM]
[edited by - clash on June 11, 2004 7:49:53 AM]
To your question... you mention you don't want to further transform the intermediate code. Are you implying that you have already resolved identifiers with actual addresses? I imagine not, because you havn't thought of the VM architecture yet. My experience has taught me that it helps significantly if you do this in the compilation\assembling phase rather than trying to do this while executing the instructions.
Regardless, the way that I went about it was to have a lookup table where the opcode is the index. In order for this to work you either to need be reading in bytecode at this point, or have a hashing function for the opcode string. That lookup table contained structures which described the individual opcodes. The VM would then gobble up the correct number of operands from the bytecode stream and invoke a function pointer in that structure and pass the operands to it. For each opcode, I had one function defined which simply executed the logic for that opcode.
Resolving the operands was trivial because I had already transformed all identifier references into addresses. In my case, the VM was stack based, so the addresses were offsets from either the current function frame (for function arguments and local variables) or relative addresses from the bottom of the stack(for global variables). Pointers were supported by flagging each operand with an "is indirect" bit. (*edit: I should also mention that when implementing more complex variable types such as classes, arrays, and constants there is a bit more voodoo to actually resolving the operands. The general gist is that before each operand in the bytecode stream there are a few bits which flag what the operand is and the VM then performs whatever operations on that to genereate an actual value. If your opcodes only accept specific types of data, and you change the opcode depending on the data then this is not necessary).
Handling function instructions isn't that tough if you're using a stack. What I did was this: when the VM encountered a function call instruction it would push the current instruction address onto the stack and then change the next instruction address to be whatever is the argument to call. This, of course, is assuming that the assembler\compiler expanded out the arguments of the function call into PUSH operations before the CALL instruction, greatly simplifying things. Doing this though, you need to be sure, when resolving variable addresses, to remember there is a value between the beginning of the function-frame on the stack and the first parameter.
When exiting the function, you require the function to pop off the return value -- save it -- pop off all its parameters and then jmp to the return value. (Keep in mind, you don't need to physically pop values off a stack, you can just decrement the top-address. Usually the stack is implemented as a large array).
Anyway, there are TONS of things to consider when building a VM. I highly recommend picking up Game Scripting Mastery by Alex Varnese. It's a great book and addresses many of the questions you have. (be sure to go to amazon from gamedev so they get some commision!!)
Good luck!
- Jason Citron
- Team nine:14, System Architect
- www.nine14.net
--------------------------------------------------
Checkout the prototype of Fates Forgiven at www.fatesforgiven.com
[edited by - clash on June 11, 2004 7:48:37 AM]
[edited by - clash on June 11, 2004 7:49:53 AM]
- Jason Citron- Programmer, Stormfront Studios- www.stormfront.com
Thanks for your response, Clash. Your comments point to the heart of the problem
And also thanks for your recommendation of "Game Scripting Mastery", it''s really awesome. As to my view of the book, it''s so very detailed than I expected. The virtual machine design chapters are good; however, I do not think that it is so necessary to translate the intermediate code into XASM source (another lex/parse process here!), and then turn it into real nemeric code native to the XVM, since we are not dealing with a hard-ware cpu; we are working on a soft-ware vm instead. So I still intend for my project that my vm runs the quadruple intermediate code, which is generated by the parser as a high-level structure and is easily analyzed. No wonder this can reduce a lot of work and anyhow, I can not see much performance degradation in this way.
And also thanks for your recommendation of "Game Scripting Mastery", it''s really awesome. As to my view of the book, it''s so very detailed than I expected. The virtual machine design chapters are good; however, I do not think that it is so necessary to translate the intermediate code into XASM source (another lex/parse process here!), and then turn it into real nemeric code native to the XVM, since we are not dealing with a hard-ware cpu; we are working on a soft-ware vm instead. So I still intend for my project that my vm runs the quadruple intermediate code, which is generated by the parser as a high-level structure and is easily analyzed. No wonder this can reduce a lot of work and anyhow, I can not see much performance degradation in this way.
You mention that your intermediate code is generated in a "high-level structure." Are you referring to an actual struct or the text-representation?
If you are referring to a data-type, then the benefit of creating bytecode would certainly not be there for performance. However, if you intend to release your product, I imagine your method now will require you to give out the script-source so that it can be compiled to those structures. Using bytecode would prevent others from seeing and modifying your scripts. Of course, you could always use a pack-file type setup in which case your scripts will be safe anyway. Or, you could write out those structures verbatim and reload them at run-time.
If you were referring to having the VM essentially parse and analyze the text-representation, then you''re back to an expensive process of transforming that text into meaningful data. Precompiling it into bytecode would avoid this step.
Best of luck.
- Jason Citron
- Team nine:14, System Architect
- www.nine14.net
--------------------------------------------------
Checkout the prototype of Fates Forgiven at www.fatesforgiven.com
If you are referring to a data-type, then the benefit of creating bytecode would certainly not be there for performance. However, if you intend to release your product, I imagine your method now will require you to give out the script-source so that it can be compiled to those structures. Using bytecode would prevent others from seeing and modifying your scripts. Of course, you could always use a pack-file type setup in which case your scripts will be safe anyway. Or, you could write out those structures verbatim and reload them at run-time.
If you were referring to having the VM essentially parse and analyze the text-representation, then you''re back to an expensive process of transforming that text into meaningful data. Precompiling it into bytecode would avoid this step.
Best of luck.
- Jason Citron
- Team nine:14, System Architect
- www.nine14.net
--------------------------------------------------
Checkout the prototype of Fates Forgiven at www.fatesforgiven.com
- Jason Citron- Programmer, Stormfront Studios- www.stormfront.com
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement