Advertisement

Creating a scripting language - variables

Started by October 19, 2003 11:16 PM
5 comments, last by INVERSED 21 years, 1 month ago
Ok, I''m working on a scripting language (compiled bytecode) currently, and I have a lexer and a parser and various other objects working to a satisfactory level, but the thing I''m still curious about is variables. One way I could do it is to have a table to all the variables in the script and just fill in their positions on the virtual machine stack as they become apparent, I''m concerned with this method because this would eat up more memory, although it may be a little faster in some respects. I don''t know really. Does anyone have any othe sugggestions that would require making two passes over the source code?
Write more poetry.http://www.Me-Zine.org
Runtime stack is the way to go. on how much memory it takes up depends on whether you use a array or a linked list for your runtime stack. an Array is fast because it is referenced by indicies, but linked lists allow the stack to be of any size. its up to you but i would go with the speed of an array and just make some sort of opcode to set the stacksize within the script. if you really think about it it doesn't *waste* to much memory... I.E.:

struct sVariable
{
char iType;
union
{
float fFloat;
char* pString;
int iInt;
}
};

the size of that struct is 5 bytes because of the union.

so a stack size of 200 would be 1000 bytes.

IMO the array is the way to go. just make sure and add good error checking for stack over flow and what not...or if you really get creative you could make your V Machine reallocate the stack in a bigger size (could just use vectors if your thinking C++)... just a thought.

hope i may have helped.

EDIT:

About the Variable tables... in your compiled code, there is no need for variable tables because any intruction (opcode) that references a "variable" is just a reference(index) to a stack. When compiling however, you must create a Identifier table.

here is a sample assembly like script function:

func myfunc:
local Var1 ; create a local variable
local Var2 ; create a 2nd local varable
mov var1, 0 ; move 0 into var1
add, var2, 2, 1 ; add 2+1 and store in var2
ret


Parsing this script with 2 passes would go something like this



FIRST PASS:
-----------

- Look for any DIRECTIVE that would define an Identifier (I.E. "local" and "func" from our about script)
- Add these Identifiers to their appropriate lookup tables

from the above script we would have 2 lookup tables:

Func Identifier Table:
myfunc

Local Identifier Table:
Var1
Var2

- Calculate things like instruction count, line count, and the Local data size of each function. (any thing that is useful to the second pass).


SECOND PASS:
------------

- Parse intructions into opcode (make sure and check for type missmatches and undefined Identifiers and all that good stuff)

from the above line "mov var1, 0"

this could be thought as:

mov StackRef[0], Literal[0]

Imagine mov's opcode is 0 and Stackref Operand Type is 0 and Literal Operand Type is 1
The bytecode (represented in numbers) would look like this:

00010

i put a type number infront of each operand value.

To figure out what the stack reference is, you take into account what the local data size is(from first pass) and sice we got 2 since there are 2 local varibles we know that our locals are stored into stack indices 0 and 1.



I hope i may have helped in any way...if i have caused any confusion please ask..

-sky.

[edited by - skyblu on October 20, 2003 1:41:46 PM]

[edited by - skyblu on October 20, 2003 1:43:36 PM]

[edited by - skyblu on October 20, 2003 1:44:24 PM]
Advertisement
If you use a ''variable stack'', don''t forget to make local variables relative to the ''current position'' so that recursion and functions calling functions works properly. Either that, or set up a new stack for each function when it is called, but that would probably be slower.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
quote: Original post by INVERSED
Ok, I''m working on a scripting language (compiled bytecode) currently, and I have a lexer and a parser and various other objects working to a satisfactory level, but the thing I''m still curious about is variables. One way I could do it is to have a table to all the variables in the script and just fill in their positions on the virtual machine stack as they become apparent, I''m concerned with this method because this would eat up more memory, although it may be a little faster in some respects. I don''t know really. Does anyone have any othe sugggestions that would require making two passes over the source code?

Do not use a table. Do what assembly does -- when compiling to create the bytecode, replace all variable names with the stack offset. Then each "variable" is actually just an index into the stack.

~CGameProgrammer( );

-- Post screenshots of your projects. 100+ posts already in the archives.
~CGameProgrammer( ); Developer Image Exchange -- New Features: Upload screenshots of your games (size is unlimited) and upload the game itself (up to 10MB). Free. No registration needed.
I just recently finished writing an assembler and found that, for local variables, negative indeces are the way to go. If you think of an index of -1 as being "the top of the runtime stack," then the local variable indeces will resolve correctly regardless of how deep the recursive function calls get. The only trick with using that technique is you need to keep track of the top of the current stack frame "before" any code is executed (such as Push), so that the correct offset can be added to the negative index. When a function is called, push that value as well as the function''s index onto the stack.

Hope that helps a bit..
- Jason Citron- Programmer, Stormfront Studios- www.stormfront.com
quote: Original post by Clash
I just recently finished writing an assembler and found that, for local variables, negative indeces are the way to go. If you think of an index of -1 as being "the top of the runtime stack," then the local variable indeces will resolve correctly regardless of how deep the recursive function calls get.


yes this is the way to go...if you want to implement globals you reference memory in number >= 0 any local data is reference as < 0.

sky





my teeth are tired from chewing demon bones.
Advertisement
Thanx for all the replies, I appreciate the help. The idea of using a stack had occured to me, I do in fact plan to use one. As my current script compiler goes, it does all of it''s preprocessing, syntaxing, and code generation in one pass, I suppose, I would need to use two passes to use negative stack indices, so that I can figure out just how many variables get declared locally at a time. Is there anyway around this, or is it necessary to do 2 passes?
Write more poetry.http://www.Me-Zine.org

This topic is closed to new replies.

Advertisement