Advertisement

Scripting Language Genesis

Started by November 01, 2004 07:28 AM
116 comments, last by cmp 20 years, 2 months ago
Quote:
Quote:

the way, you implement it, is totally up to you, you could use one big array or c's malloc (i think i would go this way).

I was thinking about having one big chunk of memory, but then that would lead to fragmentation problems, so I thought why not just leave it to the OS?

i would just use the malloc implementation provided by your compiler, because it may be the fastet for a given platform.

Quote:
Quote:

no, in your language every method is 'virtual', if you think of virtual of just meaning, that the address of a method is unkown at compile time, so it has to be determined at run time.

I was refering to what C++ calls 'virtual', that the method can be overridden. Using your definition, then only interface methods are virtual, the others are known at compile time.

are you really sure, that you understood what virtual means in c++?
if you say:
struct A{  void hello() { printf("class a"); }  virtual void hello2() { printf("class a"); } }struct A : B{  void hello() { printf("class b"); }  void hello2() { printf("class b"); } }void test{  A   real_a;  B   real_b;  A* a1;   B* a2;  a1 = &real_a;  a2 = &real_b;  a1.hello();  a2.hello();  a1.hello2();  a2.hellow();}

the output would be:
class a
class a
class a
class b

in c++ you can 'overide' every method, but if the method is not declared as virtual the method of the pointer's class is used, if the method is virtual the method of the real class is used.
so every method of yours is virtual.

Quote:
Quote:

what do you mean with an offset to the object's structure?, if the offset and the address of the object's structure is known at compile time, you can simply use the real address of the method.

Because of overriding. This code
class base    void print()        print("One")class step1 : base    void print()        print("Two")class step2 : base    void print()        print("Three")class step3 : step2    void print()        print("Four")base a = base.alloc().init()base b = step1.alloc().init()base c = step2.alloc().init()base d = step3.alloc().init()a.print()b.print()c.print()d.print()// should outputOneTwoThreeFour

Yes you know the offset for the method, but you do not know which one to use. So you use the object to get it's class, then you use the offset of the method to jump to the right place in the class, then you jump to the method. Every class that inherits from base will have that method location pointing to a method. Even if they don't implement that method, it will just point to the last implementation of it.


this is just normal c++ code, with every method being virtual, but it's totally inefficent, as you can trust the c++ compiler writers to have decent knowledge about this topic, you should probally choose to use their way.
the first step of compiling should be to check if a method is anywhere is overridden, if it is, it's marked as virtual (or whatever you want to call it). for every method not being virtual you can determine the address at compile time, because everytime you this method of the object (in source code means), you known the direct address in the bytecode.
for example:
call 0x2310

if the method is virtual, you should really use a vtable, as methods get overridden at class basis, so you waste not so much space, a function call to this method would then look like this:
mov eax, [vtable_start + offset * pointer_size] // i'm not sure about the asm syntax
call eax

as you can see the overhead of this technique is one pointer per class (the pointer to the vtable), for every method overridden one byte per class used (the pointer to the method in the vtable), and the runtime overhead is to construct a pointer to the vtable position (vtable_start + offset * pointer_size), dereffence it and then call the resulting memory offset.

as you can see every method beeing virtual causes a memory and runtime overhead, wich you should avoid.

Quote:
Quote:
Just on an aside, what would be wrong with using pointers on the code and link everything together?

what do you mean, i don't understand your explanation.

Instead of using ID values for a method, just use pointers. If you want to find an object's method, just follow the pointer to it.
// virtual machine implementationclass voodoo_class{    char* name;    voodoo_method* methods[];};

Instead of plain data. The byte code could even be linked this way, a reference to a constant could be replaced with a pointer to the constant.
if you really strive for simplicity and clearity you should also try to follow these principle in your vm, so why let the vm know about classes, if it doesn't give you any big advantage?
since your language is staticly typed, and does not need to know anything about method names, etc, you should use some bytecode more like assembler.
the vm itself will be very lightweighted - you can write such an vm in under 1000 lines of code. and be able to express anything you will want.

as i said earlier in this thread i have once written a vm like this, but as i latley looked at the code, it was only on big mess (it seems like i did just start a rewrite, when i stopped).
and all comments are in german, so i think it won't help much if i upload code.
but i will try to update at least the file describing all opcodes and upload it.
Quote:
Quote:
Quote:
the way, you implement it, is totally up to you, you could use one big array or c's malloc (i think i would go this way).

I was thinking about having one big chunk of memory, but then that would lead to fragmentation problems, so I thought why not just leave it to the OS?

i would just use the malloc implementation provided by your compiler, because it may be the fastet for a given platform.

Exactly. I was under the impression that the OS had something to do with memory as well.
Quote:
Quote:
Quote:
no, in your language every method is 'virtual', if you think of virtual of just meaning, that the address of a method is unkown at compile time, so it has to be determined at run time.

I was refering to what C++ calls 'virtual', that the method can be overridden. Using your definition, then only interface methods are virtual, the others are known at compile time.

are you really sure, that you understood what virtual means in c++?

Fairly sure.

The difference in my language is that even though every method can be overridden, the location (offset) of the method is known at compile time. The vm uses the offset to find the actual method to use.

Thinking about it, I now realise that this is probably similar to what virtual means in C++, and that's where the confusion started. I think my methods are less virtual than C++'s, because multiple inheritance can never work.

I think your code has a few mistakes, but I get it's meaning.

Quote:
this is just normal c++ code, with every method being virtual, but it's totally inefficent, as you can trust the c++ compiler writers to have decent knowledge about this topic, you should probally choose to use their way.

They way I was planning to do it sounds exactly the same way C++ does it. Each object has a pointer to it's class. An offset (that you know) from the class is the pointer to the method that you want.
Quote:
as you can see every method beeing virtual causes a memory and runtime overhead, wich you should avoid.

Yes there is overhead, but I consider the advantages outweigh the disadvantages. Also because this is going to be a dynamic system, you don't know beforehand if the method is going to be overridden, new classes can (and will) be added and removed on the fly.
Quote:
if you really strive for simplicity and clearity you should also try to follow these principle in your vm, so why let the vm know about classes, if it doesn't give you any big advantage?
since your language is staticly typed, and does not need to know anything about method names, etc, you should use some bytecode more like assembler.
the vm itself will be very lightweighted - you can write such an vm in under 1000 lines of code. and be able to express anything you will want.

All code will belong to an object, so I think for sandboxing and simplification it would be good to have objects that represent code objects. I am not planning on having global methods, so in fact there won't be much extra work because I won't have to implement global methods.

And later if I want, I will be able to build in real security - certain objects are only allowed to be instanced by certain other objects. The object will be able to introspect the state of the vm, and will be able to find out what type of object is trying to instantiate it, and return null. Only 'native' code will be able to do this, so an executable is relatively secure (using md5 sums etc). There is always going to be a way around any type of security if all the code is on one computer, but making it as hard as possible to cheat is always good.

Quote:
as i said earlier in this thread i have once written a vm like this, but as i latley looked at the code, it was only on big mess (it seems like i did just start a rewrite, when i stopped).
and all comments are in german, so i think it won't help much if i upload code.
but i will try to update at least the file describing all opcodes and upload it.

That would be great. It is surprising how similar german and english are in terms of the larger words, so it will be quite useful. Just off topic, but is it true that in german there are 16 words for the?
Advertisement
yes i think it is true, because each noun in a senctence has a gender and a casus.
you have 3 genders, male, female and neutral, and 4 casus nominativ (the subject), genetiv and dativ and akkustaiv (both beign objects). and you also have to think about the numerus, either singular or plural.
but as i recount them, i don't think there are 16.

my opcode table:
#ifndef INSTRUCTIONS_H#define INSTRUCTIONS_H/*	this vm is a stackbased one so the only **	instruction having a parameter in the**	bytecode is the push instruction.****	every other instruction takes it **	arguements from the stack.****	methods do clean up the stack: they **	remove their arguements themselft and**	push the return value on the stack.*/	enum __opcode{END 	= 0,	XCALL,			//	ID			calls a method implemented in native code, 			//				id is a 32 bit id assigned by the v//	code flow instructionsJMP,			//	ADDRESSCALL,			//	ADDRESSRET,			//	conditional jumpingCJMP,			//	CONDITION ADDRESSCCALL,			//	CONDITION ADDRESSCRET,			//	CONDITION	ALLOC,			//	SIZE			allocates a chunk of memory, returns the addressFREE,			//	ADDRESS			frees a block of memory	PUSH,			//	pushes a value on the stack, but instead of expecting its arguement on 			//	the stack, it is directly followed by a 32 bit number being its argument			// 	in the bytecodePUSH_V,			//	VARADDR			the same as *p, with p being a pointerPUSH_BV,		// 	VARADDR			the same as PUSH_V, but pushing one bytePUSH_WV,		//	VARADDR			the same as PUSH_V, but pushing one word	POP_V,			//	VARADDRPOP_BV,			//	VARADDRPOP_WV,			//	VARADDR		POP,			//	just pops the last value from the stackPICK,			//	OFFSET			put the value at stack_ptr - OFFSET * STACK_VALUE_SIZE 			//				on top of the stackPUT,			//	OFFSET VALUE		sets the stackvalue at OFFSET to VALUEDUB,			//				clones the topmost value//	math instructions//	for integersADD,			//	A B	A + BSUB,			//	A B 	A - BSUBI,			//	A B 	B - AMUL,			//	A B	A * B	DIV,			//	A B 	A / BDIVI,			//	A B	B / AMOD,			//	A B	A % BMODI,			//	A B 	B % A//	for floating pointFADD,FSUB,FSUBI,FMUL,FDIV,FDIVI,//	bitwise logicAND,			//	A B	returns A & BOR,			//	A B	returns A | BXOR,			//	A B	returns A ^ B//	logicalLAND,			//	A B	returns A && BLOR,			//	A B	returns A || BNOT,			//	A	returns !A//	comparision operationsEQ			//	A B	returns A == BNE			//	A B	returns A != BLT			//	A B	returns A < BLE			//	A B	returns A <= BGT			//	A B	returns A > BGE			//	A B	returns A >= B};/*	example**	instruction		stack after the instruction**	PUSH 3			3**	PUSH 4			3 4**	ADD			7**	PUSH 5			7 5**	MUL			35**	DUB			35 35**	DUB			35 35 35**	ADD			35 70**	SUB_INV			-35*/#endif


you may have to define the numbers for each opcode, since they should all be continous and stay under 255, because i would just use a big array of function pointers, to each opcode function and use the opcode as an index to this array.

another nice idea, that i came up with, when i was thinking about this, is an extendable vm: you would have < 128 opcodes, so there would be another 128 for use, the user then could specify wich function of a vm should be used for the other opcodes, this way you could even rip out the floating point instructions, and let the user load a dll wich contains them, if he really need them.
you would only need a load dll instructrion, and a use function instruction. for example:
push offset_of_dll_name
ldll # load dll
push offset_of_function_name
push opcode_number_to_use
sof # set opcode function
and if you say that dlls are only loaded from a special dll directory, it is even safe, since the user would only install safe dlls (at least he should).
an interesting side effect of this technique would be that self modifing code, would be fairly easy to write.
Quote:
Original post by cmp
yes i think it is true, because each noun in a senctence has a gender and a casus.
you have 3 genders, male, female and neutral, and 4 casus nominativ (the subject), genetiv and dativ and akkustaiv (both beign objects). and you also have to think about the numerus, either singular or plural.
but as i recount them, i don't think there are 16.

I'm not sure where I heard it, just I knew that there were a few.

Quote:
my opcode table:
*** Source Snippet Removed ***

Thanks a lot. It seems fairly logical.

Quote:
you may have to define the numbers for each opcode, since they should all be continous and stay under 255, because i would just use a big array of function pointers, to each opcode function and use the opcode as an index to this array.

Good idea.

Quote:
another nice idea, that i came up with, when i was thinking about this, is an extendable vm: you would have < 128 opcodes, so there would be another 128 for use, the user then could specify wich function of a vm should be used for the other opcodes, this way you could even rip out the floating point instructions, and let the user load a dll wich contains them, if he really need them.

Sounds cool.

I wasn't planning on having comparison operators as opcodes, I was going to have them as method calls, also I'm not going to have the basic arithmetic operations in the bytecode either. You can extend the language by providing new objects that are implemented by C++ code. Because they can introspect the virutal machine, they could do some interesting stuff. This is the way I am going to write the features of this language - the closure type things, lists etc.

Also the code can be in a dynamic library, or anywhere else - as long as it 'registers' itself to the compiler and the virtual machine, it is good. In terms of bindings, I'm a little at a loss for what to do. I've heard that lua's stack based binding is not too good. I think I need a way to have pre written binding methods. Perhaps one for one parameter, another for 2 etc. up to about 10? Also duplicated for methods that return something. The method still has to deal with these object objects, but I think it will make things simpler.

What do you think? It means that for binding the stack automatically gets popped, and the return automatically pushed on. Also the object references get converted into objects that can have more methods to manipulate them.

In one part of the api I think will also have a similar way to call methods. I don't think I will place any restrictions on what the C++ code can do, so you need to make sure that the method you are calling does take the number of parameters you provide. Perhaps there is a 'safe' method calling api that does check.

In terms of actually getting the method name that you want to call, there will be a hashed lookup table for each object. Something like
vm.get_class("integer").get_method("add")

Another important idea is that C++ objects also need to be able to access the properties of other C++ objects somehow. This code should work, and none of the actual polygon data should be available in voodoo (could be an option to - but probably not necessary)
poly_soup a = poly_soup.load_soup("object.dxf")world b = world.alloc().init()b.add_soup(a,vector3d.origin)b.render()

This means the add_soup C++ code needs to know that the object poly_soup has C++ data attached to it.

If you are implementing some code in C++ you need to provide xml that says what you implement, and preferably what you need to see in the system (imports etc).
you could simply use a register_class function, where you would pass a descriptor of this class, with all elements and all methods (i think angel script does it this way).
or you could go the lua way and let the user have to use the stack directly, when supplying functions in c++. but as you said this would be really painfull.

Quote:
I wasn't planning on having comparison operators as opcodes, I was going to have them as method calls

but they are frequently used thorught the code, so maybe you should supply opcodes for them, but when not, you would have to call a function, wich would have to be looked up and then be called - sounds terribly slow to me.
Quote:
Original post by cmp
you could simply use a register_class function, where you would pass a descriptor of this class, with all elements and all methods (i think angel script does it this way).
or you could go the lua way and let the user have to use the stack directly, when supplying functions in c++. but as you said this would be really painfull.

Quote:
I wasn't planning on having comparison operators as opcodes, I was going to have them as method calls

but they are frequently used thorught the code, so maybe you should supply opcodes for them, but when not, you would have to call a function, wich would have to be looked up and then be called - sounds terribly slow to me.

True it is slower (2 lookups instead of a function) but it is easier to program (no native types).

I would say that they aren't used that frequently in what I am considering programming. The code would be something like this, for a simple ai
class bad_guy : enemy    void character_sighted(character c)        if (c.distance(self) < 20)            attack_player()

not
void quicksort(int l,int u,double g[]) {    if (l>=u) return;    else if(l==u-1) {        if (g[l]>g) {            double t = g[l];            g[l] = g;            g = t;        }        return;    }    int a=l,b=u,c=(a+b)/2;    double p=g[c];    g[c]=g;    g=p;    while (a<b) {        while (g[a]<=p&&a<b) a++;        while (p<=g&&a<b) b--;        if (g[a]>g) {            double t=g[a];            g[a]=g;            g=t;        }    }    g=g;    g=p;    quicksort(l,a-1,g);    quicksort(b+1,u,g);}

A quicksort algorithm that I programmed a while ago. (it is purposely hard to read, if you want a 'nice' version just ask)

All the processor intensive algorithms will be programmed in C++, where they should be. Even using opcodes is slow anyway.

So if I throw away integer operation opcodes I get a 'cleaner' virtual machine (completely OO), and it is easier to program (debatable, but nicer).
Advertisement
Progressing a bit on the virtual machine...

What do you do about objects being deleted? The pointer will just go to some memory - we don't know if the object that is there is the one we want or not. I was thinking about having a collection of pointers to the pointers in other places (stack, objects properties) and whenever you assign the variable to another object, you tell the object to remove the pointer to the pointer. Whenever you point your pointer to the object, you tell it to remember your pointer. When the object is going to be deleted, it points all the pointers that were pointing at it, to null.

Is this too memory intensive? Remember that this implementation is for ease of use, not speed of execution. How does Java know when to throw a null pointer exception?


Also I was thinking about C++ calling a script method, how to do it and still be fast? For the one off things something like this would be fine
vm->owner_object("integer")->get_property("some_constant")
but that would be really bad for the internal api implementation (the implementation of the list class for example).

The problem is that at some point this needs to be done anyway, because even if you know the offset for a certain method at one time it could change in the future. There is already a linking stage (each chunk of bytecode that is loaded needs to be linked to the actual objects and methods) so if the C++ code asks also to be linked to the methods and objects that it wants it might work. From then on it needs to use the values that it was given.

I think once a class has been subclassed in the vm then no new methods or properties can be added to it otherwise the offsets for the new methods and properties that the subclass adds to it will have to be changed. Also methods and properties of a class that has been subclassed can't be deleted for the same reason. Whole classes can be unloaded and using the memory method described above - and all the pointers to it will turn to null. And the defintion tree will also be removed (so no objects can link to it from then on). I don't think code can be re-linked once it has been loaded, it has to be removed and loaded from source again.

I am planning to use this load/unload functionality to provide scripting in levels and with objects. A script accompanies the level or character and gets loaded into memory. The xml file that comes with the level / 3D model has the name of the class that sets up all the script objects for that level. The objects mostly subclass classes that already exist, like triggers, characters etc. This way, no code for the rest of the engine references them directly, they just use their inherited methods and interface definitions to communicate to them. This means that when the level is unloaded, they can just be deleted without fuss.

On the -using scripts to extend games- topic, what do you think is best, subclassing a class that calls it's own methods - or using delegates?

Subclassing
class character    void tick()        if (keyboard.forward_key())            forward()        elseif (keyboard.backwards_key())            backwards()        elseif (keyboard.left_key())            left()        elseif (keyboard.right_key())            right()        void forward()    void backwards()    void left()    void right()// my characterclass hero : character    void forward()        animation.run_forward(1.0)    void backwards()        animation.run_backwards(0.5)    void left()        animation.strafe_left(0.7)    void right()        animation.strafe_right(0.7)


Delegates
class game_state    character player    void tick()        if (keyboard.forward_key())            player.forward()        elseif (keyboard.backwards_key())            player.backwards()        elseif (keyboard.left_key())            player.left()        elseif (keyboard.right_key())            player.right()class character    void forward()    void backwards()    void left()    void right()// my characterclass hero : character    void forward()        animation.run_forward(1.0)    void backwards()        animation.run_backwards(0.5)    void left()        animation.strafe_left(0.7)    void right()        animation.strafe_right(0.7)

It doesn't change the actual implementation code (hero is the same in both cases). I think the delegate style is slightly better because the programmer doesn't have access to the tick method (can be private in whatever object that has it). But then again the subclassing method encapsulates all the character code in one object.
Quote:

What do you do about objects being deleted? The pointer will just go to some memory - we don't know if the object that is there is the one we want or not. I was thinking about having a collection of pointers to the pointers in other places (stack, objects properties) and whenever you assign the variable to another object, you tell the object to remove the pointer to the pointer. Whenever you point your pointer to the object, you tell it to remember your pointer. When the object is going to be deleted, it points all the pointers that were pointing at it, to null.

if you want to use a garbage collected approach this won't be necessary, because an object is only deleted, when all pointers are null.
if you don't want to use gc, you could simply use id's for objects, instead of real pointers, everytime an object gets delete it's id get invalidated, and for the next object a new id gets choosen - if you would use a 32 bit id, you will have so many ids, that only after some time a id will have to be reused.
if you would use this approach you could easily tell, if an id is invalid (simply look for an object, wich this id and if it does not exists, it's invalid.even if an object with this id does exist, you could check, if it is of the desired type).
the only problem would be the lookup speed.
Quote:

On the -using scripts to extend games- topic, what do you think is best, subclassing a class that calls it's own methods - or using delegates?

why don't you just use closures? - thex are normally used in event based enviroments.
but if i had to choose between own methods and delegats, i would rather use delegates.
Quote:
Original post by cmp
if you want to use a garbage collected approach this won't be necessary, because an object is only deleted, when all pointers are null.
if you don't want to use gc, you could simply use id's for objects, instead of real pointers, everytime an object gets delete it's id get invalidated, and for the next object a new id gets choosen - if you would use a 32 bit id, you will have so many ids, that only after some time a id will have to be reused.
if you would use this approach you could easily tell, if an id is invalid (simply look for an object, wich this id and if it does not exists, it's invalid.even if an object with this id does exist, you could check, if it is of the desired type).
the only problem would be the lookup speed.

Is the pointer method bad?

I did think about the id method, but you can't really be guaranteed that something bad might happen, if another object was using that id.

On the subject of safety, should I check each method / property access for out of bounds? Or just have a mode that has a lot of safety features, and only use it for debugging?

Because, if it is possible to screw up by accessing a method that is too far away from the object (off the end of the array) anyway then the memory issue isn't such a big deal.

I was planning on using a memory method like the one used in Objective-C. If you don't want an object to turn null on you (you are using it) then you 'retain' it. If you don't need it any more then you 'release' it. On the last release call to the object, it deletes itself. It's like an active retain count, but you have control over it and there's no garbage collector.

Quote:
Quote:
On the -using scripts to extend games- topic, what do you think is best, subclassing a class that calls it's own methods - or using delegates?

why don't you just use closures? - thex are normally used in event based enviroments.
but if i had to choose between own methods and delegats, i would rather use delegates.

Closures are good for the simple extensions, but if you wanted to add another npc for example (lots of actions etc.), it would be best to have a new object.

I also like the delegate method. Perhaps there is two objects invloved. The character (the delegate) and the character controller (the calling object). You could think of the character controller as the senses and communication, and the character as the actions on the information.

I read an article about the AI in Halo 2, it was quite good. They use a sort-of object system.
when i had some spare time, i wrote a little vm:
#include <stdio.h>#include <string.h>#include "xvm.h"#include "opcodes.h"//	all opcodesxvm_bool __xvm_end(XVM_vm* vm){	vm->state = XVM_STOPPED;		return XVM_TRUE;}xvm_bool __xvm_throw(XVM_vm* vm){	if(vm->stack.top < 1)		return XVM_FALSE;		vm->state 		= XVM_EXCEPTION;	vm->exception.id	= vm->stack.mem[vm->stack.top];	vm->stack.top--;		return XVM_TRUE;}xvm_bool __xvm_push(XVM_vm* vm){		xvm_int32	value;		if(vm->code.size >= (vm->code.current + 4)){				if(vm->stack.size >= (vm->stack.top + 1)){						//	((xvm_uint32*)vm->code.mem)[vm->code.current + 1] is not possible, 			//	currently maybe because of endianess, or memory allignment						value = vm->code.mem[vm->code.current + 1];			value <<= 8;			value += vm->code.mem[vm->code.current + 2];			value <<= 8;			value += vm->code.mem[vm->code.current + 3];			value <<= 8;			value += vm->code.mem[vm->code.current + 4];						vm->stack.mem[vm->stack.top] = value;			vm->stack.top++;						//	skip the data			vm->code.current += 4;		}else			return XVM_FALSE;	}else		return XVM_FALSE;		return XVM_TRUE;}//	stack modificationxvm_bool __xvm_dub(XVM_vm* vm){	if(vm->stack.size < (vm->stack.top + 1))		return XVM_FALSE;	if(vm->stack.top < 1)		return XVM_FALSE;		vm->stack.mem[vm->stack.top] = vm->stack.mem[vm->stack.top - 1];	vm->stack.top++;		return XVM_TRUE;}xvm_bool __xvm_pick(XVM_vm* vm){		xvm_int32 offset = 0;		if(vm->stack.size < (vm->stack.top + 1))		return XVM_FALSE;	if(vm->stack.top < 2)		return XVM_FALSE;		vm->stack.top--;	offset = vm->stack.mem[vm->stack.top];			//	offset > 0 	relative	//	offset == 0 	same as dub	//	offset < 0	from the begining of the stack	if(offset < 0){		if(offset >= vm->stack.top)			return XVM_FALSE;				vm->stack.mem[vm->stack.top] = vm->stack.mem[-offset];	}else if(offset == 0){		return __xvm_dub(vm);	}else{		if(offset >= vm->stack.top)			return XVM_FALSE;		vm->stack.mem[vm->stack.top] = vm->stack.mem[vm->stack.top - 1 - offset];	}	vm->stack.top++;	return XVM_TRUE;}xvm_bool __xvm_pop(XVM_vm* vm){		if(vm->stack.top < 1)		return XVM_FALSE;		vm->stack.top--;		return XVM_TRUE;}xvm_bool __xvm_put(XVM_vm* vm){	xvm_int32	offset = 0;	xvm_int32	value = 0;		if(vm->stack.top < 2)		return XVM_FALSE;		offset = vm->stack.mem[vm->stack.top - 2];	value = vm->stack.mem[vm->stack.top - 1];	vm->stack.top -= 2;	if(offset >= vm->stack.top)		return XVM_FALSE;			if(offset < 0){		vm->stack.mem[-offset] = value;	}else{		vm->stack.mem[vm->stack.top - 1 - offset] = value;	}				return XVM_TRUE;}//	(top - 2) = (top - 2) + (top - 1)#define __XVM_INT_OP(__name__, __op__)				xvm_bool __name__(XVM_vm* vm){						if(vm->stack.top < 2)							return XVM_FALSE;													xvm_int32	A = vm->stack.mem[vm->stack.top - 2];		xvm_int32	B = vm->stack.mem[vm->stack.top - 1];		vm->stack.mem[vm->stack.top - 2] = __op__ ;			vm->stack.top--;														return XVM_TRUE;					}											__XVM_INT_OP(	__xvm_add, 	A + B);__XVM_INT_OP(	__xvm_sub, 	A - B);__XVM_INT_OP(	__xvm_subi, 	B - A);__XVM_INT_OP(	__xvm_mul, 	A * B);__XVM_INT_OP(	__xvm_div, 	A / B);__XVM_INT_OP(	__xvm_divi, 	B / A);__XVM_INT_OP(	__xvm_mod, 	A % B);__XVM_INT_OP(	__xvm_modi, 	B % A);//	the interfaceconst int XVM_STD_OPCODE_COUNT = 255;const int XVM_STD_CBUFF_SIZE = 1024;const int XVM_STD_SBUFF_SIZE = 100;XVM_vm* XVM_init(){	XVM_vm* vm = NULL;		//	test all sizes	if(	sizeof(xvm_uint8) != 1 ||		sizeof(xvm_uint32) != 4 * sizeof(xvm_uint8))	{		printf("wrong sizes\n");		exit(-1);	}		vm = (XVM_vm*)malloc(sizeof(XVM_vm));		//	allocate an array for all opcode functions	vm->opcodes = (XVM_opcode_func*) malloc(sizeof(XVM_opcode_func)* XVM_STD_OPCODE_COUNT);		vm->opcode_count	= XVM_STD_OPCODE_COUNT;	vm->opcode_used		= 0; 		vm->code.mem		= (xvm_uint8*) malloc(sizeof(xvm_uint8) * XVM_STD_CBUFF_SIZE);	vm->code.size		= XVM_STD_CBUFF_SIZE;	vm->code.used		= 0;	vm->code.current	= 0;		vm->stack.mem		= (xvm_int32*) malloc(sizeof(xvm_int32) * XVM_STD_SBUFF_SIZE);	vm->stack.size		= XVM_STD_SBUFF_SIZE;	vm->stack.top		= 0;		vm->state		= XVM_PAUSED;		return vm;}void XVM_free(XVM_vm* vm){			free(vm);}xvm_bool XVM_load_opcode(XVM_vm* vm, xvm_uint32 num, XVM_opcode_func func){	if(num > vm->opcode_count)		return XVM_FALSE;		vm->opcodes[num] = func;	return XVM_TRUE;}xvm_bool XVM_load_std_opcodes(XVM_vm* vm){		//	basic	XVM_load_opcode(vm, 	END, 	__xvm_end);	XVM_load_opcode(vm, 	PUSH, 	__xvm_push);		//	stack mod	XVM_load_opcode(vm,	PICK,	__xvm_pick);	XVM_load_opcode(vm,	DUB,	__xvm_dub);	XVM_load_opcode(vm, 	POP,	__xvm_pop);	XVM_load_opcode(vm, 	PUT,	__xvm_put);		//	arithmetic	XVM_load_opcode(vm, 	ADD, 	__xvm_add);	XVM_load_opcode(vm, 	SUB, 	__xvm_sub);	XVM_load_opcode(vm, 	SUBI,	__xvm_subi);	XVM_load_opcode(vm, 	MUL, 	__xvm_mul);	XVM_load_opcode(vm, 	DIV, 	__xvm_div);	XVM_load_opcode(vm, 	DIVI, 	__xvm_divi);	XVM_load_opcode(vm, 	MOD, 	__xvm_mod);	XVM_load_opcode(vm, 	MODI, 	__xvm_modi);		return XVM_TRUE;}xvm_bool XVM_load_code(XVM_vm* vm, const xvm_uint8* code, xvm_uint32 size){		if((vm->code.used + size) > vm->code.size)		return XVM_FALSE;		void* dst = vm->code.mem + vm->code.used;	memcpy(dst, code, size);		vm->code.used += size;		return XVM_TRUE;	}xvm_bool XVM_run(XVM_vm* vm){	xvm_bool 	result;	//XVM_opcode_func	func;	xvm_bool (*func) (XVM_vm*);		//	if paused resume	if(vm->state == XVM_PAUSED)		vm->state = XVM_RUNNING;		while(vm->state == XVM_RUNNING){		//	run the current opcode		//if(vm->code.mem[vm->code.current] >= vm->opcode_used)		//	return XVM_FALSE;				func = vm->opcodes[vm->code.mem[vm->code.current]];		result = func(vm);				if(result != XVM_TRUE){			vm->state = XVM_ERROR;			return XVM_FALSE;		}				vm->code.current++;				if(vm->code.current >= vm->code.used){			vm->state 		== XVM_EXCEPTION;			vm->exception.id	= EXC_CODE_OVERFLOW; 			return XVM_FALSE;		}	}		return XVM_TRUE;}

as you can see, there is much wich would have to add, but as a proof of concept it works perfectly.
it could be used this way:
#include <stdio.h>#include "xvm.h"#include "opcodes.h"xvm_uint8 code[255] =	{	PUSH, INT(1),	PUSH, INT(2),	PUSH, INT(3),	PUSH, INT(4),	PUSH, INT(5),	SUBI,	PICK,	END};void print_stack(XVM_vm* vm){	int i = 0; 	for(i = 0; i < vm->stack.top; i++)		printf("[%d] : %d\n", i, vm->stack.mem);}int main(int argc, char** argv){	XVM_vm*	vm = NULL;		vm = XVM_init();	XVM_load_std_opcodes(vm);	XVM_load_code(vm, code, 255);		if(XVM_run(vm) != XVM_TRUE){		if(vm->state == XVM_ERROR)			printf("an error occured during execution in an opcode function\n");		else if(vm->state == XVM_EXCEPTION){			printf("an exception was thrown during execution\n");			printf("id = %d\n", vm->exception.id);		}else			printf("an error occured\n");	}else{	//	display the stack		print_stack(vm);	}		XVM_free(vm);}


the int macro does translate an integer into the 2's complement format (i think my vm uses another endianess than the x86, so it's needed).

[Edited by - cmp on December 4, 2004 12:49:54 PM]

This topic is closed to new replies.

Advertisement