Advertisement

Assembly-Language Script Evolution

Started by August 19, 2007 06:04 AM
10 comments, last by Vorpy 17 years, 3 months ago
I'm making an evolver like Tierra based on an assembler-like scripting language similar to the "Little Man Computer" model ( http://en.wikipedia.org/wiki/Little_man_computer ). Here is the instruction set: 3xx - ADD - Take the value stored in mailbox xx and add it to whatever value is currently on the accumulator. 4xx - SUBTRACT - Take the value stored in mailbox xx and subtract it from whatever value is currently on the accumulator. 5xx - STORE - Take the value from the accumulator (non-destructive) and store it in mailbox xx (destructive). 6xx - LOAD - Take the value from mailbox xx (non-destructive) and enter it in the accumulator (destructive). 7xx - BRANCH (unconditional) - Reset the program counter to the value xx. That is, xx will be the next instruction executed. 8xx - BRANCH IF ZERO - If the accumulator contains the value 0, reset the program counter to the value xx. Otherwise, do nothing. 9xx - BRANCH IF POSITIVE - If the accumulator is 0 or positive, reset the program counter to the value xx. Otherwise, do nothing. 298 - INPUT - Fetch the value from the opened file's current line, put it in the accumulator (destructive), and move on to the next line. 299 - OUTPUT - Fetch the value from the accumulator (non-destructive), put it in the current line of the currently opened file, move on to the next line. 0xx - OPEN FILE - Open file number 0xx (can be any value from 000 to 099). This opens the file for INPUT or OUTPUT. 1xx - OPEN FILE - Also opens a file for INPUT or OUTPUT (can be any value from 100 to 199). 2xx - OPEN FILE - Also opens a file for INPUT or OUTPUT (can be any value from 200 up to 297.). In this case, "mailbox" refers to lines in the file that is running (not to be confused with the file is opened by the file that is running). The scripts will be in their own files with valid names numbering from 000 to 297. What are some good pointers for me? Something I'm wondering about is how will I be able to make sense of it once some mutations happen. I don't want to be looking at a file full of numbers, how do I help make it easier on my eyes to see if any interesting mutation developed. I'll have to make the very first self-replicating script by hand. It'll be a script that copies itself to the file with a number one higher than itself. EDIT: And mutations will happen randomly during OUTPUT by having the 3-digit decimel string converted to a binary stream and having one random digit inverted. So for example, when any time a script would OUTPUT a line to another file, there would be let's say a 1 in 1000 chance of a mutation. If this occured, the 3-digit decimal that would normally be outputted to this other file would be converted to a binary stream. Let's say if the decimal was 152, then the binary stream would be 10011000. A digit from this would be picked randomly and inverted, so let's say it picked the fourth digit 100[1]1000 and inverted it to 10001000, turning the decimal value into 136. EDIT: The script interpreter that runs the files will be a console window that will randomly pick a file number, execute that file, and once that script has finished, move on to another random file. But one problem I see is that the script interpreter might get stuck in an infinite loop and never be able to move on to other files (if for example a mutation developed where a BRANCH statement made the script repeat over and over again). How would I prevent this? [Edited by - polyfrag on August 19, 2007 6:31:30 AM]
Assembly language programs will probably be too fragile for anything interesting to be generated with this scheme. Any single mutation would cause the program to become unable to replicate itself.

To make it easier to read the programs, write a disassembler to translate the numbers back into instruction names.

To prevent infinite loops, just limit the number of instructions a program is allowed to execute.
Advertisement
But isn't this the same way Tierra did mutation?

Quote: Ray altered the Tierra system to simulate a computer with a slight flaw. Every now and again, the machine code instruction which copied data between memory cells would randomly flip one of the bits during copying. If the data being copied was the machine code of the program itself as it tried to reproduce, the result would be a slightly different mutant program.

http://www.infidels.org/library/modern/meta/getalife/coretierra.html
Well, I just read through some stuff at the tierra web page to try to figure out what it does. It has a link to Avida, which is based on Tierra.

Some of my observations:

Tierra uses a timesharing model, where the OS allocates timeslices to processes and switches between them when their timeslice expires (after a certain number of instructions).

Processes allocate memory before writing to it. This prevents programs from corrupting each other. There is an instruction that allocates a requested amount of contiguous memory and then ages the process.

When the system runs out of memory, it starts killing processes, starting with the oldest, to free up space.

Many instructions use templates consisting of a series of no-ops following the instruction. The system then looks for a complimentary template nearby; the direction of the search depends on the instruction; it can be forward, backward, or the closest either way. There are at least 2 types of no-ops, so for example nop1 and nop2 are compliments. These instructions do things like jump, jump and store a return address, load an address into a register, etc.

The interesting results tend to involve things like smaller programs becoming more successful because they can copy themselves faster. Timeslicing or parallel execution is a major part of the system, as well as the use of templates to act as labels for jumps which allow processes to sometimes jump into another process's code.
Quote: Many instructions use templates consisting of a series of no-ops following the instruction. The system then looks for a complimentary template nearby; the direction of the search depends on the instruction; it can be forward, backward, or the closest either way. There are at least 2 types of no-ops, so for example nop1 and nop2 are compliments. These instructions do things like jump, jump and store a return address, load an address into a register, etc.

The interesting results tend to involve things like smaller programs becoming more successful because they can copy themselves faster. Timeslicing or parallel execution is a major part of the system, as well as the use of templates to act as labels for jumps which allow processes to sometimes jump into another process's code.


I don't understand how tierra's assembly language works then... what are templates?
Yeah, it took me a while to figure out what templates were, and Tierra's documentation is confusing. The Avida documentation explains it more clearly.

A template is just a continuous series of nops. The language has at least two nops. Let's call them nop1 and nop2. These instructions do nothing on their own, and the program advances right through them. The template is used sort of like a label. For example, nop1 nop1 nop1 nop2 might be used to mark the start of a loop or function. To jump to that place, an appropriate instruction is followed by the template's complement, nop2 nop2 nop2 nop1. The instruction searches for the compliment, and might jump to it or load its address into a register.

These templates allow parasite programs to evolve by matching templates in the code of other programs, so they might set the registers a certain way and call another program's function. They also might allow control flow to change around a bit, I don't know. The whole thing is kind of confusing to me and I'm amazed that it actually did something interesting.

Avida also allows nops to modify adjacent instructions, so an instruction to load a value into register A might load it into register B instead if preceded by a certain nop.
Advertisement
I still don't understand what you were saying about mutation... What I got was that if the program had a timesharing model where several processes took turns running their own scripts, it added the possibility of two programs writing to a single file, overwriting each other's instructions, which added a new mechanism for mutation.
I really didn't see anything in there about two programs writing to the same place at the same time. My impression of it was that a process asks for memory allocation and is then given write access to an area of memory, but I could be wrong. If a process dies while in the middle of copying itself then that is how code from different processes can become mixed up.

The template system works as a way to create simple labels. Because the labels are so simple, it is not difficult for a mutation to cause a jump to a different label or to a label in a different program.
Quote: I really didn't see anything in there about two programs writing to the same place at the same time. My impression of it was that a process asks for memory allocation and is then given write access to an area of memory, but I could be wrong. If a process dies while in the middle of copying itself then that is how code from different processes can become mixed up.


Oh yeah that's what I meant, processes.


Quote: The template system works as a way to create simple labels. Because the labels are so simple, it is not difficult for a mutation to cause a jump to a different label or to a label in a different program.


I don't like using labels so much. How would I do input and output? With a 3 number decimel, its easy to manipulate the data. The commands, variables, and data values are all encodable in this format. But if I use a format like:

Add_To_Accumulator VariableA
Store_Accumulator_In VariableB
Branch_To_Line LineX
No_Op
No_Op
No_Op
No_Op
This_is_Line LineX
No_Op
No_Op
Branch_To_Line LineZ
Data_Value 213419
This_is_Line LineZ
Stop_Program

Then I can no longer manipulate the commands. Without labels I could do something like:

305 //ADD to accumulator value from line 5
503 //STORE accumulator value on line 3
000 //OPEN FILE #000, but this will be increased by 1 each iteration
701 //BRANCH (unconditional) back to line 1
001 //This is the value that will be added to the accumulator

Here I can seamlessly manipulate commands and their variables in the same way. If I were to do reproduction using labels, I would have to use a Copy_Line command, like here:

Set_Variable File_Name 2341235
Open_File File_Name
Set_Variable Line_Counter 1
This_is_Line Line_Where_Loop_Starts
Append_Line_To_File File_Name Line_Counter
Increment_Variable Line_Counter
Branch_To_Line Line_Where_Loop_Starts

Which removes the possibility of manipulating the lines. How would I do something like changing the command? Without labels, I can just change the front number. I could make the mutation randomly change the command if I used labels, but what if the new command had a different amount of arguments?

[Edited by - polyfrag on August 21, 2007 12:06:08 AM]
The labels are the patterns of no_ops.

NO_OP1
NO_OP1
NO_OP1
do some stuff
call routine by searching forward for the matching label
do more stuff
NO_OP1
NO_OP1
NO_OP2
Jump backwards to label matching the following template
NO_OP2
NO_OP2
NO_OP2
Dummy Instruction to separate NO_OPs
NO_OP2
NO_OP2
NO_OP1
subroutine stuff
jump to address that was pushed onto the stack by the subroutine call

The program flow here will be: do some stuff, subroutine stuff, do more stuff, and then it jumps back to the beginning and repeats.

I think this system is what encouraged parasites to evolve since it is easy to jump to the templates of another program. If the jump address is specified in the instruction itself then programs will have difficulty jumping into the code of other programs, or even just randomly combining in a somewhat meaningful way. Or maybe it would still work, I don't know.

This topic is closed to new replies.

Advertisement