Advertisement

Implementing the range-based for loop (aka foreach)

Started by September 30, 2024 05:39 PM
29 comments, last by WitchLord 4 weeks, 1 day ago

Last time I tried to implement a new engine property for AS, which gave me some basic understanding of AngelScript's codebase. Now, I'm looking at the source of AS and trying to implement a range-based for loop (for(auto item : range) in C++). This feature is also named “for each” in some languages. I found that this topic has been marked in the to-do list 10 years ago, saying this topic needs some further considerations. I decide to post this to ask for suggestions and help.

My Proposed Implementation

The key concepts of range-based for loop in C++ (and likely in most of the languages) are:

  1. being(): Returns the initial iterator position.
  2. operator!= and end(): Decide when to end the loop.
  3. operator* : To retrieve value from the iterator.
  4. operator++: Advance the iterator to the next position.

The separate iterator type will easily lead to unsafe code like dangling reference/pointer, thus I decide to put those specific methods inside the class to iterate, so the class code can validate the iterator.

For example, given a script class:

class Range
{
  string[] data(4); // An array of string whose size is 4

  // Here use `uint` as the iterator type for simplicity,
  // it can be a complex custom type in real-world application
  
  // Can have a non-`const` overload if the class wants different iterator type in const and non-const version
  // For example, providing a writable reference in non-`const` version.
  uint opForBegin() const
  {
    return 0;
  }
  
  bool opForEnd(uint iter) const
  {
    return iter == 4;
  }
  
  string& opForValue(uint iter)
  {
    // Can perform so checks to validate the iterator here
    return data[iter];
  }
  
  // Overload for const version
  const string& opForValue(uint iter) const { /* ... */ }
  
  uint opForNext(uint iter) const
  {
    // Can perform so checks to validate the iterator here
    return iter + 1;
  }
};

The same rule also applied to an application-registered class.

Then the range-based for will look like this:

Range r;
for(auto& item : r) // The `item` is reference to a string
  do_something(item);

which is equivalent to

Range r;
// Add an intermediate variable `iter` for explanation
for(auto iter = r.opForBegin(); !r.opForEnd(iter); iter = r.opForNext(iter))
{
  auto& item = r.opForValue(iter);
  do_something(item);
}

About Safety Concerns

  1. The lifetime of the container/range must be kept throughout the loop. If the range is a return value, then the values in the whole expression must be kept to reduce surprising bug. For example, given for(auto& item : gen_list_of_ref(new_ref_value()).get_subrange(…)), any intermediate values should be kept since it might be referred in the range to iterate. (BTW, the range-based for in C++ also had the same problem until C++23 introduced a new rule of lifetime expansion.)
  2. Dangling reference can be avoided by performing validation inside those specific methods for application-registered container class. For example, checking the pointer whether it is pointing the buffer of current container, then raise an exception, etc. This can also guarantee that if container is modified inside the loop, the program won't crash due to access violation at least. Though the result will be obviously undefined or unexpected. (In my opinion, it's script writer's duty to make sure container is not modified during the loop. The AS only needs to provide a basic guarantee that prevents bad script from crashing the host application.)

Further Developments

Here are some more complex ideas, that I may not implement them in the initial version. But I think they are worth of discussion.

  1. Multiple iterating values at the same time.

    for(auto key, val : dict) for iterating over a dictionary-like object. This can probably be done by declaring opForValue0(iterator_type iter), opForValue1(iterator_type iter)

  2. set_opForValue and get_opForValue properties for complex logic, like the get/set_opIndex.
  3. An behavior called DESTROY_ITERATOR for destroying iterator after loop for application-registered type. Then the host can registered void* opForBegin_impl() as int opForBegin() (like the template callback bool (asITypeInfo*, bool&) is registered as bool f(int&in, bool&out) ), thus the iterator type can be completely opaque to the script. But this might need the compiler to prohibit script from using these methods directly.

About Changes to Existing Code

If I understand the codebase correctly, modifying as_parser.cpp and as_compiler.cpp is enough for the basic functionalities. Some helper interface can be added to the script engine to make it easy for application-registered code to utilize those script feature. An example is to register an enumerate wrapper for who still needs index in range-based for, for(auto idx, val : enumerate(array))

None

Hi Henry,

I haven't had the time to sit down and rethink all the complications that needs to be resolved for implementing a for-each support, though given that it is basically syntax sugar for a normal for-loop it should be fine. If you want to move ahead with this feel free to do so, I'd love to take a look at it once you get it working.

One thing: I think this warrants a new keyword, e.g. ‘foreach’, rather than reusing the ‘for’ keyword.

Just for curiosity: for-each loop in different languages: https://en.wikipedia.org/wiki/Foreach_loop

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Advertisement

One thing: I think this warrants a new keyword, e.g. ‘foreach’, rather than reusing the ‘for’ keyword.

A new keyword might break existing script and tools, e.g. method named foreach for iterating a container, or code completion / syntax highlighting tools in many editors.

The compiler can distinguish between two types of for since the range-based one will use : inside the parentheses. This also won't add too much complexity to the parser/compiler. I think an additional branch like IsRangeVarDecl() inside the asCParse::ParseFor is enough.

None

Yes, I understand that the parser can easily see the difference between the two. But I'm not designing a language for being easy to parse, it should be easy for the human to read as well. That is why I will suggest to introduce the keyword foreach to clearly indicate the difference between the loop commands.

Yes, it might break someone's scripts if they decide to upgrade AngelScript. But so can any other change I do. I will not limit the choices for new features because it might break something, otherwise I might as well stop developing new features all together. I will always try to maintain backwards compatibility when reasonable, but I can only do that for the scope of my library, not for anything that someone else might have added on top of my library.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

As an alternative to the keyword ‘foreach’, it would be possible to use a context sensitive word ‘each’. So the syntax would be

for each(auto i : rangei) {}

That would have less risk of conflicting with existing scripts. ‘each’ would not be a reserved keyword and would only have a meaning to the compiler in this context.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

I agree with WitchLord's sentiment about updating. If this conflicts with people's existing implementations, so be it.

Personally I think the foreach keyword makes a lot more sense than a separate for each, as it's a lot more familiar to users of other programming languages.

Additionally, it looks like ActionScript of all places uses for each which is interesting to note as many of our users have confused Angelscript with ActionScript! It wouldn't hurt setting it apart a bit more from ActionScript 🙂

Advertisement

I'd greatly appreciate some ‘foreach’-kind of operator to reduce things in scripts like

  auto fieldCount = container.GetFieldCount();
  for (auto i = 0; i < fieldCount; ++i)
  {
    auto field = c.GetField(i);
...

Since exact copy-paste from-to C++ is unlikely, I vote ‘foreach’ as a keyword (two keywords breaks the uniformity IMO) :D

None

I was busy the past few days. Now I have time to continue this work. After seeing the discussion in this post, I decided to implement it as foreach(type var_name : some_range) . I have completed the parsing part, and now I'm working on the compiling part. But I'm still trying to understand details of the AS compiler, so this might take some time to finish.

UPDATE:

Can I generate a snDeclaration type script node in the parser satisfying CompileDeclaration, which is equivalent to type $iter = range.opForBegin(), as well as other syntactic sugars provided by the foreach. Or should I just write a new compiling code for foreach since it seems that foreach cannot directly reuse other parts of the compiler such as CompileDeclaration.

UPDATE 2

Maybe I can imitate the code of ImplicitConvObjectToPrimitive for how it creates intermediate variable and makes function call?

None

In general, it is not the job of the parser to try to fit the nodes into what the compiler is already using. The parser should just create the script nodes as it sees them. The compiler is the one that should interpret the script nodes to produce the correct bytecode sequence.

If you're in doubt, just copy the logic into a new method and change it the way you need it. I can do the clean-up to do the code reuse where possible.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

WitchLord said:

If you're in doubt, just copy the logic into a new method and change it the way you need it.

I copied logic from other places and adjust it. The major part is come from CompileForStatement. But the assertion keeps saying invalid stack size (line 2485 of as_bytecode.cpp, in the function DebugOutput). It seems that I didn't figure out the correct way to generate byte code setting stack size.

Here is my script

class Range
{
  int[] data(4);
  Range() { data = {1, 2, 3, 4}; }
  uint opForBegin() { return 0; }
  bool opForEnd(uint iter) const { return iter == 4; }
  int& opForValue(uint iter) { return data[iter]; }
  uint opForNext(uint iter) { return iter + 1; }
};

void Test()
{
  Range r;
  int s;
  foreach(int i : r)
    s += i;
  Print('foreach:' + s);
}

And here is the dumped bytecode

void Test()

Temps: 5, 8, 10

Variables: 
 002: (heap) Range r
 003: int s
 005: int 
 008: (heap) const string@ 
 010: (heap) string@ 


- 13,3 -
    0  10 *    SUSPEND
               VarDecl  0
    1  10 *    CALL     93           (Range@ Range())
    3  10 *    STOREOBJ v2
- 15,11 -
               VarDecl  1
            {
    4  10 *    SUSPEND
    5  10 *    CALLINTF 100           (uint Range::opForBegin())
    7   8 *    CpyRtoV4 v5
    8   8 *    CpyVtoV4 v4, v5
   10   8 *    JMP      +16              (d:28)
            4:
   12   6 *    SUSPEND
- 16,5 -
   13   6 *    SUSPEND
   14   6 *    PshV4    v4
   15   7 *    CALLINTF 102           (int& Range::opForValue(uint))
   17   4 *    RDR4     v5
   18   4 *    CpyVtoV4 v6, v5
   20   4 *    ADDi     v3, v3, v6
            3:
   22   4 *    PshV4    v4
   23   5 *    CALLINTF 103           (uint Range::opForNext(uint))
   25   2 *    CpyRtoV4 v5
   26   2 *    CpyVtoV4 v4, v5
- 15,11 -
            1:
   28   8 *    SUSPEND
   29   8 *    PshV4    v4
   30   9 *    CALLINTF 101           (bool Range::opForEnd(uint) const)
   32   6 *    JLowZ    -22              (d:12)
- 17,3 -
            2:
            }
   34   6 *    SUSPEND
   35   6 *    PGA      0x61c340          (str:foreach:)
   38   8 *    RefCpyV  v8, 0x5df380          (type:string)
   41   8 *    PopPtr
   42   6 *    PshV4    v3
   43   7 *    PshVPtr  v8
   44   9 *    CALLSYS  79           (string@ string::opAdd(int) const)
   46   6 *    STOREOBJ v10
   47   6 *    FREE     v8, 0x5df380          (type:string)
   50   6 *    ChkNullV v10
   51   6 *    PshVPtr  v10
   52   8 *    CALLSYS  90           (void Print(const string&in))
   54   6 *    FREE     v10, 0x5df380          (type:string)
- 18,2 -
   57   6 *    SUSPEND
   58   6 *    FREE     v2, 0x5fe8f0          (type:Range)
            0:
   61   6 *    RET      0

None

Advertisement