Advertisement

Implementing the range-based for loop (aka foreach)

Started by September 30, 2024 05:39 PM
5 comments, last by Miss 8 hours, 49 minutes ago

Last time I tried to implement a new engine property for AS, which gave me some basic understanding of AngelScript's codebase. Now, I'm looking at the source of AS and trying to implement a range-based for loop (for(auto item : range) in C++). This feature is also named “for each” in some languages. I found that this topic has been marked in the to-do list 10 years ago, saying this topic needs some further considerations. I decide to post this to ask for suggestions and help.

My Proposed Implementation

The key concepts of range-based for loop in C++ (and likely in most of the languages) are:

  1. being(): Returns the initial iterator position.
  2. operator!= and end(): Decide when to end the loop.
  3. operator* : To retrieve value from the iterator.
  4. operator++: Advance the iterator to the next position.

The separate iterator type will easily lead to unsafe code like dangling reference/pointer, thus I decide to put those specific methods inside the class to iterate, so the class code can validate the iterator.

For example, given a script class:

class Range
{
  string[] data(4); // An array of string whose size is 4

  // Here use `uint` as the iterator type for simplicity,
  // it can be a complex custom type in real-world application
  
  // Can have a non-`const` overload if the class wants different iterator type in const and non-const version
  // For example, providing a writable reference in non-`const` version.
  uint opForBegin() const
  {
    return 0;
  }
  
  bool opForEnd(uint iter) const
  {
    return iter == 4;
  }
  
  string& opForValue(uint iter)
  {
    // Can perform so checks to validate the iterator here
    return data[iter];
  }
  
  // Overload for const version
  const string& opForValue(uint iter) const { /* ... */ }
  
  uint opForNext(uint iter) const
  {
    // Can perform so checks to validate the iterator here
    return iter + 1;
  }
};

The same rule also applied to an application-registered class.

Then the range-based for will look like this:

Range r;
for(auto& item : r) // The `item` is reference to a string
  do_something(item);

which is equivalent to

Range r;
// Add an intermediate variable `iter` for explanation
for(auto iter = r.opForBegin(); !r.opForEnd(iter); iter = r.opForNext(iter))
{
  auto& item = r.opForValue(iter);
  do_something(item);
}

About Safety Concerns

  1. The lifetime of the container/range must be kept throughout the loop. If the range is a return value, then the values in the whole expression must be kept to reduce surprising bug. For example, given for(auto& item : gen_list_of_ref(new_ref_value()).get_subrange(…)), any intermediate values should be kept since it might be referred in the range to iterate. (BTW, the range-based for in C++ also had the same problem until C++23 introduced a new rule of lifetime expansion.)
  2. Dangling reference can be avoided by performing validation inside those specific methods for application-registered container class. For example, checking the pointer whether it is pointing the buffer of current container, then raise an exception, etc. This can also guarantee that if container is modified inside the loop, the program won't crash due to access violation at least. Though the result will be obviously undefined or unexpected. (In my opinion, it's script writer's duty to make sure container is not modified during the loop. The AS only needs to provide a basic guarantee that prevents bad script from crashing the host application.)

Further Developments

Here are some more complex ideas, that I may not implement them in the initial version. But I think they are worth of discussion.

  1. Multiple iterating values at the same time.

    for(auto key, val : dict) for iterating over a dictionary-like object. This can probably be done by declaring opForValue0(iterator_type iter), opForValue1(iterator_type iter)

  2. set_opForValue and get_opForValue properties for complex logic, like the get/set_opIndex.
  3. An behavior called DESTROY_ITERATOR for destroying iterator after loop for application-registered type. Then the host can registered void* opForBegin_impl() as int opForBegin() (like the template callback bool (asITypeInfo*, bool&) is registered as bool f(int&in, bool&out) ), thus the iterator type can be completely opaque to the script. But this might need the compiler to prohibit script from using these methods directly.

About Changes to Existing Code

If I understand the codebase correctly, modifying as_parser.cpp and as_compiler.cpp is enough for the basic functionalities. Some helper interface can be added to the script engine to make it easy for application-registered code to utilize those script feature. An example is to register an enumerate wrapper for who still needs index in range-based for, for(auto idx, val : enumerate(array))

None

Hi Henry,

I haven't had the time to sit down and rethink all the complications that needs to be resolved for implementing a for-each support, though given that it is basically syntax sugar for a normal for-loop it should be fine. If you want to move ahead with this feel free to do so, I'd love to take a look at it once you get it working.

One thing: I think this warrants a new keyword, e.g. ‘foreach’, rather than reusing the ‘for’ keyword.

Just for curiosity: for-each loop in different languages: https://en.wikipedia.org/wiki/Foreach_loop

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Advertisement

One thing: I think this warrants a new keyword, e.g. ‘foreach’, rather than reusing the ‘for’ keyword.

A new keyword might break existing script and tools, e.g. method named foreach for iterating a container, or code completion / syntax highlighting tools in many editors.

The compiler can distinguish between two types of for since the range-based one will use : inside the parentheses. This also won't add too much complexity to the parser/compiler. I think an additional branch like IsRangeVarDecl() inside the asCParse::ParseFor is enough.

None

Yes, I understand that the parser can easily see the difference between the two. But I'm not designing a language for being easy to parse, it should be easy for the human to read as well. That is why I will suggest to introduce the keyword foreach to clearly indicate the difference between the loop commands.

Yes, it might break someone's scripts if they decide to upgrade AngelScript. But so can any other change I do. I will not limit the choices for new features because it might break something, otherwise I might as well stop developing new features all together. I will always try to maintain backwards compatibility when reasonable, but I can only do that for the scope of my library, not for anything that someone else might have added on top of my library.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

As an alternative to the keyword ‘foreach’, it would be possible to use a context sensitive word ‘each’. So the syntax would be

for each(auto i : rangei) {}

That would have less risk of conflicting with existing scripts. ‘each’ would not be a reserved keyword and would only have a meaning to the compiler in this context.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

I agree with WitchLord's sentiment about updating. If this conflicts with people's existing implementations, so be it.

Personally I think the foreach keyword makes a lot more sense than a separate for each, as it's a lot more familiar to users of other programming languages.

Additionally, it looks like ActionScript of all places uses for each which is interesting to note as many of our users have confused Angelscript with ActionScript! It wouldn't hurt setting it apart a bit more from ActionScript 🙂

Advertisement