Last time I tried to implement a new engine property for AS, which gave me some basic understanding of AngelScript's codebase. Now, I'm looking at the source of AS and trying to implement a range-based for loop (for(auto item : range)
in C++). This feature is also named “for each” in some languages. I found that this topic has been marked in the to-do list 10 years ago, saying this topic needs some further considerations. I decide to post this to ask for suggestions and help.
My Proposed Implementation
The key concepts of range-based for loop in C++ (and likely in most of the languages) are:
being()
: Returns the initial iterator position.operator!=
andend()
: Decide when to end the loop.operator*
: To retrieve value from the iterator.operator++
: Advance the iterator to the next position.
The separate iterator type will easily lead to unsafe code like dangling reference/pointer, thus I decide to put those specific methods inside the class to iterate, so the class code can validate the iterator.
For example, given a script class:
class Range
{
string[] data(4); // An array of string whose size is 4
// Here use `uint` as the iterator type for simplicity,
// it can be a complex custom type in real-world application
// Can have a non-`const` overload if the class wants different iterator type in const and non-const version
// For example, providing a writable reference in non-`const` version.
uint opForBegin() const
{
return 0;
}
bool opForEnd(uint iter) const
{
return iter == 4;
}
string& opForValue(uint iter)
{
// Can perform so checks to validate the iterator here
return data[iter];
}
// Overload for const version
const string& opForValue(uint iter) const { /* ... */ }
uint opForNext(uint iter) const
{
// Can perform so checks to validate the iterator here
return iter + 1;
}
};
The same rule also applied to an application-registered class.
Then the range-based for will look like this:
Range r;
for(auto& item : r) // The `item` is reference to a string
do_something(item);
which is equivalent to
Range r;
// Add an intermediate variable `iter` for explanation
for(auto iter = r.opForBegin(); !r.opForEnd(iter); iter = r.opForNext(iter))
{
auto& item = r.opForValue(iter);
do_something(item);
}
About Safety Concerns
- The lifetime of the container/range must be kept throughout the loop. If the range is a return value, then the values in the whole expression must be kept to reduce surprising bug. For example, given
for(auto& item : gen_list_of_ref(new_ref_value()).get_subrange(…))
, any intermediate values should be kept since it might be referred in the range to iterate. (BTW, the range-based for in C++ also had the same problem until C++23 introduced a new rule of lifetime expansion.) - Dangling reference can be avoided by performing validation inside those specific methods for application-registered container class. For example, checking the pointer whether it is pointing the buffer of current container, then raise an exception, etc. This can also guarantee that if container is modified inside the loop, the program won't crash due to access violation at least. Though the result will be obviously undefined or unexpected. (In my opinion, it's script writer's duty to make sure container is not modified during the loop. The AS only needs to provide a basic guarantee that prevents bad script from crashing the host application.)
Further Developments
Here are some more complex ideas, that I may not implement them in the initial version. But I think they are worth of discussion.
Multiple iterating values at the same time.
for(auto key, val : dict)
for iterating over a dictionary-like object. This can probably be done by declaringopForValue0(iterator_type iter)
,opForValue1(iterator_type iter)
…set_opForValue
andget_opForValue
properties for complex logic, like theget/set_opIndex
.- An behavior called
DESTROY_ITERATOR
for destroying iterator after loop for application-registered type. Then the host can registeredvoid* opForBegin_impl()
asint opForBegin()
(like the template callbackbool (asITypeInfo*, bool&)
is registered asbool f(int&in, bool&out)
), thus the iterator type can be completely opaque to the script. But this might need the compiler to prohibit script from using these methods directly.
About Changes to Existing Code
If I understand the codebase correctly, modifying as_parser.cpp
and as_compiler.cpp
is enough for the basic functionalities. Some helper interface can be added to the script engine to make it easy for application-registered code to utilize those script feature. An example is to register an enumerate
wrapper for who still needs index in range-based for, for(auto idx, val : enumerate(array))