Yet another performance comparison (AS vs Small)

Maurizio Ferraris · 2014-12-24T06:09:03

Hi all, I know that this matter has been covered several times, but in all previous posts, there are clear indication of interest about this matter. So, having done my own performance comparison I submit it to the community. I understand that the scope of this comparison is very specific, but that comes from a particular need I have. I currently build automation machines with a real time control done with a PC. Machine personalization is done with a series of programs currently written in Small. These programs are normally compiled into byte code, and the byte code is then injected into the real time environment where a virtual machine does the real execution. The virtual machine exposes to the script several system function call to operate on the physical machine hardware (reading/setting input/outputs, moving axes, setting variables, ecc ...) Due to the limitations of the Small language (lack of structures, no typing, no doubles, ...) I am investigating the possibility to switch to AngelScript. Having solved most of the interface problems, now I have an AngelScript compiler and a real time virtual machine for execution running, so I am able to make some performance comparisons. I started with a very simple script. Here is the AS version: int TestNum; void main(void) { int count = 0; for(int col = 0; col < 10000; col++) { count++; ExtVar = count; // Line A: Makes one call to external environment TestNum = count; // Line B: Only set a script variable } } Here is the Small version: new TestNum; main() { new count = 0; for(new col = 0; col < 10000; col++) { count++; Set_ExtVar(count); // Line A: Makes one call to external environment TestNum = count; // Line B: Only set a script variable } } As you can see they are very similar, and basically they are made of single loop making one local variable increment and than an assignment. Line A an B that you see in the code are alternatives. Only one of them were present during the evaluation. This are the results in mS: AS Small Line A 4.37 1.85 Line B 3.05 1.07 Looks like AS is two to three times slower than Small. Few final notes: The test was done on the same machine with similar load and same real time environment I didn't investigate the JIT, because I am not sure it can even work in the real time environment The external call is the same in both tests executing the same code, but it causes an additional 0.8mS in Small and 1.3mS is AS. Thus it is possible to conclude that the external function call is at least 0.5mS slower The jitter between test runs is in the order of than 0.05 in all cases The slight syntax difference between the two sources in line A is due to the fact that Small does not allow the definition of an external opAssing method, so en explicit function call is made. I believe that this result in a very similar byte code anyway. I am under the impression that AS virtual machine makes some allocation/deallocation during the run of the test, even if everything looks static (or at least allocated at the beginning). I tell this because if I let the test run repeatedly for one hour,, when I shout down the real time system the call to engine->Release takes as much as 5 minutes to complete. I am not able at the moment to tell exactly why. Should you need further information or have me make further tests, just ask. Regards. Mau. PS: Unfortunately these results forces us to stay with current solution, but I will keep a eye of further AS development, especially in the performance area.

ThyReaper

488

November 28, 2014 04:43 AM

If it would be possible for you to use our JIT, it would be fairly simple to add a custom line callback to it which caches the line number for each call. I'm not sure exactly what requirements exist for a real time program, but the JIT's behavior is simpler than most of what AngelScript's own compiler has to do.

WitchLord

4,860

November 28, 2014 08:56 PM

If I recall my university time correctly the main requirement for a real-time application is that the response time is predictable, i.e. it cannot be varying, e.g. due to increasing amount of time spent doing memory allocations, garbage collection, etc.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

ThyReaper

488

November 28, 2014 11:34 PM

That's my limited understanding as well, and I believe the JIT does satisfy that requirement as long as its allocator does as well (though that could be replaced with a suitable allocator if needed).

DvDmanDT

1,951

November 29, 2014 12:04 AM

There's typically a response time requirement, such that a task must complete within X µs. This may be hard to guarantee with unpredictable/varying algorithms. In such cases, you need to verify that the worst case falls within your limits, and that's a very hot research topic at the moment. In short, it's much better to just not do unpredictable/varying stuff.

Then there's multiple levels of real time and safety critical systems. Some standards and certifications have extremely strict requirements where you can barely have branches in your code while others are rather relaxed. If you are using something like an x86 cpu then you are likely on the more relaxed side.

ziomau

Author

115

December 02, 2014 07:49 AM

Hi All,

Thanks for all the information and suggestions.

GarbageCollect

I was under the impression that disabling the GC, would cause objects to be released immediately when needed.

Ok, my fault, I will add an explicit call and see. I will report here my results.

One additional question here:

I can call the GarbageCollect on a thread with no strict real time requirements, but I need to know how the execution of the GC will interact with the execution of the script interpreter which must continue to run in real time on alive objects.

I mean: does the GC just rise a semaphore at the beginning and drop it at the end (600mS later) or will protect single critical access to shared areas with finer granularity?

Or (better) since the GC is disabled in script execution, it just runs on its" to be released" memory being sure that nothing will be added there?

JIT

Regarding the Jit probably this is not the best place to ask, but I need some general information to understand if it can be applicable to my case.

Just a few questions if someone can give me a simple answer or redirect me to the available information ...

1) Does the JIT generate processor (x86 in my case) instructions, or it is some additional byte code optimization but still interpreted by a virtual machine?

2) In case real processor instructions are generated, I expect that they are first generated as data. So what is the method used to "jump" to the data? Nowadays processor normally disable data execution and only privileged instructions are allowed to change this.

3) Instead, in case only some internal optimization is done and not real processor instructions, what is the expected execution speed gain against the average script code?

Real Time

As previous posters have indicated there are several kinds of "real time". Sometimes they have to be certified and/or undergo particular scrutiny.

Fortunately, in my case, my code is not involved in deep space or in life savings, so no need for certification (that's also why I can use code like AS or Small without the need to certify them).

Nevertheless I need to control machine automation and compute space trajectory of several motors, and this need strict timing requirements.

This requirement is what it is normally called "hard real time" which means the deadline should never be missed. Opposite to the "soft real time" where this requirement have to be fulfilled on average, but sparse deadline miss is acceptable (games, or audio processors fall into this second category where this problem is normally solved by a sufficiently deep buffer).

The "hard real time" requirement means that I must be sure there aren't bottlenecks or unnecessary critical sections that may cause priority inversion or other nasty (for real time) effects.

Thanks.

Mau.

ThyReaper

488

December 02, 2014 08:46 AM

JIT

Regarding the Jit probably this is not the best place to ask, but I need some general information to understand if it can be applicable to my case.

Just a few questions if someone can give me a simple answer or redirect me to the available information ...

1) Does the JIT generate processor (x86 in my case) instructions, or it is some additional byte code optimization but still interpreted by a virtual machine?

2) In case real processor instructions are generated, I expect that they are first generated as data. So what is the method used to "jump" to the data? Nowadays processor normally disable data execution and only privileged instructions are allowed to change this.

3) Instead, in case only some internal optimization is done and not real processor instructions, what is the expected execution speed gain against the average script code?

1) It produces native x86 instructions with fallback to the VM under various conditions (some specific types of calls it can't handle natively, script exceptions, and any new ops that might be added since it was last updated).

2) The JIT requests a page from the OS which can be set to be executable. There is a rather simple class, CodePage, which is responsible for this allocation and can easily be changed. The JIT does expect that new code pages can be allocated dynamically, but a single large static page should be sufficient for most purposes. Jumping to the executable page is handled by the JIT instructions in the VM.

3) Native code runs between 2x and 10x faster depending on the exact code being executed and the architecture involved.

WitchLord

4,860

December 02, 2014 01:42 PM

GarbageCollect
I was under the impression that disabling the GC, would cause objects to be released immediately when needed.
Ok, my fault, I will add an explicit call and see. I will report here my results.
One additional question here:
I can call the GarbageCollect on a thread with no strict real time requirements, but I need to know how the execution of the GC will interact with the execution of the script interpreter which must continue to run in real time on alive objects.
I mean: does the GC just rise a semaphore at the beginning and drop it at the end (600mS later) or will protect single critical access to shared areas with finer granularity?
Or (better) since the GC is disabled in script execution, it just runs on its" to be released" memory being sure that nothing will be added there?

The garbage collector is non-blocking, i.e. you can run it in a secondary thread while the primary thread continues to execute the script. (of course, in this way you cannot compile the library with AS_NO_THREADS to turn off support for multithreading).

Your scripts appear to be well written and do not generate garbage on their own (since you didn't get any memory accumulation during normal script execution) but even if your scripts did generate garbage they wouldn't be blocked by the fact that the garbage collector was processing in a second thread.

Regards,

Andreas

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

ziomau

Author

115

December 04, 2014 10:24 AM

2) The JIT requests a page from the OS which can be set to be executable. There is a rather simple class, CodePage, which is responsible for this allocation and can easily be changed. The JIT does expect that new code pages can be allocated dynamically, but a single large static page should be sufficient for most purposes. Jumping to the executable page is handled by the JIT instructions in the VM.

That's what I imagined.

The problem for me is that the real time environment is not equivalent to the OS.

Somewhat similar, not identical and far from complete.

I will investigate if the page request call is available in my environment, or if some sort of workaround is possible.

Anyway, thanks for the hint.

Mau.

audioboy77

108

December 23, 2014 03:54 PM

I have been studying the AngelScript docs and so far was very impressed and was planning to integrate it into our codebase next year, it looks pretty perfect for our needs.

However, the talk about the garbage collection and its impliations however raised some concerns.

Basically, I dont understand why garbage collection is really neccesary at all, given that all objects are either stack based, or reference counted. Like Ziomau, I also assumed that the memory used by the reference-counted objects would be released when they are destructed (ie on the last Release call which sets the retain count to 0).

We use reference counting heavily in our code base (for realtime audio applications) and this approach never causes problems, as our design ensures that objects are never actually destroyed in the realtime threads, as we ensure that the very last Release call will always be made in the main thread. (But just to clarify, localised / without a global "garbage" list, hence avoiding the known drawbacks of that).

Would it not make sense to have a compiler flag to release memory on object destruction instead of using a garbage collector at all? Then it is more in the applications / script writers control.

I think the global garbage list also implies that objects created on additional threads will always be deleted in the main thread (assuming the garbage collector runs on the main thread). Is this correct? I would have to check, but this is also likely to cause issues for us.

Does that make sense or am I missing something? Any further insight into how the garbage collection works would be helpful.

audioboy77

108

December 23, 2014 04:49 PM

The garbage collector is non-blocking, i.e. you can run it in a secondary thread while the primary thread continues to execute the script. (of course, in this way you cannot compile the library with AS_NO_THREADS to turn off support for multithreading).

I guess that assumes that malloc and free are non-blocking, which Im not sure but I have always assumed that they are not (as it would be very difficult to write allocators, which are non-blocking, at least fast ones using linked lists for example)

Yet another performance comparison (AS vs Small)

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Yet another performance comparison (AS vs Small)

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines