I made the algorithm myself, it took quite a lot of work to get it running this fast, with proper Flock behaviour.
And yep, that's the point - the main challenge is to make a really fast algorithm
There are plenty of boids algorithms available of course, the trick it to get the behaviour looking nice AND running fast.
I'd be very impressed if you got 100k even in Unity or Java on the web. I wouldn't be surprised with that number running directly in Windows, but in a browser that number would be hard to achieve at an acceptable framerate.
Obviously on a GPU platform, the pixels being rendered are going to be practically free (where as on my platform, they use like 75% of the avaialable power). So the trick is to develop some really fast flock code and object handling.