Advertisement

WTF Sleep sort

Started by June 18, 2011 05:05 PM
25 comments, last by way2lazy2care 13 years, 4 months ago

That's also plainly not the same algorithm as "sleep sort."


It is.

Instead of buckets we have process/thread stack/address space which holds the numbers. Each thread/process is a single bucket.

The granularity of OS scheduler is finite, so is the delay precision, hence equal numbers or even similar, in case of inadequate granularity, end up in same "bucket".

It is.

Instead of buckets we have process/thread stack/address space which holds the numbers. Each thread/process is a single bucket.

The granularity of OS scheduler is finite, so is the delay precision, hence equal numbers or even similar, in case of inadequate granularity, end up in same "bucket".


Sure, but if you're relying on the facts about the OS scheduler, you're also running a heapsort because the scheduler is using a heap to sort processes by wakeup time. The time complexity of the heapsort will dominate all of the other terms in your analysis. Also, I revised my original post once I realized what you were saying, so you might want to look at that too.
-~-The Cow of Darkness-~-
Advertisement

Sure, but if you're relying on the facts about the OS scheduler, you're also running a heapsort because the scheduler is using a heap to sort processes by wakeup time. The time complexity of the heapsort will dominate all of the other terms in your analysis. Also, I revised my original post once I realized what you were saying, so you might want to look at that too.


What heap sort? What scheduler?

Algorithm has time complexity of O(n).
Using log(n) process scheduler to perform initial allocation will result in nlogn initial thread creation, which must, if algorithm is to be correct, complete before the first number is output. So this part must be treated as constant or algorithm isn't correct. Second part, the invocation of individual threads must once again be done in constant time, or algorithm isn't correct.

I've given estimates on the required completion times for algorithm to work correctly in previous reply. All these values end up as algorithmic constants. OS scheduler isn't capable of reliably meeting them - suspend the script for 5 seconds and it will be incorrect.

So how something is individually implemented doesn't matter - either it behaves as if everything is a constant factor or the algorithm doesn't produce correct results.

Interpreted more generally, though, I argue that there's no reason to take the O(n) step if you're running the algorithm in parallel. You have 24 cores, each core takes its numbers (in parallel), each core waits, and each core puts its numbers in the sorted array as it finishes. In practice there will probably be an O(n) term somewhere, but that's not really part of the algorithm.[/quote]

What cores?

Look at what the algorithm actually does, or better yet, why it works and when it would fail. Don't get stuck on trivialities of what might be happening in one specific case.

Scheduling doesn't require threads or heap or sorting.

Using log(n) process scheduler to perform initial allocation will result in nlogn initial thread creation, which must, if algorithm is to be correct, complete before the first number is output.


OK, sure.

So this part must be treated as constant or algorithm isn't correct.[/quote]

Wait, what? n log n grows with n. Thus, treating it as constant, in asymptotic analysis, is not correct. It doesn't matter if it happens "before the algorithm;" it's still the dominant term.

Second part, the invocation of individual threads must once again be done in constant time, or algorithm isn't correct.[/quote]

Yes, this is also true. The shell script's behavior does indeed become incorrect as n goes to infinity.

So how something is individually implemented doesn't matter - either it behaves as if everything is a constant factor or the algorithm doesn't produce correct results.[/quote]

Yes, I agree, so let's abstract away these details for a bit, which is what I think we both were trying to do.

Don't get stuck on trivialities of what might be happening in one specific case.

Scheduling doesn't require threads or heap or sorting.
[/quote]

That's precisely the opposite of what I'm saying in the second half of my post. I'm saying that we can run a variant of the algorithm in parallel (on multiple "cores") that never does anything requiring O(n) time, making the runtime complexity O(B) (where B is the largest number, in keeping with Prefect's notation) instead. Obviously the algorithm requires us to run in parallel anyway, so there's no reason to include the O(n) term at all.

My point, then, is that either we take the script as written, which requires O(n log n) time (and also doesn't work, but that could easily be fixed), or we take the general idea behind "sleep sort" and run it in such a way that it takes O(B) time, eliminating the O(n) term altogether. This is, I think, all that Prefect was trying to say.
-~-The Cow of Darkness-~-

Obviously the algorithm requires us to run in parallel anyway, so there's no reason to include the O(n) term at all.

No, it does not. There is nothing parallel about algorithm or sleeping. OSes today run 1000s of threads with base install, yet we don't have 1000 core systems.

Threads or processes are just separate stack frames or memory buckets.

My point, then, is that either we take the script as written, which requires O(n log n) time[/quote]
There is no log n. I have no clue where log n suddenly came into this discussion.

A scheduler or process management routine of some OS might be called by sleep and that might be implemented as logn, n^2 or n!. But complexity of sleep or schedule is O(1). Even Linux has a O(1) scheduler.

Wait, what? n log n grows with n. Thus, treating it as constant, in asymptotic analysis, is not correct. It doesn't matter if it happens "before the algorithm;" it's still the dominant term.[/quote]
Scheduling n numbers takes n time. It can be implemented as to require larger running time, but complexity of scheduling n numbers is n.

If one implements this on a system where schedule() takes logn, then algorithm will be correct only if nlogn = C and C is less than the time it takes to print first batch.


The confusing part here is that unlike usual algorithms where time and space are analyzed separately and correctness can choose to ignore one or another, they are both same and shared. So time needed to process individual parts directly affects correctness. The tradeoffs of how long scheduling takes are made between performance and memory. But from algorithmic perspective, they are O(1). There is nothing about schedule() that requires more than O(1).

For example, when analyzing algorithms that involve arithmetic operations one doesn't typically consider time needed to perform those, despite their complexity being logn (multiplication, for example). So quicksort is analyzed as if all arithmetic operations as well as element accesses were done in constant time. Same anomalies appear there - as size of array being quicksorted grows, paging effects cause running time to exhibit step function properties, despite algorithm remaining unchanged.

No, it does not. There is nothing parallel about algorithm or sleeping.

(...)

There is no log n. I have no clue where log n suddenly came into this discussion.

A scheduler or process management routine of some OS might be called by sleep and that might be implemented as logn, n^2 or n!. But complexity of sleep or schedule is O(1). Even Linux has a O(1) scheduler.


OK, let's assume that there's no parallelism at all, then. It is my understanding that in this case, typically, when a process goes to sleep, it is descheduled by the OS and goes on some data structure so that it can be woken up at the appropriate time. The data structure needs to be able to allow newly-sleeped threads to be put on it, and we need to be able to retrieve the next thread to wake up, as well. It is my understanding that this data structure is typically a Fibonacci heap or a sorted list in modern Operating Systems (including Linux).

Fibonacci heaps support constant-time insert, and they allow the minimum element to be found in constant time, but it still takes O(log n) time to remove the minimum element (which is necessary to allow more than one thread to wake up). As such, ultimately, what's being run is a heap sort in O(n log n), or, in the case of a sorted list, an insertion sort in O(n^2). Even if it's not a heap-sort, there's still no magical (and certainly no comparison-based) data structure that will allow all of these operations to be performed in constant time unless you're doing something in parallel.

That, I suspect, is where Sneftel (and everyone else) is getting the idea that this is actually just a heap sort.
-~-The Cow of Darkness-~-
Advertisement

1) Put N buckets on the floor, label them 0 .. N - 1.
2) Throw numbers in the buckets. (correctness invariant: this step must complete before next step starts)
3) Look at clock, see what hour it is. Pick up that bucket and print out its numbers.
[/quote]
In other words 2) Sort.

"Throw numbers into the buckets" is the actual sorting, the rest is just a fancy (and slow) print loop. You seem to have decided that this is constant factor - I don't see how. The algorithm is O(sort phase) + O(n). Can you demonstrate a sort phase that will run in O(n)?

I don't see how "this must be treated as constant or algorithm isn't correct" follows.
I would side with antheus. Algorithm analysis for O() has nothing to do with the system/hardware it is running on. O() is about the worst case scenario of an algorithm taken inside a black box where the only thing certain about the algorithm is the algorithm itself.

I would side with antheus. Algorithm analysis for O() has nothing to do with the system/hardware it is running on. O() is about the worst case scenario of an algorithm taken inside a black box where the only thing certain about the algorithm is the algorithm itself.


I actually agree with this. In theory, however, "sleep sort," as an abstract concept, runs in O(B) time, where B is the largest integer on the list. This is unquestionably true, and it's also what Prefect, I, and just about everyone else have said: the sorting algorithm never finishes until the last number is printed/saved, and obviously the last number's thread takes time proportional to itself to run.

Antheus, however, is arguing that, for various reasons, there is also an O(n) term in practice (note that this really doesn't matter, since O(n) will always be less than O(b) anyway). I'm simply pointing out that, in the shell script posted, the real work is being done by the Unix scheduler, and that not only is it not being run in parallel, but that it is actually ultimately a comparison sort, and so bounded by O(n log n).

Note that even in the best, theoretical case, the algorithm needs to take O(B) time. I see no justification for saying the complexity is anything other than this, except possibly O(n log n), as O(n log n) most accurately represents the original code posted. In any case, O(n) time is not really relevant; Antheus' suggestion that this is like a bucket sort is extremely interesting but really doesn't have much to do with the asymptotic complexity.
-~-The Cow of Darkness-~-

Antheus, however, is arguing that, for various reasons, there is also an O(n) term in practice (note that this really doesn't matter, since O(n) will always be less than O(b) anyway). I'm simply pointing out that, in the shell script posted, the real work is being done by the Unix scheduler, and that not only is it not being run in parallel, but that it is actually ultimately a comparison sort, and so bounded by O(n log n).


Whether we treat this as comparison sort comes down to one thing - is strict ordering of Sleep() timings preserved or not.

My claim is that it's not, since we're working with finite set of time values (integers in original example, but even floats have same problem). [1,1,1,1] - what order will they be printed?

As such, many input values will result in same discrete delay, hence "a bucket'.

If however we require that sort be stable, then it becomes a full comparison sort and logn bound.

The stability argument here is a difficult one, especially after looking at real world schedulers, which, on non-real-time OS or on multiple threads are not deterministic, hence the argument for a comparison sort doesn't hold. We end up with unstable sort. Scheduler behavior therefore becomes a property of algorithm itself. A scheduler which doesn't preserve strict order can be O(1). Even order-preserving scheduler can be O(1) at expense of memory. it's impractical as general purpose algorithm, but looks something like this:std::list<Process*> next_to_execute[MAX]; // next_to_execute = list of processes to execute at time i

void sleep(int n) {
next_to_execute[now + n].append(this);
}

void main() {
while (running) {
execute_all(next_to_execute[now]);
now++;
}
}


To demonstrate this, original example could be modified to store tuples of (value,index). Under typical scheduler, values belonging to same bucket would print in arbitrary order. At very least, most schedulers do not guarantee order of execution, especially on multiple threads, where such automatic synchronization would serialize entire code.

For me, stable comparison sort is a special case of the problem, one which most schedulers will not be able to achieve due to non-deterministic nature of threads. So might as well go for unstable non-comparison sort.

doesn't have much to do with the asymptotic complexity.[/quote]
Bucket sort is O(n). If original example can be shown to be equivalent, then it has same complexity.

This topic is closed to new replies.

Advertisement