Sometimes you need the top k items in a list. A naive way to do it is to sort the list, then take the first k items. The problem with that approach is that you don’t need to sort the whole list, so it’s inefficient and becomes more so as the list length, n, increases relative to k.

To sort just the top k items, you can use a bounded priority queue; after feeding the whole list into it, the queue contains the top k items. This approach is better, but expends unnecessary effort maintaining the sort in the queue as items are added. If n is much larger than k, it would be faster to find the top items without sorting, then sort only the final list of top items.

The code below does a partial sort in linear time. Upon return, the first k positions of the array contain the largest or smallest items of the array, depending on the compare() function provided by the Ordered trait of the class contained in the array. The items in the first k positions are not sorted.

package scala

/*
 * FirstK
 *
 * These functions provide a linear partial sort of an array of Ordered objects. The array is
 * sorted in-place and upon return the the k smallest or largest items are in the first k positions.
 * The first k values are not sorted, but are less than or equal to the values in the rest of the
 * array. Whether the first k items are the largest or the smallest of the array is determined
 * by the compare function of the objects in the array.
 *
 * For example:
 * 	case class Foo(n: Int) extends Ordered[Foo] { def compare(other: Foo) = this.n.compare(other.n) }
 * means that firstK will return the smallest n values in Array[Foo]
 * 	case class Foo(n: Int) extends Ordered[Foo] { def compare(other: Foo) = -this.n.compare(other.n) }
 * means that firstK will return the largest n values in Array[Foo]
 *
 *  Example from scalatest unit test:
 *
 *    case class Foo(n: Int) extends Ordered[Foo] {
 *      def compare(other: Foo) = this.n.compare(other.n)
 *    }
 *
 * 	  test(" Test findFirstK() Length 2" ) {
 *      val array = Array(new Foo(2), new Foo(1))
 *      FirstK.findFirstK(array,  1)       // k = 1
 *      assert(array.deepEquals(Array(Foo(1), Foo(2))))
 *    }
 */
object FirstK {
	def swap[T](array: Array[T], x: Int, y: Int) = {
		val t = array(x)
		array(x) = array(y)
		array(y) = t
	}

	// Lomuto Partition
	def partition[T <% Ordered[T]](array: Array[T], left: Int, right:Int, pivotIndex: Int): Int = {
		val pivotValue = array(pivotIndex)
		swap(array, pivotIndex, right)              // Move pivot to end
		var storeIndex = left
		for (i <- left until right ) {
			if (array(i) <= pivotValue) {			// will use the compare fn, so may use > for desc sort
				swap(array, storeIndex, i)
				storeIndex += 1
			}
		}
		swap(array, right, storeIndex) 				// Move pivot to its final place
		storeIndex
	}

	// Note that p here is the position that ends the range of 'first' values.
	def findFirstToP[T <% Ordered[T]](array: Array[T], left: Int, right:Int, p: Int): Unit = {
		if (right > left) {
			val pivotIndex = (left + right) / 2
			val pivotNewIndex = partition(array, left, right, pivotIndex)
			if (pivotNewIndex > p)
				findFirstToP(array, left, pivotNewIndex - 1, p)
			if (pivotNewIndex < p)
				findFirstToP(array, pivotNewIndex + 1, right, p)
		}
	}

 	// intended entry fn; includes parameter checking. Modifies the array so that the first k items
 	// are largest/smallest of the array.
	def findFirstK[T <% Ordered[T]](array: Array[T],  k: Int) : Unit = {
		if (array==null || k==0 || k>array.size) {
			throw new IllegalArgumentException()
		}
		findFirstToP(array, 0, array.length - 1, k-1)		// fn is position-based so, k-1 for 'first k'
	}
}
Bookmark and Share

Leave a Reply

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>