gary on June 21st, 2010

Working with Vaadin, Jetty, Scala and Eclipse, I’ve seen intermittent deployment failures in which the web application would fail with a 500 error and

Problem accessing /HomepageDashboard/. Reason:

scala/ScalaObject
Caused by:

java.lang.NoClassDefFoundError: scala/ScalaObject

The problem is that the scala library files are not being copied correctly into the WEB-INFO/lib directory. This omission can be verified by looking into the war file:

jar tvf HomepageDashboard.war

for the content the WEB-INFO/lib directory.

Some suggested fixes, such as checking the scala library in the project’s “Java EE Modules Dependencies” have not worked consistently for me. Instead, the simple solution is to just manually copy the files where they need to go. The files are found inside the Scala distribution jar. Just extract and copy the needed files:

jar xvf scala.library_2.7.7.final.jar
cp lib/scala-dbc.jar lib/scala-library.jar lib/scala-swing.jar WebContent/WEB-INF/lib/

When the eclipse plugin deploys correctly, it copies all three. You probably only need the scala-library.jar.

Bookmark and Share
gary on April 7th, 2010

In the previous post, I showed how to call multivariate linear regression functions in the Apache Commons Math Statistics library. You might want to compare your results to Excel, perhaps to check your implementation or because you manually develop your analytic process before automating it in code. Here are some tips for comparing Excel results with those from code.

First, to perform a multivariate linear regression in Excel, use the LINEST function. It requires a formula array to see all of the results, so control-U and command-enter are your friends. Select an 5 row by n column group of cells, where n is the number of variables you’re fitting. Hit control-U to edit. Enter “=linest(y,x,1,1)” where y covers your output values and x covers the n columns of input values. Then hit command-enter to set the array formula for all of the cells. Check the Excel help for more information about LINEST usage.

Once you’ve figured out the array formula gymnastics, note that the coefficients are returned backwards! That’s right, the first row of the array formula contains the calculated coefficients for your fit, but in reverse order compared to the order of your input x columns. Go figure. You can even see this dyslexia in the third example in the LINEST help. Look for the section where the regression equation is constructed using the new coefficients.

The third parameter may cause confusion. If you’ve used LINEST for simple regressions, you may have used linest(y,x,,1) as your function, where x and y are columns. In that case, LINEST provides the slope in row 1, column 1 of the output array, and the intercept in row 1, column 2. Above we set the third parameter to 1 or true, which causes Excel to force the regression plane through the origin. Doing so allows us to compare results between Excel and the Commons Math regression functions. But wait, what if I don’t want the fit to go through the origin? What if I have no reason to think my function goes through 0?

It’s actually the standard way of doing regression. Derivations of regression equations don’t treat the intercept calculation separately. Instead, it’s built in. Remember, linear regression is linear on the regression coefficients. The function on the coefficients goes through the origin. The function you construct with those coefficients may or may not…
If you want a constant value in your regressor variables, add it as a column. The resulting coefficient is the intercept. More specifically, regression provides an output coefficient for each column of input. To create a non-zero intercept, add a column of 1s to your data. Regression will find a coefficient for that column. It will be the constant part of your function, the intercept.

For example, when fitting a single column of data, Excel allows you to input an x column and a y column. That’s one input variable, x. Then linest(y,x,,1) will return two values, the slope and the intercept. A more generalized way about thinking about this problem is to use two input columns: the original x values and a columns of 1s. Then linest(y,x,1,1) returns a coefficient for each column. The resulting equation is y=b1 * x1 + b0 * x0, where x0 is always 1.

With the generalized view, we can now compare results between Excel and coded results such as from Commons Math. The Excel coefficients in the first row of the LINEST array function and the values returned by estimateRegressionParameters() should match.

For example, to fit a parabola in both systems, we would perform a linear regression on a*x^2 + b*x + c. Create columns containing x^2, x, and 1. Then the regression will return a, b, and c.

Finally, note that that Excel returns the residual and regressor sum of squares. You can use these to check the the r-squared calculations in your code: r^2 = 1-ssRes/(ssRes+ssReg).

Bookmark and Share

Tags:

gary on April 7th, 2010

Linear Regression in Scala is as easy as calling the routines in the Apache Commons Math jar. We just need to add the calculation of the correlation coefficient, aka r^2, to see how well we fit. Here’s how:

Download the library from here and put the commons-math-2.1.jar file on your classpath. Here’s the example in the documentation converted to Scala:

		val regression = new OLSMultipleLinearRegression()
		// example from apache: http://commons.apache.org/math/userguide/stat.html#a1.5_Multiple_linear_regression
		val y = Array(11.0, 12.0, 13.0, 14.0, 15.0, 16.0)
		val x = new Array[Array[Double]](6,6)
		x(0) = Array(1.0, 0, 0, 0, 0, 0)
		x(1) = Array(1.0, 2.0, 0, 0, 0, 0)
		x(2) = Array(1.0, 0, 3.0, 0, 0, 0)
		x(3) = Array(1.0, 0, 0, 4.0, 0, 0)
		x(4) = Array(1.0, 0, 0, 0, 5.0, 0)
		x(5) = Array(1.0, 0, 0, 0, 0, 6.0)

		this.regression.newSampleData(y, x) 

		val beta = this.regression.estimateRegressionParameters()
		println("betas: " + beta.map("%.3f".format(_)).mkString(", "))

		// residuals, if needed
		val residuals = this.regression.estimateResiduals()

This works so cleanly because Scala compiles Array[Double] to double[]. It’s compatible with the Java library, while still allowing all the cool Scala functions like map().

Commons Math provides the matrix inversion needed to solve the normal equation and return the desired equation coefficients. But it doesn’t provide the correlation coefficient, often called “r-squared,” which tells you how much of the data’s variance is explained by the regression. We can add that calculation with the following code. Given the original Y values and the residuals, use:

	def calcRSquared(y:Array[Double], residuals:Array[Double]) = {
		val ssReg = sumSq(y.zip(residuals).map{case(a,b) => a - b})
		val rMean = sum(residuals) / residuals.size.toDouble
		val ssRes = sumSq(residuals.map(_ - rMean))

		1.0 - ssRes / (ssReg + ssRes)
	}

	def sq(in:Double) = in * in
	def sum(in: Array[Double]) = (0.0 /: in){_ + _}
	def sumSq(in: Array[Double]) = sum(in.map(sq))

The conciseness of Scala means that in reading the code, you can quickly discern both what a function does and how it does it. ssReg, for example is literally “the sum of the squares of pairs of items from y and residuals subtracted” or, more colloquially, “the sum of the squares of differences between y and residuals elements.

Bookmark and Share

Here’s a way to ensure that only one instance of your application runs at a time. Perhaps it updates a resource and you need to prevent duplicate updates. Or, in another use, I recently wanted to make sure an application was always running. So I wrote a cron job to periodically start the program; if it was already running, the new start failed.

A traditional method for ensure single instances of program uses file locking and PIDs. A program, upon start up, would get its process id (PID) from the OS, then write it to a file called “program.lock” in a known, fixed location. Then it would read the file and check the written PID against its own. If they match, great; continue. If they don’t match, it means that another file succeeded at creating the file, ie got the lock, so this application must shut down. You might think that if the file exists, then no check is necessary. However, the idea is to handle the situation when multiple instances start at nearly identical times. Neither sees the file and tries to create it, but only one succeeds because Unix file creation is atomic.

The traditional method is clever, but challenging to get right. Lock files must be removed on exit, requiring more code. If the program terminates unexpectedly, provisions must be made to catch the exit signal and clean up or use additional methods to reap leftover lockfiles.

Here’s a much simpler method. Like the traditional method, each application attempts to hold a shared resource. Only one succeeds, and the resource hold is automatically released on program termination. The resource? A OS network port. So the only risk is accidentally choosing a port needed by another application for actual communication.

All you have to do is choose a port to use as an “application exclusion group id”. Several applications use the id. But only one application among the group sharing the id can run at a time.

Here’s the code. Include the file in your program, then call SingletonApp.performSolo as shown in the demo. To demo, just run multiple copies of the file as shown in the comments. Only one will run at a time…

import java.net.BindException
import java.net.ServerSocket

/**
 * This class enables applications to ensure that only one instance
 * of the application runs at a time.
 *
 * @author  Gary Boone, PhD
 * @version 1.0, 2010/02/24
 */
object SingletonApp {

	var serverSocket:ServerSocket = null

	/**
	 * Pick a port number to use as the application exclusion
	 * group id. Choose a port that won't conflict with other
	 * applications. See
	 * http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers
	 * for common port numbers.
	 *
	 * Additionally, pass in a closure to be run if the application
	 * is already running.
	 */
	def performSolo( portToHold:Int)( fail: => Unit ) = {
		try {
			serverSocket = new ServerSocket(portToHold)
		} catch {
		  	case e:BindException => fail
		}
	}

	/**
	 * Demo the SingletonApp class
	 *
	 * Compile, or run with:
	 * scala -i SingletonApp.scala -e 'SingletonApp.main(null)' &
	 * Then start another instance to see it fail.
	 */
	def main( args:Array[String] ) : Unit = {
		val DEMO_PORT_TO_HOLD = 15486

		// ensure single instance
		SingletonApp.performSolo(DEMO_PORT_TO_HOLD) {
			println("failing due to already-running instance.")
			exit
		}

		println("Application started. Try to start another instance.")
		Thread.sleep(300000)		// 3e6 ms = 300 sec = 5 min
		println("exiting")
	}
}
Bookmark and Share
gary on November 2nd, 2009

Sometimes you need the top k items in a list. A naive way to do it is to sort the list, then take the first k items. The problem with that approach is that you don’t need to sort the whole list, so it’s inefficient and becomes more so as the list length, n, increases relative to k.

To sort just the top k items, you can use a bounded priority queue; after feeding the whole list into it, the queue contains the top k items. This approach is better, but expends unnecessary effort maintaining the sort in the queue as items are added. If n is much larger than k, it would be faster to find the top items without sorting, then sort only the final list of top items.

The code below does a partial sort in linear time. Upon return, the first k positions of the array contain the largest or smallest items of the array, depending on the compare() function provided by the Ordered trait of the class contained in the array. The items in the first k positions are not sorted.

package scala

/*
 * FirstK
 *
 * These functions provide a linear partial sort of an array of Ordered objects. The array is
 * sorted in-place and upon return the the k smallest or largest items are in the first k positions.
 * The first k values are not sorted, but are less than or equal to the values in the rest of the
 * array. Whether the first k items are the largest or the smallest of the array is determined
 * by the compare function of the objects in the array.
 *
 * For example:
 * 	case class Foo(n: Int) extends Ordered[Foo] { def compare(other: Foo) = this.n.compare(other.n) }
 * means that firstK will return the smallest n values in Array[Foo]
 * 	case class Foo(n: Int) extends Ordered[Foo] { def compare(other: Foo) = -this.n.compare(other.n) }
 * means that firstK will return the largest n values in Array[Foo]
 *
 *  Example from scalatest unit test:
 *
 *    case class Foo(n: Int) extends Ordered[Foo] {
 *      def compare(other: Foo) = this.n.compare(other.n)
 *    }
 *
 * 	  test(" Test findFirstK() Length 2" ) {
 *      val array = Array(new Foo(2), new Foo(1))
 *      FirstK.findFirstK(array,  1)       // k = 1
 *      assert(array.deepEquals(Array(Foo(1), Foo(2))))
 *    }
 */
object FirstK {
	def swap[T](array: Array[T], x: Int, y: Int) = {
		val t = array(x)
		array(x) = array(y)
		array(y) = t
	}

	// Lomuto Partition
	def partition[T <% Ordered[T]](array: Array[T], left: Int, right:Int, pivotIndex: Int): Int = {
		val pivotValue = array(pivotIndex)
		swap(array, pivotIndex, right)              // Move pivot to end
		var storeIndex = left
		for (i <- left until right ) {
			if (array(i) <= pivotValue) {			// will use the compare fn, so may use > for desc sort
				swap(array, storeIndex, i)
				storeIndex += 1
			}
		}
		swap(array, right, storeIndex) 				// Move pivot to its final place
		storeIndex
	}

	// Note that p here is the position that ends the range of 'first' values.
	def findFirstToP[T <% Ordered[T]](array: Array[T], left: Int, right:Int, p: Int): Unit = {
		if (right > left) {
			val pivotIndex = (left + right) / 2
			val pivotNewIndex = partition(array, left, right, pivotIndex)
			if (pivotNewIndex > p)
				findFirstToP(array, left, pivotNewIndex - 1, p)
			if (pivotNewIndex < p)
				findFirstToP(array, pivotNewIndex + 1, right, p)
		}
	}

 	// intended entry fn; includes parameter checking. Modifies the array so that the first k items
 	// are largest/smallest of the array.
	def findFirstK[T <% Ordered[T]](array: Array[T],  k: Int) : Unit = {
		if (array==null || k==0 || k>array.size) {
			throw new IllegalArgumentException()
		}
		findFirstToP(array, 0, array.length - 1, k-1)		// fn is position-based so, k-1 for 'first k'
	}
}
Bookmark and Share
gary on October 2nd, 2009

If you’ve played around with the Option class, you know that it allows you to note that no value is available and do so without nulls. For example, using getOrElse() you can access a value or a default when no value is present:

scala> val a:Option[Int] = Some(5)
a: Option[Int] = Some(5)
scala> val b:Option[Int] = None
b: Option[Int] = None
scala> a.getOrElse(0)
res1: Int = 5
scala> b.getOrElse(0)
res2: Int = 0

But a problem you’re likely to come across is how to access an instance variable when a class is stored in an option? For example,

scala> case class S(lines:Int, pages:Int)
defined class S
scala> val sOpt:Option[S] = Some(new S(16,5) )
sOpt: Option[S] = Some(S(16,5))
scala> val nOpt:Option[S] = None
nOpt: Option[S] = None

Here, we’ve created a simple class an put an instance of it into an Option. Then we made another option container for the class containing None. So given an Option[S], how do you get the lines value out of it? You can’t do it directly because the Option may contain None.

The answer is to use map, passing in the accessor or other function you want to apply. In this case, we just use the accessor lines. An Option is returned, so getOrElse() then returns the value or a default.

scala> val numLines = sOpt.map{ _.lines }.getOrElse(0)
numLines: Int = 16
scala> val numLines = nOpt.map{ _.lines }.getOrElse(0)
numLines: Int = 0

The underscore is an abbreviation for the object in the map. Don’t forget that you can also pass functions into the map.

scala> sOpt.map{println}
S(16,5)
scala> nOpt.map{println}

Not that the instance toString() was printed by the first line, whereas the second printed nothing because the Option contained a None. (I’ve omitted the REPL result lines.)

Finally, note that complex functions can be built using the normal block “=>” syntax. Here, we print as well as accessing the lines instance variable:

scala> val numLines = sOpt.map{x => println(x); x.lines}.getOrElse(0)
S(16,5)
numLines: Int = 16
scala> val numLines = nOpt.map{x => println(x); x.lines}.getOrElse(0)
numLines: Int = 0
scala>


Bookmark and Share
gary on September 4th, 2009

The Jersey package is Sun’s reference implementation of the JAX-RS standard for RESTful web services. It’s easy to use; with a few annotations, you’ll have your own REST api. The examples show how to use the included Grizzly server, allowing you to create embedded servers. That is, instead of fussy XML configurations for the large web servers, you can create small, easily deployable servers that you control via REST calls.

Using eclipse, just create your project, adding the jars required by Jersey. Follow the Getting Started guide for an example.

To deploy, you can package your project into a single jar using “File | Export…” Then you can run the jar using

java -Xmx512m -jar MyServer.jar

where you use -Xmx to make sure your server has enough memory. There’s just one problem…

When you run the jar, the program will start up cleanly. But when you hit it with a url, you see

The ResourceConfig instance does not contain any root resource classes.

followed by some exception. The problem is the server can’t find your classes. The simple trick is to add your bin directory to your classpath. Use “Project | Properties | Java Build Path.” On the Libraries tab, click “Add External Class Folder…” and add the “bin” directory in your project root.

Next time you run, you’ll see

Scanning for root resource and provider classes in the packages:
main
Root resource classes found:
class main.MyClass1Resource
class main.MyClass2Resource
class main.MyClass3Resource
class main.MyClass4Resource

as the embedded Grizzly server scans the package (”main”) you gave it when you initialized it:

initParams.put(”com.sun.jersey.config.property.packages”, “main”);

Instant RESTful server, one line deploy. What’s not to love?


Bookmark and Share

Tags: ,

gary on August 20th, 2009
// Sum the values in a list
val sum = (0.0 /: list){ _ + _ }
// What it says: it's an abbreviated form of
// list.leftFold(0.0) { (s,i) => s + i }

// Find the minimum value in a list
val minValue = list.reduceLeft( _ min _ )
// What it says: Note that it's reduce, not fold. It applies the
// min function to the series of results obtained by applying
// the function from the left. 

// Note the difference between fold and reduce: fold takes an
// initial value, while reduce does not.
Bookmark and Share

Tags:

gary on August 17th, 2009

To work effectively with an API, you have to be able to search it quickly. The best way I’ve found to do so is with a locally installed copy in CHM format. That’s the Windows helpfile format. With a good CHM browser, search-as-you-type allows you to find classes and methods as fast as you can type.

The Scala 2.7.5 API is available in CHM thanks to “bongole” and his project on github, scala-api-chm. The project includes code to compile the API into CHM, but all you need is the zipped chm file, scala_api-2.7.5.zip (direct link).

To view on a Mac, the best CHM browser I’ve found is ArCHMock.


Bookmark and Share
gary on June 25th, 2009

Trying to start a program created with the eclipse Scala plugin, I saw an error that the main class couldn’t be found. That was cryptic, as there was a main. Stranger still, this is a program I had run successfully before.

The first part of solving the problem was to look for clues in the Problems view. Opening it, I saw this:

error while loading Configgy, Scala signature Configgy has wrong version expected: 5.0 found: 4.1 in /Users/…

So the first error was due to the compilation failing and not producing a main class that eclipse could launch.

After further digging, the version error turned out to be caused by a mismatch between the Scala version (2.7.5) I used to compile the Configgy library and the Scala version (2.8) that the eclipse plugin used to compile the rest of the program. I didn’t install Scala 2.8, so how could this occur? The Scala plugin installs its own version of Scala. Because I had selected the nightly build version of the plugin, the Scala versions drifted apart during one of the eclipse updates.

You can tell which version the eclipse Scala plugin uses by looking at the jar name in the eclipse/plugins directory. Eg, scala.tools.nsc_2.75.final.jar

The solution was to revert to the stable, release version of the plugin. Do that by using this url in the eclipse Software Updates | Available Software | Manage Sites… list.

http://www.scala-lang.org/scala-eclipse-plugin

More complete directions here.


Bookmark and Share

Tags: ,