Orange is my favorite color

Here’s a guerilla debugging tactic based upon an old post of mine, Finding Memory Leaks in Coldfusion JVM, and some inspiration from Charlie Arehart. I’m in the process of migrating from ColdFusion 8 to 10 and am encountering some unexpected errors. This stack trace snippet should strike fear into the heart of any developer:

ava.lang.StackOverflowError at org.apache.xerces.dom.ParentNode.item(Unknown Source) at
net.sf.saxon.dom.NodeWrapper$ChildEnumeration.skipFollowingTextNodes(NodeWrapper.java:1166) at
net.sf.saxon.dom.NodeWrapper$ChildEnumeration.next(NodeWrapper.java:1194) at
net.sf.saxon.tree.util.Navigator$EmptyTextFilter.next(Navigator.java:918) at
net.sf.saxon.tree.util.Navigator$DescendantEnumeration.advance(Navigator.java:1052) at
net.sf.saxon.tree.util.Navigator$DescendantEnumeration.advance(Navigator.java:1111) at
net.sf.saxon.tree.util.Navigator$BaseEnumeration.next(Navigator.java:949) at
net.sf.saxon.tree.util.Navigator$DescendantEnumeration.advance(Navigator.java:1043) at
net.sf.saxon.tree.util.Navigator$BaseEnumeration.next(Navigator.java:949) at
... (endless repeat)

Something is stuck in a loop but because the stack trace fills up, you can’t see any of the lines of code being executed just before the ka-blammo. So, what to do?

Brute Force Debugging

My solution was to swipe a perl script to run commands at very short intervals and dump jstack output into a series of text files. Low-tech, but effective. This is the timer.pl script:

#!/usr/bin/perl

use strict;
use warnings;
use Time::HiRes qw/time sleep/;

sub launch {
    return if fork;
    exec @_;
    die "Couldn't exec";
}

$SIG{CHLD} = 'IGNORE';

my $interval = shift;
my $start = time();
while (1) {
    launch(@ARGV);
    $start += $interval;
    sleep $start - time();
}

And a separate shell script, dump.sh, so the redirect would create a new file every time it was run. 9892 is just the process id of the ColdFusion instance:

#!/bin/sh
sudo -u nobody /usr/java/latest/bin/jstack 9892 > t_$(date '+%Y%m%d%H%M%S_%N')

I tried running it every second, but that wasn’t enough to catch the stack overflow in the act. I upped it to every 500ms, 100ms and finally 50ms until it caught the offending code. Here’s the command in action:

perl timer.pl 0.05 ./dump.sh 

That says run the dump.sh every 50ms which will output to files formatted like t_20130402171521_009840232 using the current date and time.

Results

Running every 50ms, it doesn’t take long to fill up a directory so I started the script on the command line and in my web browser, triggered the offending code. Here’s what my directory looks like:

 52 t_20130402171519_959011475
 52 t_20130402171520_009648175
 52 t_20130402171520_060316767
 52 t_20130402171520_110450371
 52 t_20130402171520_160301392
 52 t_20130402171520_210357359
 52 t_20130402171520_260432922
 52 t_20130402171520_309171913
 52 t_20130402171520_360033048
 52 t_20130402171520_409978373
 52 t_20130402171520_459903958
 52 t_20130402171520_510488116
 52 t_20130402171520_558777256
 60 t_20130402171520_610244244
 72 t_20130402171520_659494494
136 t_20130402171520_709058569
136 t_20130402171520_760286651
136 t_20130402171520_810351161
136 t_20130402171520_860382457
136 t_20130402171520_910136459
136 t_20130402171520_960370954
136 t_20130402171521_009840232
 72 t_20130402171521_059986004
136 t_20130402171521_110340195
136 t_20130402171521_160277982
136 t_20130402171521_209090149
 84 t_20130402171521_260090368
136 t_20130402171521_310428721
136 t_20130402171521_367027484
136 t_20130402171521_410303418
136 t_20130402171521_460314216
136 t_20130402171521_509053355
136 t_20130402171521_560120477
136 t_20130402171521_610417764
128 t_20130402171521_677647036
136 t_20130402171521_712667159
136 t_20130402171521_758884870
136 t_20130402171521_810562747
136 t_20130402171521_860425517
...

That’s kilobytes in the first column, filename in the second. You can see the steady state while I’m going to push the button in the web browser and then very quickly the stack traces get larger as code is executing. What I wasn’t getting until I stepped down to 50ms iterations were the intermediate 72kb and 84kb stack trace which actually showed the offending lines between the stack overflows churning.

While this hasn’t solved the problem yet, at least I know where to go looking (and it’s at you, XmlSearch()) to continue the troubleshooting process.

Conclusion

This is a pretty hacky solution but if you’re this far into the deep end of the “what do I do?” pool, it’s a useful technique. A line debugger would be far better but I can’t replicate this in my test environment. So, forget the “right” way to do it and let’s just get some useful information so we can make progress.

Many moons ago, I helped write the docs for TableSorter, a jQuery plugin for sorting HTML tables. It was last updated in 2008 but it’s as useful today as it was then. Thankfully, Rob Garrison has been making significant updates to it on GitHub. Importantly, he’s also brought in and radically improved one of my favorite widgets for Grouping. This lets you click a header and see collapsible groups. While the old widget was pretty limited, Rob has gone wild with this one giving you all kind of formatting capability over how the groups are formed and titled.

Two huge thumbs up: take a look at this jQuery table grouping demo.

If you’re using TableSorter 2.0.3 or 2.0.5 from 2008, take a look at Rob’s version today.

Nerd post today. Most of my MXUnit unit tests look something like:

<cffunction name="setUp" returntype="void" access="public">
	<cfscript>
		variables.beanFactory = createObject("component", "coldspring.beans.DefaultXmlBeanFactory").init();
		variables.beanFactory.loadBeansFromXmlFile("/config/globalbeans.xml", true);
		variables.service = variables.beanFactory.getBean("BatchbookService");
	</cfscript>

	<cfset localMode = true />
</cffunction>

You can read more about my localMode trick for testing remote APIs but I’m simply loading up my Coldspring configuration file for my app which includes dozens and dozens and dozens of beans. I have three problems with this:

  1. The unit tests depend on the configuration used to run the site which might not always be in my environment
  2. It slows down my tests by generating beans that my unit test doesn’t need
  3. My production and test bean config may not be the same (credentials, etc)

I might be able to get around #2 by using lazy init but the principle still holds: the unit test should be as standalone as possible. In the case I’m working on today, I have a client I’ve written (and released) for a third party API from Batchbook. But I need to have some Batchbook-to-My-App conversions of JSON to queries and so forth. I don’t want to bake those into my publicly-released client so what I really want in my production app is some bean config that looks like:

<bean id="BatchbookService" class="model.external.batchbook.BatchbookService">
	<constructor-arg name="BatchbookClient">
		<bean class="model.external.batchbook.batchbook">
	        <constructor-arg name="apikey"><value>password</value></constructor-arg>
	        <constructor-arg name="endpoint"><value>https://somehost.batchbook.com/api/v1</value></constructor-arg>
			<constructor-arg name="restconsumer"><ref bean="restconsumer" /></constructor-arg>
			<constructor-arg name="JSONUtil"><ref bean="JSONUtil" /></constructor-arg>
		</bean>
	</constructor-arg>
</bean>
<bean id="RestConsumer" class="model.external.restconsumer" />
<bean id="JSONUtil" class="model.utils.JSONUtil" />

But if my BatchbookClient is a nested constructor-arg, how can I get it from the beanfactory to test it?

Alternative Coldspring Configurations

Turns out there’s more than one way to load beans into Coldspring and passing a path to an XML configuration file is just one. Looking over the 1.0 reference guide, I found two additional methods of interest:

  • loadBeansFromXmlRaw(xmlStringConfig, true)
  • loadBeansFromXmlObj(parsedXmlConfig, true)

Applying that to my current unit test, my setup method now looks like:

<cffunction name="setup">
	<cfset var beanConfigs = "" />
	<cfset var localMode = true />

	<cfsavecontent variable="beanConfigs">
		<beans>
			<bean id="BatchbookService" class="model.external.batchbook.BatchbookService">
				<constructor-arg name="BatchbookClient"><ref bean="BatchbookClient" /></constructor-arg>
			</bean>
			<bean id="BatchbookClient" class="model.external.batchbook.batchbook">
		        <constructor-arg name="apikey"><value>password</value></constructor-arg>
		        <constructor-arg name="endpoint"><value>https://somehost.batchbook.com/api/v1</value></constructor-arg>
				<constructor-arg name="restconsumer"><ref bean="restconsumer" /></constructor-arg>
				<constructor-arg name="JSONUtil"><ref bean="JSONUtil" /></constructor-arg>
			</bean>
			<bean id="RestConsumer" class="model.external.restconsumer" />
			<bean id="JSONUtil" class="model.utils.JSONUtil" />
		</beans>
	</cfsavecontent>

	<cfscript>
		variables.beanFactory = createObject("component", "coldspring.beans.DefaultXmlBeanFactory").init();
		variables.beanFactory.loadBeansFromXmlRaw(beanConfigs, true);

		variables.bb = variables.beanFactory.getBean("BatchbookClient");
		variables.svc = variables.beanFactory.getBean("BatchbookService");
	</cfscript>

</cffunction>

Coldspring now instantiates much quicker and it only loads beans related to my unit test. Plus, I have the flexibility to directly address the BatchbookClient and run it through its paces alongside the BatchbookService.

If you only have a few unit tests, these improvements won’t mean much but as your test suites grow (and you work towards continuous deployment…) you will find that speeding up testing is always a good thing and helps encourage you to do it.

2012 was like a bolt of lightning. I hope 2013 is as exciting as Graham Hill taking pole at a wet Spa Francorchamps in 1965:

Graham Hill, Spa Franchorchamps, 1965

Pic linked from wren.