Orange is my favorite color

Strings sorting like numbersOn MotorsportReg.com, we store vehicle numbers as a string because every so often someone has a letter as part of their number. The problem with storing numbers as strings is that they sort poorly. Instead of “1, 2, 3… 10, 20, 30…” you are presented with “1, 10, 2, 20, 3, 30…”.

In our reporting system, to get around the fact that oftentimes this value was empty, I was using a simple (but naive) fix in my CF query of query:

CAST(field + '0' AS INTEGER)

That worked really well until last night when someone actually had one of those numbers with a letter in it, specifically “T60″. Like I said, it happens pretty rarely. ColdFusion didn’t like trying to cast a true string to a number and it barfed. (On a side note, it didn’t use the CFERROR-specified template, why not?)

Back to the drawing board – and back to the database. Why not create a value that, as far as the application is concerned, is a true integer? Because Postgres supports regular expressions in its query engine, it took only a few minutes of searching the documentation and an article on General Bits about partial sort ordering to come up with the following calculated field:

COALESCE(CAST(SUBSTRING(vchNumber FROM '([0-9]{1,10})') AS INTEGER), 0) AS intNumber

The COALESCE is in there to guarantee that if no numeric values exist that it will come back as a 0. Originally I had the Regular Expression as [0-9]*, which should have matched any numbers in the field. Fields ending with a letter worked fine but those beginning with a non-numeric value didn’t return anything. I added the range of characters {1,10} (this is a varchar(10) field) and suddenly it returned the numeric values perfectly.

Technically what I want is an expression that matches anything that isn’t a letter/punctuation/whitespace/etc because a number like “T50T20″ only returns “50″ but we can live with that level of sorting. This value is never displayed, it’s only used to order the results.

2 Comments

  1. Nathan Dintenfass said:

    on June 23, 2006 at 9:01 am

    Wouldn’t you want “T50″ to display after the numeric values? Or, is the “T” in an ID like that always non-meaningful as far as the sort?

  2. brian said:

    on June 23, 2006 at 9:22 am

    Possibly – every club uses it slightly differently and very few use letters at all. I think most often it’s either “just because” or it’s some kind of class/group designation in which case the number is “more important”.

    If this becomes the case, we could fix it by prepending “999″ to the front of any value that begins with a letter and that would solve 99% of the cases since rarely does a number exceed 3 digits (in fact, basically never).

    There are multiple other ways of grouping people in the system so using the number to do that would be inefficient at best.

{ RSS feed for comments on this post}