Categories:

# The 1, 10, 2, 3 problem.

When you sort strings alphabetically, you end up with

• Project 1 Report
• Project 10 Report
• Project 2 Report
• Project 3 Report

We’ve all seen this, right? Here’s an example in PERL:
```my @examples = ( "Project 1 Report", "Project 2 Report", "Project 3 Report", "Project 10 Report", "Project 2342 Report", "Alpha 2.9", "Alpha 10.1" );```

``` ```

```my @sorted = sort @examples; print join( "\n",@sorted )."\n";```

This’ll give you:
```Alpha 10.1 Alpha 2.9 Project 1 Report Project 10 Report Project 2 Report Project 2342 Report Project 3 Report```
What we really want is an alphabetic sort that treats numbers magically.

Here’s a quick fix to make it do the thing you meant, by zero-padding any numbers before comparing the strings:
```my @sorted = sort sensible_sort @examples; print join( "\n",@sorted )."\n";```

``` ```

```sub sensible_sort { my \$a1=\$a; my \$b1=\$b; # clone these so we don't modify originals \$a1 =~ s/(\d+)/sprintf("%020d",\$1)/ge; \$b1 =~ s/(\d+)/sprintf("%020d",\$1)/ge; return \$a1 cmp \$b1; }```

This modifies the strings used in the sort comparisons by finding every string of one or more digits 0-9 and replacing it with a 20 digit version, padded with zeroes. (20 digits is a number picked from my arse, 6 would probably do). This technique is easy enough to do in C or php or whatnot. There may well be a library out there which already does it, but it’s a neat self-contained little technique, which makes our lists from our learning objects repository far saner. Lecture 10 was listed second!

Posted in Uncategorized.

Tagged with .

## 4 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

1. You can also use the numeric sort function, which is likely to be fast if your code does this sort a lot of times, e.g.:

my @sorted = sort {\$a \$b} @numbers;

• I think you meant sort { \$a \$b } which is fine for numbers, but not for strings containing numbers which you want the strings to sort alphabetically.

2. Great suggestion from an old mentor of mine via Facebook:

“If you wanted to deal with numbers longer that twenty digits, you could use a substitution something like s/0*(\d+)/sprintf(“%016x”, length(\$1)) . \$1)/ge, which would be good for numbers with up to 2^64 digits :-)”

3. Unfortunate that your comments thing blindly strips anything between angle brackets, resulting in the above comments losing a critical part of their code snippets…

Anyway, there is indeed a module to do this already in Perl, which I recommend because it probably Does The Right Thing in a larger set of circumstances (including different locales). It’s called Sort::Naturally

Some HTML is OK