Locale Sort


This is the implementation of an option to sort, to make it use the locale collating conventions. It is a patch to the sort.c in textutlis-1.22.

I have not checked thoroughly, but at least textutils 2.0.11 sorts using the users locale, and thus this patch is not needed anymore.

It is activated by giving a -l option, either globally or for a single key.

If used together with the -M option in a key, it will use the locale's abrev.  month names for sorting that key.

The code seems fairly robust, and works as expected for my locale. I would appreciate help for testing in other locales. (I have several installed in my system, but don't know what the expected results would be, and I also lack files for testing.) The program passes all the tests in the textutils-1.22 distribution, so at least I didn't broke basic behaviour.

It has a number of drawbacks, with the most noticeable being it's slowness. Tried on a 60,000 line file of random characters, which normaly gets sorted in ~1.2 seconds, it took >80 sec. to run, so it shouldn't be used if not *needed*.

Another problem is the implementation of interaction with the -d and specially the -f options. Although -d seems to work correctly, much more testing is needed. As for -f, I can't even tell what it should do, as my locale (es_MX) already sorts with case folding. I don't know about other locales, so I simply canot do it.

Last problem: Since I use the library's collating functions, sort keys can no longer contain embebed NUL's, since they will be taken as the end of the string, and further characters will be ignored. I haven't been able to come up with a solution, since everything I've thought either breaks up in weird situations, or is too slow, or both, specially since it doesn't seem to me that embebed NUL's are common enough to warrant all this trouble
 

Further Ideas:

  • Make -l take a suboption, indicating what locale to use for this key, enabling one to sort different keys with different locales. It doesn't seem like that much trouble, but I wonder if someone would ever use it.
  • Somehow arange to compare characters, instead of strings. It might be faster, and would help eliminate the embebed NUL's problem. It seems difficult, and looks like would need messing with the C Library, which doesn't sound like a good idea.

  •  

    Download:

    You can download the patch right here. It's a gzipped tar that contains the patch and this readme.
    Install it with patch onto your source tree.
     

    Back to Home