Welcome to ciysys blog

Array.sort() performance

Published on: 25th July 2023

Overview

V8 JavaScript engine has optimized the array sorting using Timsort. By right, you should not care much about it. This is true if you are sorting the native data type (number and string) but not true for an Object type (such as Date and the custom Object). For sorting an object array, you will have to write a compare function.

Before we start

We are writing codes on Node.js and we need the chance package to help in generating random data for the sorting test.

npm install chance

Here's the header of your Node.js script:

const { performance } = require('node:perf_hooks');
const chance0 = require('chance');
const chance = new chance0();

The Array.sort() function

The Array.sort() function can take a compare function and it is an optional parameter. For the compare function, it has 2 parameters and it should return 3 possible values (range). Here's the explanation,

function(a, b)

Where

If the compare function has not been specified, the sorting process will sort the items based on its type. The default compare function will be able to handle number and string type without much performance issue. For other types such as object type or Date type, it calls toString() to convert the item into string value and uses it for the comparison. Please refer to the references section in this article.

Let's try to test out the sorting performance for each type.

Sorting the native type, the simplest use case.

We have 1,000 age records in an array to be sorted. Do we need a compare function?

let l1 = [];
let l1_2 = [];

for (let i2 = 0; i2 < 1000; i2++) {
    let age = chance.age();
    l1.push(age);
    l1_2.push(age);
}

performance.mark('a1');
l1.sort();
performance.mark('a2');
performance.measure('sorting Number type with default sorter', 'a1', 'a2');

performance.mark('a1');
l1_2.sort((a, b) => a - b);
performance.mark('a2');
performance.measure('sorting Number type with custom sorter', 'a1', 'a2');

console.log(performance.getEntriesByType('measure'));

Performance output will look something like this. It seems that sorting the number with a custom compare function runs faster. Does it surprise you?

[
    PerformanceMeasure {
        name: 'sorting Number type with default sorter',
        entryType: 'measure',
        startTime: 279.9359999895096,
        duration: 0.35339999198913574,
        detail: null
    },
    PerformanceMeasure {
        name: 'sorting Number type with custom sorter',
        entryType: 'measure',
        startTime: 280.4722999930382,
        duration: 0.2400999665260315,
        detail: null
    }
]

Sorting the Date type, the simplest use case.

Let's try to sort the items in Date type and see what will happen.

let l2 = [];
let l2_2 = [];

for (let i2 = 0; i2 < 1000; i2++) {
    let dob = chance.birthday();
    l2.push(dob);
    l2_2.push(dob);
}

performance.mark('a1');
l2.sort();
performance.mark('a2');
performance.measure('sorting Date type with default sorter', 'a1', 'a2');

performance.mark('a1');
l2_2.sort((a, b) => a.getTime() - b.getTime());
performance.mark('a2');
performance.measure('sorting Date type with custom sorter', 'a1', 'a2');

console.log(performance.getEntriesByType('measure'));

Performance output will look something like this. It seems that sorting the Date with a custom compare function runs faster. Does it surprise you?

[
    PerformanceMeasure {
        name: 'sorting Date type with default sorter',
        entryType: 'measure',
        startTime: 312.19940000772476,
        duration: 34.29030001163483,
        detail: null
    },
    PerformanceMeasure {
        name: 'sorting Date type with custom sorter',
        entryType: 'measure',
        startTime: 346.7824000120163,
        duration: 0.589199960231781,
        detail: null
    }
]

As mentioned earlier, if the compare function has not been specified, the Date type will be converted to string (by calling toString()) and then run the comparison. That's why sorting the Date type without the compare function will be slower.

Sorting String type using Intl.Collator()

Sorting string type sounds intuitive but sorting the string type in case insensitive way will require extra codes. By default, it sorts the items with low level bit comparison. Here's the explanation,

let l3 = ['a1', 'A9'];
l3.sort();
console.log(l3);

/*
output (it is NOT sorted in case insensitive way):

    [ 'A9', 'a1' ]
*/

To sort the items in case insensitive ways, we use Intl.Collator() to help in the compare function.

let collator = new Intl.Collator('en', { numeric: true, sensitivity: 'base' });
let l3_2 = ['a1', 'A9'];
l3_2.sort((a, b) => collator.compare(a, b));
console.log(l3_2);

/*
output (the items are sorted in case insensitive):

    [ 'a1', 'A9' ]
*/

The best part of using a Intl.Collator() to sort a string is that it is able to sort the numbers in string type because we set numeric:true.

let l3_3 = ['31', '3', '30'];
l3_3.sort((a, b) => collator.compare(a, b));
console.log(l3_3);

/*
output:

    [ '3', '30', '31' ]
*/

Let's run some tests on sorting the strings.

let l4 = [];
let l4_2 = [];

for (let i2 = 0; i2 < 1000; i2++) {
    let name = chance.name();
    l4.push(name);
    l4_2.push(name);
}

//-------------------------
performance.mark('a1');
l4.sort();
performance.mark('a2');
performance.measure('sorting String type with default sorter', 'a1', 'a2');

//-------------------------
performance.mark('a1');
l4_2.sort((a, b) => collator.compare(a, b));
performance.mark('a2');
performance.measure('sorting String type with custom sorter', 'a1', 'a2');

console.log(performance.getEntriesByType('measure'));

Performance output - default sorting function is faster than using Intl.Collator() that is because it sorts the items with low level bit comparison while the collactor.compare() is doing an extra process like a == A.

[
    PerformanceMeasure {
        name: 'sorting String type with default sorter',
        entryType: 'measure',
        startTime: 363.6834000349045,
        duration: 0.9098999500274658,
        detail: null
    },
    PerformanceMeasure {
        name: 'sorting String type with custom sorter',
        entryType: 'measure',
        startTime: 364.6313999891281,
        duration: 3.0295000076293945,
        detail: null
    }
]

Sorting object by multiple fields

The above example is sorting the array using the V8 default implementation and it does not sort multiple fields in the object. We will have to implement a compare function that is able to sort an array by multiple fields in the object.

Our standard compare function

First, we need a compare function that helps in comparing the item for the given field name and handles the field type.

/**
* Returns the comparison result for 2 given fields.
* @param {Any} a - an object.
* @param {Any} b - another object.
* @param {String} fa - field name in a.
* @param {String} fb - field name in b.
* @param {Object} [opt] - if this param is missing, `pd` var (declared at the top in this file) will be used.
* @param {Intl.Collator} [opt.collator] - the Collator for comparing the String data.
* @returns {Number}
*/
function comp_a_b(a, b, fa, fb, opt) {
    // read the value once.
    const a2 = a[fa];
    const b2 = b[fb];

    // special handler for undefined & null value before running the comparison.
    if (typeof b2 == 'undefined' 
        || b2 == null
    ) {
        return 1;
    }

    if (typeof a2 == 'undefined' 
        || a2 == null
    ) {
        return -1;
    }

    // compare the easiest type first.
    if (typeof a2 == 'number') {
        return a2 - b2;
    }

    // for string type.
    if (typeof a2 == 'string') {
        if (!opt) {
            opt = {};
        }
        
        if (!opt.collator) {
            // create a collator once and reuse it in the entire sorting process.
            opt.collator = new Intl.Collator('en', { numeric: true, sensitivity: 'base' });
        }
        return opt.collator.compare(a2, b2);
    }

    // for date type - to avoid conversion to String type for comparison. This reduced the duration by half.
    if (a2 instanceof Date) {
        return a2.getTime() - b2.getTime();
    }

    // use the native comparison for other types.
    if (a2 > b2) {
        return 1;
    }

    if (a2 < b2) {
        return -1;
    }
    
    return 0;
}

Second, we need another function in comparing multiple fields for 2 objects.

/**
* The function that compares multiple fields in 2 objects.
* @param {Any} a - an object.
* @param {Any} b - another object.
* @param {string|String[]} [fld] - the field name if a/b is an object. This field may be an array of another string array.
* @param {string|String[]} [fld2] - the field name in b if b is an object. If undefined/null, follows 'fld' param.
* @param {boolean} [desc] - by default, this is 'false' which sorts the array in ascending order. 
* Pass in NULL if the sorting order is different for each field.
* @param {Object} [opt] - if this param is missing, it will slow down the sorting process due to creating Collator() multiple times.
* @param {Intl.Collator} [opt.collator] - the Collator for comparing the String data.
* @returns {Number} - returns -1, 0 or 1.
*/
function arrSort(a, b, fld, fld2, desc, opt) {
    const order = (desc ? -1 : 1);

    // handle undefined & null array item.
    if (typeof a == 'undefined' || a == null) {
        return -1 * order;            
    }
    if (typeof b == 'undefined' || b == null) {
        return 1 * order;
    }
    
    // handle native data type.
    if (typeof fld == 'undefined' || fld == null) {    
        if (a > b
            || typeof b == 'undefined' 
            || b == null
        ) {
            return 1 * order;
        }
        
        if (a < b 
            || typeof a == 'undefined' 
            || a == null
        ) {
            return -1 * order;
        }

        return 0;
    }
    else {
        // handle object type

        // if `fld2` has not been specified, it will follow the same field settings in `a`.
        if (typeof fld2 == 'undefined' || fld2 == null) {
            fld2 = fld;
        }

        if (Array.isArray(fld)) {
            // for sorting by multiple fields.
            let i9 = 0;
            let m = fld.length;
            let fa, fb, order9;

            for (let i = 0; i < m; i++) {
                // decide field a & the sorting order.
                if (Array.isArray(fld[i])) {
                    // each field has a different sorting order.
                    fa = fld[i][0];
                    order9 = (fld[i][1] ? -1 : 1);
                }
                else {
                    fa = fld[i];
                    order9 = order;
                }

                // decide field b.
                if (Array.isArray(fld2[i])) {
                    fb = fld2[i][0];
                }
                else {
                    fb = fld2[i];
                }

                i9 = (comp_a_b(a, b, fa, fb, opt) * order9);

                // stop the comparison if a[fa]<>b[fb]. This will speed up the sorting process.
                if (i9 != 0) {
                    return i9;
                }
            }
            
            return i9;
        }
        else {
            // sort by single field.
            return comp_a_b(a, b, fld, fld2, opt) * order;
        }
    }
}

Now, we are ready to sort the object array by multiple fields. To see if our compare function works as expected, we will create objects with the static value, run the script and review the result.

let obj1 = { id: 1, name: 'A9', is_active: 1, dt: new Date(2001, 0, 1) };
let obj2 = { id: 3, name: 'a3', is_active: 0, dt: new Date(2001, 4, 1)  };
let obj3 = { id: 2, name: 'a1', is_active: 1, dt: new Date(2001, 7, 1)  };
let obj4 = { id: 4, name: 'a2', is_active: 1, dt: new Date(2001, 2, 1)  };

let l5 = [ obj1, obj2, obj3, obj4 ];

let sort_opt = {};
l5.sort((a,b) => arrSort(a, b, 
    [
        // in descending order
        ['is_active', true], 
        // in ascending order
        ['name', false]
    ],
    null, true, sort_opt));

console.log(l5);

/*
Output - sort by 'is_active desc, name'.

[
    { id: 2, name: 'a1', is_active: 1, dt: 2001-07-31T16:00:00.000Z },
    { id: 4, name: 'a2', is_active: 1, dt: 2001-02-28T16:00:00.000Z },
    { id: 1, name: 'A9', is_active: 1, dt: 2000-12-31T16:00:00.000Z },
    { id: 3, name: 'a3', is_active: 0, dt: 2001-04-30T16:00:00.000Z }
]
*/

l5.sort((a,b) => arrSort(a, b, 
    [
        // in descending order
        ['is_active', true], 
        // in ascending order
        ['dt', false],
        ['name', false],
    ],
    null, true, sort_opt));

console.log(l5);

/*
Output - sort by 'is_active desc, dt, name'.

[
    { id: 1, name: 'A9', is_active: 1, dt: 2000-12-31T16:00:00.000Z },
    { id: 4, name: 'a2', is_active: 1, dt: 2001-02-28T16:00:00.000Z },
    { id: 2, name: 'a1', is_active: 1, dt: 2001-07-31T16:00:00.000Z },
    { id: 3, name: 'a3', is_active: 0, dt: 2001-04-30T16:00:00.000Z }
]
*/

Sorting by multiple fields

After we have learned how the sorting process works, we are going to try sorting the object array by multiple fields just like the ORDER BY keyword provided in any SQL RDBMS database.

The object that we are going to create contains a number, string and Date type field. So, this is sufficient in our test on multiple fields of different types.

let l5_2 = [];

for (let i2 = 0; i2 < 1000; i2++) {
    l5_2.push({
        id: chance.age(),
        name: chance.name(),
        is_active: chance.bool() ? 1 : 0,
        dob: chance.birthday()
    });
}

performance.mark('a1');
l5_2.sort((a,b) => arrSort(a, b, 
    [
        // in descending order
        ['is_active', true], 
        // in ascending order
        ['dob', false],
        ['name', false],
    ],
    null, true, sort_opt));
performance.mark('a2');
performance.measure('sorting object on multiple fields (is_active, dob & name) with custom sorter', 'a1', 'a2');

performance.mark('a1');
l5_2.sort((a,b) => arrSort(a, b, 
    [
        // in ascending order
        ['name', false],
        ['dob', false],
        // in descending order
        ['is_active', true], 
    ],
    null, true, sort_opt));
performance.mark('a2');
performance.measure('sorting object on multiple (name, dob & is_active) with custom sorter', 'a1', 'a2');

console.log(performance.getEntriesByType('measure'));

Performance output - this is the most of our concern. From the output, it looks like sorting 1,000 objects by multiple fields is still considered very fast.

[
    PerformanceMeasure {
        name: 'sorting object on multiple fields (is_active, dob & name) with custom sorter',
        entryType: 'measure',
        startTime: 367.33680003881454,
        duration: 7.760299980640411,
        detail: null
    },
    PerformanceMeasure {
        name: 'sorting object on multiple (name, dob & is_active) with custom sorter',
        entryType: 'measure',
        startTime: 375.1438000202179,
        duration: 6.355099976062775,
        detail: null
    }
]

Use case

Conclusion

Array.sort() has provided a sorting process for native data types but it does not sort the multiple fields in an object array. So, it is still left out to the programmer to fill in the space.

References

Jump to #NODEJS blog

Author

Lau Hon Wan, software developer.