Sort and search arrays in C / C++ - PART 3

Posted on April 17, 2023 at 16:24 (GMT +00:00) by Colin

Welcome to Sort and search arrays in C / C++ - PART 3

Today I wanted to focus my attention on the searching algorithm. Let's find some integers!

First I will start with the most simple implementation which brute-force iterates through each integer until it finds what it wants.

The code to do this is super simple....

//
// the most basic brute-force index of
//
int array_indexof(int a[], int len, int value) {
    // loop through our items in reverse
    for (int i = len-1; i >= 0; i--) {
        // if we find the value, return the index
        if (a[i] == value) return i;
    }

    // if we get here, we didn't find it, return -1
    return -1;
}

Even though this is not efficient, it is still ridiculously fast, this is because generally the slowest operations are data manipulation. Newer CPU's are fantastic at prefetching steams of data by using branch prediction. If you are not familiar with branch prediction, I really suggest you go search and have a read about how it works.

Regardless, we want to optimise our function, so we will again be using divide and conquer.

First I wanted to give an example of partitioning, this method would be faster than our original but it is completely pointless.

//
// a useless version of our function that partitions the array and searches 1 or the other half
//
int array_indexof2(int a[], int len, int value) {
    int pivot = len/2;

    if ((len > 1) && (a[pivot] > value)) {
        // search the first half of the array
        for (int i = 0; i < pivot; i++) {
            // if we find the value, return the index
            if (a[i] == value) return i;
        }

    } else {
        // search the second half of the array in reverse
        for (int i = pivot; i >= 0; i--) {
            // if we find the value, return the index
            if (a[i] == value) return i;
        }
    }

    // if we get here, we didn't find it, return -1
    return -1;
}

I just wanted to show what is the point of partitioning. Here we can see depending on the middle value of our array, we will search either the lower or the upper half of the array. So with divide and conquer what we want to achieve is to keep dividing the array by the divided (or middle) index.

Writing such a function is actually even more simple than this indexof2 function.

//
// divide and conquer style indexof, also known as binary search
//
int array_dacindexof(int a[], int len, int value) {
    // partition lower index, which starts at the array zero position
    int lower = 0;
    // partition upper index, this is the length of the array -1 (because 0-index)
    int upper = len-1;

    // while the lower index is lower or equal to the upper we keep partitioning and searching
    while (lower <= upper) {
        // we set the current index as the lower + upper and divide it by 2. This gives us a position between the 2 positions.
        int index = (lower + upper) / 2;

        // if by luck the item at index is the value we want, just return
        if (a[index] == value) return index;
        // else if the item at index is lower than our value we will move our partition lower index up to this point + 1
        else if (a[index] < value) lower = index + 1;
        // else we move our partition upper index to our current index - 1 and the while will continue
        else upper = index - 1;
    }
    
    // if we finished the while loop without a result, it doesn't exist, return -1
    return -1;
}

I have commented the code step by step. This is one of the most easiest indexof functions to write.

I have updated the original code main function to run the dac index of 20 times.

int main() {
    // a constant number we can change easily, how large the array will be.
    const int ARRAY_SIZE = 100000;

    // declare our array
    int a[ARRAY_SIZE];

    // call our fill function
    array_fillrand(a, ARRAY_SIZE);
    //int a[] = {5, 1, 15, 9872, 42, 2576, 234, 4, 557, 4, 78, 353464, 32534, 47523, 2};

    // this is useless as we know the size already, but I want to keep it as in another
    // article we may go into sorting dynamic sized arrays
    int len = sizeof(a) / sizeof(int);

    // printing this amount of numbers is crazy, we will modify our print function to print
    // only the first 15 numbers so we can see a snapshot of it working...
    std::cout << "Unsorted: " << std::endl;
    array_print(a, len);

	// get start tick
	int start_tick = clock();

	// sort the array
	// array_heapsort(a, len);
	array_dacsort(a, 0, len-1);

	// get end tick
	int end_tick = clock();

	// output the difference to get total tick count, but we will convert to seconds
	std::cout << "Sort took " << ((float)end_tick - start_tick) / CLOCKS_PER_SEC << " seconds" << std::endl;

	// print sorted array
	array_print(a, len);

	srand(rand());

	for (int c = 0; c < 20; c++) {
        int getindex = rand();
        // let's find our number
        start_tick = clock();
        int index = array_dacindexof(a, len, getindex);
        end_tick = clock();

        std::cout << "Found value " << getindex << " at index: " << index << " in " << ((float)end_tick - start_tick) / CLOCKS_PER_SEC << " seconds" << std::endl;
	}
    return 0;
}

After I finished writing this dac index function, I done some searching to see if anyone had divised a better way of doing it, but it seems this is considered the best way of doing it.

I wanted to just add that an unsorted array, you could only use the linear indexof, however, we could slightly improve this function by removing the count index comparison.

When declaring our array, we add an additional array entry at the end (This could be modified to work with multi threaded read access to the array)

So we update our main to be like so:

// a constant number we can change easily, how large the array will be.
    const int ARRAY_SIZE = 500000;

    // declare our array
    int a[ARRAY_SIZE+1]; // we are going to add 1 extra as our failsafe for unsorted find

    // call our fill function
    array_fillrand(a, ARRAY_SIZE);

Notice our declaration has +1 for an additional array that won't be filled or used in general, except for our linear index of.

//
// a linear index of that has a failsafe value so we don't need to compare the current index
//
int array_indexof_fs(int a[], int len, int value) {
    // this is our failsafe, we add the value in our empty array item so it is 100% sure the while will find the value
    // even if it does not exist in our array
    a[len] = value;

    // starting from the 0-index
    int i = 0;

    // now we can just do a single compare until we find the value, if it doesn't exist we will hit our failsafe
    while (a[i++] != value);

    // if the index is the length of the array we didn't find it else return the actual index
    return (i == len) ? -1 : i;
}

So this version of the indexof will save 1 compare instruction per array item at the cost of 1 mov and a cmp at the end.

And on that note, I will wrap this post up with the full code we have so far.

/*
    ! *****************************************************************************

    MIT License

    Copyright (c) Colin J.D. Stewart. All rights reserved.

    The following license applies to this software and associated documentation 
    files (the "Software"):

    Permission is hereby granted, free of charge, to any person obtaining a copy 
    of this Software, to use, copy, modify, merge, publish, distribute, sublicense, 
    and/or sell copies of the Software, subject to the following conditions:

    1. 	The above copyright notice and this permission notice shall be included 
        in all copies or substantial portions of the Software.
        
    2. 	If you use this Software in a product or service, an attribution in the 
        form of a copyright notice, a link to the license, and a disclaimer of 
        warranties (and any other similar attribution) is required for any 
        distribution, reproduction, or derivative works of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 
    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 
    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR 
    OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE, 
    ARISING FROM, OUT OF, OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 
    OTHER DEALINGS IN THE SOFTWARE.

    ! ***************************************************************************** 
*/

#include <iostream>
#include <time.h>



//
// divide and conquer quicksort implementation.
//
void array_dacsort(int a[], int lower, int upper) {
    // if lower index is higher or equal to upper just return
    if (lower >= upper) return;

    // let's start to calculate the pivot of the partition starting with the passed upper index of the array
    int pivot = a[upper];
    // we will use "i" as a track index
    int i = lower - 1;

    // loop through our array from lower to upper indexes
    for (int j = lower; j < upper; j++)
        // if the value of the current index is lower than the array pivot, we should increment the track index and swap the values
        if (a[j] < pivot) std::swap(a[++i], a[j]);

    // swap the last upper value with the current track index + 1
    std::swap(a[i+1], a[upper]);

    // now we will do a recursive call to sort the 2 partitioned parts
    array_dacsort(a, lower, i);
    array_dacsort(a, i+2, upper);
}


//
// procedure to re-arrange values obeying heap properties.
//
inline void array_heapify(int a[], int parent, int len) {
    // upper is our current upper maximum index
    int upper = parent;

    // simple while loop
    while (true) {
        // set the left index like a binary tree, we want the left child which is parent * 2 + 1
        // use bitwise instead of multiplication
        int left = (parent << 1) + 1;
        // right is 1 index higher
        int right = left + 1;

        // self explanatory
        if ((left < len) && (a[left] > a[upper])) upper = left;
        if ((right < len) && (a[right] > a[upper])) upper = right;
        if (upper == parent) break;

        std::swap(a[parent], a[upper]);

        parent = upper;
    }
}


//
// our heapsort procedure that is basically the loops calling our re-arrange
//
void array_heapsort(int a[], int len) {
    // make sure we have at least 2 items
    if (len < 2) return;

    // build the heap
    for (int i = (len / 2 - 1); i >= 0; i--)
        array_heapify(a, i, len);

    // sort the heap
    for (int i = len - 1; i >= 0; i--) {
		// swap the root element
        std::swap(a[0], a[i]);
		// re-arrange using our heapify procedure
        array_heapify(a, 0, i);
    }
}

//
// procedure to sort the array
//
void array_sort(int a[], int len) {
    // make sure we have at least 2 items
    if (len < 2) return;

    // let's run through the input array starting from the second item
    for (int i = 1; i < len; i++) {
        // store the value of index "i" for later and so we don't need to keep accessing the array offset
        int value = a[i];
        // j set as the current index and used to go in reverse through the array
        int j = i;

        // while the current index is above the first item and
        // the value of the item before the current index is above our value THEN
        while ((j > 0) && (a[j-1] > value)) {
            // set the value at current index to the previous value
            a[j] = a[j-1];
            // decrement our j index to continue our while condition
            j--;
        }

        // finally set the value of whatever our current index is to the saved value earlier
        a[j] = value;
    }
}



//
// the most basic loop to find the index of a value
//
int array_indexof(int a[], int len, int value) {
    // loop through our items in reverse
    for (int i = len-1; i >= 0; i--) {
        // if we find the value, return the index
        if (a[i] == value) return i;
    }

    // if we get here, we didn't find it, return -1
    return -1;
}


//
// a linear index of that has a failsafe value so we don't need to compare the current index
//
int array_indexof_fs(int a[], int len, int value) {
    // this is our failsafe, we add the value in our empty array item so it is 100% sure the while will find the value
    // even if it does not exist in our array
    a[len] = value;

    // starting from the 0-index
    int i = 0;

    // now we can just do a single compare until we find the value, if it doesn't exist we will hit our failsafe
    while (a[i++] != value);

    // if the index is the length of the array we didn't find it else return the actual index
    return (i == len) ? -1 : i;
}

//
// divide and conquer style indexof, also known as binary search
//
int array_dacindexof(int a[], int len, int value) {
    // partition lower index, which starts at the array zero position
    int lower = 0;
    // partition upper index, this is the length of the array -1 (because 0-index)
    int upper = len-1;

    // while the lower index is lower or equal to the upper we keep partitioning and searching
    while (lower <= upper) {
        // we set the current index as the lower + upper and divide it by 2. This gives us a position between the 2 positions.
        int index = (lower + upper) / 2;

        // if by luck the item at index is the value we want, just return
        if (a[index] == value) return index;
        // else if the item at index is lower than our value we will move our partition lower index up to this point + 1
        else if (a[index] < value) lower = index + 1;
        // else we move our partition upper index to our current index - 1 and the while will continue
        else upper = index - 1;
    }

    // if we finished the while loop without a result, it doesn't exist, return -1
    return -1;
}


//
// print array values
//
void array_print(int a[], int len) {
    // we have modified this to show a max of 15 items
    for (int i = 0; i < ((len > 15) ? 15 : len); i++) {
        std::cout << a[i] << ", ";
    }
    std::cout << std::endl;
}



//
// fill an array with random integers
//
void array_fillrand(int a[], int len) {
    // Set random seed based on current time
    srand(1987);

    // generate random numbers
    for (int i = 0; i < len; i++) {
        a[i] = rand();
    }
}


//
// our starting point
//
int main() {
    // a constant number we can change easily, how large the array will be.
    const int ARRAY_SIZE = 100000;

    // declare our array
    int a[ARRAY_SIZE];

    // call our fill function
    array_fillrand(a, ARRAY_SIZE);
    //int a[] = {5, 1, 15, 9872, 42, 2576, 234, 4, 557, 4, 78, 353464, 32534, 47523, 2};

    // this is useless as we know the size already, but I want to keep it as in another
    // article we may go into sorting dynamic sized arrays
    int len = sizeof(a) / sizeof(int);

    // printing this amount of numbers is crazy, we will modify our print function to print
    // only the first 15 numbers so we can see a snapshot of it working...
    std::cout << "Unsorted: " << std::endl;
    array_print(a, len);

	// get start tick
	int start_tick = clock();

	// sort the array
	// array_heapsort(a, len);
	array_dacsort(a, 0, len-1);

	// get end tick
	int end_tick = clock();

	// output the difference to get total tick count, but we will convert to seconds
	std::cout << "Sort took " << ((float)end_tick - start_tick) / CLOCKS_PER_SEC << " seconds" << std::endl;

	// print sorted array
	array_print(a, len);

	srand(rand());

	for (int c = 0; c < 20; c++) {
        int getindex = rand();
        // let's find our number
        start_tick = clock();
        int index = array_dacindexof(a, len, getindex);
        end_tick = clock();

        std::cout << "Found value " << getindex << " at index: " << index << " in " << ((float)end_tick - start_tick) / CLOCKS_PER_SEC << " seconds" << std::endl;
	}
    return 0;
}

As I noted in the previous article on this subject, I will probably convert this into a c++ library for re-usability and will add sorting of different types of data.

Until next time.....

/Programming /Algorithms