samedi 25 juin 2016

Median of Medians algorithm misunderstanding?

What I understand already

I understand that median of medians algorithm(I will denote as MoM) is a high constant factor O(N) algorithm. It finds the medians of k-groups(usually 5) and uses them as the next iteration's sets to find medians of. The pivot after finding this will be between 3/10n and 7/10n of the original set, where n is the number of iterations it took to find the one median base case.

I keep getting a segmentation fault when I run this code for MoM, but I'm not sure why. I've debugged it and believe that the issue lies with the fact that I'm calling medianOfMedian(medians, 0, medians.size()-1, medians.size()/2);. However, I thought that this was logically sound since we were supposed to recursively find the median by calling itself. Perhaps my base case isn't correct? In a tutorial by YogiBearian on youtube(a stanford professor, link: https://www.youtube.com/watch?v=YU1HfMiJzwg ), he did not state any extra base case to take care of the O(N/5) operation of recursion in MoM.

Complete Code

Note: Per suggestions, I have added a base case and used .at() function by vectors.

static const int GROUP_SIZE = 5;
/* Helper function for m of m. This function divides the array into chunks of 5 
 * and finds the median of each group and puts it into a vector to return.
 * The last group will be sorted and the median will be found despite its uneven size.
 */
vector<int> findMedians(vector<int>& vec, int start, int end){
    vector<int> medians;
    for(int i = start; i <= end; i+= GROUP_SIZE){
        std::sort(vec.begin()+i, min(vec.begin()+i+GROUP_SIZE, vec.end()));
        medians.push_back(vec.at(min(i + (GROUP_SIZE/2), (i + end)/2)));
    }
    return medians;
}

/* Job is to partition the array into chunks of 5(subject to change via const)
 * And then find the median of them. Do this recursively using select as well.
 */
int medianOfMedian(vector<int>& vec, int start, int end, int k){
    /* Acquire the medians of the 5-groups */
    vector<int> medians = findMedians(vec, start, end);

    /* Find the median of this */
    int pivotVal;
    if(medians.size() == 1)
        pivotVal = medians.at(0);
    else
        pivotVal = medianOfMedian(medians, 0, medians.size()-1, medians.size()/2);

    /* Stealing a page from select() ... */
    int pivot = partitionHelper(vec, pivotVal, start, end);

    cout << "After pivoting with the value " << pivot << " we get : " << endl;
    for(int i = start; i < end; i++){
        cout << vec.at(i) << ", ";
    }
    cout << "nn" << endl;
    usleep(10000);
    int length = pivot - start + 1;
    if(k < length){
        return medianOfMedian(vec, k, start, pivot-1);
    }
    else if(k == length){
        return vec[k];
    }
    else{
        return medianOfMedian(vec, k-length, pivot+1, end);
    }

}

Some extra functions for helping unit test

Here are some unit tests that I wrote for these 2 functions. Hopefully they help.

vector<int> initialize(int size, int mod){
    int arr[size];
    for(int i = 0; i < size; i++){
    arr[i] = rand() % mod;
    }
    vector<int> vec(arr, arr+size);
    return vec;
}

/* Unit test for findMedians */
void testFindMedians(){
    const int SIZE = 36;
    const int MOD = 20;
    vector<int> vec = initialize(SIZE, MOD);
    for(int i = 0; i < SIZE; i++){
        cout << vec[i] << ", ";
    }
    cout << "nn" << endl;

    vector<int> medians = findMedians(vec, 0, SIZE-1);

    cout << "The 5-sorted version: " << endl;
    for(int i = 0; i < SIZE; i++){
        cout << vec[i] << ", ";
    }
    cout << "nn" << endl;

    cout << "The medians extracted: " << endl;
    for(int i = 0; i < medians.size(); i++){
        cout << medians[i] << ", ";
    }
    cout << "nn" << endl;
}

/* Unit test for medianOfMedian */
void testMedianOfMedian(){
    const int SIZE = 30;
    const int MOD = 70;
    vector<int> vec = initialize(SIZE, MOD);
    cout << "Given array : " << endl;
    for(int i = 0; i < SIZE; i++){
        cout << vec[i] << ", ";
    }
    cout << "nn" << endl;
    int median = medianOfMedian(vec, 0, vec.size()-1, vec.size()/2); 
    cout << "nnThe median is : " << median << endl;

    cout << "As opposed to sorting and then showing the median... : " << endl;
    std::sort(vec.begin(), vec.end());
    cout << "sorted array : " << endl;
    for(int i = 0; i < SIZE; i++){
        if(i == SIZE/2)
            cout << "**";
        cout << vec[i] << ", ";
    }
    cout << "Median : " << vec[SIZE/2] << endl;
}

Extra section about the output that I'm getting

Given array :
7, 49, 23, 48, 20, 62, 44, 8, 43, 29, 20, 65, 42, 62, 7, 33, 37, 39, 60, 52, 53, 19, 29, 7, 50, 3, 69, 58, 56, 65,

After pivoting with the value 5 we get :
23, 29, 39, 42, 43,

After pivoting with the value 0 we get :
39,

Segmentation Fault: 11

It seems all right and dandy until the segmentation fault. I'm confident that my partition function works as well(was one of the implementations for the leetcode question).

Disclaimer: This is not a homework problem, but rather my own curiosity about the algorithm after I used quickSelect in a leetcode problem set.

Please let me know if my question proposed requires more elaboration for MVCE, thanks!

EDIT: I figured out that the recursion partition scheme is wrong in my code. As Pradhan has pointed out - I somehow have empty vectors which lead to the start and end being 0 and -1 respectively, causing me to have segmentation fault from an infinite loop of calling it. Still trying to figure this part out.

Aucun commentaire:

Enregistrer un commentaire