Error: Get Sizes of Pairwise Intersection between Sorted Sets Very Efficiently

I pray that someone can give an answer/some suggestion. All suggestions are welcomed. I start writing c since last week, when python libraries are too slow.

How to do the following very fast without using too much memory on a single cpu?

Input

ar_element is a 1D array of the elements of all the sets, arranged as [ elements of 0th set, elemenst of 1st set, ..., elements of num_set - 1th set ]. Elements within each set is already sorted from small to large.
ar_elemptr[i] is the index of the 1st element of the ith set in the ar_element array.

The input is similar to a matrix in compressed sparse row matrix format.

Output

ar_setsize is the size of all pairwise intersection between the sets. This array has (num_set)*(num_set - 1)/2 elements. It can be in following form: [ size of intersection between 1st set and 0th set, [that bween 2nd and 0th set] [that between 2nd and 1st set], etc].

I will multi-thread later, but that is another question for SO. I heard that the cpu can run multiple instructions at the same time, but I don't now how to take advantage of that.

Thank you for your help.

The benchmark code is at here. I can't post it on stackoverflow because it is a bit too long for a SO post.

Adding your implementation is easy. Just follow the format at the main function, which is at the end of the whole program. Feel free to ask any question about it.

gcc can compile it with gcc -O3 set_intersection.c icc (intel c compiler) can compile it with icc set_intersection.c -lrt When posting the timing it produces, please also state the compiler option you used. Sometimes it can mean 2 times speed difference.

Here is the output that I get (compiled with icc -O3):

~~~~~~~~~~~~~~
~ Small Test ~
~~~~~~~~~~~~~~

    Test Parameters

        Title                                   Small Test
        Verbosity                               1         
        Cycle Makes New Test Data               1         
        # Cycles per Implementation             2         
        # Set in Test Data                      10        
        Minimal Set Size                        5         
        Maximal Set Size                        10        
        Max # Consecutive Small Spacing         4         
        Size of Small Spacing                   2         
        Size of Large Spacing                   10        

...

    ~~~ Implementation 2/2: 2 ptr stepping A ~~~

        Warning: Timing is not avaliable for verbosity level >= 1

        = Cycle 1/2 =

        Create new test data

        The sets are:

            set 0: [3 5 6 8 14]
            set 1: [9 11 12 15 17 18 24 25]
            set 2: [2 3 5 14 15 17 19 20]
            set 3: [10 11 13 14 16 18]
            set 4: [2 3 9 11 12]
            set 5: [9 10 11 13 22 24 26]
            set 6: [1 3 5 7 9 11 21 22 31 33]
            set 7: [2 3 4 14 16 18 19]
            set 8: [5 6 7 8 15 16]
            set 9: [7 9 10 12 21 23 24 25]

        The result is:

            row 0:
            row 1:  0 
            row 2:  3 2 
            row 3:  1 2 1 
            row 4:  1 3 2 1 
            row 5:  0 3 0 3 2 
            row 6:  2 2 2 1 3 3 
            row 7:  2 1 4 3 2 0 1 
            row 8:  3 1 2 1 0 0 2 1 
            row 9:  0 4 0 1 2 3 3 0 1 



~~~~~~~~~~~~~~
~ Large Test ~
~~~~~~~~~~~~~~

    Test Parameters

        Title                                   Large Test
        Verbosity                               0         
        Cycle Makes New Test Data               1         
        # Cycles per Implementation             1         
        # Set in Test Data                      1000      
        Minimal Set Size                        2000      
        Maximal Set Size                        6000      
        Max # Consecutive Small Spacing         10        
        Size of Small Spacing                   200       
        Size of Large Spacing                   7000      

    Limits of Types

        Max value of usigned short              65535     
        Max value of short                      32767     
        Max value of usigned int                4294967295
        Max value of int                        2147483647
        Max value of usigned long               18446744073709551615
        Max value of long                       9223372036854775807
        Max value of usigned long long          18446744073709551615
        Max value of long long                  9223372036854775807

    Compiler

        Compiler Name                           Intel C   
        Version String                          Intel(R) C++ gcc 4.1.2 mode
        Version                                 1600      
        Build Date(YYYYMMDD)                    20160415  

    Timing

        ------------------------------------------------
        |      Implementation       |Time (s) per cycle|
        ------------------------------------------------
        |Nothing                    |   7.881500e-02   |
        |2 ptr stepping A           |   1.532209e+01   |
        ------------------------------------------------

My implementation is as below. (Don't laugh out too loud... I am a newbie to C. I spent 4 days writing the benchmark code without doing anything else.)

void pairwise_set_intersect(set_id   num_set,
                            elem_id* ar_elemptr,
                            element* ar_element,
                            setsize* ar_setsize)
{

    set_id  i; // Index for outer loop over the sets
    set_id  j; // Index for inner loop over the sets
    set_id  r; // Index of the result array
    setsize s; // Counter for the size of intersection between
               // a pair of sets.

    element e1;// Current element of the 1st set
    element e2;// Current element of the 2nd set.
    elem_id a;     // Index of e1 in the ar_element array
    elem_id b;     // Index of e2 in the ar_element array
    elem_id a_end; // (Index of last element of 1st set) + 1
    elem_id b_end; // (Index of last element of 2nd set) + 1

    r = 0;     // Set index of the result array to one.
    for (i = 1; i < num_set; i++)   // Outer loop over the sets.
    {
        for (j = 0; j < i; j++)     // Inner loop over the sets.
        {
            s = 0;                  // Counting the size of the
                                    // intersection between
                                    // the 1st and the 2nd set.

            a     = ar_elemptr[i];  // Index of the ar_element array,
                                    // for 1st element of 1st set

            b     = ar_elemptr[j];  // Index of the ar_element array,
                                    // for 1st element of 2nd set.

            a_end = ar_elemptr[i+1];// Ending index of the 1st set

            b_end = ar_elemptr[j+1];// Ending index of the 2nd set

            // loop for calculating s until all the elements
            // of the 1st or the 2nd set have been processed.
            while ( (a < a_end) && (b < b_end) )
            {
                e1 = ar_element[a]; // current element of 1st set
                e2 = ar_element[b]; // current element of 2nd set

                s += (e1 == e2);    // Increment size counter if
                                    // the elements are equal

                a += (e1 <= e2);    // Increment index to 1st set
                                    // if current element of 1st
                                    // set is less than or equal
                                    // to current element of the
                                    // 2nd set

                b += (e1 >= e2);    // similar to above.
            }

            ar_setsize[r] = s;      // Put the size of intersection
                                    // into the result array
            r += 1;                 // Increase index of the result
                                    // array
        }
    }

}

Updates

Found agner.org/optimize The vector class library seems useful. (6/20/2016)
Double binary search, as suggested by khaja in Efficient list intersection algorithm. This seems to benefit from the high sparsity of the data, which is about 748/750 most of the time. (6/22/2016)
simd binary search (6/22/2016)

Error

vendredi 24 juin 2016

Get Sizes of Pairwise Intersection between Sorted Sets Very Efficiently

Aucun commentaire:

Enregistrer un commentaire