I pray that someone can give an answer/some suggestion. All suggestions are welcomed. I start writing c since last week, when python libraries are too slow.
How to do the following very fast without using too much memory on a single cpu?
Input
ar_element
is a 1D array of the elements of all the sets, arranged as [ elements of 0th set, elemenst of 1st set, ..., elements ofnum_set - 1
th set ]. Elements within each set is already sorted from small to large.ar_elemptr[i]
is the index of the 1st element of thei
th set in thear_element
array.
The input is similar to a matrix in compressed sparse row matrix format.
Output
ar_setsize
is the size of all pairwise intersection between the sets. This array has(num_set)*(num_set - 1)/2
elements. It can be in following form: [ size of intersection between 1st set and 0th set, [that bween 2nd and 0th set] [that between 2nd and 1st set], etc].
I will multi-thread later, but that is another question for SO. I heard that the cpu can run multiple instructions at the same time, but I don't now how to take advantage of that.
Thank you for your help.
The benchmark code is at here. I can't post it on stackoverflow because it is a bit too long for a SO post.
Adding your implementation is easy. Just follow the format at the main function, which is at the end of the whole program. Feel free to ask any question about it.
gcc can compile it with gcc -O3 set_intersection.c icc (intel c compiler) can compile it with icc set_intersection.c -lrt When posting the timing it produces, please also state the compiler option you used. Sometimes it can mean 2 times speed difference.
Here is the output that I get (compiled with icc -O3):
~~~~~~~~~~~~~~
~ Small Test ~
~~~~~~~~~~~~~~
Test Parameters
Title Small Test
Verbosity 1
Cycle Makes New Test Data 1
# Cycles per Implementation 2
# Set in Test Data 10
Minimal Set Size 5
Maximal Set Size 10
Max # Consecutive Small Spacing 4
Size of Small Spacing 2
Size of Large Spacing 10
...
~~~ Implementation 2/2: 2 ptr stepping A ~~~
Warning: Timing is not avaliable for verbosity level >= 1
= Cycle 1/2 =
Create new test data
The sets are:
set 0: [3 5 6 8 14]
set 1: [9 11 12 15 17 18 24 25]
set 2: [2 3 5 14 15 17 19 20]
set 3: [10 11 13 14 16 18]
set 4: [2 3 9 11 12]
set 5: [9 10 11 13 22 24 26]
set 6: [1 3 5 7 9 11 21 22 31 33]
set 7: [2 3 4 14 16 18 19]
set 8: [5 6 7 8 15 16]
set 9: [7 9 10 12 21 23 24 25]
The result is:
row 0:
row 1: 0
row 2: 3 2
row 3: 1 2 1
row 4: 1 3 2 1
row 5: 0 3 0 3 2
row 6: 2 2 2 1 3 3
row 7: 2 1 4 3 2 0 1
row 8: 3 1 2 1 0 0 2 1
row 9: 0 4 0 1 2 3 3 0 1
~~~~~~~~~~~~~~
~ Large Test ~
~~~~~~~~~~~~~~
Test Parameters
Title Large Test
Verbosity 0
Cycle Makes New Test Data 1
# Cycles per Implementation 1
# Set in Test Data 1000
Minimal Set Size 2000
Maximal Set Size 6000
Max # Consecutive Small Spacing 10
Size of Small Spacing 200
Size of Large Spacing 7000
Limits of Types
Max value of usigned short 65535
Max value of short 32767
Max value of usigned int 4294967295
Max value of int 2147483647
Max value of usigned long 18446744073709551615
Max value of long 9223372036854775807
Max value of usigned long long 18446744073709551615
Max value of long long 9223372036854775807
Compiler
Compiler Name Intel C
Version String Intel(R) C++ gcc 4.1.2 mode
Version 1600
Build Date(YYYYMMDD) 20160415
Timing
------------------------------------------------
| Implementation |Time (s) per cycle|
------------------------------------------------
|Nothing | 7.881500e-02 |
|2 ptr stepping A | 1.532209e+01 |
------------------------------------------------
My implementation is as below. (Don't laugh out too loud... I am a newbie to C. I spent 4 days writing the benchmark code without doing anything else.)
void pairwise_set_intersect(set_id num_set,
elem_id* ar_elemptr,
element* ar_element,
setsize* ar_setsize)
{
set_id i; // Index for outer loop over the sets
set_id j; // Index for inner loop over the sets
set_id r; // Index of the result array
setsize s; // Counter for the size of intersection between
// a pair of sets.
element e1;// Current element of the 1st set
element e2;// Current element of the 2nd set.
elem_id a; // Index of e1 in the ar_element array
elem_id b; // Index of e2 in the ar_element array
elem_id a_end; // (Index of last element of 1st set) + 1
elem_id b_end; // (Index of last element of 2nd set) + 1
r = 0; // Set index of the result array to one.
for (i = 1; i < num_set; i++) // Outer loop over the sets.
{
for (j = 0; j < i; j++) // Inner loop over the sets.
{
s = 0; // Counting the size of the
// intersection between
// the 1st and the 2nd set.
a = ar_elemptr[i]; // Index of the ar_element array,
// for 1st element of 1st set
b = ar_elemptr[j]; // Index of the ar_element array,
// for 1st element of 2nd set.
a_end = ar_elemptr[i+1];// Ending index of the 1st set
b_end = ar_elemptr[j+1];// Ending index of the 2nd set
// loop for calculating s until all the elements
// of the 1st or the 2nd set have been processed.
while ( (a < a_end) && (b < b_end) )
{
e1 = ar_element[a]; // current element of 1st set
e2 = ar_element[b]; // current element of 2nd set
s += (e1 == e2); // Increment size counter if
// the elements are equal
a += (e1 <= e2); // Increment index to 1st set
// if current element of 1st
// set is less than or equal
// to current element of the
// 2nd set
b += (e1 >= e2); // similar to above.
}
ar_setsize[r] = s; // Put the size of intersection
// into the result array
r += 1; // Increase index of the result
// array
}
}
}
Updates
- Found agner.org/optimize The vector class library seems useful. (6/20/2016)
- Double binary search, as suggested by khaja in Efficient list intersection algorithm. This seems to benefit from the high sparsity of the data, which is about 748/750 most of the time. (6/22/2016)
- simd binary search (6/22/2016)
Aucun commentaire:
Enregistrer un commentaire