2015年6月2日 星期二

5 Tips for Speeding Up Your Code Using TBB

5 Tips for Speeding Up Your Code Using TBBShare your comment!

Intel Threading Building Blocks is a library to help your application use parallelization without you having learn all the subtleties of threading and to avoid the pitfalls. If you use TBB to speed up your software, below are a few tips to help you on your way.

You don’t always need a blocked_range

A blocked_range in TBB is a template class that’s used to divide (recursively) an array or range of data into smaller pieces so that TBB can then process each piece concurrently. However, you can use TBB to control how the division is done. You have to provide a simple class D that implements these methods:
  • bool empty() const;
  • bool is_divisible() const;
  • D (D & d, split);
The empty() function returns true if the range is empty, while is_divisible() returns true if the range can be split into two non-empty sub-ranges. The constructor D has a parameter d of type D and a constant of type tbb::split. After splitting the range into two roughly half ranges, d refers to the first half and the constructed object instance to the second half.
The advantages of this are that you’re free to partition the range as you choose. For example, if you are processing image data, then a 2D array of pixels could be processed.

Use parallel_do

The traditional do_while loop has never been what you’d call a popular construct compared to while and for loops. In my opinion the do_while loop was a design mistake in C, C++ and other C related languages. If you compare do_while to Pascal’s Repeat_Until, both are similar and run the statement at least once then check loop termination at the end. however repeat-until(done) terminates in the more logical but opposite sense to do_while(!done) and I think repeat_until is clearer. Especially as in northern England the word while is sometimes used to mean until! “He were there while 11:00 pm”.
In TBB, you can a tbb::parallel_do when the loop terminating condition isn’t predetermined as it would be with a tbb::parallel_for. An additional advantage of parallel_do is that a second parameter of type parallel_do_feeder can be used to add additional work items.

Use Concurrent Containers

The C++ STL containers are broadly thread safe. Simultaneous reads of the same object are safe and simultaneous writes of different objects. Simultaneous writes to the same objects will however probably cause data races and lead to corruption of the STL container.
But Intel TBB to the rescue. It provides several concurrent container template classes, specifically concurrent_hash_map, concurrent_vector and concurrent_queues. Locking is done only on the part of the container is needed but this means that overall performance is a bit slower than STL containers so only use these if a speedup is likely.

Avoid false sharing with cache_aligned_allocator

False sharing is where two or more processes access the same shared memory cache line at the same time. Modern processors read data from memory not just a byte, word or quadword at a time but in a fixed size block called a cache line. When false sharing occurs, reading data into a cache line causes the other processor’s cache line data to become stale and it requires a refresh which diminishes performance.
Intel TBB provides two allocator templates classes similar to std::allocator. If your program does a lot of memory allocation then you would use the scalable_allocator, but to avoid false sharing you’d instead use the cache_aligned_allocator. Two objects allocated by this are guaranteed not to have false sharing.

Use Tasks Not Threads

In the .NET world, the Task Parallel Library has made multi-tasking a lot simpler than managing threads and with Intel TBB you have the equivalent for C++. Thread programming can be messy and very easy to get wrong. If you develop tasks instead then you let the task scheduler allocates tasks to threads, and tasks start up can be up to 100x faster than threads on Windows.
It’s worth reading the Catalog of Recommended task patterns to understand how best to use tasks.

Conclusion

Hopefully, these tips will give you a flavor of TBB’s power and how you can get the most out of it.

沒有留言:

張貼留言