• Datum: 11.-12. Oktober 2017
  • Dauer: 9:00 - 17:00 Uhr
  • Ort: Stuttgart, Hotel Steigenberger Graf Zeppelin
  • Sprache: Englisch
  • Referent: Dr. Andrei Alexandrescu
  • Fruehbucher: 1.990,- € zzgl. MwSt. (19%) bei Anmeldung bis 28.07.2017
  • Preis: 2.190,- € zzgl. MwSt. (19%)


This two-day course introduces attendees to a thorough approach to optimization techniques for contemporary computing architectures. It is based on Andrei's career-long experience with tuning performance of various software systems, from Machine Learning research to high-performance libraries to Facebook-scale computing backends.


A large category of applications have no boundaries on desired speed, meaning there's no point of diminishing returns in making code faster. Better speed means less power consumed for the same work, more workload for the same data center expense, better features for the end user, better machine learning, better analytics, and more. Yet information on writing fast code is scant and difficult to find. Software engineering folklore is rife with tales of optimizations. Programmers commonly discuss and argue whether a piece of code is supposed to be faster than another, or what to do to improve the performance of a system small or large. The course teaches systematic, scientific approaches to measuring and improving code performance.


Optimization has received increased attention during the past decade, a trend that is likely to intensify. Serial execution speed has stalled and, after parallelizing what's possible, in any system we have single-thread speed of sheer algorithmic execution as the lasting bottleneck. Systematic algorithmic improvements that work in conjunction with hardware tradeoffs are key to improving performance. Optimization has always been an art, and in particular optimizing code on contemporary hardware has become a task of formidable complexity. This is because modern hardware has a few peculiarities about it that are not sufficiently understood and explored. This class offers a thorough dive in this fascinating world.



Please note: This course is being actively developed. The actual course might contain more topics and slight variations on the topics outlined below.


  • The Art of Benchmarking 
    • Amdahl's and Lhadma's Law
    • The use of intuition

  • Conducting Time Measurements

    • Noise and how to avoid it
    • Quantization noise
    • Background noise
    • Overhead noise
    • Performance counters

  • Baselines

    • Pitfalls in choosing baselines (or failing to)
    • Choosing good baselines
    • Differential timing

  • Strength Reduction

    • Motivation
    • Strength operations hierarchy
    • Running example

  • Minimizing Indirections

    • Motivation
    • Running example

  • Replacing branches with arithmetic

    • Motivation
    • Running example
    • "Too much of a good thing"

  • Eager Computation: Tables vs. Computation 

    • Dynamically initialized static data vs. statically initialized static data
    • Running example

  • Lazy Computation 

    • Memoization
    • Computation vs. Tables
    • Lazy Structuring

  • "The Forgotten Parallelism:" Instruction-Level Parallelism 

    • Relationship with data dependencies
    • Relationship with associativity of operations and other arithmetic underpinnigs
    • Redesigning algorithms for minimizing data dependencies

  • Inlining

    • Beware compiler's most vexing inlining
    • "Dark matter" code
    • I-Cache considerations
    • Controling inlining of cdtors
    • Case study: a custom shared_ptr

      • Beware inline destructors
      • Lazy refcount allocation
      • Skip last decrement
      • Prefer zero
      • Use dedicated allocators
      • Use smaller counters

  • Copy Elision 

    • Elision vs. moving
    • API design to avoid unnecessary work

  • Scalable Use of the STL

    • A few words on hash tables
    • The affix scrambling trick
    • Resizing std::vector

  • Building Structure on Top of Arrays

    • Motivation
    • The rule of "sevenish"
    • Sorted arrays, and improvements thereof
    • Binary heaps
    • Bring to front
    • Randomized bring to front

  • Large Set Operations and Derivatives 

    • Bilinear version
    • Improved version for skewed statistics
    • Galloping search

  • Contention Minimization
    • Work on the side, lock, swap
    • Readers-Writer locks
    • Using R-W locks with the STL

  • Algorithm design topics 
    • Sentinels
    • Vacancies
    • Faster Hoare partition
    • Redesigning Quickselect

      • Better Layout
      • Smaller groups
      • Adaptation


Attendee Profile

This is aimed at C++ programmers who have efficiency of generated code as a primary concern.



The format is a highly interactive lecture. Questions during the lecture are encouraged. Use of laptops for trying out examples is allowed.

Dr. Andrei Alexandrescu
Dr. Andrei Alexandrescu
Dr. Andrei Alexandrescu is a researcher, software engineer, and author.


"The explanations of Andrei led to many "Aha!" moments and made it definitely worth my time."
S. Montellese, Supercomputing Systems AG


"Thanks to Andrei for sharing his knowledge!"
P. Kaczmarczyk, ESG Elektroniksystem - und Logistik GmbH

"Bringing my application to speed of light seems possible after this course."
C. Lang, Supercomputing Systems AG


+49 (0)711 138183-0


AGB für Seminare