Scientific Computing and Numerics (SCAN) Seminar
Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At CERN, for example, it will become by far the dominant problem in the upcoming "high luminosity" era of the Large Hadron Collider (LHC). Massively parallel processing will be essential due to the extremely high event rate of the HL-LHC. In an effort to expand parallelism, various track finding techniques have been explored, but the most common ones in use today are those based on the Kalman Filter. Significant experience has been accumulated with Kalman Filter techniques in real tracking detector systems; they are known to provide high physics performance, are robust, and are in fact the very ones that have been used in the design of the tracking system for HL-LHC.
Kalman Filter methods have already been parallelized on a large-grain, per-event basis. However, modern CPU design has been pushing the available parallelism down to much finer-grain scales, due mainly to power density constraints. While processor features continue to shrink in size in accordance with Moore's Law, not all existing code implementations will see the benefit. In order to realize the expected performance/price gains, it will be necessary for algorithms to exploit larger numbers of lightweight cores, together with specialized functional units such as extra-wide vector units. Examples of these lower-power, multi-core processor technologies today include Intel's Xeon Phi and GPGPUs.
In our current investigation, we parallelize the usual Kalman Filter algorithm per track, rather than per event. We find that by grouping multiple tracks into a carefully optimized data structure, and processing them simultaneously within the vector unit associated with a single thread, Kalman-Filter-based track fitting can achieve large speedups both with Intel Xeon and Xeon Phi. We report on some of our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a realistic simulation setup.