Exploration of Optimization Techniques

Dmitri Smirnov

Updated

Motivation

  • Reconstruction of particle tracks and vertices is intrinsically an optimization problem searching for best solution in a multidimensional parameter space
  • In the field of particle physics we are constantly searching for fast and efficient reconstruction techniques to cope with growing number of particles produced in high energy collisions

Requirements

  • Applicable to the following class of problems
    • A set of measurements coming from a number of underlying processes that can be (approximately) described by (roughly) known models
  • Minimal tuning of initial parameters, i.e. defaults should satisfy many problems
  • Perform for a wide range of the number of input measurements (orders of magnitude)
  • Auto seeding. Components should be tried (identified) during the fit automatically controlled by a minimal set of parameters
  • Robustness. Converge to global optimum from different initial conditions

EM Algorithm

  • After reviewing the literature we chose to explore the EM algorithm (and its variations) due to the following:
    • Simple implementation
    • Guaranteed to converge
    • Extensible and adoptable, with
  • References:
    • "The Expectation-Maximization (EM) algorithm (Dempster, Laird, and Rubin, 1977) is an iterative statistical technique for computing maximum likelihood parameter estimates from incomplete data", (DA Variant of the EM Algorithm, Ueda, Nakano, 1995)
    • wiki article

Software Stack

  • For prototyping and quick tests of new ideas we chose to work within Python ecosystem due to its reach codebase for data analysis
      Anticipated dependencies
    • numpy for fast vector and matrix calculations
    • pandas for easy data manipulation
    • plotly for visualization

Development

Release Schedule

  • 0.0.x
    • Initial implementation of EM
    • Layout controls and visualization
    • Sporadic updates while optimizing software stack
    • Tests with toy data samples
  • 0.1.0
    • Automatic seed creation
    • Support multidimensional data $(N>2)$ with arbitrary 2D/(1D?) projections
    • Read user data from files
    • Tests with realistic data

Release Schedule

  • 0.1.x
    • ...
    • Automatic seed merging?
  • 1.x.x
    • C++ implementation