We mainly support coarse-grain crowd-benchmarking/crowd-tuning, i.e. global stats per whole program or one main function. Next, we need to improve repository to keep stats per multiple functions/kernels within analyzed program (while keeping track of any optimizatoins on this level, i.e. compiler flags, passes, OpenCL/CUDA/OpenMP/MPI parameters, etc). This will be useful to unify Caffe/TensorFlow/BLAS crowd-benchmarking and crowd-tuning under one engine ...