Currently the "experiment runs" oracle compares the expected/reference measurements with observed ones by computing element-wise similarity. However, the authors of EgWalker caution that measurements may vary when using different HW/SW setups than in their original EuroSys'25 paper.
Thus, we need to take into account these differences and compare the distribution shapes instead using Pearson correlation.