Metrics We Care About

1. MCS minimality

1.1. Big section on effectiveness of timing heuristics

1.2. Deep dive into a handful of cases

2. Scalability

2.1. Scaling graph (already included)

2.2. Back of envelope calc of limits of different designs

2.3. Demonstrate cases where replay/dd work on very long traces

3. Number/relevance of bugs found

3.1. Find more bugs

3.2. Submit patches after debugging complete

3.3. Very thoroughly document cases where STS was *not* useful and why.

4. Developer time saved

4.1. A/B test on developer time spent

5. Overhead for instrumentation

5.1. Document LOC needed to instrument

5.2. Evaluate effectiveness of non-determinism mitigation techniques and their relative complexity

6. Non-determinism

6.1. Stack graph of effectiveness of different techniques for mitigating non-determinism

6.2. MCS size vs. intermediate retries experiment