- [ ] Datasets that can be used to evaluate skills - [ ] Dataset for developer skills - [ ] Dataset for user skills - [ ] What percentage of datasets will be run per PR - [ ] When to run complete suite for the skills - [ ] What are the metrics to capture for evaluation