As device and voltage scaling cease, ever-increasing performance targets can only be achieved through the design of parallel, heterogeneous architectures. The workloads targeted by these domain-specific architectures must be designed to leverage the strengths of the platform: a task that has proven to be extremely difficult and expensive.
Machine learning has the potential to automate this process by understanding the features of computation that optimize device utilization and throughput.
Unfortunately, applications of this technique have utilized small data-sets and specific feature extraction, limiting the impact of their contributions.
To address this problem I present Dash-Database; a repository of C and C++ programs for software-defined radio applications and its neighboring fields; a methodology for structuring the features of computation using kernels, and a set of evaluation metrics to standardize computation data sets. Dash-Database contributes a general data set that supports machine understanding of computation and standardizes the input corpus utilized for machine learning of computation; currently only a small set of benchmarks and features are being used.
I present an evaluation of Dash-Database using three novel metrics: breadth, depth and richness; and compare its results to a data set largely representative of those used in prior work, indicating a 5x increase in breadth, 40x increase in depth, and a rich set of sample features.
Using Dash-Database, the broader community can work toward a general machine understanding of computation that can automate the design of workloads for domain-specific computation.