From Data Collection to Learning from Distributed Data: a Minimum Cost Incentive Mechanism for Private Discrete Distribution Estimation and an Optimal Stopping Approach for Iterative Training in Federated Learning
The first half of this dissertation introduces a minimum cost incentive mechanism for collecting discrete distributed private data for big-data analysis. The goal of an incentive mechanism is to incentivize informative reports and make sure randomization in the reported data does not exceed a target level. It answers two fundamental questions: what is the minimum payment required to incentivize an individual to submit data with quality level $\epsilon$? and what incentive mechanisms can achieve the minimum payment? A lower bound on the minimum amount of payment required for guaranteeing quality level $\epsilon$ is derived. Inspired by the lower bound, our incentive mechanism (WINTALL) first decides a winning answer based on reported data, then pays to individuals whose reported data match the winning answer. The expected payment of WINTALL matches lower bound asymptotically. Real-world experiments on Amazon Mechanical Turk are presented to further illustrate novelty of the principle behind WINTALL.
The second half studies problem of iterative training in Federated Learning. A system with a single parameter server and $M$ client devices is considered for training a predictive learning model with distributed data. The clients communicate with the parameter server using a common wireless channel so each time, only one device can transmit. The training is an iterative process consisting of multiple rounds. Adaptive training is considered where the parameter server decides when to stop/restart a new round, so the problem is formulated as an optimal stopping problem. While this optimal stopping problem is difficult to solve, a modified optimal stopping problem is proposed. Then a low complexity algorithm is introduced to solve the modified problem, which also works for the original problem. Experiments on a real data set shows significant improvements compared with policies collecting a fixed number of updates in each iteration.