Matching Items (2)
Filtering by

Clear all filters

Description

We propose a new strategy for blackjack, BB-Player, which leverages Hidden Markov Models (HMMs) in online planning to sample a normalized predicted deck distribution for a partially-informed distance heuristic. Viterbi learning is applied to the most-likely sampled future sequence in each game state to generate transition and emission matrices for

We propose a new strategy for blackjack, BB-Player, which leverages Hidden Markov Models (HMMs) in online planning to sample a normalized predicted deck distribution for a partially-informed distance heuristic. Viterbi learning is applied to the most-likely sampled future sequence in each game state to generate transition and emission matrices for this upcoming sequence. These are then iteratively updated with each observed game on a given deck. Ultimately, this process informs a heuristic to estimate the true symbolic distance left, which allows BB-Player to determine the action with the highest likelihood of winning (by opponent bust or blackjack) and not going bust. We benchmark this strategy against six common card counting strategies from three separate levels of difficulty and a randomized action strategy. On average, BB-Player is observed to beat card-counting strategies in win optimality, attaining a 30.00% expected win percentage, though it falls short of beating state-of-the-art methods.

ContributorsLakamsani, Sreeharsha (Author) / Ren, Yi (Thesis director) / Lee, Heewook (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)
Created2023-05
158800-Thumbnail Image.png
Description
Bicycle stabilization has become a popular topic because of its complex dynamic behavior and the large body of bicycle modeling research. Riding a bicycle requires accurately performing several tasks, such as balancing and navigation which may be difficult for disabled people. Their problems could be partially reduced by providing steering

Bicycle stabilization has become a popular topic because of its complex dynamic behavior and the large body of bicycle modeling research. Riding a bicycle requires accurately performing several tasks, such as balancing and navigation which may be difficult for disabled people. Their problems could be partially reduced by providing steering assistance. For stabilization of these highly maneuverable and efficient machines, many control techniques have been applied – achieving interesting results, but with some limitations which includes strict environmental requirements. This thesis expands on the work of Randlov and Alstrom, using reinforcement learning for bicycle self-stabilization with robotic steering. This thesis applies the deep deterministic policy gradient algorithm, which can handle continuous action spaces which is not possible for Q-learning technique. The research involved algorithm training on virtual environments followed by simulations to assess its results. Furthermore, hardware testing was also conducted on Arizona State University’s RISE lab Smart bicycle platform for testing its self-balancing performance. Detailed analysis of the bicycle trial runs are presented. Validation of testing was done by plotting the real-time states and actions collected during the outdoor testing which included the roll angle of bicycle. Further improvements in regard to model training and hardware testing are also presented.
ContributorsTurakhia, Shubham (Author) / Zhang, Wenlong (Thesis advisor) / Yong, Sze Zheng (Committee member) / Ren, Yi (Committee member) / Arizona State University (Publisher)
Created2020