2026-05-24T06:06:42Zhttps://keep.lib.asu.edu/oai/request

oai:keep.lib.asu.edu:node-2023212025-08-18T22:22:09Zoai_pmh:alloai_pmh:repo_items

202321 https://hdl.handle.net/2286/R.2.N.202321 http://rightsstatements.org/vocab/InC/1.0/ All Rights Reserved 2025 139 pages Doctoral Dissertation Academic theses en Kumarage, Kumarage Tharindu Sandaruwan Liu, Huan Li, Baoxin Davulcu, Hasan Basharat, Arslan Arizona State University Partial requirement for: Ph.D., Arizona State University, 2025 Field of study: Computer Science Artificial intelligence (AI), particularly generative AI, has undergone a remarkable transformation with the advent of large language models (LLMs), enabling scalable and human-like content generation. These capabilities have accelerated progress across journalism, education, and scientific discovery. However, the same advances that drive innovation also introduce avenues for misuse, particularly by creating deceptive or harmful content that threatens public trust and the safe deployment of AI technologies. In response to these threats, two complementary directions have emerged: (1) AI-generated content forensics, which offers reactive techniques to identify and attribute synthetic text, and (2) safety training of LLMs, which aims to prevent harmful outputs through alignment techniques proactively. However, both practices face a growing reliability crisis. As LLMs become more sophisticated and adversaries more strategic, existing forensic detectors struggle to maintain accuracy under content manipulation and newly emerging multi-turn threats such as AI-powered social engineering. Moreover, safety training of LLMs often fails to generalize beyond narrow training distributions, remains vulnerable to jailbreak attempts, and can lead to over-refusal of legitimate user queries, thereby reducing utility. In my dissertation, I address these fundamental reliability challenges. In the forensic domain, I have proposed incorporating domain-specific expertise to enhance robustness and accuracy in identifying AI-generated news content. I further conducted red-teaming evaluations of current forensic detectors, revealing vulnerabilities under adversarial manipulation and instruction tuning. To address increasingly conversational threats, I extended single-turn forensic analysis to multi-turn social engineering attacks, providing new methodologies and datasets that capture attacker intent, victim traits, and dynamic interaction patterns. Finally, in proactive safety training, I have proposed a reasoning-driven alignment approach to improve LLM safety behavior. Rather than overfitting to known threats, this approach enables the creation of safety reasoning data that trains models to reason about the latent harmfulness of prompts, thereby improving robustness and generalization.These contributions offer a path toward more resilient and trustworthy generative AI systems by reinforcing detection and prevention mechanisms against evolving AI misuse threats. Artificial Intelligence Nanoscience Towards Trustworthy AI: A Study of Reactive Forensics and Proactive Safety Reasoning