<?xml version="1.0"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-05-19T11:01:49Z</responseDate><request verb="GetRecord" metadataPrefix="oai_dc">https://keep.lib.asu.edu/oai/request</request><GetRecord><record><header><identifier>oai:keep.lib.asu.edu:node-201880</identifier><datestamp>2025-07-17T19:39:31Z</datestamp><setSpec>oai_pmh:all</setSpec><setSpec>oai_pmh:repo_items</setSpec></header><metadata><oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:identifier>201880</dc:identifier>
          <dc:identifier>https://hdl.handle.net/2286/R.2.N.201880</dc:identifier>
                  <dc:rights>http://rightsstatements.org/vocab/InC/1.0/</dc:rights>
          <dc:rights>All Rights Reserved</dc:rights>
                  <dc:date>2025</dc:date>
                  <dc:format>146 pages</dc:format>
                  <dc:type>Doctoral Dissertation</dc:type>
          <dc:type>Academic theses</dc:type>
                  <dc:language>en</dc:language>
                  <dc:contributor>Gupta, Mohit</dc:contributor>
          <dc:contributor>Eiris, Ricardo</dc:contributor>
          <dc:contributor>Czerniawski, Thomas</dc:contributor>
          <dc:contributor>Zeng, Ruijie</dc:contributor>
          <dc:contributor>Arizona State University</dc:contributor>
                  <dc:description>Partial requirement for: Ph.D., Arizona State University, 2025</dc:description>
          <dc:description>Field of study: Civil, Environmental and Sustainable Engineering</dc:description>
          <dc:description>Piping and Instrumentation Diagrams (P&amp;IDs) are critical engineering schematics in process industry. Extracting information from P&amp;IDs remains challenging due to two key issues: (1) P&amp;IDs are often shared in non-machine-readable formats, rendering them incompatible with CAD tools for searchability and editing, (2) interpreting topological relationships among mechanical equipment and piping components requires domain expertise, which CAD tools currently do not support. As a result, semantic information extraction remains mostly manual. Prior work used neural networks to automate digitization. However, these relied on small datasets with extensive class-specific labels, making them impractical given the variability in P&amp;ID standards and the hundreds of unique symbols used. Similarly, approaches for semantic information extraction were often narrow scope and rule-based, limiting flexibility and adaptation.
To address challenges in symbol detection due to variability in P&amp;ID standards and symbols representation, a hybrid framework combining class-agnostic symbol detection with self-supervised differentiation using Siamese Networks is proposed. This approach reduces annotation effort by 2.5× compared to class-specific methods while achieving a Top-5 accuracy of over 95%. A complete end-to-end digitization pipeline is then proposed, detecting symbols, text, and lines while structuring connectivity into a graph-like structure facilitating interoperability with industrial formats like DEXPI and SFILES 2.0. A novel line detection algorithm is developed achieving F1-score of 0.997 with a runtime of 2.1 seconds outperforming prior pixel-traversal methods that take over 40-50 minutes. Text detection is performed using a finetuned KerasOCR model. The overall time for end-to-end digitization of a single P&amp;ID sheet is 22 seconds, roughly 120× faster than prior methods.
For semantic information extraction, the graph is converted into a Knowledge Graph (KG) enriched with equipment metadata. A question-answering system, powered by Large Language Models (LLMs), extracts information with a tiered context injection strategy, improving accuracy from 12.7% to 99.5% by grounding queries in schema definitions and domain knowledge. Additionally, an open-source PIDQA dataset consisting of 64,000 question-answer pairs is introduced as the first benchmark for industrial KG querying.
By integrating advances in computer vision, graph-based reasoning, and LLMs, this work establishes a scalable, annotation-efficient framework to convert static P&amp;IDs into dynamic, queryable knowledge graphs.


</dc:description>
                  <dc:subject>Civil Engineering</dc:subject>
          <dc:subject>Artificial Intelligence</dc:subject>
          <dc:subject>Computer Science</dc:subject>
          <dc:subject>Computer vision</dc:subject>
          <dc:subject>Knowledge Graphs</dc:subject>
          <dc:subject>P&amp;ID</dc:subject>
          <dc:subject>Piping &amp; Instrumentation</dc:subject>
          <dc:subject>Siamese Network</dc:subject>
          <dc:subject>YOLO</dc:subject>
                  <dc:title>Converting Raw P&amp;IDs into Queryable Knowledge Graphs with AI</dc:title></oai_dc:dc></metadata></record></GetRecord></OAI-PMH>
