Search Content

Learning Energy Consumption and Demand Models Through Data Mining for Reverse Engineering

Description

The estimation of energy demand (by power plants) has traditionally relied on historical energy use data for the region(s) that a plant produces for. Regression analysis, artificial neural network and Bayesian theory are the most common approaches for analysing these data. Such data and techniques do not generate reliable results.…

The estimation of energy demand (by power plants) has traditionally relied on historical energy use data for the region(s) that a plant produces for. Regression analysis, artificial neural network and Bayesian theory are the most common approaches for analysing these data. Such data and techniques do not generate reliable results. Consequently, excess energy has to be generated to prevent blackout; causes for energy surge are not easily determined; and potential energy use reduction from energy efficiency solutions is usually not translated into actual energy use reduction. The paper highlights the weaknesses of traditional techniques, and lays out a framework to improve the prediction of energy demand by combining energy use models of equipment, physical systems and buildings, with the proposed data mining algorithms for reverse engineering. The research team first analyses data samples from large complex energy data, and then, presents a set of computationally efficient data mining algorithms for reverse engineering. In order to develop a structural system model for reverse engineering, two focus groups are developed that has direct relation with cause and effect variables. The research findings of this paper includes testing out different sets of reverse engineering algorithms, understand their output patterns and modify algorithms to elevate accuracy of the outputs.

ContributorsNaganathan, Hariharan (Author) / Chong, Oswald (Author) / Ye, Long (Author) / Ira A. Fulton School of Engineering (Contributor)

Created2015-12-09

Low-Investment Energy Retrofit Framework for Small and Medium Office Buildings

Description

Small and medium office buildings consume a significant parcel of the U.S. building stock energy consumption. Still, owners lack resources and experience to conduct detailed energy audits and retrofit analysis. We present an eight-steps framework for an energy retrofit assessment in small and medium office buildings. Through a bottom-up approach…

Small and medium office buildings consume a significant parcel of the U.S. building stock energy consumption. Still, owners lack resources and experience to conduct detailed energy audits and retrofit analysis. We present an eight-steps framework for an energy retrofit assessment in small and medium office buildings. Through a bottom-up approach and a web-based retrofit toolkit tested on a case study in Arizona, this methodology was able to save about 50% of the total energy consumed by the case study building, depending on the adopted measures and invested capital. While the case study presented is a deep energy retrofit, the proposed framework is effective in guiding the decision-making process that precedes any energy retrofit, deep or light.

ContributorsRios, Fernanda (Author) / Parrish, Kristen (Author) / Chong, Oswald (Author) / Ira A. Fulton School of Engineering (Contributor)

Created2016-05-20

A Non-Stationary Analysis Using Ensemble Empirical Mode Decomposition to Detect Anomalies in Building Energy Consumption

Description

Commercial buildings’ consumption is driven by multiple factors that include occupancy, system and equipment efficiency, thermal heat transfer, equipment plug loads, maintenance and operational procedures, and outdoor and indoor temperatures. A modern building energy system can be viewed as a complex dynamical system that is interconnected and influenced by external…

Commercial buildings’ consumption is driven by multiple factors that include occupancy, system and equipment efficiency, thermal heat transfer, equipment plug loads, maintenance and operational procedures, and outdoor and indoor temperatures. A modern building energy system can be viewed as a complex dynamical system that is interconnected and influenced by external and internal factors. Modern large scale sensor measures some physical signals to monitor real-time system behaviors. Such data has the potentials to detect anomalies, identify consumption patterns, and analyze peak loads. The paper proposes a novel method to detect hidden anomalies in commercial building energy consumption system. The framework is based on Hilbert-Huang transform and instantaneous frequency analysis. The objectives are to develop an automated data pre-processing system that can detect anomalies and provide solutions with real-time consumption database using Ensemble Empirical Mode Decomposition (EEMD) method. The finding of this paper will also include the comparisons of Empirical mode decomposition and Ensemble empirical mode decomposition of three important type of institutional buildings.

ContributorsNaganathan, Hariharan (Author) / Chong, Oswald (Author) / Huang, Zigang (Author) / Cheng, Ying (Author) / Ira A. Fulton School of Engineering (Contributor)

Created2016-05-20

Semi-Supervised Energy Modeling (SSEM) for Building Clusters Using Machine Learning Techniques

Description

There are many data mining and machine learning techniques to manage large sets of complex energy supply and demand data for building, organization and city. As the amount of data continues to grow, new data analysis methods are needed to address the increasing complexity. Using data from the energy loss…

There are many data mining and machine learning techniques to manage large sets of complex energy supply and demand data for building, organization and city. As the amount of data continues to grow, new data analysis methods are needed to address the increasing complexity. Using data from the energy loss between the supply (energy production sources) and demand (buildings and cities consumption), this paper proposes a Semi-Supervised Energy Model (SSEM) to analyse different loss factors for a building cluster. This is done by deep machine learning by training machines to semi-supervise the learning, understanding and manage the process of energy losses. Semi-Supervised Energy Model (SSEM) aims at understanding the demand-supply characteristics of a building cluster and utilizes the confident unlabelled data (loss factors) using deep machine learning techniques. The research findings involves sample data from one of the university campuses and presents the output, which provides an estimate of losses that can be reduced. The paper also provides a list of loss factors that contributes to the total losses and suggests a threshold value for each loss factor, which is determined through real time experiments. The conclusion of this paper provides a proposed energy model that can provide accurate numbers on energy demand, which in turn helps the suppliers to adopt such a model to optimize their supply strategies.

ContributorsNaganathan, Hariharan (Author) / Chong, Oswald (Author) / Chen, Xue-wen (Author) / Ira A. Fulton Schools of Engineering (Contributor)

Created2015-09-14

Data Mining of High Density Genomic Variant Data for Prediction of Alzheimer's Disease Risk

Description

Background: The discovery of genetic associations is an important factor in the understanding of human illness to derive disease pathways. Identifying multiple interacting genetic mutations associated with disease remains challenging in studying the etiology of complex diseases. And although recently new single nucleotide polymorphisms (SNPs) at genes implicated in immune response,…

Background: The discovery of genetic associations is an important factor in the understanding of human illness to derive disease pathways. Identifying multiple interacting genetic mutations associated with disease remains challenging in studying the etiology of complex diseases. And although recently new single nucleotide polymorphisms (SNPs) at genes implicated in immune response, cholesterol/lipid metabolism, and cell membrane processes have been confirmed by genome-wide association studies (GWAS) to be associated with late-onset Alzheimer's disease (LOAD), a percentage of AD heritability continues to be unexplained. We try to find other genetic variants that may influence LOAD risk utilizing data mining methods.

Methods: Two different approaches were devised to select SNPs associated with LOAD in a publicly available GWAS data set consisting of three cohorts. In both approaches, single-locus analysis (logistic regression) was conducted to filter the data with a less conservative p-value than the Bonferroni threshold; this resulted in a subset of SNPs used next in multi-locus analysis (random forest (RF)). In the second approach, we took into account prior biological knowledge, and performed sample stratification and linkage disequilibrium (LD) in addition to logistic regression analysis to preselect loci to input into the RF classifier construction step.

Results: The first approach gave 199 SNPs mostly associated with genes in calcium signaling, cell adhesion, endocytosis, immune response, and synaptic function. These SNPs together with APOE and GAB2 SNPs formed a predictive subset for LOAD status with an average error of 9.8% using 10-fold cross validation (CV) in RF modeling. Nineteen variants in LD with ST5, TRPC1, ATG10, ANO3, NDUFA12, and NISCH respectively, genes linked directly or indirectly with neurobiology, were identified with the second approach. These variants were part of a model that included APOE and GAB2 SNPs to predict LOAD risk which produced a 10-fold CV average error of 17.5% in the classification modeling.

Conclusions: With the two proposed approaches, we identified a large subset of SNPs in genes mostly clustered around specific pathways/functions and a smaller set of SNPs, within or in proximity to five genes not previously reported, that may be relevant for the prediction/understanding of AD.

ContributorsBriones, Natalia (Author) / Dinu, Valentin (Author) / College of Health Solutions (Contributor)

Created2012-01-25

Differential Expression of MicroRNAs as Predictors of Glioblastoma Phenotypes

Description

Background: Glioblastoma is the most aggressive primary central nervous tumor and carries a very poor prognosis. Invasion precludes effective treatment and virtually assures tumor recurrence. In the current study, we applied analytical and bioinformatics approaches to identify a set of microRNAs (miRs) from several different human glioblastoma cell lines that exhibit…

Background: Glioblastoma is the most aggressive primary central nervous tumor and carries a very poor prognosis. Invasion precludes effective treatment and virtually assures tumor recurrence. In the current study, we applied analytical and bioinformatics approaches to identify a set of microRNAs (miRs) from several different human glioblastoma cell lines that exhibit significant differential expression between migratory (edge) and migration-restricted (core) cell populations. The hypothesis of the study is that differential expression of miRs provides an epigenetic mechanism to drive cell migration and invasion.

Results: Our research data comprise gene expression values for a set of 805 human miRs collected from matched pairs of migratory and migration-restricted cell populations from seven different glioblastoma cell lines. We identified 62 down-regulated and 2 up-regulated miRs that exhibit significant differential expression in the migratory (edge) cell population compared to matched migration-restricted (core) cells. We then conducted target prediction and pathway enrichment analysis with these miRs to investigate potential associated gene and pathway targets. Several miRs in the list appear to directly target apoptosis related genes. The analysis identifies a set of genes that are predicted by 3 different algorithms, further emphasizing the potential validity of these miRs to promote glioblastoma.

Conclusions: The results of this study identify a set of miRs with potential for decreased expression in invasive glioblastoma cells. The verification of these miRs and their associated targeted proteins provides new insights for further investigation into therapeutic interventions. The methodological approaches employed here could be applied to the study of other diseases to provide biomedical researchers and clinicians with increased opportunities for therapeutic interventions.

ContributorsBradley, Barrie (Author) / Loftus, Joseph C. (Author) / Mielke, Clinton (Author) / Dinu, Valentin (Author) / College of Health Solutions (Contributor)

Created2014-01-18

Next-Generation Sequencing Methylation Profiling of Subjects With Obesity Identifies Novel Gene Changes

Description

Background: Obesity is a metabolic disease caused by environmental and genetic factors. However, the epigenetic mechanisms of obesity are incompletely understood. The aim of our study was to investigate the role of skeletal muscle DNA methylation in combination with transcriptomic changes in obesity.

Results: Muscle biopsies were obtained basally from lean (n = 12; BMI = 23.4 ± 0.7…

Background: Obesity is a metabolic disease caused by environmental and genetic factors. However, the epigenetic mechanisms of obesity are incompletely understood. The aim of our study was to investigate the role of skeletal muscle DNA methylation in combination with transcriptomic changes in obesity.

Results: Muscle biopsies were obtained basally from lean (n = 12; BMI = 23.4 ± 0.7 kg/m[superscript 2]) and obese (n = 10; BMI = 32.9 ± 0.7 kg/m[superscript 2]) participants in combination with euglycemic-hyperinsulinemic clamps to assess insulin sensitivity. We performed reduced representation bisulfite sequencing (RRBS) next-generation methylation and microarray analyses on DNA and RNA isolated from vastus lateralis muscle biopsies. There were 13,130 differentially methylated cytosines (DMC; uncorrected P < 0.05) that were altered in the promoter and untranslated (5' and 3'UTR) regions in the obese versus lean analysis. Microarray analysis revealed 99 probes that were significantly (corrected P < 0.05) altered. Of these, 12 genes (encompassing 22 methylation sites) demonstrated a negative relationship between gene expression and DNA methylation. Specifically, sorbin and SH3 domain containing 3 (SORBS3) which codes for the adapter protein vinexin was significantly decreased in gene expression (fold change −1.9) and had nine DMCs that were significantly increased in methylation in obesity (methylation differences ranged from 5.0 to 24.4 %). Moreover, differentially methylated region (DMR) analysis identified a region in the 5'UTR (Chr.8:22,423,530–22,423,569) of SORBS3 that was increased in methylation by 11.2 % in the obese group. The negative relationship observed between DNA methylation and gene expression for SORBS3 was validated by a site-specific sequencing approach, pyrosequencing, and qRT-PCR. Additionally, we performed transcription factor binding analysis and identified a number of transcription factors whose binding to the differentially methylated sites or region may contribute to obesity.

Conclusions: These results demonstrate that obesity alters the epigenome through DNA methylation and highlights novel transcriptomic changes in SORBS3 in skeletal muscle.

ContributorsDay, Samantha (Author) / Coletta, Rich (Author) / Kim, Joon Young (Author) / Campbell, Latoya (Author) / Benjamin, Tonya R. (Author) / Roust, Lori R. (Author) / De Filippis, Elena A. (Author) / Dinu, Valentin (Author) / Shaibi, Gabriel (Author) / Mandarino, Lawrence J. (Author) / Coletta, Dawn (Author) / College of Liberal Arts and Sciences (Contributor)

Created2016-07-18

BitTorious Volunteer: Server-Side Extensions for Centrally-Managed Volunteer Storage in BitTorrent Swarms

Description

Background: Our publication of the BitTorious portal [1] demonstrated the ability to create a privatized distributed data warehouse of sufficient magnitude for real-world bioinformatics studies using minimal changes to the standard BitTorrent tracker protocol. In this second phase, we release a new server-side specification to accept anonymous philantropic storage donations by…

Background: Our publication of the BitTorious portal [1] demonstrated the ability to create a privatized distributed data warehouse of sufficient magnitude for real-world bioinformatics studies using minimal changes to the standard BitTorrent tracker protocol. In this second phase, we release a new server-side specification to accept anonymous philantropic storage donations by the general public, wherein a small portion of each user’s local disk may be used for archival of scientific data. We have implementated the server-side announcement and control portions of this BitTorrent extension into v3.0.0 of the BitTorious portal, upon which compatible clients may be built.

Results: Automated test cases for the BitTorious Volunteer extensions have been added to the portal’s v3.0.0 release, supporting validation of the “peer affinity” concept and announcement protocol introduced by this specification. Additionally, a separate reference implementation of affinity calculation has been provided in C++ for informaticians wishing to integrate into libtorrent-based projects.

Conclusions: The BitTorrent “affinity” extensions as provided in the BitTorious portal reference implementation allow data publishers to crowdsource the extreme storage prerequisites for research in “big data” fields. With sufficient awareness and adoption of BitTorious Volunteer-based clients by the general public, the BitTorious portal may be able to provide peta-scale storage resources to the scientific community at relatively insignificant financial cost.

ContributorsLee, Preston (Author) / Dinu, Valentin (Author) / College of Health Solutions (Contributor)

Created2015-11-04

BitTorious: Global Controlled Genomics Data Publication, Research, and Archiving Via BitTorrent Extensions

Description

Background: Centralized silos of genomic data are architecturally easier to initially design, develop and deploy than distributed models. However, as interoperability pains in EHR/EMR, HIE and other collaboration-centric life sciences domains have taught us, the core challenge of networking genomics systems is not in the construction of individual silos, but the…

Background: Centralized silos of genomic data are architecturally easier to initially design, develop and deploy than distributed models. However, as interoperability pains in EHR/EMR, HIE and other collaboration-centric life sciences domains have taught us, the core challenge of networking genomics systems is not in the construction of individual silos, but the interoperability of those deployments in a manner embracing the heterogeneous needs, terms and infrastructure of collaborating parties. This article demonstrates the adaptation of BitTorrent to private collaboration networks in an authenticated, authorized and encrypted manner while retaining the same characteristics of standard BitTorrent.

Results: The BitTorious portal was sucessfully used to manage many concurrent domestic Bittorrent clients across the United States: exchanging genomics data payloads in excess of 500GiB using the uTorrent client software on Linux, OSX and Windows platforms. Individual nodes were sporadically interrupted to verify the resilience of the system to outages of a single client node as well as recovery of nodes resuming operation on intermittent Internet connections.

Conclusions: The authorization-based extension of Bittorrent and accompanying BitTorious reference tracker and user management web portal provide a free, standards-based, general purpose and extensible data distribution system for large ‘omics collaborations.

ContributorsLee, Preston (Author) / Dinu, Valentin (Author) / College of Health Solutions (Contributor)

Created2014-12-21

Statistical Methods for Analyzing Immunosignatures

Description

Background: Immunosignaturing is a new peptide microarray based technology for profiling of humoral immune responses. Despite new challenges, immunosignaturing gives us the opportunity to explore new and fundamentally different research questions. In addition to classifying samples based on disease status, the complex patterns and latent factors underlying immunosignatures, which we attempt…

Background: Immunosignaturing is a new peptide microarray based technology for profiling of humoral immune responses. Despite new challenges, immunosignaturing gives us the opportunity to explore new and fundamentally different research questions. In addition to classifying samples based on disease status, the complex patterns and latent factors underlying immunosignatures, which we attempt to model, may have a diverse range of applications.

Methods: We investigate the utility of a number of statistical methods to determine model performance and address challenges inherent in analyzing immunosignatures. Some of these methods include exploratory and confirmatory factor analyses, classical significance testing, structural equation and mixture modeling.

Results: We demonstrate an ability to classify samples based on disease status and show that immunosignaturing is a very promising technology for screening and presymptomatic screening of disease. In addition, we are able to model complex patterns and latent factors underlying immunosignatures. These latent factors may serve as biomarkers for disease and may play a key role in a bioinformatic method for antibody discovery.

Conclusion: Based on this research, we lay out an analytic framework illustrating how immunosignatures may be useful as a general method for screening and presymptomatic screening of disease as well as antibody discovery.

ContributorsBrown, Justin (Author) / Stafford, Phillip (Author) / Johnston, Stephen (Author) / Dinu, Valentin (Author) / College of Health Solutions (Contributor)

Created2011-08-19

ASU Scholarship Showcase

Filtering by

Learning Energy Consumption and Demand Models Through Data Mining for Reverse Engineering

Low-Investment Energy Retrofit Framework for Small and Medium Office Buildings

A Non-Stationary Analysis Using Ensemble Empirical Mode Decomposition to Detect Anomalies in Building Energy Consumption

Semi-Supervised Energy Modeling (SSEM) for Building Clusters Using Machine Learning Techniques

Data Mining of High Density Genomic Variant Data for Prediction of Alzheimer's Disease Risk

Differential Expression of MicroRNAs as Predictors of Glioblastoma Phenotypes

Next-Generation Sequencing Methylation Profiling of Subjects With Obesity Identifies Novel Gene Changes

BitTorious Volunteer: Server-Side Extensions for Centrally-Managed Volunteer Storage in BitTorrent Swarms

BitTorious: Global Controlled Genomics Data Publication, Research, and Archiving Via BitTorrent Extensions

Statistical Methods for Analyzing Immunosignatures