Matching Items (3)
Filtering by

Clear all filters

156679-Thumbnail Image.png
Description
The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to hel

The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to help understand the data, discover the patterns and improve the decision making. All the proposed methods can generalize to other industrial fields.

The first topic of this dissertation focuses on the data clustering. Data clustering is often the first step for analyzing a dataset without the label information. Clustering high-dimensional data with mixed categorical and numeric attributes remains a challenging, yet important task. A clustering algorithm based on tree ensembles, CRAFTER, is proposed to tackle this task in a scalable manner.

The second part of this dissertation aims to develop data representation methods for genome sequencing data, a special type of high-dimensional data in the biomedical domain. The proposed data representation method, Bag-of-Segments, can summarize the key characteristics of the genome sequence into a small number of features with good interpretability.

The third part of this dissertation introduces an end-to-end deep neural network model, GCRNN, for time series classification with emphasis on both the accuracy and the interpretation. GCRNN contains a convolutional network component to extract high-level features, and a recurrent network component to enhance the modeling of the temporal characteristics. A feed-forward fully connected network with the sparse group lasso regularization is used to generate the final classification and provide good interpretability.

The last topic centers around the dimensionality reduction methods for time series data. A good dimensionality reduction method is important for the storage, decision making and pattern visualization for time series data. The CRNN autoencoder is proposed to not only achieve low reconstruction error, but also generate discriminative features. A variational version of this autoencoder has great potential for applications such as anomaly detection and process control.
ContributorsLin, Sangdi (Author) / Runger, George C. (Thesis advisor) / Kocher, Jean-Pierre A (Committee member) / Pan, Rong (Committee member) / Escobedo, Adolfo R. (Committee member) / Arizona State University (Publisher)
Created2018
157101-Thumbnail Image.png
Description中国商品期货市场经历30年发展,已初备协调资源分配、对冲经营风险的功能。但受产业自身和期货市场发展的制约,各期货品种市场有效性参差不齐。随着我国经济从增量阶段过渡到存量阶段,期货作为企业的价格管理和风险控制工具的重要性日益凸显,因此研究我国商品期货市场有效性具有非常好的现实意义。

本文开创性的从期货的基本功能——资源配置的角度出发,提出有效市场是指其期货价格能够对本行业社会资源起到合理的调配作用的市场。在内容安排上,本文首先总结了现有国际成熟期货品种的特点并找出能够反映期货对资源配置能力的四个指标假说,分别为期现回归性、利润波动性、库存波动性以及现金流变化,然后通过数学模型证明指标数据和品种成熟度的关联,最后应用该套指标对我国商品市场有效性进行检验。数学方法上,本文先采用Bai-Perron内生多重结构突变模型对时间序列进行突变点检验,然后对断点时间序列分别进行多元回归,并在剔除季节性和周期性后,通过平稳性检验、ARCH效应检验结果来确定相应的Garch模型,并用Garch模型来描述时间序列的波动性。

通过数学验证,我们认为期现回归性、利润波动性、库存波动性以及现金流变化这四个指标可以作为反映期货成熟度的检验指标,用该套方法对国内部分活跃品种检验后发现大连豆粕期货已经具备成熟品种的特征,本文认为豆粕期货市场是有效的;PTA、玉米淀粉期货的四个检验指标在近年来表现出时间序列优化的特点,但因时间较短尚不稳定,可以认为是接近成熟的品种;而螺纹钢和铝期货在多数指标上表现不佳,表明他们对社会资源配置能力较差,因此本文认为螺纹钢和铝期货市场是活跃但非有效的。通过进一步分析,本文认为品种的期现回归性差是制约其资源配置能力发挥的关键因素,而交易标的不明确、

仓单制作难度大、产业参与度低以及期货设计中的其他限制因素又是导致期现回归性差的重要原因。
ContributorsWang, Ping (Author) / Gu, Bin (Thesis advisor) / Li, Feng (Thesis advisor) / Yan, Hong (Committee member) / Arizona State University (Publisher)
Created2019
154246-Thumbnail Image.png
Description
The power of science lies in its ability to infer and predict the

existence of objects from which no direct information can be obtained

experimentally or observationally. A well known example is to

ascertain the existence of black holes of various masses in different

parts of the universe from indirect evidence, such as X-ray

The power of science lies in its ability to infer and predict the

existence of objects from which no direct information can be obtained

experimentally or observationally. A well known example is to

ascertain the existence of black holes of various masses in different

parts of the universe from indirect evidence, such as X-ray emissions.

In the field of complex networks, the problem of detecting

hidden nodes can be stated, as follows. Consider a network whose

topology is completely unknown but whose nodes consist of two types:

one accessible and another inaccessible from the outside world. The

accessible nodes can be observed or monitored, and it is assumed that time

series are available from each node in this group. The inaccessible

nodes are shielded from the outside and they are essentially

``hidden.'' The question is, based solely on the

available time series from the accessible nodes, can the existence and

locations of the hidden nodes be inferred? A completely data-driven,

compressive-sensing based method is developed to address this issue by utilizing

complex weighted networks of nonlinear oscillators, evolutionary game

and geospatial networks.

Both microbes and multicellular organisms actively regulate their cell

fate determination to cope with changing environments or to ensure

proper development. Here, the synthetic biology approaches are used to

engineer bistable gene networks to demonstrate that stochastic and

permanent cell fate determination can be achieved through initializing

gene regulatory networks (GRNs) at the boundary between dynamic

attractors. This is experimentally realized by linking a synthetic GRN

to a natural output of galactose metabolism regulation in yeast.

Combining mathematical modeling and flow cytometry, the

engineered systems are shown to be bistable and that inherent gene expression

stochasticity does not induce spontaneous state transitioning at

steady state. By interfacing rationally designed synthetic

GRNs with background gene regulation mechanisms, this work

investigates intricate properties of networks that illuminate possible

regulatory mechanisms for cell differentiation and development that

can be initiated from points of instability.
ContributorsSu, Ri-Qi (Author) / Lai, Ying-Cheng (Thesis advisor) / Wang, Xiao (Thesis advisor) / Bliss, Daniel (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)
Created2015