Learning Sparse Representations for Fruit-Fly Gene Expression Pattern Image Annotation and Retrieval
Fruit fly embryogenesis is one of the best understood animal development systems, and the spatiotemporal gene expression dynamics in this process are captured by digital images. Analysis of these high-throughput images will provide novel insights into the functions, interactions, and networks of animal genes governing development. To facilitate comparative analysis, web-based interfaces have been developed to conduct image retrieval based on body part keywords and images. Currently, the keyword annotation of spatiotemporal gene expression patterns is conducted manually. However, this manual practice does not scale with the continuously expanding collection of images. In addition, existing image retrieval systems based on the expression patterns may be made more accurate using keywords.
Results
In this article, we adapt advanced data mining and computer vision techniques to address the key challenges in annotating and retrieving fruit fly gene expression pattern images. To boost the performance of image annotation and retrieval, we propose representations integrating spatial information and sparse features, overcoming the limitations of prior schemes.
Conclusions
We perform systematic experimental studies to evaluate the proposed schemes in comparison with current methods. Experimental results indicate that the integration of spatial information and sparse features lead to consistent performance improvement in image annotation, while for the task of retrieval, sparse features alone yields better results.
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and other non-random biases cannot be de-
tected by increased depth. The problem of accurate genotyping is exacerbated when
there is not a reference genome or other auxiliary information available.
I explore several methods for sensitively detecting mutations in non-model or-
ganisms using an example Eucalyptus melliodora individual. I use the structure of
the tree to find bounds on its somatic mutation rate and evaluate several algorithms
for variant calling. I find that conventional methods are suitable if the genome of a
close relative can be adapted to the study organism. However, with structured data,
a likelihood framework that is aware of this structure is more accurate. I use the
techniques developed here to evaluate a reference-free variant calling algorithm.
I also use this data to evaluate a k-mer based base quality score recalibrator
(KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing
data. Base quality scores can help detect errors in sequencing reads, but are often
inaccurate. The most popular method for correcting this issue requires a known
set of variant sites, which is unavailable in most cases. I simulate data and show
that errors in this set of variant sites can cause calibration errors. I then show that
KBBQ accurately recalibrates base quality scores while requiring no reference or other
information and performs as well as other methods.
Finally, I use the Eucalyptus data to investigate the impact of quality score calibra-
tion on the quality of output variant calls and show that improved base quality score
calibration increases the sensitivity and reduces the false positive rate of a variant
calling algorithm.
Pathways of Distinction Analysis of Liver Cancer Data: Genetic Differences Between Males and Females
In this paper, we present a Bayesian analysis for the Weibull proportional hazard (PH) model used in step-stress accelerated life testings. The key mathematical and graphical difference between the Weibull cumulative exposure (CE) model and the PH model is illustrated. Compared with the CE model, the PH model provides more flexibility in fitting step-stress testing data and has the attractive mathematical properties of being desirable in the Bayesian framework. A Markov chain Monte Carlo algorithm with adaptive rejection sampling technique is used for posterior inference. We demonstrate the performance of this method on both simulated and real datasets.
We describe mechanical metamaterials created by folding flat sheets in the tradition of origami, the art of paper folding, and study them in terms of their basic geometric and stiffness properties, as well as load bearing capability. A periodic Miura-ori pattern and a non-periodic Ron Resch pattern were studied. Unexceptional coexistence of positive and negative Poisson's ratio was reported for Miura-ori pattern, which are consistent with the interesting shear behavior and infinity bulk modulus of the same pattern. Unusually strong load bearing capability of the Ron Resch pattern was found and attributed to the unique way of folding. This work paves the way to the study of intriguing properties of origami structures as mechanical metamaterials.