Theoretical analysis and optimization for SC DC-DC converters have been presented in prior works, however optimization of different capacitors, namely flying and input/output decoupling capacitors, in SC voltage regulators (SCVRs) under an area constraint has not been addressed. A methodology to optimize flying and decoupling capacitance for area-constrained on-chip SCVRs to achieve the highest system-level power efficiency. Considering both conversion efficiency and droop voltage against fast load transients, the proposed model determines the optimal ratio between flying and decoupling.
Based on the previous design, a fully integrated switched-capacitor voltage regulator with voltage comparison and on-chip lossless current sensing control is proposed. Based on the voltage comparison result and sensed current as the load current changes, the frequency of the SC converters are modulated for optimal efficiency. The voltage regulator targets 2.1V input voltage and 0.9V output voltage, which offers higher-voltage power transfer across chip package. A 17-phase interleaved structure is used to reduce output voltage ripple.
In 65nm CMOS, the regulator is implemented with MIM-capacitor, targeting 2.1V input voltage and 0.9V output voltage. According to the measurement results, the proposed SC voltage regulator achieves 69.6% peak efficiency at 60mA load current, which corresponds to a 4.2mW/mm2 power-area density and 12.5mW
F power-capacitance density. The efficiency across 20mA to 92mA regulator load current range is above 62%. The steady-state output voltage ripple across 22x load current range of 3.5mA-76mA is between 50mV to 60mV.
on-square matrix. It has a wide range
of applications especially in Multiple Input-Multiple Output (MIMO) communication
systems. Unfortunately it has high computation complexity { for matrix size of nxn,
QRD has O(n3) complexity and back substitution, which is used to solve a system
of linear equations, has O(n2) complexity. Thus, as the matrix size increases, the
hardware resource requirement for QRD and back substitution increases signicantly.
This thesis presents the design and implementation of a
exible QRD and back substitution accelerator using a folded architecture. It can support matrix sizes of
4x4, 8x8, 12x12, 16x16, and 20x20 with low hardware resource requirement.
The proposed architecture is based on the systolic array implementation of the
Givens algorithm for QRD. It is built with three dierent types of computation blocks
which are connected in a 2-D array structure. These blocks are controlled by a
scheduler which facilitates reusability of the blocks to perform computation for any
input matrix size which is a multiple of 4. These blocks are designed using two
basic programming elements which support both the forward and backward paths to
compute matrix R in QRD and column-matrix X in back substitution computation.
The proposed architecture has been mapped to Xilinx Zynq Ultrascale+ FPGA
(Field Programmable Gate Array), ZCU102. All inputs are complex with precision
of 40 bits (38 fractional bits and 1 signed bit). The architecture can be clocked at
50 MHz. The synthesis results of the folded architecture for dierent matrix sizes
are presented. The results show that the folded architecture can support QRD and
back substitution for inputs of large sizes which otherwise cannot t on an FPGA
when implemented using a
at architecture. The memory sizes required for dierent
matrix sizes are also presented.
Owing to the suprasegmental behavior of emotional speech, turn-level features have demonstrated a better success than frame-level features for recognition-related tasks. Conventionally, such features are obtained via a brute-force collection of statistics over frames, thereby losing important local information in the process which affects the performance. To overcome these limitations, a novel feature extraction approach using latent topic models (LTMs) is presented in this study. Speech is assumed to comprise of a mixture of emotion-specific topics, where the latter capture emotionally salient information from the co-occurrences of frame-level acoustic features and yield better descriptors. Specifically, a supervised replicated softmax model (sRSM), based on restricted Boltzmann machines and distributed representations, is proposed to learn naturally discriminative topics. The proposed features are evaluated for the recognition of categorical or continuous emotional attributes via within and cross-corpus experiments conducted over acted and spontaneous expressions. In a within-corpus scenario, sRSM outperforms competing LTMs, while obtaining a significant improvement of 16.75% over popular statistics-based turn-level features for valence-based classification, which is considered to be a difficult task using only speech. Further analyses with respect to the turn duration show that the improvement is even more significant, 35%, on longer turns (>6 s), which is highly desirable for current turn-based practices. In a cross-corpus scenario, two novel adaptation-based approaches, instance selection, and weight regularization are proposed to reduce the inherent bias due to varying annotation procedures and cultural perceptions across databases. Experimental results indicate a natural, yet less severe, deterioration in performance - only 2.6% and 2.7%, thereby highlighting the generalization ability of the proposed features.
inherent low voltage stability. Moreover, designs increasingly use compiled instead of custom memory blocks, which frequently employ static, rather than pre-charged dynamic RFs. In this work, the various RFs designed for a microprocessor cache and register files are discussed. Comparison between static and dynamic RF power dissipation and timing characteristics is also presented. The relative timing and power advantages of the designs are shown to be dependent on the memory aspect ratio, i.e. array width and height.
ligence. The innovations in ANN has led to ground breaking technological advances
like self-driving vehicles,medical diagnosis,speech Processing,personal assistants and
many more. These were inspired by evolution and working of our brains. Similar
to how our brain evolved using a combination of epigenetics and live stimulus,ANN
require training to learn patterns.The training usually requires a lot of computation
and memory accesses. To realize these systems in real embedded hardware many
Energy/Power/Performance issues needs to be solved. The purpose of this research
is to focus on methods to study data movement requirement for generic Neural Net-
work along with the energy associated with it and suggest some ways to improve the
design.Many methods have suggested ways to optimize using mix of computation and
data movement solutions without affecting task accuracy. But these methods lack a
computation model to calculate the energy and depend on mere back of the envelope calculation. We realized that there is a need for a generic quantitative analysis
for memory access energy which helps in better architectural exploration. We show
that the present architectural tools are either incompatible or too slow and we need
a better analytical method to estimate data movement energy. We also propose a
simplistic yet effective approach that is robust and expandable by users to support
various systems.