Abstract—Linear-feedback shift register (LFSR) counters have been shown to be well suited to applications requiring large arrays of counters and can improve the area and performance compared with conventional binary counters. However, signif icant logic is required to decode the count order into binary, causing system-on-chip designs to be unfeasible. This paper presents a counter design based on multiple LFSR stages that retains the advantages of a single-stage LFSR but only requires decoding logic that scales logarithmically with the number of stages rather than exponentially with the number of bits as required by other methods. A four-stage four-bit LFSR proof of concept was fabricated in 130-nm CMOS and was characterized in a time-to-digital converter application at 800 MHz.
Abstract—In this paper, a new pseudorandom number gen erator (PRNG) based on the logistic map has been proposed. To prevent the system to fall into short period orbits as well as increasing the randomness of the generated sequences, the pro posed algorithm dynamically changes the parameters of the chaotic system. This PRNG has been implemented in a Virtex 7 f ield-programmable gate array (FPGA) with a 32-bit fixed point precision, using a total of 510 lookup tables (LUTs) and 120 registers. The sequences generated by the proposed algorithm have been subjected to the National Institute of Standards and Technology (NIST) randomness tests, passing all of them. By comparing the randomness with the sequences generated by a raw 32-bit logistic map, it is shown that, by using only an additional 16% of LUTs, the proposed PRNG obtains a much better performance in terms of randomness, increasing the NIST passing rate from 0.252 to 0.989. Finally, the proposed bitwise dynamical PRNG is compared with other chaos-based realizations previously proposed, showing great improvement in terms of resources and randomness.
Abstract—A nondestructive column-selection-enabled 10T SRAM for aggressive power reduction is presented in this brief. It frees a half-selected behavior by exploiting the bitline-shared data-aware write scheme. The differential-VDD (Diff-VDD) technique is adopted to improve the write ability of the design. In addition, its decoupled read bitlines are given permission to be charged and discharged depending on the stored data bits. In combination with the proposed dropped-VDD biasing, it achieves the significant power reduction. The experimental results show that the proposed design provides the 3.3× improvement in the write margin compared with the standard Diff-10T SRAM. A 5.5-kb 10T SRAM in a 65-nm CMOS process has a total power of 51.25 µW and a leakage power of 41.8 µW when operating at 6.25 MHz at 0.5 V, achieving 56.3% reduction in dynamic power and 32.1% reduction in leakage power compared with the previous single-ended 10T SRAM.
Abstract—This paper describes a bandwidth (BW)- and slew rate (SR)-enhanced class AB voltage follower (VF). A thorough small signal analysis of the proposed and a state-of-the-art AB-enhanced VF is presented to compare their performance. The proposed circuit has 50-MHz BW, 19.5-V/µs SR, and a BW figure of merit of 41.6 (MHz × pF/µW) for CL = 50 pF. It provides 13 times higher current efficiency and 15 times higher BW than the conventional VF with equal 60-µW static power dissipation. The experimental and simulation results of a fabricated test chip in the 130-nm CMOS technology validate the proposed circuit.
Abstract— Approximate computing is one of best suited efficient data processing for error resilient applications, such as signal and image processing, computer vision, machine learning, data mining etc. Approximate computing reduces accuracy which is acceptable as a cost of increasing the circuit characteristics depends on the application. Desirable accuracy is the threshold point for controlling the trade off, between accuracy and circuit characteristics under the control of the circuit designer. In this work, the rounding technique is introduced as an efficient method for controlling this trade off. In this regard multiplier circuits as a critical building block for computing in most of the processors have been considered for the evaluation of the rounding technique efficiency. The impact of the rounding method is investigated by comparison of circuit characteristics for three multipliers. These three multipliers are the conventional Wallace tree accurate multiplier, DRUM [4] the recently proposed approximate multiplier and the rounded based approximate multiplier proposed in this work. Simulation results for three selected technologies show significant improvement on the circuit characteristics in terms of power, area, speed, and energy for proposed multiplier in comparison with their counterparts. Input data rounding pattern and the probability of the repetition for rounded values has been introduced as two essential items to control the level of the accuracy
Abstract—This paper presents a performance comparison of new authenticated encryption (AE) algorithms which are aimed at providing better security and resource efficiency compared to existing standards. Specifically, these algorithms improve the security of existing AE standards by providing a critical property termed nonce-misuse resistance. This paper addresses algorithm to architectural mappings of several candidates from the ongoing Competition for AE: Security, Applicability, and Robustness as well as a submission from the Crypto Forum Research Group. Implementations of the architectures on both field-programmable gate arrays and application-specific integrated circuits platforms are provided and compared with the architecture of a popular standard: Advanced Encryption Standard in Galois Counter mode (AES-GCM). Optimizations that are applicable to AE, in general, and nonce-misuse-resistant architectures, in particu lar, are presented. A hardware–software codesign approach to optimization is also discussed. The implementations via pro posed optimizations demonstrate that new AE algorithms can provide comparable performance as standard AES-GCM while enhancing
Abstract—A scalable approximate multiplier, called truncation- and rounding-based scalable approximate multiplier (TOSAM) is presented, which reduces the number of partial products by truncating each of the input operands based on their leading one-bit position. In the proposed design, multiplication is performed by shift, add, and small fixed-width multiplication operations resulting in large improvements in the energy consumption and area occupation compared to those of the exact multiplier. To improve the total accuracy, input operands of the multiplication part are rounded to the nearest odd number. Because input operands are truncated based on their leading one-bit positions, the accuracy becomes weakly dependent on the width of the input operands and the multiplier becomes scalable. Higher improvements in design parameters (e.g., area and energy consumption) can be achieved as the input operand widths increase. To evaluate the efficiency of the proposed approximate multiplier, its design parameters are compared with those of an exact multiplier and some other recently proposed approximate multipliers. Results reveal that the proposed approximate multiplier with a mean absolute relative error in the range of 11%–0.3% improves delay, area, and energy consumption up to 41%, 90%, and 98%, respectively, compared to those of the exact multiplier. It also outperforms other approximate multipliers in terms of speed, area, and energy consumption. The proposed approximate multiplier has an almost Gaussian error distribution with a near-zero mean value. We exploit it in the structure of a JPEG encoder, sharpening, and classification applications. The results indicate that the quality degradation of the output is negligible. In addition, we suggest an accuracy configurable TOSAM where the energy consumption of the multiplication operation can be adjusted based on the minimum required accuracy.
Abstract—A nondestructive column-selection-enabled 10T SRAM for aggressive power reduction is presented in this brief. It frees a half-selected behavior by exploiting the bitline-shared data-aware write scheme. The differential-VDD (Diff-VDD) technique is adopted to improve the write ability of the design. In addition, its decoupled read bitlines are given permission to be charged and discharged depending on the stored data bits. In combination with the proposed dropped-VDD biasing, it achieves the significant power reduction. The experimental results show that the proposed design provides the 3.3× improvement in the write margin compared with the standard Diff-10T SRAM. A 5.5-kb 10T SRAM in a 65-nm CMOS process has a total power of 51.25 µW and a leakage power of 41.8 µW when operating at 6.25 MHz at 0.5 V, achieving 56.3% reduction in dynamic power and 32.1% reduction in leakage power compared with the previous single-ended 10T SRAM.
Abstract—Linear-feedback shift register (LFSR) counters have been shown to be well suited to applications requiring large arrays of counters and can improve the area and performance compared with conventional binary counters. However, signif icant logic is required to decode the count order into binary, causing system-on-chip designs to be unfeasible. This paper presents a counter design based on multiple LFSR stages that retains the advantages of a single-stage LFSR but only requires decoding logic that scales logarithmically with the number of stages rather than exponentially with the number of bits as required by other methods. A four-stage four-bit LFSR proof of concept was fabricated in 130-nm CMOS and was characterized in a time-to-digital converter application at 800 MHz.
Abstract: In recent years, ultra-low-voltage (ULV) operation is gaining more importance for achieving minimum energy consumption. Full adder is the basic computational arithmetic block in many of the computing and signal/image processing applications. Here, a new hybrid 1-bit full adder circuit which employs both Gate Diffusion Input (GDI) logic and multi-threshold voltage (MVT) transistor logic is reported. The main objective of the proposed MVT-GDI-based hybrid full adder design is to provide minimum energy consumption with less area. The proposed hybrid design is simulated using standard 45 nm CMOS process technology at an ULV of 0.2 V. The post-layout simulation results have shown that the proposed design achieved significant improvements in comparison with the other reported designs by achieving >57%, 92% savings in the Energy and EDP, respectively, with only 14 transistors. Monte–Carlo simulations have also been performed and is found that the proposed design methodology yields full functionality and robustness against local and global process variations. Normalised energy metrics to 32 and 22 nm technologies shows that the proposed design achieves >57% energy savings in prior to the recent works.
Abstract: In recent years, ultra-low-voltage (ULV) operation is gaining more importance for achieving minimum energy consumption. Full adder is the basic computational arithmetic block in many of the computing and signal/image processing applications. Here, a new hybrid 1-bit full adder circuit which employs both Gate Diffusion Input (GDI) logic and multi-threshold voltage (MVT) transistor logic is reported. The main objective of the proposed MVT-GDI-based hybrid full adder design is to provide minimum energy consumption with less area. The proposed hybrid design is simulated using standard 45 nm CMOS process technology at an ULV of 0.2 V. The post-layout simulation results have shown that the proposed design achieved significant improvements in comparison with the other reported designs by achieving >57%, 92% savings in the Energy and EDP, respectively, with only 14 transistors. Monte–Carlo simulations have also been performed and is found that the proposed design methodology yields full functionality and robustness against local and global process variations. Normalised energy metrics to 32 and 22 nm technologies shows that the proposed design achieves >57% energy savings in prior to the recent works.