# Novel Design Technique of Address Decoder for SRAM

## Arvind Kumar Mishra

Department of Electronics and Communication Engineering, National Institute of Technology, Rourkela, Odisha, India e-mail: arvindengg10@gmail.com Debiprasad Priyabrata Acharya

Department of Electronics and Communication Engineering, National Institute of Technology, Rourkela, Odisha, India e-mail: dpacharya@nitrkl.ac.in Pradip Kumar Patra

Sankalp Semiconductor, Bhubaneswar, Odisha, India e-mail: pradip\_p@sankalpsemi.com

Abstract—Address Decoder is an important digital block in SRAM which takes up to half of the total chip access time and significant part of the total SRAM power in normal read/write cycle. To design address decoder need to consider two objectives, first choosing the optimal circuit technique and second sizing of their transistors. Novel address decoder circuit is presented and analysed in this paper. Address decoder using NAND-NOR alternate stages with predecoder and replica inverter chain circuit is proposed and compared with traditional and universal block architecture, using 90nm CMOS technology. Delay and power dissipation in proposed decoder is 60.49% and 52.54% of traditional and 82.35% and 73.80% of universal block architecture respectively.

## Index Terms-Address Decoder, SRAM architecture, Cache memory.

## I. INTRODUCTION

SRAM modules are frequently used in most digital and computer systems as register and cache memory because of its compatible speed with processor and it is accessed at least once at every clock cycle. In memory hierarchy to connect bulk storage to the processor, SRAM are used as level cache [7]. SRAM is used as Cache memory which is very fast and used to speed up the task of processor and memory interface. With the recent advances in VLSI technology, processor clock rates have increased intensely. However, while the speed of CMOS transistor has increased substantially with improvements in VLSI technology, memory access times have not improved proportionally since memory densities have also increased simultaneously to handle large amount of data. Therefore, SRAM memories are dominating in designing of high-speed computers and other digital systems.

Address decoder is essential elements in all SRAM memory block which respond to very high frequency. Access time and power consumption of memories is largely determined by decoder design. Design of a random access memory (RAM) is generally divided into two parts, the decoder, which is the circuitry from the address input to the wordline, and the sense and column circuits, which includes the bitline to the data input/output circuits [2]. Due to large amount of storage cells in memories it can be found various solutions of address decoder designs leading to power consumption reduction and performance improvement. Usually different kind of precharging dynamic decoders are used [3]. Design of dynamic decoder is complex and having more probability of wrong sensing. Traditional static decoder gives more accurate result but it is having more number of transistors with large delay. Some solutions use hierarchical decoders with predecoding and also implemented binary tree decoder built by Demultiplexers [4] [5] [6].

This paper will describe the design of high speed low power decoder using NAND-NOR alternate stages, predecoder and replica circuit. The remainder of this paper is organized as follows. Section II, we will start with conventional AND decoder and then we will see universal block decoding scheme after that we will propose new design of 5:32 decoder. Simulation results for decoder is obtained from Cadence tool are presented using UMC90nm CMOS technology in Section III. Corner analysis also will be done at simulation section. The conclusion will be in Section IV.

## II. DECODER DESIGN

The row and column decoders are essential elements in all random-access memories. Time taken to access data and power consumption in memories are largely determined by decoder performance. Row decoders having an n-bit address data and prgives 2<sup>n</sup> outputs, one of which is activates cell of SRAM. The one that is activated depends directly on the address applied to the memory. The SRAM row decoder can be of a single- or multi-stage architecture. In a single-stage decoder all decoding is realized in a single block. The multiple-stage decoding uses several hierarchically-linked blocks. Normally, the most significant bits are decoded (pre-decoded) in the first decoder stage, effectively selecting the array that is to be accessed by providing enable signals for the subsequent decoder stage(s) that enable a particular word line. The number of outputs of the last decoding stage corresponds to the number of rows (word lines) to be decoded. Single-stage row decoders are attractive for small single-block memories. However, most memory architectures based on splitting the row address space into several blocks decoded by separate decoder stages. This technique is proven to be less power consuming and faster for large memories architectures with of multiple arrays.

## A. Conventional AND Decoder

Conventional decoder by using CMOS AND gate is shown in Fig. 1. Here two input AND gate is used because as number of input increases delay of decoder increases drastically. This is the basic static decoder circuit. There is a problem with



Fig. 1 Conventional 3-to-8 decoder by using CMOS AND gate [4]

implementation of the decoder in CMOS technology, because AND gates are not directly available in CMOS, their realization needs two gates, NAND and NOT serially connected. It increases number of transistors, power consumption and delay. So structure of the decoder have to be realized directly with NOT, NAND and NOR gates only [1].

#### B. Universal Block Decoding Scheme

As shown in Table 1, NAND gate gives unique logic low output when both of its input is high and it gives high output for other combination. Therefor we cannot make decoder by only using NAND gate. NOR gate gives unique logic high output when both of its input is low and it gives high output for other combination. Both these gate need inverter at output to make decoder and this increase number of transistor as well as delay in circuit. But their unique and different property can be used as combination and gives excellent result, because NAND gives output low but demands high all input and NOR gives output high but demands low input.

| TRUTH TABLE FOR LOGIC GATES |   |                            |    |      |     |  |
|-----------------------------|---|----------------------------|----|------|-----|--|
| Input                       |   | Output For Different Gates |    |      |     |  |
| Combination                 |   |                            |    |      |     |  |
| А                           | В | AND                        | OR | NAND | NOR |  |
| 0                           | 0 | 0                          | 0  | 1    | 1   |  |
| 0                           | 1 | 0                          | 1  | 1    | 0   |  |
| 1                           | 0 | 0                          | 1  | 1    | 0   |  |
| 1                           | 1 | 1                          | 1  | 0    | 0   |  |

TABLE 1 IRUTH TABLE FOR LOGIC GATES



Fig.2 schematic of 4 to 16 decoder, divided into blocks[1]

To design Decoder, Gate with unique output is required. As shown in Table 1, NOR Gate give unique high output for both low inputs and NAND gives unique low output for both high input. Based on this principle, universal design scheme is proposed to design decoder by using combination of NAND and NOR [1]. For high logic output, the last stage of decoder is consist of NOR gates and previous to that with NAND gates, the alternate stages will continue up to input stage. Number of decoder inputs will decide the no. of stages of decoder and hence the first level i.e. either NAND or NOR gates. For even no. of input, the first stage is of NOR and for odd number of inputs it is of NAND for block architecture. Fig. 2 shows the architecture of this decoder [1]. In this case 4:16 decoder has been taken as example.

Problem in block architecture decoder is that, it is not fully optimize in terms of transistor count, delay and power dissipation. Also due to different path lengths for different inputs, i.e. LSB need to travel every stage from input to output while MSB need to travel only last stage, that's why some address combination gives multiple outputs high due to path delay differences.

As shown in fig. 3, when address is 00000, before line 0 at decoder output become high, line 15 became high for some duration. This is because different path delay in at output stage. This results in false selection of cell and extra power dissipation. Only single inverter is driving the stage of large gate so delay of decoder will increases for large input. Also as number of stages increase delay increases. To eliminate these problems new decoding scheme is proposed.



## C. Proposed Decoding Scheme

We have proposed a 5:32 decoder for SRAM (Fig. 4) using predecoder and inverter replica based circuit in addition to alternate NAND and NOR stage. In this architecture predecoder circuit reduces the gate count, also number of stages from input to output which results in reduction in delay and power consumption. By the application of predecoder circuit we can reduce number of stages, it can be performed at combination 4,8,16... input decoder structure. Here we have reduced one stage.



Fig. 4 Proposed 5:32 decoder using Predecoder and replica circuit

Fig. 4 shows the proposed 5:32 decoder, here NAND and NOR stages works to produce unique output. We have used predecoder circuitry to reduce the number of stages as

compared to universal architecture, also reduced the count of transistors which makes proposed decoder faster and dissipates less power. Replica circuitry is used to overcome the problem of multiple selections due to variable path delay. It provides the same delay to MSB as that of LSB, and therefore the fixed delay circuit is formed for every logic combination change.

First stage of this decoder is always predecoder, which can be made either NAND or NOR gates depends on number of input line. In this case first stage is NOR based architecture. NOR gate provides high unique high output when all its input is low. Next stage is NAND gate because it gives unique low output when all input combination is high. Again NAND output can be decoded by NOR stage and when input combination increases we can employ predecoder.



Fig. 5: layout of proposed decoder

Based on this simple approach this type of decoder can be designed and it is basic principle of this designed technique. Third stage of this decoder needs inverter for decoding but simple inverter gives false decoder due to different path delay in different gate stage. So replica circuit overcomes the problem of multiple selections. CMOS inverter have optimal fan-out 4,so for driving 16x2 stage we need 8 inverter with 4 high and 4 low logic. Based on this approach replica chain is made and decoder is designed. Layout of proposed decoder is shown in fig.5.

Simulation results shows that the transistor count, delay and power dissipation in proposed decoder is smallest in comparison with Traditional and Block architecture. Fig. 6 shows that, as the size of decoder increases, the performance of proposed decoder is improved over block and traditional decoder architectures.



Fig. 6 Delay comparison of proposed architecture with traditional and Block architecture



TABLE 2 COMPARISON BETWEEN TRADITIONAL, BLOCK AND PROPOSED Traditional Block Proposed Delay (ps) 119 98 162 No. of Transistors 370 250 250 Power (uW) 295 210 155

## **III. SIMULATION RESULTS**

For 5:32 decoder, comparison between traditional, universal block and proposed architecture is shown in table 2. It shows the delay, number of transistors and power dissipation in proposed architecture is less than that of traditional and universal block architecture. Fig. 6 shows the proposed decoder is having better performance over traditional and block and it improves with the increase in size of decoder with respect to other. Fig. 7 shows the simulation of proposed decoder.

Table 3 gives the results for corner analysis where the largest delay is found out for SS case and it is 129.5ps whereas the smallest is for FF case and it is 81.6ps.

| TABLE 3<br>CORNER ANALYSIS |            |  |  |
|----------------------------|------------|--|--|
|                            | Delay (ps) |  |  |
| TT                         | 98         |  |  |
| SS                         | 129.5      |  |  |
| FF                         | 81.6       |  |  |
| FS                         | 97         |  |  |
| SF                         | 101        |  |  |

## IV. CONCLUSION

Decoder with NAND and NOR stages, predecoder and replica circuit is designed, it gives less power consumption and delay than that of traditional and block architecture. Delay and power dissipation in proposed decoder is 60.49% and 52.54% of traditional and 82.35% and 73.80% of universal block architecture respectively. High speed decoder is the essential components in fast SRAM. Proposed decoder is used to develop a 1-kb 8-bit 1.25-GHz SRAM Memory.

#### REFERENCES

- [1] I. Brzozowski, Ł. Zachara and A. Kos "Universal Design Method of n-to-2<sup>n</sup> Decoders," Mixed Design of Integrated Circuits and Systems Conference, Poland, June 2013.
- [2] B. Amrutur and M. Horowitz, "Fast Low-Power Decoders for RAMs," IEEE J. Solid-State Circuits, vol. 36, no. 10, pp. 1506-1515, 2001.
- [3] M. Turi and J. Frias, "High-performance low-power selective precharge schemes for address decoder," IEEE Trans. On Circuits & Systems, vol. 55, no. 9, pp. 917-621, Sept. 2008.
- [4] Jan M. Rabaey, Anantha Chandrakasan, and Borivoje Nikolic, "Digital Integrated Circuits a Design Perspective," PHI Learning, 2009.
- [5] Bharadwaj S. Amrutur, "Design And Analysis Of Fast Low Power Srams" P.H.D Thesis, Stanford University, 1999.
- [6] A. Pavlov, M. Sachdv "CMOS SRAM Circuit Design and Parametric Test in Nano-Scaled Technologies": Springer publication.
- Kevin Zhang, "Embedded Memories for Nano-Scale VLSIs" [7] Springer publication.