# Parametric Performance Analysis of Synchronous and Asynchronous Heterogeneous

# Network on Chip

Ayas Kanta Swain Department of Electronics and Communication Engineering, National Institute of Technology, Rourkela,Odisha Email:swaina@nitrkl.ac.in Anil Kumar Rajput Department of Electronics and Communication Engineering, National Institute of Technology, Rourkela,Odisha Email: rajputanilkumar@gmail.com Kamalakanta Mahapatra Department of Electronics and Communication Engineering, National Institute of Technology, Rourkela,Odisha Email: kkm@nitrkl.ac.in

Abstract—This paper presents a comparison of throughput and end-to-end latency of synchronous and asynchronous heterogeneous NoC under uniform and exponential traffic conditions using different parameters. The parameters we have chosen are no. of cores, load (traffic) and no. of VCs of a router. Further, sink bandwidth analysis of synchronous and asynchronous NoC under uniform traffic was studied and compared. The experimental results show that asynchronous NoC offers more bandwidth, high throughput and low latency than the synchronous NoC for a given no. of VCs and cores.

Keywords—Route,Traffic,HeterogeneousNoC,Latency, Throughput.

## I. INTRODUCTION

The advancement in semiconductor technology, submicron technology node and requirement of high performance computation-intensive applications such as mobile and satellite application, enable the integration of computing resources such as CPU, DSP, Intellectual Property (IP) Cores and peripherals etc. into a single chip, termed as multiprocessor System-on-Chip(MPSoC). An effective way of communication between these cores are essential which will enhance scalability, higher bandwidth, better modularity along with an increase in performance of the system to meet the required computational task[1].

Network-on-Chip (NoC) has emerged as communication fabric in many core chip and stack replacing traditional buses and crossbar [2]. The basic building blocks of NoC are routers, cores and network interface (NI). Using graph theory NoCs consists of various nodes and links. Router at every node is connected to the neighbour node via on-chip local wiring called interconnect (links) that allows multiplexing of multiple communication between cores over this interconnect to provide higher bandwidth and better scalability.

To ease the burden of designer and simplify the design majority, NoC design supports the homogeneous structure i.e. the traffic and timing requirements between cores are known at the design time. But in practice the traffic is distributed unevenly across the chip, between cores, external memories, NIs, and a computation unit. A difference in module-to-module bandwidth and delay can be observed. These necessitate heterogeneity requirement in NoC design [3].

Figure 1. shows an example of heterogeneous traffic load, where the higher load can be observed at the link placed

in the center of the network than the link available in the periphery. By adding varying link capacity and no. of virtual channel (VC) for each unidirectional port in a router ,the heterogeneous characteristics can be added to a network which increases the performance of the system with the reduction in area and power of the system as compared to the homogeneous network[4][5].



Figure 1. Example of Heterogeneous Traffic Loads

Router is the basic building block of NoC design. Router transfers data through packets. Packets consist of header flits, data flits, tail flits. Header flits contain information regarding destination node for packet transmission. According to the transfer of packets, the NoCs are classified into Synchronous router based NoC and Asynchronous router based NoC. The efficiency of NoC depends on the bandwidth and scalability of the system and how the NoC spread traffic to support the bandwidth requirement [6].

This paper explores the performance analysis of heterogeneous NoC architecture for different parameters. As the router plays an important role in transmitting the packets between various nodes through links, hence the parameter taken for performance analysis is router type and no. of VCs.

The organizations of rest of the papers are as follows: Section II presents the basic background of NoC. Section III discusses the experimental set-up and parameters. Section IV gives the performance analysis. A comparative analysis of the synchronous and asynchronous NoCs is presented in section V. Finally, sections VI give the conclusion of this paper

## II. BACKGROUND

A NoC is built from two modules: Routers and Network Interfaces. Routers are built of as a collection of connected ports. There are three research areas for NoC design i.e. topology, switching techniques, routing algorithm.

Topology is the interconnection pattern of resources in NoC. There are various topologies available i.e. mesh, torus, tree, spidergon etc. But several researchers have suggested that 2-D mesh architecture for NOC will be more efficient in terms of latency, power consumption and ease of implementation, as compared to other topology views[7].

Routing in NOC determines the path that each packet follows between source and destination pair. There are some properties of routing algorithms which are essentially required for interconnection networks i.e. connectivity, adaptivity, Deadlock and livelock freedom, fault tolerance. Connectivity is the ability to route packets from any source node to any destination node. Adaptivity is the ability to route packets through alternative paths in the presence of contention or faulty components. Deadlock freedom is the ability to guarantee that packets will not block or wander across the network forever.

The most commonly used routing algorithms are routing algorithms i.e. XY, OE, and DyAD. The Performance metrics shows that OE routing algorithm is better routing algorithm than XY routing algorithm and DyAD routing algorithm is better than both XY and OE routing algorithm in performance aspects i.e. latency, throughput, and total network power[8].

Wormhole switching technique suits best for the NoC design. An efficient router design is also an emerging research area in NoC design. According to the transfer of packets, the NoCs are classified into Synchronous router based NoC and Asynchronous router based NoC.

#### A. Synchronous NoC Router Architecture

The basic NoC router consists of three main components including Buffer, Routing unit, and Crossbar switch. Buffer component and routing process is sensitive to rising and falling edges of the clock.

Routing unit component would compute the route to the destination based on header flit and it should be aware of upcoming trailer flit. It would make the Routing unit cease granting the selected output switch to transfer any more flits after the trailer flit has passed. The Buffer is implemented as a circular queue to optimize applying buffer efficiently and it consists of a control register indicating the empty and full flag to replace the old flits by the new ones before transmission.

An arbitration unit locks the dedicated output channel until the end of packet transmission and it is useful for solving contention. Routing unit grants the requested input port and enables the selected output multiplexer to make them connected once the routing process has been accomplished successfully. Such a grant and activation signals are disabled as the trailer flit of a packets are transmitted to its desired output port.

A Crossbar switch connects the input and output ports of the router. The control signals of the crossbar switch have given from the output of the arbiter. By implementing the wormhole switching, all of the remaining flits of a packet follow the header flit in a pipeline manner. They are blocked if their header is blocked on the way toward the destination. Once a packet transmission has finished, the switch is unlocked to serve other input channels.

#### B. Asynchronous NoC Router Architecture

The asynchronous router like the synchronous router consists of three main modules which are Input buffer, Crossbar switch, and Routing unit.

Asynchronous NoC router does not contain clock instead all the data transmissions are put into action by the help of handshaking signals. A four-phased handshake protocol has been employed:

- 1) Wait for input to become valid.
- 2) Acknowledge the sender the transmission has been accomplished.
- 3) Wait for inputs to become neutral.
- 4) Make the acknowledge signal low.

A send activity contains four subsequent phases:

- 1) Send a valid output.
- 2) Wait for acknowledge.
- 3) Make the output neutral.
- 4) Wait for acknowledge to lower output.

The remaining operation of asynchronous NoC is same as synchronous NoC in terms buffering, arbitration and switching [9].

#### III. EXPERIMENTAL SET-UP

To perform the simulation, we use a 3x3 and 4x4 mesh topology. Heterogeneous NoCs (HNOCs) [10][11] is used as the simulation environment to evaluate the performance of the NoCs under observation. HNOCs is an open source simulator based on OMNet++. OMNet++ is an event driven simulation engine that provides C++ APIs that can be used to describe, configure, model topology, collect simulation data and performance analysis. It is the only simulator that supports heterogeneous NoC with variable link capacities and the number of VCs per each port [12].

The simulation is done by choosing XY routing algorithm and wormhole switching technique. Performance analysis is done for synchronous and asynchronous heterogeneous NoC in term of End-to-End latency, throughput, loss probability,and sink bandwidth. Here link data rate is 16 Gbps, packet size is 32-bit, and frequency of operation at 500 MHz.

### IV. PERFORMANCE ANALYSIS

The performance analysis of NoC is carried out by finding out the circumstances under which NoC will offer higher speed along with a wider bandwidth. Hence an analysis is necessary between Latency and Throughput vs. offered load conditions. The performance analysis of two heterogeneous NoCs was performed by considering the following parameters:



Figure 2. (a)-(b) End-to-End Latency vs. offered Load under Uniform Traffic; (c)-(d) Throughput vs. offered Load under Uniform Traffic.



Figure 3. (a)Sink Bandwidth vs. different Load under Uniform Traffic at VC2 (b) Sink bandwidth vs. different Load under Uniform Traffic at VC4

1) End-to-End Latency: Latency is the average delay required to transfer packets from source to destination. The Endto-End Latency is given by the maximum latency for a pair of source-destination nodes at the farthest distance in a network.

2) *Throughput:* It is defined as the rate at which network can successfully accept and deliver the injected packet. Saturation throughput occurs when injected packets lost or discarded.

3) Sink Bandwidth: Sink bandwidth define as the rate at which destination sink accepts the packets send by the source node.

4) Loss Probability: It is given by the ratio of the packets lost to the total packets sent by the source nodes in a network. A network with loss probability 0 value suggests that a packet

will never be lost, 100 would imply that all packets will be lost.

### A. Latency and Throughput Analysis under Uniform Traffic

In uniform traffic pattern source nodes send an equal amount of traffic to all destinations in the NoC. A synchronous and asynchronous heterogeneous NoCs have been analyzed under uniform traffic. Figure 2. (a)-(d) present end-to-end latency, throughput analysis of synchronous and asynchronous heterogeneous NoC with different no. of cores and no. of VCs.

From Figure 2. (a)-(d) we observed that asynchronous NoC has low end-to-end latency and high throughput with respect to synchronous NoC. We also observed that when no. of



Figure 4. (a)-(b) End-to-End Latency vs. offered Load under Exponatial Traffic; (c)-(d) Throughput vs. offered Load under Exponential Traffic.



Figure 5. (a)Sink Bandwidth vs. different Load under exponential traffic at VC2 (b) Sink bandwidth vs. different Load Exponential Traffic at VC4

core increases then latency of NoC increase by 10-20% and throughput decreases by 6-10% due to the increment of path delay between sources to the destination. When no. of VCs increases from 2 to 4 then end-to-end latency decreases by 3-5% and throughput increases by 4-8%. This results due to more VCs offering low queuing time for packets.

Figure 3. presents sink bandwidth analysis of synchronous and asynchronous heterogeneous NoC with different no. of cores and VCs under uniform traffic. It shows for higher load asynchronous NoC offers more sink bandwidth than synchronous NoC.

# B. Latency and Throughput Analysis under Exponential Traffic

Exponential traffic is on/off type of traffic in which during On period packets are generated at constant rate and during OFF period no traffic is generated. The same analysis has been done for exponential traffic for both synchronous and asynchronous heterogeneous NoC. Figure 4. (a)-(d) show endto-end latency and throughput analysis at different no. of cores and no. of VCs.

From Figure 4. (a)-(d) we observed that when no. of cores increase then end-to-end latency increases by 10-18% And throughput decrease by 8-13%, when no. of VCs increase from 2 to 4 then end to end latency decrease by 10-15% and throughput increases by 11-18%. Figure 5. presents sink bandwidth

analysis of synchronous and asynchronous heterogeneous NoC with different no. of cores and VCs under exponential traffic.



Figure 6. Event log viewer snapshot of Synchronous Router



Figure 7. Event log viewer snapshot of Asynchronous Router

TABLE I. Saturation Throughput at different no. of VCs under Uniform Traffic

| No.<br>of VCs | Saturation Throughput(Gbps) |         |          |         |  |  |  |
|---------------|-----------------------------|---------|----------|---------|--|--|--|
|               | 4x4Async                    | 4x4Sync | 3x3Async | 3x3Sync |  |  |  |
| VC=2          | .80                         | .70     | .89      | .88     |  |  |  |
| VC=4          | .85                         | .79     | .91      | .90     |  |  |  |

TABLE II. Loss Probability at different no. of VCs and different offered Load under Uniform Traffic

| Load   | Loss Probability(%) |       |       |       |       |       |       |       |  |  |
|--------|---------------------|-------|-------|-------|-------|-------|-------|-------|--|--|
| (Gbps) | VC=2                |       |       |       | VC=4  |       |       |       |  |  |
|        | 4x4                 | 4x4   | 3x3   | 3x3   | 4x4   | 4x4   | 3x3   | 3x3   |  |  |
|        | Async               | Sync  | Async | Sync  | Async | Sync  | Async | Sync  |  |  |
| .80    | 0.62                | 0.84  | 0     | 0     | 0     | 0.12  | 0     | 0     |  |  |
| .90    | 2.76                | 0.84  | 0     | 0.04  | 0.95  | 1.17  | 0     | 0     |  |  |
| 1.00   | 6.05                | 11.82 | 0.13  | 2.17  | 3.57  | 5.25  | 0.53  | 0.88  |  |  |
| 1.50   | 27.48               | 32.97 | 16.53 | 24.45 | 24.70 | 27.73 | 14.83 | 21.49 |  |  |
| 2.00   | 51.33               | 57.27 | 44.6  | 51.57 | 49.01 | 53.25 | 45.75 | 48.88 |  |  |

#### V. COMPARATIVE ANALYSIS OF NOC

The comparative analysis of synchronous and asynchronous heterogeneous NoC is done on the basis of the table developed from loss probability and saturation throughput under uniform traffic. Table.I present saturation throughput analysis of synchronous and asynchronous heterogeneous NoC Under uniform traffic. In this table 3x3 asynchronous NoC with VC= 4 has highest saturation throughput of 0.91 Gbps. Table.II presents the loss probability analysis of synchronous and asynchronous and asynchronous and asynchronous heterogeneous NoC for uniform traffic.

From Table.II we observe that asynchronous NoC with less no. of cores has less loss probability when offered load increases.

Event log graph shown in Figure 6. and Figure 7. demonstrate the working of the synchronous and asynchronous router. The synchronous router takes eight cycles for transmission of two flits which is shown in Figure 6. the asynchronous router takes only six cycles for transmission of two flits. which is shown in Figure 7.

#### VI. CONCLUSION

We observed that asynchronous NoC gives better performance in terms of bandwidth, end-to-end latency, and throughput. Further we observed that as the no. of VCs increases the latency decreases(when VCs increase from 2 to 4 latency decreases by 3-5%) and as the no. of cores increases the latency increases(when compared to 3X3 mesh 4X4 mesh has more latency of 10-20%) and the throughput for a particular router cannot be increased beyond a certain limit. Finally, we can conclude that for lower latency and higher throughput asynchronous NoC with more no. of VCs suits best.

#### REFERENCES

- [1] W. Dally and B. Toweles, "Principles and practices of interconnectionnetworks,"Morgan Kaufmann,2004.
- [2] N.E. Dally and Li-shiuan Peh. On-Chip Network. "Synthesis lecture on computer architecture,"Morgan and Claypool,2009.
- [3] B. Itzhak, I. Cidon, A. Kolodny, M.Shabum and N.Shmual, "heterogeneous NoC Router Architecture,"IEEE Transactions on Parallel and Distributed System, pp.1-14,2013.
- [4] Y. Ben-Izhak, I. Cidon, A. Kolodny, "Optimizing heterogeneous NoC design,"in Proceeding of the International WOrkshop on System Level Interconnect Prediction. ACM., 2012.
- [5] A. Mishra, N.V. Krishnan and C. Das, "A case for hetrogeneous on-chip interconnect for CMPs,"in Proceeding of the 38th annual international symposium on Computer architecture.,pp.389-399,2011.
- [6] S. Abbaan and J.A. lee, "A parametric-based performance evalution and desighn trade-offs for interconnect architecture using FPGA for Network On Chip," Microprocessors and Microsystems, Elsevier Vol.38, pp. 378-398, 2014.
- [7] T. N. K. Reddy, A. K. Swain, J. K. Singh, K. K. Mahapatra, "Performance Assessment of Different Network-on-Chip Topologies,"2nd International Conference on Devices, Circuit and Systems (ICDCS), pp.1-5, 2014.
- [8] J. K. Singh, A. K. Swain, T. N. K. Reddy, K. K. Mahapatra, "Performance Evalution of Different Routing Algorithms In Network-on-Chip,"IEEE Asia Pacific Conference on Postgraduate research in Microeloctronics and Electronics, pp.180-185,2013.
- [9] P. M. Yaghini, A. Eghbal, H. Pedram, H. R. Aramid, "Investigation of transient fault effect in synchronous and asynchronous Network-on-Chip router," Journal of Systems Architecture, Elsevier, Vol. 57 pp.61-68, 2011.
- [10] Y. B. Itzhak, E. Zahavi, I. Cidon and A. Kolodny, "HNOC:Modular Open-Source Simulator for Hetrogeneous Network-on-Chip,"IEEE Conference on Embedded Computer Systems:Architectures, Modeling, and Simulation(SAMOS XII), pp.51-57, 2012
- [11] http://webee.technion.ac.il/matrics/software.html.
- [12] http://www.omnetpp.org/models/catalog.