Skip to main content
Log in

Model and simulation of exascale communication networks

  • Article
  • Published:
Journal of Simulation

Abstract

Exascale supercomputers will have millions or even hundreds of millions of processing cores and the potential for nearly billion-way parallelism. Exascale compute and data storage architectures will be critically dependent on the interconnection network. The most popular interconnection network for current and future supercomputer systems is the torus (eg, k-ary, n-cube). This paper focuses on the modelling and simulation of ultra-large-scale torus networks using Rensselaer's Optimistic Simulator System. We compare real communication delays between our model and the actual torus network from Blue Gene/L using 2048 processors. Our performance experiments demonstrate the ability to simulate million-node to billion-node torus networks. The torus network model for a 16-million-node configuration shows a high degree of strong scaling when going from 1024 cores to 32 768 cores on Blue Gene/L, with a peak event-rate of nearly 5 billion events per second. We also demonstrate the performance of our torus network model configured with 1 billion nodes on both Blue Gene/L and Blue Gene/P systems. The observed best event rate at 128 K cores is 12.36 billion per second on Blue Gene/P.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

References

  • Abu-Libdeh H, Costa P and Rowstron A (2010). Symbiotic routing in future data centers. ACM 2010 Conference on Special Interest Group on Data Communication (SIGCOMM’10) New Delhi, India; ACM, New York, NY, pp 51–62.

  • Adiga NR et al (2005). Blue Gene/L torus interconnection network. IBM Journal of Research & Development 49 (2–3): 265–276.

    Article  Google Scholar 

  • Agarwal A (1991). Limits on interconnection network performance. IEEE Transactions on Parallel and Distributed Systems 2 (4): 398–412.

    Article  Google Scholar 

  • Balaji P, Naik H and Desai N (2009). Understanding network saturation behavior on large-scale Blue Gene/P systems. The Fifteenth International Conference on Parallel and Distributed Systems (ICPADS’09); Shenzhen, China; IEEE Computer Society: Washington, DC, pp 586–593.

  • Bauer DW, Carothers CD and Holder A (2009). Scalable time warp on Blue Gene supercomputers. ACM/IEEE SCS 23rd Workshop on Principles of Advanced and Distributed Simulation (PADS’09) Lake Placid, New York; IEEE Computer Society: Washington, DC, pp 35–44.

  • Bland AS et al (2009). Jaguar: The world's most powerful computer. Compute the Future, CUG 2009 Proceedings. Atlanta, Geogia; Oak Ridge National Laboratory: Oak Ridge, TN.

  • Blumrich M et al (2003). Design and Analysis of the Blue-Gene/L Torus Interconnection Network. Technical Report RC23025 (W0312-022) IBM Thomas J. Watson Research Center, New York.

  • Brown R (1988). Calendar queues: A fast 0(1) priority queue implementation for the simulation event set problem. Communications of the ACM 31 (10): 1220–1227.

    Article  Google Scholar 

  • Budnik T et al (2010). Blue Gene/Q resource management architecture. 3rd IEEE Workshop on Many-task Computing on Grids and Supercomputers (MTAGS10), co-located with IEEE/ACM Supercomputing; IEEE: New Orleans, Louisiana.

  • Carothers CD, Perumalla KS and Fujimoto RM (1999). Efficient optimistic parallel simulations using reverse computation. The ACM Transactions on Modeling and Computer Simulation 9 (3): 224–253.

    Article  Google Scholar 

  • Chen D (2011). The IBM Blue Gene/Q interconnection network and message unit. Proceedings of 2011 International Conference for High-performance Computing, Networking, Storage and Analysis (SC’11) Article 26, 10 pages.

  • Cope J et al (2011). CODES: Enabling co-design of multilayer exascale storage architectures. Proceedings of the Workshop on Emerging Supercomputing Technologies (WEST 2011); Argonne National Lab: Lemont, Illinois.

  • Das A, Gupta I and Motivala A (2002). Swim: Scalable weakly consistent infection-style process group membership protocol. Proceedings of the 2002 International Conference on Dependable Systems and Networks (DSN’02), IEEE Computer Society; Washington, DC, pp 303–312.

  • Guirguis M, Bestavros A and Matta I (2004). Routing tradeoffs inside d-dimensional torus with applicability to CAN. Proceedings of the First International Computer Engineering Conference (ICENCO2004), Cairo, Egypt, December.

  • Hafizur MM, Rahman MG and Horiguchi S (2006). Interprocessor communication performance of a hierarchical torus network under bit-flip traffic patterns. In: Proceedings of the 4th International Conference on Electrical and Computer Engineering (ICECE’06), IEEE; Dhaka, Bangladesh, pp 573–576.

  • Holder A and Carothers CD (2008). Analysis of time warp on a 32 768 processor IBM Blue Gene/L supercomputer. 2008 Proceedings European Modeling and Simulation Symposium, DIP University of Genoa: Campora San Giovanni, Amentea (CS), Italy, pp 284–292.

  • Jefferson DR (1985). Virtual time. ACM Transactions on Programming Languages and Systems 7 (3): 404–425.

    Article  Google Scholar 

  • Liu N and Carothers C (2011). Modeling billion-node torus networks using massively parallel discrete-event simulation. In: Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation (PADS'11), France; IEEE Computer Society: Los Alamitos, CA, pp 1–8.

  • Min G and Ould-Khaoua M (2005). Prediction of communication delay in torus networks under multiple time-scale correlated traffic. Performance Evaluation 60 (1): 255–273.

    Article  Google Scholar 

  • Mora G et al (2006). Towards and efficient switch architecture for high-radix switches. Proceedings of the 2006 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS’06), San Jose, California; ACM: New York, NY, pp 11–20.

  • Rahman MMH and Horiguchi S (2004). High performance hierarchical torus network under matrix transpose traffic patterns. Proceedings of the 7th International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN’04), Hong Kong, China; IEEE Computer Society: Los Alamitos, CA, p 111.

  • Safaei F, Khonsari A, Fathy M and Ould-Khaoua M (2006). Analysis of circuit switching for the torus interconnect networks with hot-spot traffic. Proceedings of the 2006 International Conference Workshops on Parallel Processing (ICPPW’06); pp 142–150.

  • Sancho JC et al (2003). Routing in infiniband™ torus network topologies. Proceedings of the 2003 International Conference on Parallel Processing (ICPP’03), IEEE Computer Society; Los Alamitos, CA, p 509.

  • Shalf J, Kamil S, Oliker L and Skinner D (2005). Analyzing ultra-scale application communication requirements for a reconfigurable hybrid interconnect. Proceedings of the 2005 ACM/IEEE Supercomputing (SC’05), IEEE Computer Society: Washington DC, p 17.

  • Sleator DD and Tarjan RE (1985). Self-adjusting binary search trees. Journal of the ACM 32 (3): 652–686.

    Article  Google Scholar 

  • Thorson GM and Scott SL (1997). Adaptive routing mechanism for torus interconnection network. US Patent 5,701,416.

  • Yaun G, Carothers CD and Kalyanaraman S (2003). Largescale TCP models using optimistic parallel simulation. Proceedings of the Seventeenth Workshop on Parallel and Distributed Simulation (PADS’03), IEEE Computer Society; Washington DC, p 153.

  • Yaun GR et al (2004). Large-scale network simulation techniques: Examples of TCP and OSPF models. SIGCOMM Computer Communications Review 33 (3): 27–41.

    Article  Google Scholar 

  • Zheng G et al (2010). Simulating large scale parallel applications using statistical models for sequential execution blocks. Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS 2010), Shanghai, China, pp 10–15.

Download references

Acknowledgements

This work was supported in part by the Office of Advanced Scientific Computer Research, Office of Science, US Dept. of Energy, under Contracts DE-AC02-06CH11357 and DE-FC02-10ER25989/DE-SC0004875, and in part by the NSF CNS NeTS Program, Contract #0435259. This research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the US Department of Energy under contract DE-AC02-06CH11357. Computing time on Intrepid was provided by a US Department of Energy INCITE award. Rensselaer's Computational Center for Nanotechnology Innovations (CCNI) provided the Blue Gene/L computing resources.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, N., Carothers, C., Cope, J. et al. Model and simulation of exascale communication networks. J Simulation 6, 227–236 (2012). https://doi.org/10.1057/jos.2012.4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1057/jos.2012.4

Keywords

Navigation