Skip navigation

Due to my new role, I have been thinking a lot about how modern network architecture has evolved over the years.  When I first entered networking it was the “cool” thing that we did so that we could play Doom in our fraternity house with a bunch of other guys.  That was the old 10Base-2 connectors.  I quickly graduated to a real job shortly after pulling and terminating cables and in some cases repairing the old baseband systems that we can thank IBM for in schools.  But really since the standardization of cable around Cat5 or Cat6 – haven’t we come a long way?  Yes I realize that things can interfere with cable such as electrical current, but for the most part if you have well terminated copper cable you can run at a minimum 100M without a problem.

This got me thinking to why were networks ever built to plan for dropped packets.  Did it go back to the original ARPAnet design that we wanted to be able to survive a nuclear blast?   Yes we all know that this rumor is not true, but in fact part of ARPAnet was to be able to work across an unreliable medium.  Is this why protocols were built on top to compensate for  this problem with reliable delivery?  Does that same problem with a reliable medium still exist today?  If not, how would we change our network architectures?  These questions plagued me for the last month or so and hence I decided to do some research.

Here are a couple thoughts:

Dropped Packets Suck in the Data Center!

Whether it is from oversubscription or a microburst in the network, dropped packets in large suck.  This problem is only going to get worse as we increase the speeds of the networks.  Are you aware that on a 10Gb Ethernet link, that a single second of dropped packets will spew ~15M packets onto the datacenter floor.  Worse yet is that we are in an era of now 40Gb ethernet and soon to be 100Gb Ethernet.  Yes that math is correct that a single second of impediment on 100Gb ethernet would throw away 150 million packets (assuming the smallest frame size).  I don’t care what application you are using if you lose 150M packets, the user experience is going to be bad.

This makes me ask the question, why would you ever build a datacenter network that is massively oversubscribed?  You could suggest that you are playing the economics game…explain that to your CIO when the #1 revenue producing application is down.  You could also hypothesize that it is only oversubscription if everyone talks at the same time…which I understand.  The premise that I am suggesting is that we should never build a network that works “in a perfect world.”  Believe me I have seen some of the finest pieces of powerpoint engineering that in practice fail worse than Ben Affleck and Gigli did in 2003.

Dropped packets decrease the user experience!

In a world that demands instant customer satisfaction from our applications, we can’t afford to drop packets in the data center.  Akamai did a study in 2009 that says the new average for loading a page was 2 seconds.  Worse yet 40% of the users at that time said they would abandon a page that took longer than 3 seconds to load.  Gartner claims that by 2014, 80% of the datacenter traffic will be east/west.  I once again ask the question, if this is true, why would you build a data center network that is highly oversubscribed for intra data center traffic?

I was responsible for a legacy application in my previous life that made 1200 one way trips to put away data in a database.  Can you imagine the impact that dropped packets would have on this performance?  To make matters worse, the application had to perform flawlessly to be any where near a 5 second threshold that we had set for the entire application (of which our piece was a smaller portion).

Anyone that is working in the modern enterprise with any type of legacy applications and a service chain that can be a mile long knows exactly what I am talking about.  The truth of the matter is that the business applications struggle to meet the SLA metrics when everything works perfect…so again, why build a data center network that is highly oversubscribed and likely to drop packets?

There is good news!

The good news is that with standards and modern advances in silicon and signaling, we don’t have to live in the dark ages of data center network architecture.  We can in fact build a completely non-blocking architecture that scales to 000’s of physical servers and even more virtual servers.  As I dive back into the networking world, I am impressed with the advances that have been made in the hardware capabilities.  I am suggesting that the architectures need to evolve to take better advantage of these hardware capabilities.  Certainly the differentiation will come in the software space and the integration of this software into the business and data systems.  You can call it what you would like – a fabric, a two-tier architecture, leaf/spine – it doesn’t matter.  What does matter is that you move with speed needed to stop dropping packets in the data center.

Let’s face it – the application and data teams just expect the network to work.  I will argue that the network is the furthest thing from a commodity in any enterprise.  Just ask any CIO how much work gets done when the network is down.  BUT we have to evolve our architectures to fulfill the current and future business needs…and I am suggesting that we can start by architecting solutions that don’t drop packets even during the highest periods of business need!  The technology, products, and standards are there…remember 150M packets is just the start…what happens as speeds move to the Terabit range?

I come from a long history of being an architect for small organizations up to one of the largest enterprise organizations in the United States.  During my time I have heard many people say “in architecture it isn’t about picking the right answer, it is about picking which right answer.”  At first I was a bit taken back by this premise, but I later came to realize that all throughout my career as an engineer, my life was pretty binary, it was either right or wrong, 1 or 0, and all the rest didn’t matter.  It was after working for a very large organization with lots of smart people, that you realize that often there are many right answers and it depends on the perspective that you come from as to whether YOU think it is the right answer.

I studied Psychology while I was attending Illinois Wesleyan University.  I very much enjoyed the behavioral elements of Psych.  I never in a million years thought that it would apply to by career in IT.  What I found was that many of the storied careers of my peer architects came with a set of standards that they believed in.  Throughout their entire career, they were being behaviorally modified to thinking that their answer was the right answer.  Sort of like the rat in a cage figuring out how to press the bar to get food…they had to win the battle to get the reward.

In closing on my first blog post (they will get better) – I would challenge you to think not just about your answer in the future, but think about the person across the table and how they arrived at their answer.  It was in that time that I have forged more bonds of friendship, but also got more done as an organization.  So in other words – pick which answer is best, but don’t assume that yours is better!