An Introduction to InfiniBand

Date:
2013-06-19 10:05:54
   Author:
10Gtek
  
Tag:

 

 An Introduction to InfiniBand
 
The typical computing facility of today is built from several distinct and key resources, including switches, servers, storage units and the user base of workstations. In addition, there are a variety of ways to interconnect between these different resources. For example, the local area network (LAN) is well known as an Ethernet structure, generally involving copper and fiber cable to support Fast or Gigabit Ethernet as the connection between user work stations and the switch resource located in the data center. Other interconnect approaches are often used between other resources, such as Fibre Channel between switching and storage.
 
Most interconnects between servers and from the server farm to the switch currently are accomplished by Ethernet, often Gigabit Ethernet but occasionally 10 Gigabit Ethernet as well. The June 2006 publication  of the IEEE 802.3an standard anticipates wider adoption of 10GBASE-T Ethernet over copper. However, the silicon is currently expensive, consumes relatively high power per switch port, and is not widely available in production quantities. Also, current implementations of Ethernet as a server interconnect have latencies (i.e., the length of time it takes to get a message from one processor to the next) that are higher than some other interconnect solutions, and the Ethernet architecture is not readily scalable to speeds beyond 10 Gigabits/second (Gb/s). These factors converge to open up a window of opportunity for an alternative interconnect option between servers and switches.
 
One option that is receiving widespread interest in the computing industry is InfiniBand, a scalable, high- speed interconnect architecture that has extremely low latency. These features have helped InfiniBand capture a market niche over the past several years as a high-speed connectivity solution between servers and switches that can be implemented alongside Ethernet and Fibre Channel in academic, scientific, and financial high-performance computing (HPC) environments. Now, with the introduction of virtualized LANs and computing clusters in enterprise environments to run increasingly sophisticated and bandwidth- hungry applications, InfiniBand is positioned for wider deployment.
 
10Gtek offers this Technical Article to brief describes the features of InfiniBand technology, and discusses current and projected trends regarding its adoption in the marketplace.
 
Where is InfiniBand Currently Deployed?
InfiniBand enjoys the advantage of broad industry support via the InfiniBand Trade Association (IBTA) and its business partners. Some leading InfiniBand switch equipment vendors include Cisco, Mellanox, SilverStorm Technologies, and Voltaire. In addition, pre-configured InfiniBand clusters are available from the main server vendors including IBM, HP, Dell, and Sun.
 
InfiniBand’s strongest niche is high performance computing clusters in any application requiring parallel processing and maximal performance. Clusters network many servers together to result in a scalable, modular environment that increases the overall availability and efficiency of the computing infrastructure. The cluster configuration has many advantages, such as using off the shelf processing equipment to realize cost reductions. Also, the cluster lends itself readily to scaling by adding server nodes, switch  ports, and interconnect technologies such as InfiniBand. Enterprises deploy clustered computing architectures to run advanced applications such as financial and atmospheric modeling, and finite element analyses of complex three dimensional shapes (for example, automobiles in wind tunnels).
 
InfiniBand also is increasingly used alongside virtualization technologies. Virtualization software coordinates server resources to maximize processing capability and server/storage efficiency: lightly used servers assist others as needed to complete intense processing jobs instead of delaying those tasks until resources become available. In this way, virtualization allows for server consolidation – a smaller number of servers will perform a better job since they are utilized more efficiently. Different virtualization solutions such as Xen or Cisco VFrame can be used with InfiniBand to provide direct access between the virtualized server environments and I/O, which significantly improves I/O performance and lowers processing overhead.
 
Finally, InfiniBand has found some application in network-attached storage (NAS), storage area networks (SANs), and clustered storage systems. The storage market, especially data warehousing and disaster recovery, is growing quickly in response to federal regulations governing records retention such as Sarbanes Oxley, the Health Insurance Portability and Accountability Act (HIPAA), and the Federal Rules of Civil Procedure (FRCP). Fibre Channel is the incumbent interconnect for SANs, although alternatives such as InfiniBand potentially offer price and performance benefits.
 
How Does InfiniBand Work?
Interconnect technologies such as InfiniBand work by allowing devices to communicate and work together as “nodes” in a computing “fabric” or “switch fabric” that otherwise would be located and operated separately (for example, between processors, as well as between processors and networking devices). A series of two or more switches often is used in the fabric to incorporate redundancy for improved  reliability. As more switches are added, the aggregated bandwidth of the fabric increases and multiple routes are provided for communications between any two nodes.
 
InfiniBand uses Remote Direct Memory Addressing (RDMA) technology when sending messages between processor nodes. Here, the processor I/O memory is bypassed by directly writing to or reading data from the processor user memory. This eliminates the same data from being transferred between different memory areas, so increases the time the processor is performing useful calculations and significantly reduces the latency. InfiniBand is very useful as a computing cluster interconnect, as tightly coupled cluster applications require the low latencies for optimum application performance.
 
Like other interconnects, InfiniBand uses special interfaces in switch fabric nodes to send and receive information, typically cards that can be used in the PCI Express slots of the equipment, in order. The type of interface card depends on the type of node used. InfiniBand components require one of two kinds of cards to interface with other network components:
 
• A Host Channel Adapter (HCA) is an interface card that is associated with the servers in the network. It is typically installed into a server and communicates with the server memory and processor and the switch fabric. This allows RDMA and is responsible for the low latency seen in InfiniBand installations.
 
• A Target Channel Adapter (TCA) is an interface card that is used with network devices that are not processors. The TCA includes an I/O controller that is specific to that device’s protocol (e.g., Fibre Channel or Ethernet), and they can communicate with either an HCA or the switch fabric. TCAs are located near storage, peripherals, or an I/O device network to provide gateways from the fabric to Ethernet or Fibre Channel networks.
 
Figure 1 shows a schematic of a typical InfiniBand-based computing facility, including main resources and channel adapters.
Figure 1: Example network architecture with InfiniBand fabric
 
InfiniBand can work both with conventional switches as a pure server interconnect, and with multifabric server switches to combine the server interconnect function with Ethernet and Fibre Channel gateways. With a multifabric switch, InfiniBand architecture is used to connect servers to the switch fabric, Fibre Channel is used to interconnect from the switch to the storage units, and Ethernet is used to connect from the switch to the user base through more traditional switches and to the local area network (see Figure 2). Multifabric switches maximize the potential of InfiniBand by allowing an entire fabric of servers to share virtualized pools of I/O and storage resources connected through Ethernet or Fibre Channel switches.
 
 
Figure 2: Example network design using InfiniBand multifabric server switches
 
Benefits of InfiniBand
Open Standard. In 1999, seven industry leaders (Compaq, Dell, HP, IBM, Intel, Microsoft and Sun) formed the InfiniBand Trade Association (IBTA) to design an open standard for moving high volumes of data between processors and I/O devices. Today’s membership has expanded to more than 30 companies. This open standard has helped InfiniBand compete with proprietary interconnect technologies in the cluster computing market due to the cost reductions and economies of scale that are achievable with widespread vendor support.
 
High Speed and Scalability. The IBTA standard supports Single Data Rate (SDR) signaling over both copper and fiber optic media at a basic rate of 2.5 Gb/s, termed 1X (also referred to as a lane). An InfiniBand standard 4X cable supports 10 Gb/s, and a 12X cable supports an upper limit of 30 Gb/s. Double Data Rate (DDR) and Quad Data Rate (QDR) signaling over InfiniBand cabling permit single lanes to be scaled up to 5 Gb/s and 10 Gb/s, respectively, for a potential maximum data rate of 120 Gb/s over 12X cables.
 
This scalability gives InfiniBand a significant advantage over Ethernet and other proprietary interconnects as a high-speed solution to link servers and switches. Enterprise applications increasingly require more bandwidth in the data center than Gigabit Ethernet can provide, and 10GBASE-T solutions currently in development are projected to be 1-2 years from being widely available in quantity and cost-competitive with other interconnect technologies. The flexibility for InfiniBand to scale up to a projected 120 Gb/s may widen that window of opportunity.
 
Low Latency. Latency is a measure of signaling delay, and is the time taken for a data packet to move fully from one computer node to another node, such as server to server. Latency is particularly important in clustered computing applications, as it influences overall performance by defining how fast an application can get the data it needs. Latency also sets a boundary on the overall size and scalability of a processing cluster by controlling the number of messages that can be transported over the interconnect.
 
By incorporating RDMA technology, InfiniBand typically achieves a low latency of 3-5 microseconds, with some manufacturers claiming latencies as low as 1-2 microseconds. In contrast, Ethernet latencies typically range from 20-80 microseconds, as the data packet must be stripped down and re-assembled each time it is sent across a connection. Applications exist within the Ethernet sector to reduce latency. For example, Transport Off-Load Engine (TOE) electronics can reduce processor involvement when disassembling and re-assembling data packets. However, associated circuit board complexity, cost, and real estate is needed to achieve latency results close to those obtained via InfiniBand, and currently the maximum data rate over Ethernet is 10 Gb/s.
 
Over conventional LANs and other areas of an enterprise network, Ethernet latencies do not impede network performance. However, some server cluster applications (such as financial trading data centers) need very low latencies in order to reduce the compute time associated with managing high volumes of tightly coupled processing requests, and therefore may be better suited to InfiniBand solutions.
 
Low Power Consumption. Power consumption is a key concern of every network manager. The drive for ever increasing processor speeds increases the rate of power use by CPUs, which generates increasing amounts of heat. Typical heat management strategies are to optimize ventilation and to increase the capacity of the data center cooling system. These strategies address heat management but do not target reducing the amount of heat generated in the processor; furthermore, increased cooling only increases  the energy bill.
 
InfiniBand technologies help to reduce power consumption (and associated cooling needs) as they require less power than currently available 10 Gb/s Ethernet technologies. First-generation 10 Gb/s Ethernet server adapter cards are starting to be introduced to the market place, but are limited by high power consumption. For example, 10GBASE-SR adapter cards have a power consumption of 10-15 W, and a recently introduced 10GBASE-T card featuring RDMA and TOE draws slightly less than 25 W (note that this power is the limitation of the power available from the PCI Express card within the server). The 10GBASE-T power consumption drops to about 15 W if RDMA/TOE features are excluded, but the solution then loses competitiveness against the latency characteristics of InfiniBand.
 
By contrast, a typical 10 or 20 Gb/s InfiniBand adapter card for copper draws between 3 and 4 W, and a 10 Gb/s InfiniBand transceiver for fiber draws about 1 W. The impact of this reduced power draw is considerable in today’s clustered data center architectures where the number of processor nodes is typically a few hundred, and power savings add up even faster in large supercomputing sites that can easily go to several thousand nodes. In this way, InfiniBand technology promotes a lower total cost of operation and greener data center facility.
 
InfiniBand Cabling Considerations
Characteristics of InfiniBand Copper and Fiber Cabling. The InfiniBand open standard includes requirements for both fiber and copper cable assemblies. Standard InfiniBand 4X copper cables are comprised of eight concurrent runs (or four lanes) of 100 Ohm impedance Twinax that transmit at 2.5 Gb/s per lane to allow an overall speed of 10 Gb/s. The assembly operates in dual simplex (simultaneous bi- directional) mode, where one send and one receive run in each lane support data independently. The typical length of passive InfiniBand copper cables is up to 15-20 meters, and can be significantly  increased (to 30-40 m) with the use of active circuitry located within the connector. Typical copper InfiniBand cables and connectors are shown in Figure 4.
 
High-speed copper cables used in InfiniBand applications tend to be thicker and shorter than cables commonly used in Ethernet applications due to a number of factors, including signal attenuation effects, that are magnified at higher data rates.  Signal attenuation increases as cable length increases, and often a lower gauge wire (i.e., thicker conductors) must be used to partially compensate for the higher attenuation, resulting in the higher overall cable thickness and bend radius characteristics of InfiniBand copper assemblies. Some copper cables manufactured for InfiniBand use very thin gauge conductors (e.g., 30-32 AWG) and feature a smaller bend radius for improved cable management, but the associated higher attenuation shortens the reach of these cables accordingly.
 
The standard InfiniBand 4X fiber cable is a 12-core ribbon that is terminated with a multi-fiber array connector (see Figure 4), and 50-micron fiber cables have a reach up to 300 m. The first four fiber strands are used as the transmit path, the center four are unused, and the last four are used as the receive path. Each strand carries 2.5 Gb/s for an aggregate full-duplex bandwidth of up to 10 Gb/s.  Pluggable parallel
 
InfiniBand optic transceivers perform electrical-to-optical and optical-to-electrical conversion of the data stream. Transceiver modules are based on 850-nm vertical cavity surface emitting laser (VCSEL) technology, and feature 4-transmit and 4-receive channels in one package.
 
In practice, a balance of reach and cost considerations have made copper the preferred medium for InfiniBand cabling deployments. The distance between switches and servers in cluster applications rarely exceeds 15 meters, so the reach benefits of fiber commonly are not required; also, the costs for copper and fiber InfiniBand cabling assemblies are about equal. However, the cost of copper InfiniBand electronics and adapter cards traditionally has been less than the cost for fiber electronics. This price difference is due in large part to the economies of scale available via built-in copper-based switch ports.
 
4X Copper Straight Latch   12X Copper Straight Latch Multi-Fiber Array Connector
 
Figure 4: Example InfiniBand cables and connectors (© GORE [l], Associated Enterprises / Meritec [m], and US Conec [r])
 
Cable Deployment and Clustered Computing System Design. InfiniBand cabling deployments have a decided impact on server cluster layout and size. The operation of a switch fabric requires that each port be able to access any other port on the fabric, and a number of switching combinations have been devised to handle that requirement. In small to medium sized clusters, several 24-port switches may connect to server nodes; in large clusters, there may be two tiers of switches. Other InfiniBand switches are available with more than 96 ports, and these single-tier, high port count switches offer an attractive option to tiered switch designs.
 
 
Cable management becomes a central issue with that many ports, as the large diameter, short reach, and large bend radius of InfiniBand copper cables can limit the space available for other network components and cabling. For example, if the cables are positioned too tightly together within the rack, they can severely impact airflow through the racks or cabinets containing the equipment which affects switch reliability and facility uptime. Also, high transmission rates are generally less tolerant to poor cable installation practices, so it is crucial that proper bend radius of InfiniBand copper cables be maintained in order to optimize system performance.
 
Another point of concern to network managers is that as data rates increase, the maximum allowable length of copper cable is reduced, which can limit the number of nodes available for other network components and cabling. As an illustrative example (actual figures depend upon many factors), an InfiniBand-based cluster might be configured with a few thousand nodes using 15-meter InfiniBand 4X
 
copper assemblies and a maximum data rate of SDR. If this same cluster was comprised of InfiniBand copper cables supporting DDR, the copper assembly length would be reduced to 7-8 meters, and the reduced distance available between servers and switches would permit only several hundred nodes in the cluster. In this case, unconventional rack and cabinet layouts (for example, in a “U” shape) potentially can increase the number of nodes interconnected using shorter length cables.
 
In order to extend copper cable lengths in the face of higher data speeds, the use of active InfiniBand cable assemblies is anticipated in which electronic circuitry incorporated into the connector amplifies transmitted data and/or regenerates received data. Also, the decreasing length of passive copper assemblies for higher data speeds may mean that other techniques will be adopted in cluster computing designs. For example, an increased use of blade servers would take advantage of their compact form factor to reduce the overall length of cables used; however, higher server densities would need to be balanced against heat dissipation considerations.
 
Use of InfiniBand fiber assemblies can alleviate some of the challenges associated with managing copper cabling. The small form factor and low bend radius of flat 12-fiber ribbons would help maximize cabinet space, and the longer reach (300 m) of fiber assemblies enables active equipment to be spaced farther apart to facilitate airflow in high-density cluster environments. These advantages are balanced by budgetary considerations, as the higher cost of InfiniBand fiber transceivers can significantly increase the total cost of the cluster deployment.
 
Outlook
InfiniBand is gaining market share among enterprise data center and clustered computing environments as a cost-competitive, high-performance interconnect solution between servers, server farms, and switches.
 
InfiniBand carries price advantages over proprietary interconnects due to its open standard and wide support by active equipment vendors, as well as performance advantages due to its scalability. The higher throughput and lower latencies achieved by InfiniBand offer advantages over 10 Gb/s Ethernet technologies. Further, InfiniBand adapter cards require less power than 10 Gb/s Ethernet cards, which  has a significant impact on power and cooling costs of computing clusters.
 
Careful management of InfiniBand copper cables is required to ensure high-speed performance and to manage the heat that can be generated in high-density computing clusters. Overall, in the absence of a widely available 10 Gb/s Ethernet interconnect solution, enterprises can take advantage of the speed and scalability of InfiniBand to handle high-bandwidth business critical applications.
 
 
Cost and Performance Comparison of Selected 10 Gb/s Interconnect Options
 
 
Interconnect
Scalable
Past
10 Gb/s?
 
Max. Reach (m)
 
Latency (µs)
Switch Port
 
Power (W)
 
Availability
 
Cost4
InfiniBand
Copper
Yes, up to 30 Gb/s
15-203
 
~1-5
~3-4
High
$$
Fiber
300
~1
Medium
$$$
Ethernet
Copper
No
100
 
~20-80
~15
Low
$$$$
Fiber
300
~1-2
High
$$$
 
 
 
 
 
 
 
 
 
 
 
1. Assumes 12X cabling at SDR; can potentially scale to 60 and 120 Gb/s using DDR and QDR.
2. At time of writing, IEEE High Speed Study Group was evaluating 100 Gb/s copper and fiber PMDs.
3. Active InfiniBand copper assemblies can reach 30-40 m.
4. Assumes sum of NIC plus switch/PMDs.