Advertising

Showing posts with label Arm Server Chips. Show all posts
Showing posts with label Arm Server Chips. Show all posts

Friday, 22 July 2016

Unknown

New Cavium ThunderX2 adopts 64-bit ARM-based servers to address application and workload requirements

Semiconductor vendor Cavium announced Monday ThunderX2, its second generation of workload optimized ARM server SoCs that targets high performance volume servers deployed by public/private cloud and telecom communications data centers and high performance computing applications. It is optimized for data center workloads such as compute, security, storage, data analytics, network function virtualization and distributed databases.

The ThunderX2 line of processors currently includes four workload optimized processors targeting different workloads.

The ThunderX2_CP has been optimized for cloud compute workloads such as private and public clouds, web serving, web caching, web search, commercial HPC workloads such as computational fluid dynamics (CFD) and reservoir modeling. This line supports multiple 10/25/40/50/100 GbE network Interfaces and PCIe Gen3 interfaces. It also includes accelerators for virtualization and vSwitch offload.

The ThunderX2_ST has been optimized for big data, cloud storage, massively parallel processing (MPP) databases and Data warehousing workloads. This family supports multiple 10/25/40/50/100 GbE network interfaces, PCIe Gen3 interfaces and SATAv3 interfaces. It also includes hardware accelerators for data protection/ integrity/security, user to user efficient data movement.

The ThunderX2_SC has been optimized for secure web front-end, security appliances and cloud RAN type workloads. This family supports multiple 10/25/40/50/100 GbE interfaces and PCIe Gen3 interfaces. Integrated hardware accelerators include Cavium’s industry leading, 5th generation NITROX security technology with acceleration for IPSec, RSA and SSL.

The ThunderX2_NT has been optimized for media servers, scale-out embedded applications and NFV type workloads. This family supports multiple 10/25/40/50/100 GbE interfaces. It also includes OCTEON style hardware accelerators for packet parsing, shaping, lookup, QoS and forwarding.

“The Cavium ThunderX2 will expand the market opportunity for ARM-based server technologies by addressing demanding application and workload requirements for compute, storage networking and security,” said Simon Segars, CEO, ARM. “ThunderX2 demonstrates Cavium’s ability to deliver a combination of innovation and engineering execution and the new product family increases the momentum for server deployments powered by ARM processors in large scale data centers and end user environments.”

Cavium’s ThunderX2 SoC line is supported by a comprehensive software ecosystem ranging from platform level systems management and firmware to commercial operating systems, development environments and applications.

Cavium has actively engaged in server industry standards groups such as UEFI and delivered numerous reference platforms to an array of community and corporate partners. Cavium has also demonstrated its position in the open source software community driving upstream kernel enablement for ThunderX, actively contributing to Linaro’s enterprise and networking groups, investing in Linux Foundation projects such as Xen and OPNFV and sponsoring the FreeBSD Foundation’s ARMv8 server implementation.

ThunderX2 will deliver two to three times the performance across a range of standard benchmarks and applications compared to ThunderX, while boosting the market reach of the ThunderX line of processors by targeting applications that require high single thread performance such as web search, graph analytics, a variety of enterprise applications such as massively parallel processing (MPP) databases, data warehousing and enterprise HPC applications such as computational fluid dynamics (CFD) and reservoir modelling. ThunderX2 will deliver comparable performance at a better total cost of ownership compared to the next generation of traditional server processors.



Read More

Thursday, 21 July 2016

Unknown

Cavium ThunderX block diagrams for its four families

During our work with the Cavium ThunderX platform, we have had access to a single 48-core ARMv8 2.0GHz SKU. Cavium’s strategy is to target different processor models into different market. Aside from clock speed and core count, Cavium has feature-based differentiation for its different ThunderX product families. We received permission to publish information on the different SKUs and their unique features.

The four Cavium ThunderX families (ThunderX_CP, ThunderX_ST, ThunderX_SC, ThunderX_NT) have a similar base architecture. From this common architecture, each chip is tailored to its targeted application. There are a few common underpinnings but this is the basic block diagram that shows all potential components of the platform.
The ThunderX_CP family is targeted at compute workloads. These chips target public and private clouds, web caching and web serving, search, social media and similar applications. The main accelerator in this family is the vSwitch Offload Engine. If you think about the application workload this is targeted at, one will take advantage of the high speed networking, multitude of cores and RAM bandwidth.
With Cavium’s strong networking portfolio as well as 16x SATA III 6.0gbps ports (60% more than a comparable Intel single socket system), storage is an application where the ThunderX architecture is well suited. The ThunderX_ST family targets block, object and distributed file storage workloads, distributed database applications and Hadoop style workloads. It comes in 8-48 core variants and has additional hardware accelerators such as the compression engine available.

Building on the high core count and the high-end networking throughput, the Cavium ThunderX_SC is focused on applications like eCommerce web servers. Cavium includes hardware accelerators for SSL, IPSec, deep packet inspection, anti-virus and anti-malware to free CPU cycles. Cavium has IP that it uses in dedicated Nitrox III chips that it leverages here.
With Cavium’s background as a networking company, and given the specs of the ThunderX chips, one area we think the company will do well in is in the networking and SP cloud space. The ThunderX_NT targets network function virtualization (NFV) servers and the telecom clouds being deployed with a mix of high bandwidth along with hardware accelerators. Like the storage family, the networking focused family can scale from 8-48 cores.

Final Words

We have been working with a single socket Cavium ThunderX machine, the Gigabyte R120-T30 server thus far and the results have been impressive. Cavium is embracing a different approach to product differentiation than Intel. While Intel is generally focused on raw x86 performance, Cavium’s SoC design allows it to differentiate on features. This can include networking, storage accelerators and other vectors. The Cavium ThunderX family is the first ARMv8 chip we have seen, in production, that can legitimately match and outpace parts of the Intel Xeon E5 line in terms of performance.




Read More

Monday, 13 June 2016

Unknown

Cavium's 64-Bit ARM ThunderX2 Packs up to 54 Cores


Cavium unveiled its 64-bit ARM-basedThunderX2 processor for servers in cloud data centers used for workloads such as compute, security, storage, data analytics, network function virtualization (NFV) and distributed databases.

The second generation ARM processor from Cavium, which offers a number of on-board accelerators and advanced capabilities, packs up to 54 cores, enabling it to deliver two to three times the performance across a wide range of standard benchmarks and applications compared to ThunderX. It is built in 14nm FinFET process and is compliant with ARMv8.2 architecture as well as ARM's Server Base System Architecture (SBSA) standard.


Cavium ARM's Server Base System


Key ThunderX2 features will include:

  • 2nd generation of full custom Cavium ARM core: 2.4 to 2.8GHz in normal mode, Up to 3 GHz in Turbo mode; > 2X single thread performance compared to ThunderX.
  • Up to 54 cores per socket delivering 2-3X socket level performance compared to ThunderX.
  • Cache: 40K I-Cache and 64K D-cache, highly associative; 32MB shared Last Level Cache (LLC).
  • Single and dual socket configuration support using 2nd generation of Cavium Coherent Interconnect with > 2.5X coherent bandwidth compared to ThunderX.
  • System Memory: 6 DDR4 memory controllers per socket; Dual DIMM per memory controller, for a total of 12 DIMMs per socket.
  • Full system virtualization for low latency from virtual machine to IO enabled through Cavium virtSOC technology.
  • Integrated 10/25/40/50/100GbE network connectivity.
  • Multiple integrated SATAv3 interfaces.
  • Integrated PCIe Gen3 interfaces, x1, x4, x8 and x16 support.
  • Integrated Hardware Accelerators: OCTEON style packet parsing, shaping, lookup, QoS and forwarding; Virtual Switch (vSwitch) offload; Virtualization, storage and NITROX V security.
Four versions of the ThunderX2 will be offered:
·         ThunderX2_CP:  Optimized for cloud compute workloads such as private and public clouds, web serving, web caching, web search, commercial HPC workloads such as computational fluid dynamics (CFD) and reservoir modeling. This family supports multiple 10/25/40/50/100 GbE network Interfaces and PCIe Gen3 interfaces. It also includes accelerators for virtualization and vSwitch offload.
·         ThunderX2_ST: Optimized for big data, cloud storage, massively parallel processing (MPP) databases and Data warehousing workloads. This family supports multiple 10/25/40/50/100 GbE network interfaces, PCIe Gen3 interfaces and SATAv3 interfaces. It also includes hardware accelerators for data protection/ integrity/security, user to user efficient data movement.
·         ThunderX2_SC:  Optimized for secure web front-end, security appliances and cloud RAN type workloads. This family supports multiple 10/25/40/50/100 GbE interfaces and PCIe Gen3 interfaces. Integrated hardware accelerators include Cavium’s industry leading, 5th generation NITROX security technology with acceleration for IPSec, RSA and SSL.
·         ThunderX2_NT: Optimized for media servers, scale-out embedded applications and NFV type workloads. This family supports multiple 10/25/40/50/100 GbE interfaces. It also includes OCTEON style hardware accelerators for packet parsing, shaping, lookup, QoS and forwarding.
"ThunderX2 combines our next generation core that will deliver significantly higher single thread performance with next generation IO and hardware accelerators to provide a compelling value proposition for the server market and greatly expand the serviceable server TAM," said Syed Ali, President and CEO of Cavium. "ThunderX2 will enable flexible, scalable and fully optimizable servers for next generation software defined data centers."

Read More
Unknown

Exploring A Production Cavium Thunderx Platform With A Gigabyte R120-T30 1U Server



Cavium sent us over a Gigabyte R120-T30 just about three weeks after the production scale 48-core ThunderX ARMprocessor began shipping. We are going to have a much more comprehensive view of performance over the next few weeks and we are seeing much better performance out of this system than we initially expected. In the meantime, we wanted to provide an overview of the platform we received. The impact of the platform cannot be understated. This is production silicon 48 core 64-bit ARMv8 that is the first direct competitor to the Intel Xeon E5-1600 and Xeon E5-2600 lines we have seen since AMD effectively exited the market. After spending a few weeks with the platform, we feel ready to publish an overview. The Cavium ThunderX is a very different approach to a server processor. While we will explore the performance of the 48-core Cavium ThunderX in a series of upcoming pieces, the Gigabyte R120-T30 1U server provides insights into some of the biggest selling points of the ThunderX platform. 

Test Configuration 
In the test configuration supplied by Cavium, we had the following configuration:
  • Server: Gigabyte R120-T30 (4x 3.5″ hot swap bay 1U)
  • Motherboard: Gigabyte MT30-GS0
  • CPU: Cavium ThunderX 48-core ARMv8
  • RAM: 128GB (4x 32GB DDR4 2133MHz RDIMMs) (1TB max capacity using 8x 128GB DIMMs)
  • SSDs: 2x SanDisk CloudSpeed Ultra 800GB SATA
  • PSU: 400w 80 Plus Gold
What this configuration hides is the fact that this system is a lot more advanced than a similar Intel platform from an I/O perspective. The Cavium ThunderX provides significantly more I/O standard than Intel’s Xeon E5 architecture.

Gigabyte R120-T30 1U server with Cavium ThunderX Overview

From the outside, the Gigabyte R120-T30 1U looks much like a standard 1U 4-bay 3.5″ server. What is inside the machine is different than what almost anyone who has purchased a server over the past 7 years has experienced.
1U server with Cavium

Opening the top cover, we see what appears to be a standard form factor motherboard with a large CPU heatsink. The Gigabyte R120-T30 has 5x hot swap fans and a PCIe riser slot as well.

1U Gygabyte server with Cavium

On either side of the CPU we have 4 DIMM slots. Each can take DDR4 RDIMMs to fill the ThunderX’s quad channel memory design with up to 2 DIMMs per channel (DPC.) We had 4x 32GB DDR4 2133MHz RDIMMs installed but we did also add a set of 8x 32GB (256GB) DDR4 RDIMMs just to validate all eight DIMM slots could be populated. The ThunderX supports up to 128GB DDR4 DIMMs so with 8x 128GB we get a total of 1TB of memory in a single CPU platform. Here is what the Gigabyte MT30-GS0 motherboard looks like without the fan shroud and PCIe riser:

SoC for Data Center Servers

While the newest Intel Xeon E5-1600 and E5-2600 V4 systems can utilize 3DPC for 1.5TB of RAM per CPU, 1TB is still an impressive total. For comparison, the Intel Xeon D SoC Intel produces tops out at 128GB (dual channel memory and 2 DPC.) Moving down the Intel stack further the Xeon E3 V5 lineup tops out at a paltry 64GB max. We have verified via the STREAM benchmark that there is significantly more bandwidth than the Intel Xeon D platform, even using faster DDR4-2400 RAM, but not as much as the Intel Xeon E5-2600 V4 platform. More on this in a future piece.

As we removed the CPU heatsink seemed very familiar. We noticed that the heatsink mounting points in the Gigabyte platform were just about a LGA2011 narrow ILM width. We pulled out a LGA2011 narrow ILM heatsink and sure enough, the mounting points lined up. We did not try installing a different heatsink as we were unsure of the pressure specs of the CPU package. Gigabyte did a nice job re-purposing existing designs for this new platform.
Cavium ThunderX SoC
 Underneath the heatsink we see a soldered Cavium ThunderX SoC:


The overall package is not socketed, but it is large compared to a LGA2011 chip. Here is an example as a size comparison:
 Cavium ThunderX UP – arm server SoC
In terms of storage, the Cavium ThunderX has access to 16x SATA ports per CPU.  To put that in perspective, an Intel Xeon E5-1600 V3, Intel Xeon E5-2600 V4 dual socket system, or even the quad socket Xeon E5-4600 series all use the same PCH which provides SATA connectivity (Intel onboard SAS was abandoned after Ivy Bridge.) With Cavium, each processor can give you up to 16 SATA III ports (note this can be lower depending on SKU.) Our 1U platform only has 4 of the 16x SATA III ports wired, but it does mean that a 3U 16-bay chassis will not require an add-on HBA.

1U arm server platform

We did verify that the platform was correctly telling Ubuntu 14.04 LTS that we have a SATA III 6.0gbps connection:

arm servers for cloud cmputing

The SATA ports are arranged in two 8-port sets with 7-pin SATA III connectors. It seems like the dual socket Gigabyte/ Cavium systems are using higher-density connectors.


The motherboard also has two PCIe 3.0 x8 slots. Most add-on cards come in PCIe x1, x4 or x8 form factors so ThunderX supports only up to an x8 PCIe slot.

The networking side is completely unlike what we see from most Intel platforms. Our test unit has a total of 80Gbps worth of networking from the SoC. There is a QSFP+ 40Gb Ethernet port and four SFP+ 10Gb Ethernet ports. Just to give one an idea, this is equivalent to having an Intel Fortville X710-am2 onboard. In card format, on the Intel side that would occupy a PCIe 3.0 x8 slot with an Intel XL710-QDA2 installed.
Cavium ThunderX UP for hpc appliacnes
For those looking at high network bandwidth, this is a truly awesome setup onboard.
Rounding out the list of features, one can see a standard VGA port, serial port, IPMI out-of-band management LAN port and four USB 3.0 connectors (two front two rear.) The second ARM SoC onboard this platform is the AST2400 which is probably the #1 BMC in the industry. It is great to see Gigabyte/ Cavium use the industry standard BMC.

Final Words
If you cannot tell where this is headed, we have some interesting comparisons coming in future pieces. The Intel Xeon D platform we have today still provides lower power and better single threaded performance, but in terms of memory bandwidth, memory capacity, SATA storage expansion, and networking, it is thoroughly out-classed by the Cavium ThunderX. From what we have heard in terms of pricing both list and street, we expect the Cavium ThunderX platforms to be very competitive with the higher-end Xeon D platforms in terms of price.

Shifting to the Intel Xeon E5-1600 V3 and Intel Xeon E5-2600 V4 side, for NVMe storage, those platforms have a clear PCIe lane advantage. Furthermore, we expect areas like Windows / VMware virtualization hosts, HPC servers or those with high single threaded performance needs, higher memory capacities and etc. to stay as Intel strongholds for the near future.

On the other hand, as the software side matures, especially with releases like the recent Ubuntu 16.04 LTS, we are seeing performance in web stack applications get significantly better. As an example, our OpenSSL / RSA testing was showing an ~80% performance increase moving from stock OpenSSL 1.0.1g to 1.0.2g on the ThunderX 48-core part. OpenSSL 1.1 seems like we will again see a solid performance gain. These types of performance increases are unlike what we see on the Intel side and show how rapidly the ARM side is maturing.

We will have more on management features in a coming piece in this series, but we can share the the experience is an absolute pleasure compared to some of the ARM development boards. The AST2400 integration is something we were pleased to see.

If you can use the onboard I/O or accelerators in the Cavium ThunderX workload optimized SKUs, Cavium is going to make a strong case. From the pricing we have seen thus far, if one can utilize a unique combination of 48x ARM cores, high-speed networking, SATA ports and accelerators, the Cavium platform is very compelling.
We are going to have a more in-depth look at day to day management and (small scale) operations with the ThunderX platform in the coming weeks. We have also been running baseline performance figures using the ThunderX as a web appliance (e.g. as a nginx, redis, and SSL offload server.) Stay tuned to STH for some very interesting tips, tricks and performance results.





Read More