Bundled optics can outsource generative AI computing

Researchers at IBM Research have announced a new set of advances in chip assembly and packaging, called co-packaged optics, that promise to improve energy efficiency and increase bandwidth by bringing optical link connections inside devices and within the walls of data centers used to train and implement large language models. This new process promises to increase the number of optical fibers that can be connected at the edge of a chip, a measure known as beach front density, by six times. As artificial intelligence demands ever more bandwidth, this innovation will use the world’s first successful optical polymer waveguide to bring the speed and bandwidth of optics right to the edge of chips.

Early results suggest that switching from conventional electrical interconnects to bundled optics will reduce energy costs for training AI models, speed up model training, and dramatically increase energy efficiency for data centers.

Today’s advanced chip and chip packaging technologies typically use electrical signals for the transistors in the microelectronics that power phones, computers, and almost everything we make. Transistors, for their part, have gotten many times smaller over the decades, enabling us to pack more capacity into a given space. But even the most capable semiconductor components are only as fast as the connections between them.

an IBM polymer optical waveguidean IBM polymer optical waveguide

IBM’s prototype polymer optical waveguides bring the speed and bandwidth of fiber-optic connections right to the edge of chips, replacing sluggish electrical connectors.

These connections allow us to seamlessly use electronic devices in our daily lives—like when we drive our cars, which include chips in almost every system from the seats to the tires. “Even your refrigerator has electronics to help everything work properly,” says IBM Research engineer John Knickerbocker, a prominent engineer in chiplets and advanced packaging.

However, Knickerbocker and his team think less. Because of optical connectors’ lower cost and higher energy efficiency, they are good candidates for improving the performance of chip-to-chip and device-to-device communication in data centers, where generative AI computing requires ever higher and higher bandwidth.

“Large language models have made AI very popular these days throughout the tech industry,” says Knickerbocker. “And the resulting growth of LLMs—and generative AI more broadly—requires exponential growth in high-speed connections between chips and data centers.”

IBM Research scientists in a lab look at an optics module under a microscopeIBM Research scientists in a lab look at an optics module under a microscope

Hsianghan Hsu (left) and John Knickerbocker (right) inspect a polymer optical waveguide module under a microscope at IBM Research’s global headquarters in Yorktown Heights, New York.

And while optical cables can carry data in and out of data centers, what happens inside is a different story. Even today’s most advanced chips still communicate via copper-based wires that carry electrical signals. It takes a lot of energy to make the connection from the edge of a chip to a circuit board, then from the circuit board over miles of optical cable, and then back to another module and to another chip in a remote data center. Whether you’re transmitting data or a voice call, sending a signal seamlessly across all these intersections costs energy. Low-bandwidth wired connections within servers also slow down GPU accelerators, which sit idle while waiting for data.

Electrical signals use electrons to deliver power and signal communication from one device to another. Optics, on the other hand, which has been used in communication technologies for decades, uses light to transmit data. Fiber optic cables, hair-thin and sometimes thousands of kilometers long, can transmit hundreds of terabits of data per second. Bundled together and insulated in cables that run under the sea, optical fibers carry almost all of the global trade and communication traffic that flows between continents.

Bringing the power from optical connections to circuit boards and all the way to chips results in a more than 80% reduction in energy consumption compared to electrical connections, Knickerbocker and his colleagues have found—a reduction from 5 picojoules per bit to less than 1. Over thousands of chips and millions of operations, this means huge savings.

A small plastic case holds several optical polymer waveguide modulesA small plastic case holds several optical polymer waveguide modules

John Knickerbocker carefully handles polymer optical waveguide modules in the lab. These connectors promise to reduce the amount of time GPUs sit idle waiting for data during AI model training.

Chiplet and the Advanced Packaging team at IBM Research are looking to streamline this system with co-packaged optics, an approach that promises to improve the efficiency and density of communications, both within and between chips. Part of bringing optical connections to integrated circuit boards is to build in transmitters and photodetectors to send and receive optical signals. Optical fibers are about 250 micrometers in diameter, about three times the width of a human hair. It may sound tiny, but four fibers add up to a millimeter, and as the millimeters add up, you quickly run out of space at the edges of a chip.

The solution, as IBM Research researchers saw it, lies in the next generation of optical links that enable much closer connections: the polymer optical waveguide. This device allows high-density bundles of optical fibers to be lined up right at the edge of a silicon chip so that it can communicate directly out through the polymer fibers. High-fidelity optical connections require exacting tolerances of half a micron or less between a fiber and connector, a feat the team has now achieved.

Thanks to these approaches, the team has demonstrated the viability of a 50 micron pitch for optical channels, coupled to silicon photonics waveguides and connector pluggable to single-mode glass fiber (SMF) arrays, using standard assembly packaging processes. This represents an 80% size reduction from the conventional 250 micron pitch, but tests show they can shrink this even further, down to 20 or 25 microns, which would equate to a 1,000% to 1,200% increase in bandwidth.

Exploded diagram of an IBM optical waveguide moduleExploded diagram of an IBM optical waveguide module

An exploded view of the prototype co-packaged optics module.

The insertion loss of photonic integrated circuit (PIC) to SMF optical link has typically been 1.5 to 2 decibels (dB) per channel, but in this case it has been shown to be below 1.2 dB per full optical connection. In addition, demonstrations with 18.4 micrometer pitch optical waveguides have shown less than 30 dB of crosstalk, indicating that this copackaged optics technology is scalable to very high bandwidth density for on-chip interconnect.

This means that, taking a lesson from the telephone industry’s book, they can transmit more wavelengths of light per second. optical channel, which has the potential to increase this bandwidth gain by at least 4,000% and as much as 8,000%.

In addition to the fiber-to-chip and fiber-to-board connections, they also reinforce conventional glass fibers with high-strength polymers, a step that improves durability and efficiency, but also requires advanced modeling simulations of optical lengths to ensure light can transmit through several components without loss – the “co-packing” of the whole.

Scientists and technicians walk through a clean room where electronic components are developedScientists and technicians walk through a clean room where electronic components are developed

The polymer optical waveguides are being tested at IBM’s factory in Bromont, Québec. There they are exposed to heat and cold cycles, high humidity and mechanical stress testing.

This development process also includes industry standard reliability stress testing to ensure that all the optical and electrical connections will still function when subjected to the stresses seen during manufacturing and use. Components are subjected to temperatures from -40°C to 125°C as well as mechanical durability testing to confirm that the optical fibers can withstand bending without breaking or incurring data loss. This testing takes place at IBM Research’s global headquarters in Yorktown Heights, New York, as well as at IBM’s plant in Bromont, Quebec.

“The big deal is not only that we’ve got this big density improvement for communication on the module, but we’ve also shown that this is compatible with stress tests that optical links haven’t passed before,” says Knickerbocker. IBM’s modules are intended to be compatible with standard electronic passive advanced packaging assembly processes, which can lead to lower manufacturing costs. With this innovation, IBM can produce co-packaged optical modules at its Bromont facility.

The team is building a roadmap for the next steps this technology will take, including soliciting feedback from IBM customers and enabling co-packaged optics to meet generative AI compute business needs. “We will also work with the component suppliers to position them for this next step in the technology,” says Knickerbocker, “as well as position them for the ability to support production volumes, not just prototypes.”