Intel Arria 10 FPGA 1Gbps Communication Channel: Model and Design

> James Skelly ECG795: High-Speed PCB Design Dr. Sarah Harris May 8, 2020

# **Overall Design**

Main Components of System:

- Arria 10 FPGA (Transmitter IC)
- 48.5cm PCB trace, package, via, bonding parasitics model
- Receiver IC

Total Price: \$560.90



Figure 1: System block diagram modeled in LTSpice.

# **Overall Design**

| Capacitance to Parallel Line (Same Layer)              | 3.63   | pF/m |
|--------------------------------------------------------|--------|------|
| Capacitance to Parallel Line (Layer Above)             | 19.47  | pF/m |
| Capacitance to Ground Plane                            | 106.13 | pF/m |
| Differential Capacitance (Cd)                          | 3.63   | pF/m |
| Stray Capacitance (Cc)                                 | 129.37 | pF/m |
| Capacitance of Line                                    | 133    | pF/m |
| Resistance of Line                                     | 4.65   | Ω/m  |
| Inductance of Line                                     | 368    | nH/m |
| Mutual Inductance                                      | 10.04  | nH/m |
| Odd-Mode Line Impedance                                | 51.18  | Ω    |
| Z <sub>0</sub> of Line                                 | 52.60  | Ω    |
| Velocity of Signal On Line                             | 0.143  | m/ns |
| Delay of Line                                          | 2.8    | ns   |
| Coefficient of Capacitive Crosstalk (k <sub>cx</sub> ) | 0.027  | -    |
| Coefficient of Inductive Crosstalk (k <sub>1x</sub> )  | 0.027  | -    |
| Coefficient of Forward Crosstalk                       | 0      | -    |
| Coefficient of Reverse Crosstalk                       | 0.0135 | -    |

Table 1: Design parameters and calculated values.

# Intel Arria 10 FPGA

Features:

- 484-pin FBGA package
  - Pitch = 1mm (39.4 mil)
  - Ball diameter (min) = 0.5mm
  - Ball diameter (max) = 0.7mm
- High-speed LVDS support (72 pairs)
  - Recommended VCM = 1.25V
  - Max LVDS voltage = 1.6V
  - Min LVDS voltage = 1V
- 2.1Gbps maximum data rate
  - Design uses 1Gbps data rate
  - Max transmitter rise time = 130ps
  - Min transmitter rise time = 20ps

#### Intel Arria 10 Price: \$410.00



Figure 2: Intel Arria 10 FPGA soldered to a PCB. Source: [11]

### PCB Trace, Parasitics Model

#### Components:

- Series L to model a bond wire
- Parallel RC with series L to model BGA ball [5]
- Series L with shunt C to model PCB via [6]
- Lossy RLC transmission line spice model
- Termination resistor mismatched by 20%



#### Package, Via, Bonding Parasitics

Figure 3: Line model symbol view.



Figure 4: Line model schematic including characteristic trace transmission line properties, package, via, and bonding parasitics.

# **Receiver IC**

#### Components:

- CMOS comparator
  - Design adapted from [4]
  - VDD = 1.8V
  - Uses 180nm BSIM devices
  - Requires 0.8V external bias
- D Flip Flops
  - Design adapted from [4]
  - Data clocked at 1GHz
  - Introduce clock feedthrough
- Output Buffers
  - Remove clock feedthrough
  - Clean up output signals

#### Receiver IC Price: \$33.56



Figure 5: Receiver IC circuitry modeled in LTSpice.

### **CMOS** Comparator

#### Stages:

- Externally-biased diff-amp for preamplification
- Decision circuit
- Self-biased diff-amp for postamplification
- Output buffer



Figure 6: Comparator symbol view.



Figure 7: Schematic of CMOS comparator [4] used to reliably receive data at the far end of the PCB traces.

# Signaling System

- LVDS standard (72 pairs, 144 total lines)
- Data rate of 1Gbps (1ns bit time ideally)
- Bipolar differential signaling (voltage mode)
- Common mode voltage of 1.25V
- Voltage swing of 800mV
- Two possible signal levels (Figure 9)



Figure 8: Diagram of LVDS signaling used for design.

#### Voltage Swing = 800mV



Figure 9: System "Logic 1" vs. "Logic 0."

# **Timing System**

- Single edge-triggered DFFs to clock data out
- Clock generator with wide inverter to create clock complement signal for DFFs
- Synchronous timing convention, models a per-line closed loop timing system
- Parameter td\_clk used to set the clock rising edge to the center of the data for simulation
- Clock feedthrough evident in plot of Figure 10 on both rising and falling edge of clock



Figure 10: D flip flops used to clock data out (left); clock generator (middle); plot showing how data is clocked on rising edge (right).

### **Board Stack-up and Parameters**

- FR-4 PCB (relative permittivity of 4.4)
- 8 total board layers
  - 1 oz copper for each layer
    - 0.036mm (1.4 mils)
  - Middle layers (6) for signals
  - Top layer for power
  - Bottom layer for ground
- Board thickness of 1.6mm (62 mils)
- Alternating dielectrics between layers
  - Core layer, 0.203mm (8 mils)
  - Prepreg layer, 0.170mm (6.7 mils)
- BGA ball diameter parameters
  - Minimum = 0.5mm (19.7 mils)
  - Maximum = 0.7mm (27.6 mils)
- Trace width = 0.102mm (4 mils)
- Trace spacing = 0.381mm (15 mils)
- Minimum trace length = 40cm
- Maximum trace length = 48.5 cm



Figure 10: 8-layer board stack-up and parameters.

11

# Noise Budget

- Parasitics and termination resistor mismatch
  - Noisy signals at the far end of the channel
- Ideally, signals should swing from 1.45V to 1.05V
  - Attenuation is apparent from Figure 12
- Observable reflections in transient analysis of Vp, Vm
- Maximum signal swing (800mV) limits effects of independent noise sources.



Figure 11: Noisy eye diagram of Vp, Vm signals (far end of PCB).



Figure 12: Noisy inputs to comparator (outputs of modeled parasitic PCB trace).

# Noise Budget

| Voltage Swing                               | 800mV               |
|---------------------------------------------|---------------------|
| Gross Margin                                | 400mV               |
| Reverse Crosstalk Coefficient               | 0.0135              |
| Forward Crosstalk Coefficient               | 0                   |
| Worst-case (ISI) Reflections (20% mismatch) | 0.111               |
| Attenuation Coefficient                     | 0.130               |
| Kn                                          | 0.255 (204mV)       |
| Receiver Offset, Sensitivity                | 40mV                |
| Transmitter Offset                          | 10mV                |
| Bounded Noise Total                         | 254mV               |
| Net Margin                                  | 146mV               |
| Power Supply Noise                          | 5mV                 |
| Total Gaussian Noise                        | 15mV <sub>RMS</sub> |
| VSNR                                        | 9.73                |
| BER                                         | 2.77e-21            |
| MTBF (in seconds)                           | 361 billion         |
| MTBF (in years)                             | 11,459              |

### Timing Budget Diagram



Figure 13: Timing diagram showing aperture time, rise time, ideal bit time.

# **Timing Budget**

| Transmitter Clock Jitter               | 20 ps  |
|----------------------------------------|--------|
| Receiver Clock Jitter                  | 20 ps  |
| Transmitter Jitter                     | 160 ps |
| Receiver Jitter                        | 30 ps  |
| Trace Delay                            | 2.8 ns |
| Data Rise Time, Fall Time (Worst Case) | 130 ps |
| Transmitter Skew                       | 15 ps  |
| Receiver Skew                          | 30 ps  |
| Clock Rise Time, Fall Time             | 20 ps  |
| Total Uncertainty                      | 425 ps |
| Aperture Time (Max)                    | 555 ps |
| Bit Time                               | 1 ns   |

Table 3: System timing budget.

### **Power Consumption**





Figure 15: Average power from Fig. 14.

Figure 14: Plot of voltage, current, and power traces with alternating ones and zeros in transmitter.



| Interval Start: | 1ns       |
|-----------------|-----------|
| Interval End:   | 6ns       |
| Average:        | -8.6617mW |

Figure 17: Average power from Fig. 16

Figure 16: Plot of voltage, current, and power traces with alternating ones and zeros in receiver circuitry.

### **Power Consumption**

| Average Power Supplied by Vp                | 1.96mW  |
|---------------------------------------------|---------|
| Average Power Supplied by Vm                | 1.96mW  |
| Average Power Supplied by Transmitter       | 3.92mW  |
| Average Current Drawn by VDD                | 4.61mA  |
| Average Power Dissipated by Comparator      | 6.132mW |
| Average Power Dissipated by D Flip Flops    | 1.306mW |
| Average Power Dissipated by Clock Generator | 0.517mW |
| Average Power Dissipated by Output Buffers  | 0.367mW |
| Average Power Supplied by VDD               | 8.66mW  |
| Total Average Power Dissipated by One Link  | 12.58mW |
| Total Average Power Dissipated by Design    | 905.8mW |

### **Cost Analysis**

| PCB Area                                                       | 73.1 in <sup>2</sup>   |
|----------------------------------------------------------------|------------------------|
| Cost of 8-layer Board                                          | \$0.75/in <sup>2</sup> |
| Calculated Cost of Board                                       | \$54.84                |
| Arria 10 FPGA Cost (484-pin BGA Package)                       | \$410.00               |
| Receiver IC Cost (160-pin BGA Package)                         | \$33.56                |
| Termination Resistor Cost (Quantity 5000, 100 $\Omega$ , ±20%) | \$62.50                |
| Cost Per Link of System                                        | \$7.79                 |
| Total Cost of System                                           | \$560.90               |

Table 5: System cost analysis.

### **Transient Performance**

Vp, Vm on Arria 10 IC before leaving chip

Vp, Vm at far end of PCB trace before comparator

Vp, Vm after being clocked (DFF outputs)

Vp, Vm at outputs of final output buffers



Figure 18: Plotting Vp and Vm at different nodes throughout the system.

## Eye Diagrams

Vp, Vm on Arria 10 IC before leaving chip

Vp, Vm at far end of PCB trace before comparator

Vp, Vm after being clocked (DFF outputs)

<u>Vp, Vm at outputs of final output buffers</u>



Figure 19: Eye diagrams of signals from Figure 17.

### TDR, TDT Waveforms

- Pulse source with amplitude of 1V, rise time of 20ps
- Used to test the response of the system to very fast rise times
- Observable parasitics:
  - Voltage peaks from inductance in the line
  - Voltage valleys from capacitance in the line
  - 7.5ns round trip delay from transmission line model
  - Reflections due to 20% mismatched termination resistor





Figure 21: TDR pulse source.

Figure 22: TDR, TDT waveforms with 1V input pulse on Vp line, Vm grounded.

# Summary, Future Work

Summary of system performance:

- System operates reliably at 1Gbps
- Total average power consumption = 905.6mW
  - 12.58mW per link
- Total cost of system = \$560.90
  - \$7.79 per link
- Ideal bit width of 1ns, aperture time of 555ps
- Gross margin of 400mV, net margin of 146mV

Future work and improvements:

- Design PLL for clock synchronization
- DFFs instead of inverters at comparator output
- Better system modeling (VCVS for attenuation)



Figure 23: Arria 10 signal integrity development kit diagram. Source: [11]

#### References

- [1] Baker, R J. CMOS: circuit design, layout, and simulation. Hoboken, N.J: IEEE Press/Wiley, 2010.
- [2] Dally, William J., and John W. Poulton. *Digital systems engineering*. Cambridge, U.K. New York, NY, USA: Cambridge University Press, 1998.
- [3] Texas Instruments, "High Speed PCB Layout Techniques," *Texas Instruments*, 2004. [Online]. Available: <u>http://www.ti.com/lit/ml/slyp173.pdf?ts=1588841554647</u> [Accessed: April 23, 2020].
- [4] D. M. Harris and S. L. Harris, *Digital Design and Computer Architecture*. Boston: Elsevier/Morgan Kauffman, 2012.
- [5] Intel Corporation, "The next 10<sup>th</sup> Gen Intel Core processors," Intel Arria 10 Device Datasheet, Dec. 2013 [Revised March 2020].
- [6] T. Chang, P. H. Cheng, H. C. Huang, R. S. Lee and R. Lo, "Parasitic characteristics of BGA packages," Proceedings. 1998 IEEE Symposium on IC/Package Design Integration (Cat. No.98CB36211), Santa Cruz, CA, USA, 1998, pp. 124-129, doi: 10.1109/IPDI.1998.663644.
- [7] A. Weiler, A. Pakosta, and A. Verma, "High-Speed Layout Guidelines," Texas Instruments, App. Report SCAA082A, Nov. 2006 [Revised Aug. 2017].
- [8] John Ardizzoni, "A Practical Guide to High-Speed Printed-Circuit-Board Layout," Analog Devices, Sept. 2005.
- [9] Stephen Craig, "Escaping BGAs Methods of Routing Traces from BGA Footprints," MacroFab, Aug. 2016.
- [10] Intel Corporation, "Altera Device Package Information," 484-Pin Ultra FineLine Ball-Grid Array (UBGA) Flip Chip Datasheet, Dec. 2018.
- [11] Intel Corporation, "Intel Arria 10 FPGAs," *Intel Corporation*, 2020. [Online]. Available: <u>https://www.intel.com/content/www/us/en/products/programmable/fpga/arria-10.html</u> [Accessed: April 2, 2020].