





# **Detailed Technical Points**



National Research CouncilConseil national de recherchesCanadaCanada



## Outline

- FIR filter.
- Recirculation.
- Correlator chip.
- LTA controller.



## **FIR Filter**

- Desirable to implement in FPGA to minimize NRE+risk, and maximize flexibility (baseband input arrangement, 8-bit processing, 2-stage capability).
- Currently, it will not be affordable to implement 1024 taps, 4-bits in an FPGA (requires XC2V4000-5 >\$1000 each).
- However, reducing the number of taps to ~500 should enable it to fit in an XC2V1500-5 @ ~\$344 (Xilinx price projection to 2004).
- But, the budget is \$200/chip...
- So, with 3-bit data, and cosine symmetric, should fit a 511-tap FIR in an XC2V1000-5 @ \$224 ea. (Xilinx price projection to 2004).



## **FIR Filter**

- With XC2V1000-5 chip for a cosine symmetric FIR:
  - Only have ~320 taps with 4 bits (direct realization, 4773 logic slices).
  - This is only 160 taps with 8 bits initial quantization. But, with 4-bit requantization, only 1/8 bandpass required, therefore gang two chips to yield 320 taps. Ganging chips is probably going to be required for narrower band selection, but reduces number of sub-bands available...
  - But, problem with 8-bit initial quantization and 7-bit requantization (because of output data highways). Only 160 taps to produce 1/16 bandpass...not acceptable performance!
    - Solution: 1 pol'n per Station Board? (but still only 320 taps).
    - Solution: smaller LUT? (less reject-band attenuation when we want it).
    - Solution: no 7-bit requantization...just chuck interference sub-band...



## **FIR Direct Realization: B=4-bits**



B. Carlson, 2001-Nov-02



## **Cosine Symmetric B=3-bits**



B. Carlson, 2001-Nov-02



#### **31-tap, poly-phase=4, cosine symmetric FIR**



## **Or...minimum LUT fit**

- Fit LUT and adder tree to number of bits used for each tap.
- Requires one design for every set of tap coefficients.
  - *May* be possible/practical with HDL coding of FPGA.
  - But, savings drop as filter narrows!: 1/16 avg=5, 1/64 avg=7, 1/256 avg=10





## **Or...Gate Array**



1023 taps: ~16k logic slices  $\equiv$  4M system gates (Xilinx)  $\equiv$  ~32k FF's ~= 200k gates? (PLUS: dual-port memory for sub-band multi-beaming!)

B. Carlson, 2001-Nov-02



B. Carlson, 2001-Nov-02

## NRC · CNRC



B. Carlson, 2001-Nov-02

## NRC · CNRC



B. Carlson, 2001-Nov-02



B. Carlson, 2001-Nov-02



B. Carlson, 2001-Nov-02



## **FIR Filter Sub-band Boundary Loss Curves**



B. Carlson, 2001-Nov-02

## NRC · CNRC



With 1023 taps, ~1 MHz *more* is degraded due to requantization (1.3%) and fringe rotator loss (2.2%). With 511 taps it is ~2.5 MHz more, with 255 taps it is ~4.5 MHz.

B. Carlson, 2001-Nov-02

EVLA Correlator Conceptual Design Review



Clearly not acceptable performance.

B. Carlson, 2001-Nov-02

EVLA Correlator Conceptual Design Review

#### NRC · CNRC



B. Carlson, 2001-Nov-02

### NAC - CNAC



B. Carlson, 2001-Nov-02

## Recirculation

- Use (expensive, \$98) DPSRAM ( $2 \times 256k \times 18 \equiv 512k \times 18$ ).
- Good enough for 256k spectral points.
- Requires 1 msec correlator chip readout.
- Can only afford two data and phase paths:
  - Carry phase with data to simplify phase generation.
  - Allows 4 pol'n products on one baseband.
  - Simultaneous non-recirculation correlation.
- Controlled by DUMPTRIG and Recirculation Controller configuration.



B. Carlson, 2001-Nov-02





B. Carlson, 2001-Nov-02

Mean Timestamp = 47.5 msec



B. Carlson, 2001-Nov-02



#### DUMPTRIG: 16X recirculation, no pulsar phase binning

Each "recirculation block" must have its own bin to accumulate into. Thus, there are logical blocks/dumps and logical pulsar time/phase bins that map into physical bins in the LTA.



**Recirculation Controller FPGA Simplified Block Diagram** 

#### **Recirculation Controller to Correlator Chip Functional Timing**



| -                                                                                                                                                                                                                                                                                                                                                                       |     |                                               |                                                    |  |  |  |  |  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------------------------------------------|----------------------------------------------------|--|--|--|--|--|
| DUMP_EN control bits:                                                                                                                                                                                                                                                                                                                                                   |     |                                               |                                                    |  |  |  |  |  |
| CLRS: If set, then the output of the lag chain shift registers is cleared.                                                                                                                                                                                                                                                                                              |     |                                               |                                                    |  |  |  |  |  |
| DC2                                                                                                                                                                                                                                                                                                                                                                     | DC1 | DC0                                           | ACTION                                             |  |  |  |  |  |
| 0                                                                                                                                                                                                                                                                                                                                                                       | 0   | 0                                             | First dump of data into LTA. Just save data        |  |  |  |  |  |
| 0                                                                                                                                                                                                                                                                                                                                                                       | 0   | 0 1 Add data to existing LTA data and save in |                                                    |  |  |  |  |  |
| 0                                                                                                                                                                                                                                                                                                                                                                       | 1   | 0                                             | Last dump: add to LTA data; flag LTA bin as ready. |  |  |  |  |  |
| 0                                                                                                                                                                                                                                                                                                                                                                       | 1   | 1 Speed dump: bypass LTA directly to output.  |                                                    |  |  |  |  |  |
| 1                                                                                                                                                                                                                                                                                                                                                                       | 0   | 0                                             | Dump data and discard it.                          |  |  |  |  |  |
| PB[0:15] Phase bin number for this particular dump.                                                                                                                                                                                                                                                                                                                     |     |                                               |                                                    |  |  |  |  |  |
| HSP[0:3] Harmonic suppression phase. This 4-bit phase has been added<br>to the PHASE data and must be removed by the LTA controller. This<br>suppresses harmonics of strong narrowband interference and can be<br>turned on or off by a control register in the recirculation controller. The<br>correlator chip simply passes this phase data onto the LTA controller. |     |                                               |                                                    |  |  |  |  |  |
| <b>RECIRC_BLK</b> and <b>TIMESTAMP</b> are for the data dump that just occurred.                                                                                                                                                                                                                                                                                        |     |                                               |                                                    |  |  |  |  |  |
| TIMESTAMP word 0: bits 0 - 31: number of seconds since last epoch.<br>TIMESTAMP word 1: bits 0 - 28: number of clocks since last PPS<br>bits 29 - 31: major epoch.                                                                                                                                                                                                      |     |                                               |                                                    |  |  |  |  |  |

#### B. Carlson, 2001-Nov-02

## **Correlator Chip**

- Plan 2048 complex-lag chip, 4-bit/16-level multipliers, 5-level fringe rotation.
- Specialized interfaces and control for high-performance.
- Knows very little about recirculation...passes information from the Recirculation Controllers onto the LTA controller via the output data frame.
- 16 x 128 complex-lag chip. Each 128 c-lag section is individually controlled and *always* has its own output data frame.
  - Homogeneous, simple, and fast operation.



B. Carlson, 2001-Nov-02



B. Carlson, 2001-Nov-02



B. Carlson, 2001-Nov-02



B. Carlson, 2001-Nov-02



#### **Correlator Chip Output Data Frame:**

|       | 28                             | 24       | 20               | 16       | 12          | 8     | 4         | 0     |      |
|-------|--------------------------------|----------|------------------|----------|-------------|-------|-----------|-------|------|
| SYNCH | 1 0 1 0 1 0                    | 1 0 1 0  | 1 0 1 0          | 1 0 1    | 0 1 0 1     | 0 1 0 | 1 0 1 0 1 | 0 1 0 | W0   |
|       | STATUS BITS                    |          | Reserve          | d        | CCID        | HSP-Y | HSP-X     | Cmmd  | W1   |
|       | BBID-Y BBID-X                  | SBID-Y   | SBID             | D-X      | SID-Y SID-X |       |           |       | W2   |
|       | L1                             | A (Phase | ) BIN            |          | RECIRC_E    | BLK-Y | RECIRC_I  | BLK-X | W3   |
|       |                                |          | DV               | COUNT    | -Center     |       |           |       | W4   |
|       |                                |          | DV               | COUNT    | -Edge       |       |           |       | W5   |
|       |                                |          | Γ                | DATA_B   | IAS         |       |           |       | W6   |
|       |                                |          | TI               | IMESTA   | MP-0        |       |           |       | W7   |
|       |                                |          | TI               | IMESTA   | MP-1        |       |           |       | W8   |
|       |                                |          | Lag 0-In <u></u> | _phase a | accumulator |       |           |       | W9   |
|       | Lag 0-Quadrature accumulator   |          |                  |          |             |       |           |       |      |
|       | Lag 1-In_phase accumulator     |          |                  |          |             |       |           |       |      |
|       | Lag 1-Quadrature accumulator   |          |                  |          |             |       |           |       |      |
|       |                                |          |                  |          |             |       |           |       |      |
|       | Lag 127-In_phase accumulator   |          |                  |          |             |       |           |       | W262 |
|       | Lag 127-Quadrature accumulator |          |                  |          |             |       |           | W263  |      |
| SYNCH | 0 0 0 1 1 1                    | 0 0 1    | 1 1 0 0          | 0 1 1    | 1 0 0 0     | 1 1 1 | 0 0 0 1 1 | 1 0 0 | W264 |
|       |                                |          | F                | ParityCh | neck        |       |           |       | W265 |

## **LTA Controller**

- One per corr. chip in FPGA (or, one per 4 corr. Chips)
- Corr. Chip data frame tells the LTA Controller what to do with it, and exactly where to put it.
  - But, LTA Controller is smart enough not to overwrite good data waiting for output.
  - Smart enough for burst operation.
  - "Speed dump" by-passes LTA RAM...straight to output.
- Use 128 MHz readout so cheap, slow-speed-grade FPGA can be used.
- Transmits ready data on local FPDP bus...that flows to output FPDP interface.



B. Carlson, 2001-Nov-02



B. Carlson, 2001-Nov-02

## NRC · CNRC



B. Carlson, 2001-Nov-02





LTA Controller FPGA Functional Block Diagram

B. Carlson, 2001-Nov-02

#### LTA Controller: FPDP Normal Output Data Frame

|       | 28                             | 24        | 20 16          | 12           | 8       | 4         | 0     |      |  |  |
|-------|--------------------------------|-----------|----------------|--------------|---------|-----------|-------|------|--|--|
| SYNCH | 0 1 0 1 0 1 0                  | 0 1 0 1 0 | 1 0 1 0 1      | 0 1 0 1      | 0 1 0 1 | 0 1 0 1 0 | 1 0 1 | WO   |  |  |
|       | [                              | DATA_BIN# |                | Reserv       | ChipID  | CCID      | FType | W1   |  |  |
|       | BBID-Y BBID-X                  | SBID-Y    | SBID-X         | SID-         | Y       | SID-X     |       | W2   |  |  |
|       | STATUS BITS                    | FRAM      | E_COUNT        | RECIRC       | _BLK-Y  | RECIRC_B  | SLK-X | W3   |  |  |
|       |                                |           | DVCOUN         | IT-Center    |         |           |       | W4   |  |  |
|       |                                |           | DVCOUN         | IT-Edge      |         |           |       | W5   |  |  |
|       |                                |           | Rese           | rved         |         |           |       | W6   |  |  |
|       |                                |           | TIMEST         | AMP-0        |         |           |       | W7   |  |  |
|       |                                |           | TIMEST         | AMP-1        |         |           |       | W8   |  |  |
|       |                                | L         | .ag 0-In_phase | e accumulato | or      |           |       | W9   |  |  |
|       | Lag 0-Quadrature accumulator   |           |                |              |         |           |       |      |  |  |
|       | Lag 1-In_phase accumulator     |           |                |              |         |           |       |      |  |  |
|       | Lag 1-Quadrature accumulator   |           |                |              |         |           |       |      |  |  |
|       |                                |           |                |              |         |           |       |      |  |  |
|       | Lag 127-In_phase accumulator   |           |                |              |         |           |       |      |  |  |
|       | Lag 127-Quadrature accumulator |           |                |              |         |           |       | W263 |  |  |
| SYNCH | 0 1 1 1 0 0 0                  | 1 1 1 0   | 0 0 1 1 1      | 0 0 0 1      | 1 1     | Board ID  |       | W264 |  |  |
|       |                                |           | CHEC           | KSUM         |         |           |       | W265 |  |  |

#### LTA Controller: FPDP "Speed Dump" Data Frame

|           | 28                             | 24        | 20 16     | 12        | 8     | 4         | 0     |      |  |  |
|-----------|--------------------------------|-----------|-----------|-----------|-------|-----------|-------|------|--|--|
| SYNCH 0 1 | ) 1 0 1 0                      | 0 1 0 1 0 | 1 0 1 0 1 | 0 1 0 1 0 | 1 0 1 | 0 1 0 1 0 | 1 0 1 | W0   |  |  |
| STA       | TUSBITS                        | Reserv.   | ChiplD    | CCID      | HSP-Y | HSP-X     | FType | W1   |  |  |
| BBID-     | Y BBID-X                       | SBID-Y    | SBID-X    | SID-Y     |       | SID-X     |       | W2   |  |  |
|           | Ph                             | ase BIN#  |           | RECIRC_   | BLK-Y | RECIRC_I  | BLK-X | W3   |  |  |
|           |                                |           | DVCOUN    | IT-Center |       |           |       | W4   |  |  |
|           |                                |           | DVCOUN    | IT-Edge   |       |           |       | W5   |  |  |
|           |                                |           | DATA_     | BIAS      |       |           |       | W6   |  |  |
|           |                                |           | TIMEST    | AMP-0     |       |           |       | W7   |  |  |
|           |                                |           | TIMEST    | AMP-1     |       |           |       | W8   |  |  |
|           | Lag 0-In_phase accumulator     |           |           |           |       |           |       |      |  |  |
|           | Lag 0-Quadrature accumulator   |           |           |           |       |           |       |      |  |  |
|           | Lag 1-In_phase accumulator     |           |           |           |       |           |       |      |  |  |
|           | Lag 1-Quadrature accumulator   |           |           |           |       |           |       |      |  |  |
| -         |                                |           |           |           |       |           |       |      |  |  |
|           | Lag 127-In_phase accumulator   |           |           |           |       |           |       |      |  |  |
|           | Lag 127-Quadrature accumulator |           |           |           |       |           |       | W263 |  |  |
| SYNCH 0 1 | 1 0 0 0                        | 0 1 1 1 0 | 0 0 1 1 1 | 0 0 0 1 1 | 1     | Board ID  |       | W264 |  |  |
|           |                                |           | CHE       | CKSUM     |       |           |       | W265 |  |  |







# This Just In ...



National Research CouncilConseil national de recherchesCanadaCanada



## **FIR Filter Chip**

- AMI semiconductor FPGA-to-gate array conversion (0.18 μm --full production capacity available Dec. 2001).
  - NRE: \$200k (XCV2000E-8 [1024 taps, 4-bit])
  - per chip cost: \$50 (10k piece pricing...5k piece pricing unavailable).
  - 13-16 weeks lead-time for straight conversion.
- Total FIR cost: ~\$430k (save up to \$500k).
- Some loss of flexibility because no longer programmable, but design is still very flexible.

## **Correlator Chip**

- AMI semiconductor could convert FPGA design to 0.18  $\mu$ m gate array and scale up the number of lags.
  - Cost: NRE \$200k, \$50 each in 10k quantities.
- Could build prototype correlator chip in FPGA and do a conversion from the FPGA design to gate array.
  - Can test and debug design fully with tools now being purchased.
  - Eliminates need for development using full custom toolsets.
  - Drop-in the full custom chip (footprint compatible).
- The big question is power...is gate array capable???
  - Must get serious answers before any decision is made.
- Potential savings: ~\$300k (over the *budgeted* \$1 million).



## **Correlator Chip**

• THIS PAGE CONFIDENTIAL

## **High-Speed Cabling**

- "Woven Electronics" cable with MDR-80 connector:
  - 3 m: roughly \$100-\$125 each in 3k quantity (GORE: \$268 ea). Save: ~\$400k.
  - 13 m: waiting for quotation (Guess ~\$230). Save: ~\$160k.
  - Will work on cable configuration to meet our requirements...probably flat cable...have access to expanded PTFE as well...not just normal Woven Electronics flat cable.
  - May require "relaxed" Baseline Rack cable routing.



## **Modified Baseline Rack Cabling**



B. Carlson, 2001-Nov-02

## **Recirculation Memory**

- 256k x 36-bit IDT memory now available for ~\$82.
- Could allow *full* performance recirculation on 2 pairs (no phase jitter, 4-bit data).
- Could allow *reduced* performance recirculation on all 8 basebands (8x3 + 8 + 2 = 34 bits), but with 3-bit data (and 4/fs phase jitter?).
- **BUT**: for I/O, requires a 600E-8 FPGA (\$263 vs current \$143 for 400E-8). **Total extra cost: ~\$300k**.

B. Carlson, 2001-Nov-02



## **Summary**

- FIR savings (~\$500k) + cable savings (~\$500k) + correlator chip savings (~\$300k) = \$1.3 million.
- Total project cost \$10.4 million, including \$1.4 million contingency.
- Improved recirculation width costs *additional* ~\$300k.