Shared-multiplier polyphase FIR filterPosted by Markus Nentwig on Jul 31 2013 under FPGA | Multirate DSP | VerilogKeywords: FPGA, interpolating decimating FIR filter, sample rate conversion, shared multiplexed pipelined multiplier Discussion, working code (parametrized Verilog) and Matlab reference design for a FIR polyphase resampler with arbitrary interpolation and decimation ratio, mapped to one multiplier and RAM. IntroductionA polyphase filter can be as straightforward as multirate DSP ever gets, if it doesn't turn into a semi-deterministic, three-legged little dance between input, output and clock rates to the tune of pipeline delays. Now, as usual, there's more than one way to skin a cat. To give a ballpark number, 100 MHz clock rate should not pose a challenge on an entry-level FPGA evaluation board such as this one (Altera Cyclone IV) or this (Xilinx Spartan 6). While CD audio has two channels, this example implements only a single filter unit for simplicity. Included are a testbench with multi-rate input/output file IO and an optional top-level Matlab script that may be useful on its own as a small, self-contained design flow. The example can be run without relying on commercial tools (see links below for some selected open-source software). Excourse: Polyphase filteringThe Verilog implementation needs only about 50 lines of code, but tells little about what it is doing, and why. "Phase" in the word "polyphase" refers to the idea that time-delayed replicas of the same signal are involved. ![]() The red samples are interpolated by first interleaving zeros between the original samples (upsampling), then lowpass filtering at the resulting higher rate: ![]() By discarding unwanted output samples, the rate can be lowered by an integer factor n (decimation, throw away n-1 out of n samples). Efficient FIR structureFig. 3 shows samples in a FIR filter after interleaving zeros with the input data. ![]() Every second multiplier does not contribute to the result, because its input sample is predictably zero. In fig. 4, the delay line was shifted once. Now different multipliers are idle. ![]() Shifting and multiplying zero samples is obviously not an efficient use of hardware resources. Neither is calculating output samples only to throw them away. ![]() It is interesting to note that the computational effort is solely determined by the output rate. Neither interpolation nor decimation causes any workload by itself. For example, a five-tap filter with a conversion ratio of 999999/1000000 still needs only five multiplications per sample. However, the number of coefficients skyrockets due to the high interpolation rate (5*999999). As a rule of thumb, polyphase filters are typically most attractive for small interpolation factors to keep the number of coefficients manageable. Example: Coefficient sequence for 5 up 4 down resampling(Fig. 5) illustrates the sequence of events in a 5-up 4-down resampler. ![]() An interesting observation is that the bank index seems to run backwards. This is because steps of x and x-k are identical in modulo-k arithmetics (rewrite the latter as -(k-x)). n > mIf the decimation factor is higher than the interpolation factor, the output rate is lower than the input rate. Occasionally, the bank index will wrap around more than once, and the delay line is shifted several times before a new output sample is loaded. n < mSimilarly, a decimation factor smaller than the interpolation factor results in multiple subsequent output samples without delay line shift, as shown in the example. Cycle lengthFor a rate-changing filter, m and n can be expected to be free of common factors, because it would make little sense to first interpolate, then immediately decimate by the same number. If so, the filter operates in cycles, and every coefficient will be used exactly once during each cycle. Note that in the example implementation, this cycle is not explicitly coded, but happens as a consequence of repeatedly adding to the bank index. Decimation phaseA special case is to set m=n, that is, interpolate and decimate by the same number. Only a single coefficient bank will be used at a time, and it can be selected via the initial value of bankPtr (by default 0). This structure can be used as variable delay. Functionally, it is identical to a conventional rate-1 FIR filter. Implementation architectureFig. 6 shows an overview of the implementation. ![]() It consists mainly of
Write port B of the RAM is unused and could be controlled externally to reconfigure the filter. Control logicThe control block implements the indexing scheme from fig. 5. Every time the bank index wraps around, an input sample is consumed. For each output sample macCount iterates over the input history and coefficients. Pipeline delays are compensated with delay lines that synchronize trigger commands to reset the MAC (macSet) and store its output (macRead). The higher length of the 2nd line is needed to compensate the output register of the MAC model. The delay line concept is quite flexible, because it allows easy replacement of RAM and MAC with FPGA-specific building blocks, also to insert additional pipeline registers if needed. Otherwise, there isn't much to say. Details are documented in the code. Maybe a single bit doesn't deserve to be called "state machine", but it is meant to be extended to support multiple channels or the like. For most parameter settings, conditional expressions in the code become constants and the control logic simplifies greatly. While the synthesis tool will probably spot most of those, some manual cleaning might also pay off. Other componentsThe download .zip file includes generic implementations for dual-port RAM and MAC. The Matlab script serves as top level file, and does the following:
To run, invoke top in test_FIR.v. The system calls in test_FIR.m show how it is done in iverilog and command line ModelSim. By default, the simulation creates a .vcd or .lxt2 file that can be dragged-and-dropped into gtkwave. For faster simulations, disable it by commenting out dumpvars in test_FIR.v. ConclusionThis article describes a Verilog implementation of a polyphase FIR resampler with arbitrary interpolation- and decimation factors that multiplexes all operations to a single, pipelined multiplier. A file-streaming testbench and a Matlab reference implementation are included. Download / linksDownload design and testbench Optional: Rate this article: ![]() ![]() ![]() ![]() ![]() Rating: 5 | Votes: 1
![]() Markus received his Dipl. Ing. degree in electrical engineering / communications in 1999. Work interests include RF transceiver system design, implementation, modeling and verification. He works as senior architect for Renesas Mobile Europe in Finland. Previous post by Markus Nentwig: Noise shaping all articles by Markus Nentwig |
|