Fast Placement Methods for RCS

Fast Placement Methods for Reconfigurable Computing Systems

Reconfigurable Computing System (RCS) is a system in which the hardware adapts itself to the running application. Such systems usually consist of a host processor, tightly coupled with a reconfigurable functional unit (RFU), which could be an FPGA chip, or some reconfigurable hardware on the same die as the CPU. The fact that fast, tailor-made RFU operations (RFUOPs) can be implemented on RFU opens a new horizon to computing domain. Speedups of 10x to 100x (e.g., string matching, DNA sequencing) can be achieved by implementing many modules on the RFU to perform fine-grain parallel operations.

One of the serious hurdles in realizing efficient reconfigurable computing systems (RCS) is the placement and routing of the RFUOPs (reconfigurable functional unit operations) on the RFU. In this project, we try to find fast methods for placing RFUOP modules on RFU as compact as possible.

If the flow of the program being mapped to RCS is known at compile time, then the placement stage can be modeled as a three-dimensional placement problem: assigning (x,y,t) tuples to each of the modules (x and y specify the location of the module on the chip and t is the time step in which the RFUOP starts executing). On the other hand, if the flow of the program cannot be determined during compilation, then a fast online placement method should be exploited to locate modules on the RFU during runtime.

Online Placement
By partitioning the empty space on RFU into empty rectangles, we can check if there is enough room on RFU for a new module, by comparing the module's dimensions against empty rectangles.
We have explored two methods: one which keeps all maximal empty rectangles (KAMER) and another method which keeps only linear (in terms of number of modules currently on RFU) number of empty rectangles. The figures below show examples of partitioning of the empty space by these methods. The figure on the left shows KAMER, and the one on the right shows the linear partitioning result.


Keeping all maximal empty rectangles (KAMER)	Keeping linear number of empty rectangles

We have shown that the former method generates placements which are 5% better in quality than the latter, but about 15 times slower. The figure below shows a snapshot of our online placement program.

Offline 3-D Placement
When the flow of the program can be accurately predicted at compile time (e.g., in DSP applications), we can use the information on operation start and end times to optimize the set of RFUOPs mapped on the RFU as well as their locations so that the RFU real estate is used efficiently.

When dealing with the offline case, we can reformulate placement of RFUOPs as a three dimensional placement problem. The figure below shows an example scheduled DFG and its corresponding 3D floorplan.

We implemented a number of annealing and greedy algorithms to solve the 3D floorplanning problem. In the annealing methods, we used KAMER to place X% largest (X being a parameter of the algorithm) modules, and then used low temperature annealing to improve the placement. The annealing could displace a module on the RFU (no change in start or end time), move an operation from CPU to RFU and vice versa. In the greedy method, we used KAMER to place as many modules as it could, and then used zero-temperature annealing to the resulting placement (only applying moves which immediately improve the placement quality).

Another algorithm which happens to be the best (both in terms of running time and generated quality) is a greedy method that we called Best Fit Offline Placement (BFOP). The method is similar to the well-studied bin-packing problem. Modules are first sorted on their volume (time span times area) in decreasing order. Then, for each RFUOP, the algorithm finds candidate corners. A candidate corner is a corner of an empty box (in the 3D floorplan space) which is large enough to accommodate the RFUOP module. From the candidate corners, the one which belongs to the smallest box (resulting in minimum wasted area) is picked. The figure below shows a corner for a new module.

Using Firm Templates in 3D Placement
We have observed that by using more than one shape in the library for each module, we can gain considerably in placement quality. If a module cannot be placed at its first shape, the other dimensions of the same module in the library are tried. Whichever can be placed will be chosen.

Related Published Papers

K. Bazargan, R. Kastner and M. Sarrafzadeh,
"3-D Floorplanning: Simulated Annealing and Greedy Placement Methods for Reconfigurable Computing Systems",
submitted to Design Automation for Embedded Systems (DAfES) - RSP'99 Special Issue, April 2000. (gzipped ps) (PDF)
K. Bazargan, R. Kastner and M. Sarrafzadeh,
"Fast Template Placement for Reconfigurable Computing Systems",
to appear in IEEE Design and Test - Special Issue on Reconfigurable Computing, January-March 2000. (gzipped ps) (PDF)
R. Kastner, K. Bazargan and M. Sarrafzadeh,
"Physical Design for Reconfigurable Computing Systems using Firm Templates",
Workshop on Reconfigurable Computing (WoRC), pp. 19-26, 1999. (gzipped ps) (PDF)
K. Bazargan, R. Kastner and M. Sarrafzadeh,
"3-D Floorplanning: Simulated Annealing and Greedy Placement Methods for Reconfigurable Computing Systems",
10th IEEE International Workshop on Rapid System Prototyping (RSP' 99), pp. 38-43, 1999. (gzipped ps) (PDF)
K. Bazargan and M. Sarrafzadeh,
"Fast Online Placement for Reconfigurable Computing Systems",
IEEE Symposium on FPGAs for Custom Computing Machines (FCCM), pp. 300-302, 1999. (gzipped ps) (PDF)

Presentation Slides

You can download slides in gzipped ps (gzipped ps with notes) format
OR
view them online (more convenient, with notes).

Back to Research Page.
Back to home.
Last updated Oct 23, 1999