etd
193 Pages

etd

Course Number: ETD 121699, Fall 2008

College/University: Virginia Tech

Word Count: 61996

Rating:

Document Preview

A Multiplexed Memory Port for Run Time Reconfigurable Applications James Atwell Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master of Science In Electrical Engineering Dr. Peter M. Athanas, Chair Dr. James R. Armstrong Dr. Mark Jones December 1999 Blacksburg, Virginia Keywords: Configurable Computing,...

Unformatted Document Excerpt
Coursehero >> Virginia >> Virginia Tech >> ETD 121699

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Multiplexed A Memory Port for Run Time Reconfigurable Applications James Atwell Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master of Science In Electrical Engineering Dr. Peter M. Athanas, Chair Dr. James R. Armstrong Dr. Mark Jones December 1999 Blacksburg, Virginia Keywords: Configurable Computing, FPGA, JHDL, RTR, VHDL A Multiplexed Memory Port for Run Time Reconfigurable Applications Atwell, James (ABSTRACT) Configurable computing machines (CCMs) are available as plug in cards for standard workstations. CCMs make it possible to achieve computing feats on workstations that were previously only possible with super computers. However, it is difficult to create applications for CCMs. The development environment is fragmented and complex. Compilers for CCMS are emerging but they are in their infancy and are inefficient. The difficulties of implementing run time reconfiguration (RTR) on CCMs are addressed in this thesis. Tools and techniques are introduced to simplify the development and synthesis of applications and partitions for RTR applications. A multiplexed memory port (MMP) is presented in JHDL and VHDL that simplifies the memory interface, eases the task of writing applications and creating partitions, and makes applications platform independent. The MMP is incorporated into an existing CCM compiler. It is shown that the MMP can increase the compiler functionality and efficiency. s Terms ASIC CCM CLB COTS DSP FPGA GDL HDL MMP PE RTR Spatial Partition Temporal Partition Application Specific Integrated Circuit Configurable Computing Machine Configurable Logic Block Commercial Off The Shelf Digital Signal Processing Field Programmable Gate Array Graph Description Language Hardware Description Language Multiplexed Memory Port Processing Element Run Time Reconfigurable A division of the computational task into parallel tasks A division of the computational task into sequential tasks iii Table of Contents TERMS................................................................................................................................................. III TABLE OF CONTENTS ..................................................................................................................... III LIST OF FIGURES.............................................................................................................................. VI LIST OF LISTINGS ...........................................................................................................................VII LIST OF TABLES ............................................................................................................................ VIII CHAPTER 1. INTRODUCTION ...........................................................................................................1 1.1 CUSTOM COMPUTING MACHINES.......................................................................................................1 1.2 MOTIVATION FOR THIS RESEARCH ....................................................................................................2 1.3 CONTRIBUTIONS...............................................................................................................................2 1.4 THESIS ORGANIZATION.....................................................................................................................2 CHAPTER 2. CCM BACKGROUND....................................................................................................4 2.1 CONFIGURABLE COMPUTING MACHINES ............................................................................................4 2.1.1 Run Time Reconfiguration of CCMs .........................................................................................4 2.1.1.1 Virtual Hardware .....................................................................................................................................................................5 2.1.1.2 Reconfiguration Penalty of FPGAs .........................................................................................................................................6 2.1.2 CCM Host Communications......................................................................................................7 2.2 RTR APPLICATION DEVELOPMENT DESIGN FLOW..............................................................................8 2.2.1 Specification.............................................................................................................................8 2.2.2 Behavioral Model .....................................................................................................................8 2.2.3 Partitioning the Application .....................................................................................................9 2.2.4 Writing Host Software...............................................................................................................9 2.2.5 Creating the Hardware .............................................................................................................9 2.2.5.1 Spatial and Temporal Partitioning.........................................................................................................................................10 2.2.6 Adding Platform Specific Code...............................................................................................11 2.2.7 Synthesis.................................................................................................................................11 2.2.8 Hardware Software Integration...............................................................................................12 2.3 CCM COMPILERS...........................................................................................................................12 2.3.1 The Ideal RTR Design Environment........................................................................................12 2.3.1.1 Creating Hardware From High Level Languages .................................................................................................................13 2.3.2 The Janus Compiler................................................................................................................14 2.3.2.1 Janus Overview......................................................................................................................................................................14 2.3.2.3 Janus Operation Specification ...............................................................................................................................................15 2.3.2.4 Requirements of synthesized operators..................................................................................................................................16 2.3.2.5 Janus Applicationsther Lines.............................................................................................................................23 iii 3.2 OPERATION OF THE MMP ...............................................................................................................23 3.3 USING THE MMP WITH THE JANUS COMPILER..................................................................................24 3.4 DETAILED MULTIPLEXED MEMORY PORT DESIGN............................................................................25 3.4.1 VHDL Version of the MMP.....................................................................................................25 3.4.2 JHDL Version of the MMP .....................................................................................................26 3.4.3 VHDL MMP vs. JHDL MMP .................................................................................................28 3.5 ADDRESS GENERATION FOR THE MMP ............................................................................................28 3.5.1 Address Generation with Constants ........................................................................................29 3.5.2 Address Generation with Muxes..............................................................................................29 3.5.3 Address Generation with Counters..........................................................................................29 3.5.4 Address Generation with a Generic Address Generator ..........................................................30 CHAPTER 4. WILDFORCE SPECIFICS AND TOOLS ....................................................................32 4.1 WILDFORCE ARCHITECTURAL OVERVIEW ....................................................................................32 4.2 PE MEMORY ..................................................................................................................................33 4.3 PE INTERCONNECTS .......................................................................................................................35 4.4 MULTIPLEXED MEMORY PORT ........................................................................................................35 4.5 VHDL SKELETON STRUCTURE FOR WILDFORCE LOGIC CORE.......................................................36 4.5.1 Hierarchical Structure............................................................................................................36 4.5.2 The Logic Core Model ............................................................................................................38 4.5.3 Problems with the current VHDL structure .............................................................................38 4.5.4 Creating the VHDL Computational Component ......................................................................39 4.6 4.6.1 Hierarchical Structure............................................................................................................41 4.6.2 JHDL Logic Core Description ................................................................................................42 CHAPTER 5. USING THE MMP ........................................................................................................43 5.1 JHDL EXAMPLE, RADIX-4 BUTTERFLY ...........................................................................................43 5.1.1 Radix-4 Butterfly Implementation Details ...............................................................................45 5.1.1.1 Complex Multiplier................................................................................................................................................................45 5.1.1.2 Radix-4 Butterfly Component ...............................................................................................................................................47 5.1.1.3 Radix-4 Butterfly with MMP ................................................................................................................................................49 5.1.1.4 Radix-4 Butterfly Implemented as WILDFORCE PE..........................................................................................................50 5.1.1.5 Address Generation for the radix-4 Butterfly........................................................................................................................50 5.1.1.5 Simulation Results .................................................................................................................................................................51 5.1.1.6 Notes about other implementations .......................................................................................................................................52 4.5.4.1 Building a Structural Model..................................................................................................................................................39 4.5.4.2 Creating WILDFORCE Application with mmpcomp.vhd....................................................................................................40 JHDL SKELETON STRUCTURE FOR WILDFORCE LOGIC CORE........................................................41 5.1.2 A 16 point FFT .......................................................................................................................53 5.2 LINEAR FILTER ...............................................................................................................................55 5.3 INCORPORATION OF THE MMP INTO JANUS COMPILER .....................................................................57 5.3.1 New WILDFORCE Architecture Description ..........................................................................58 5.3.1.1 New WILDFORCE Element.................................................................................................................................................58 5.3.1.2 New WILDFORCE Platform ................................................................................................................................................59 5.3.2 Expanding the Capabilities of Janusore Parameters....................................................................................................................62 7.1.2 Improved Efficiency................................................................................................................62 7.2 TASK SHARING ...............................................................................................................................62 7.3 BETTER GENERIC ADDRESS GENERATOR .........................................................................................63 iv 7.4 SMARTER JANUS MEMORY TOOLS ...................................................................................................63 7.5 FURTHER IMPROVEMENTS TO THE JANUS SCHEDULER.......................................................................63 APPENDIX A. SOURCE CODE ..........................................................................................................64 A.1 MULTIPLEXED MEMORY PORT SOURCE FILES .................................................................................64 A.1.1 VHDL MMP Source Files.......................................................................................................64 A.1.1.1 mem_port.vhd .......................................................................................................................................................................64 A.1.1.2 mem_port_tb.vhd ..................................................................................................................................................................67 A.1.2 JHDL MMP Source Files .......................................................................................................70 A.1.2.1 mux_8_1.java .......................................................................................................................................................................70 A.1.2.2 struct_m.java.........................................................................................................................................................................71 A.2 GENERIC WILDFORCE STRUCTURE FILES.....................................................................................73 A.2.1 Generic WILDFORCE VHDL Files ........................................................................................73 A.2.1.1 vt_sklc1.vhd ..........................................................................................................................................................................73 A.2.1.2 mmpcomp.vhd.......................................................................................................................................................................77 A.2.1.3 jchstcfg.vhd............................................................................................................................................................................81 A.2.1.4 janbehav.vhd .........................................................................................................................................................................86 A.2.1.5 janpack.vhd .........................................................................................................................................................................102 A.2.1.6 Libinfo.txt............................................................................................................................................................................108 A.2.2 Files for Sample Computational Element .............................................................................126 A.2.2.1 januslibtest.vhd....................................................................................................................................................................126 A.2.2.2 mmpcomp_sample.vhd .......................................................................................................................................................127 A.2.3 Generic WILDFORCE JHDL Files.......................................................................................130 A.2.3.1 user_component_mmp.java ................................................................................................................................................130 A.2.3.2 generic_pe.java....................................................................................................................................................................132 A.2.3.3 BusGrantFSM.java .............................................................................................................................................................133 A.2.3.4 BusGrantFSM.fsm ..............................................................................................................................................................134 A.3 RADIX-4 BUTTERFLY SOURCE FILES .............................................................................................134 A.3.1 butterfly_4.java....................................................................................................................134 A.3.2 tb_bf4.java ...........................................................................................................................136 A.3.3 bfly4_mmp.java....................................................................................................................139 A.3.4 complex_mult.java ...............................................................................................................141 A.3.5 tb_cm.java ...........................................................................................................................142 A.3.6 mux_4_1.java.......................................................................................................................144 A.3.7 generic_pe_bly4.java ...........................................................................................................144 A.4 16 POINT FFT SOURCE FILES .......................................................................................................145 A.4.1 fft_16pt.java.........................................................................................................................145 A.4.1.2 tb_fft16.java......................................................................................................................148 A.5 LINEAR FILTER SOURCE FILES ......................................................................................................152 A.5.1 lin_filt.vhd............................................................................................................................152 A.5.2 lin_filt_tb.vhd.......................................................................................................................153 A.5.3 lin_full.vhd...........................................................................................................................155 A.5.4 addgen.vhd...........................................................................................................................160 A.5.5 addgen_tb.vhd......................................................................................................................162 A.5.6 add.vhd ................................................................................................................................163 A.5.7 vt_lflc.vhd ............................................................................................................................164 A.6 JANUS SOURCE FILES ...................................................................................................................167 A.6.1 New WILDFORCE Architecture ...........................................................................................167 A.6.1.1 W4_XC4062_MMP.java ...................................................................................................................................................168 A.6.1.2 WildForceXL_4062_MMP.java ........................................................................................................................................169 A.6.2 Janus MMP..........................................................................................................................170 A.6.2.1 janusmmp.java ....................................................................................................................................................................170 A.6.2.2 tb_jmmp.java.......................................................................................................................................................................173 A.6.3 New Scheduler, MultiOpScheduler.java ...............................................................................176 A.6.4 DynamicPE_MMP.java........................................................................................................178 BIBLIOGRAPHY ...............................................................................................................................182 VITA ....................................................................................................................................................185 v List of Figures Figure 2.1 RTR Application Design Process .................................................................... 8 Figure 3.1 Multiplexed Memory Port............................................................................. 21 Figure 3.2 Timing diagram for the MMP........................................................................ 24 Figure 3.3 Counter for JHDL Version of MMP ............................................................. 26 Figure 3.4 Schematic of SOURCE Structure ................................................................. 27 Figure 3.5 Schematic of WRITE_SEL, ADDR, and DATA_OUT Structure .................. 27 Figure 4.1 WILDFORCE PCI Board ............................................................................. 32 Figure 4.2 Block Diagram of WF................................................................................... 33 Figure 4.3 Timing Diagram for a Memory Read............................................................. 34 Figure 4.4 Timing Diagram for a Memory Write ............................................................ 34 Figure 4.5 Timing Diagram for Consecutive Memory Reads and Writes......................... 35 Figure 4.6 Hierarchy of VHDL PE ................................................................................ 37 Figure 4.7 Hierarchy of JHDL Structure ........................................................................ 41 Figure 5.1 Radix-4 Butterfly .......................................................................................... 43 Figure 5.2 Block Diagram of Radix-4 Component ......................................................... 44 Figure 5.3 Formulas for Radix-4 Butterfly ..................................................................... 45 Figure 5.4 Asynchronous, Complex Multiplier ............................................................... 45 Figure 5.5 Complex Multiplier Block Diagram............................................................... 46 Figure 5.6 Simulation of Radix-4 Butterfly, Data Inputs................................................. 48 Figure 5.7 Simulation of Radix-4 Butterfly, Twiddle Inputs ........................................... 48 Figure 5.8 Simulation of Radix-4 Butterfly, Outputs ...................................................... 49 Figure 5.9 Simulation with WILDFORCE Signals.......................................................... 52 Figure 5.10 Linear Filter Operation................................................................................ 56 Figure 5.11 Sample of Linear Filter Application............................................................. 57 vi List of Listings Listing 2.1 Janus Operation Interface............................................................................. 15 Listing 3.1 Main VHDL Code for MMP ........................................................................ 25 Listing 3.2 Interface for Generic Address Generator ...................................................... 30 Listing 4.1 Interface of Generic VHDL Component ....................................................... 37 Listing 4.2 Interface for generic_pe.java ........................................................................ 42 Listing 4.3 Generic Signals are Mapped to WILDFORCE Specific Signals..................... 42 Listing 5.1 New WILDFORCE PE Description ............................................................. 58 Listing 5.2 New Load and Retrieve Memory Functions.................................................. 59 vii List of Tables Table 3.1 A 20 x 10 Matrix............................................................................................ 31 Table 5.1 Number of CLBs Required for Various Radix-4 Bit Widths............................ 50 Table 5.2 Results From First FFT .................................................................................. 54 Table 5.3 Results From Second FFT.............................................................................. 55 viii Chapter 1. Introduction As long as there have been computers, there have been software applications that are too slow. This drives an endless effort to make computers faster and more powerful to speed up the applications. Until recently, the best option for making applications faster was to implement them in an Application Specific Integrated Circuit (ASIC) targeted to computational properties of the software. Some applications, such as image processing, require so much computation that software solutions are not even considered. For high volume applications such as workstation video cards, the high cost and risk of developing an ASIC is justified. However, for the average software developer, making a custom ASIC for each application is not an option. A recent development in the endless effort to gain more processing power is the non-application specific integrated circuit. These devices contain configurable hardware that can be reprogrammed for the current application, making it possible to get performance close to that of ASICs without the costs and risks of ASICs. 1.1 Custom Computing Machines The most common programmable hardware devices are Field Programmable Gate Arrays (FPGAs). FPGAs contain arrays of Configurable Logic Blocks (CLBs). Several companies use FPGAs in configurable hardware boards for workstations. The boards use the FPGAs as processing elements (PE). In addition to the PEs the board designers may include RAM, FIFOs, PCI bus controllers, video chips and other hardware. These reprogrammable hardware boards are commonly called configurable computing machines (CCMs). 1 1.2 Motivation For This Research While powerful CCMs make it possible to do high performance computing on workstations, the tools for developing applications on CCMs are at a level where only experts with detailed knowledge of the CCM can create applications. As part of the effort to bring the power of configurable hardware to all programmers, several tools for streamlining application design and implementation are contributed in this thesis. 1.3 Contributions The major contribution of this thesis is a multiplexed memory port. VHDL and JHDL versions of the multiplexed memory port have been written. Using the multiplexed In memory port, the capabilities of the Janus CCM compiler have been expanded. addition to the multiplexed memory port, frameworks for developing CCM applications have been developed and are presented. 1.4 Thesis Organization Chapter 2 presents the background of CCMs. applications are presented. After a brief history of CCMs, sample A discussion of the difficult aspects of creating applications on CCMs motivates the need for better tools. A detailed overview of the current design process gives clues to where improvements can be made. Efforts to automate the design process are presented. A multiplexed memory port (MMP) is presented in Chapter 3. It addresses many of the concerns presented in chapter 2. A detailed description of the technical aspects of the MMP is provided so that readers may use this tool in their own applications. Chapter 4 presents an application framework for the WILDFORCE CCM that insulates the application designer from the details of the physical hardware of the CCM. By sacrificing some functionality and flexibility, ease of application development is achieved. This is useful not only for novice application developers, but also for developing architecture independent applications and compilers. 2 The tools of Chapter 3 and Chapter 4 are brought together in Chapter 5 with example applications. The JHDL example and the VHDL example can be used to guide a programmer in the use of the MMP in their own applications. The usefulness of the JHDL MMP is further demonstrated by incorporating it into the Janus CCM compiler. Finally the results of the example applications and the compiler work are presented in Chapter 6. Future research to extend the work in this thesis is explored. 3 Chapter 2. CCM Background This chapter introduces configurable computing machines. CCM architectures are discussed. Some examples and results of their use are given. The design process for CCMs is explored and the state of the software tools used in the design process is covered. 2.1 Configurable Computing Machines In the last ten years, CCMs have generally been used as fixed pieces of hardware to accelerate software. A computationally intensive piece of a program is converted into hardware and implemented on the CCM, resulting in execution speeds 10 to 1000 times faster than software alone. The CCM is burdened with the computationally intensive parts of an algorithm that would take much longer to execute on the workstation. In such applications, the CCM is acting like a rapidly prototyped ASIC peripheral to the computer's processor. CCMs have been used as custom hardware for a wide assortment of applications including finite difference methods [1], 2-D convolution [2], median filtering [3] and image interpolation [4]. These examples demonstrate that increases in execution speed of 10 to 1000 times can be achieved over software implementations. Another use of CCMs is rapid prototyping of ASIC designs. The designer has the option of synthesizing a chip into a virtual version that can be loaded into FPGAs on a CCM. This allows ASIC designers to quickly create a prototype and test their design in real hardware before committing it to silicon. Although some problems such as routing or synthesis difficulties can be discovered, rapid prototyping is not perfect. The designer is limited to the logic and I/O available on the FPGA. Also the operating speed is lower than a real ASIC and the power consumption is higher. 2.1.1 Run Time Reconfiguration of CCMs 4 The hardware on CCMs can be reconfigured at any time the user desires. The application designer is not limited to only reconfiguring the hardware when a new application is started. The hardware can be reconfigured during the execution of a single application. The concept of reconfiguring hardware during program execution (run time) has been around for many years, but the slow speeds of reconfiguring the processing elements on the CCM have usually made reconfiguration non-beneficial. However, the time required for configuring FPGAs has decreased and now run time reconfiguration (RTR) can offer increased performance over non-RTR implementations. Refer to [5] for an introduction and overview of challenges faced by RTR systems. 2.1.1.1 Virtual Hardware The most common reason for using RTR is a lack of resources. A full hardware implementation of some applications may require more FPGA resources than are available. For example, an application may require eight FPGAs to create all the hardware required. If the CCM only has four FPGAs, then the designer must partition the hardware into sections and either implement some of these sections in software or rotate the hardware onto the CCM over time. By breaking the hardware into sections and exchanging these over time (time multiplexing them) on the CCM, it may be possible to create all the hardware required for the application, although it will not all be available at the same time [6]. Since RTR makes it possible to use more hardware than is physically available, reconfiguration provides virtual hardware. Two examples of using RTR to reduce the number of FPGAs required to implement an application are neural networks [7] and wireless video coding system [8]. RTR can be better explained by looking at a specific example such as the dynamic instruction set computer (DISC) [9]. The DISC uses RTR to create a computer that uses custom instructions. The instructions can be custom made for each application so the application will execute faster than on a fixed instruction set computer (FISC). Executing a long sequence of instructions on the FISC may be done with one custom instruction on the DISC. Applications are written using the custom instructions. When the application 5 runs, the custom instructions are loaded into the hardware as new instructions are encountered. Other RTR examples are scanning genomes [10], document search engine [11] and automatic target recognition [12]. 2.1.1.2 Reconfiguration Penalty of FPGAs RTR has been shown to increase performance and decrease physical hardware requirements despite the fact that conventional FPGAs are not designed for it. Most FPGAs use a serial configuration scheme to load the configuration into the FPGA. The configuration bit files are large and as FPGA densities increase, their size continues to grow. Current FPGAs can take as long as 30ms to receive all the configuration information. This time is called the reconfiguration penalty. For applications that are not changing their configuration, this time is insignificant, but for RTR applications, this has been a bottleneck when trying to increase the speed of applications. One result of the reconfiguration penalty is that there is a strong emphasis on creating a scheduling and partitioning tool that minimizes the number of reconfigurations required. Another limitation of most FPGAs is that they must be completely reconfigured if any change to the configuration is required. Some experimental chips are addressing this concern. The Xilinx XC6200 was a device that allowed for reconfiguration of small areas of the FPGA so changes could be made without reconfiguring the entire chip (dynamic reconfiguration). The areas that don't need to be reconfigured can continue operating while other sections are reconfigured. Unfortunately Xilinx discontinued these chips due to low demand. The low demand may have been due to the primitive tools application designers had to work with. The Atmel AT6000 and AT40K are also dynamically reconfigurable FPGAs. They both support partial reconfiguration without loss of data and continued execution. Configuration can be done in parallel or serially. In addition to the logic cells, the AT40K devices have high speed SRAM distributed throughout the chip. Stockwood [5] gives an excellent introduction to dynamic reconfiguration as well as a tool to simulate such systems. These tools are further advanced by Robinson [13]. 6 Other devices allow the user to load several configurations into memory and swap these configurations during run time without having to load a new configuration or losing the data stored in the chip. One example is the context switchable reconfigurable computer (CSRC) made by Lockheed Sanders [14]. The CSRC can store four configurations and exchange these in one clock cycle. The user can load a new configuration into inactive configurations while one configuration is executing. This type of device greatly reduces or eliminates the reconfiguration penalty. While the CSRC is basically an FPGA with a configuration scheme optimized for RTR, other new architectures designed specifically for RTR are radically different. For example, the Stallion [15] chip developed at Virginia Tech uses program streams. The configuration data and the data to be processed are combined into a stream that flows through the device, configuring the hardware and processing the data as it goes. As RTR becomes more accepted as a way of implementing software, there may be more chips that address the specific concerns of RTR. 2.1.2 CCM Host Communications One weakness of CCMs has been the communication with the host. Card based CCMs require communications with the host through a bus such as the PCI bus. To avoid the overhead associated with bus communications, some experimental systems directly connect FPGAs to microprocessors. It has been shown that FPGAs closely coupled with microprocessors can produce orders of magnitude improvement in execution speed over unaccompanied microprocessors [16], [17]. More boards and devices are being designed with combinations of microprocessors and FPGAs, utilizing the best aspects of each [18], [19], and [20]. This new style may one day be standard on workstation motherboards. Research is also being done on combinations of CCMs and distributed computing. The Tower of Power at Virginia Tech uses 16 PCs, each with a reconfigurable WILDFORCE board, connected by a high-speed network [21]. 7 2.2 RTR Application Development Design Flow The design process used for developing RTR applications can be a long, arduous task. The software tools for developing applications for CCMs are in their infancy. They offer limited (if any) ability to simulate the host/hardware interaction and are usually unable to take advantage of the reconfigurable possibilities of the platforms. A typical design flow is shown in Figure 2.1. Specification Behavioral Model Partitioning Software Writing (Host Code) Compile Hardware Writing (VHDL, JHDL, Etc) Simulation Synthesis Host/Hardware Integration Figure 2.1 RTR Application Design Process 2.2.1 Specification All designs should start with a detailed specification of the exact tasks the application should do and include details on how fast the tasks must be completed. 2.2.2 Behavioral Model After the specification has been completed, a behavioral model is made. This may be a text based model in C, Java, VHDL, JHDL [22], or Verilog, or it could be a graphical 8 model in Matlab (Simulink), HP VEE, SystemView [23] or SPW [24]. SystemView has the option of generating bit streams for some pre-synthesized functions such as an FFT or FIR filter, as well as a seamless link to Matlab's Simulink. Verilog and VHDL are often used because most FPGA tools can synthesize these for the target devices. At this stage, problems with the specification can be found and worked out before moving to the implementation stages. 2.2.3 Partitioning the Application Unless the application is for a stand alone CCM (no host required), the application must be divided between the host and the CCM. When using a CCM in a workstation, it is necessary to have a host program running on the workstation that controls the flow of data to and from the CCM and monitors the state of the hardware. 2.2.4 Writing Host Software The workstation host program is usually accomplished with a C program. Generally the host will handle data storage and control aspects such as reading and writing data to disk, reading and writing data to the CCM, loading configurations on the CCM, and monitoring the hardware. CCM vendors provide libraries of function calls that ease the task of writing a host program, but it can still be a challenge for one who is inexperienced. If the application is to be a run time reconfiguring application, the host program will be more complex. It must oversee exchanging of FPGA images onto the processing elements (PEs) and more complex data manipulation. 2.2.5 Creating the Hardware While the host software is written in a C environment, the hardware must be written in a completely different language and development environment. The current level of widely available FPGA vendor tools (hardware design tools supplied by the makers of the FPGAs) allow the user to create hardware for the FPGAs on the CCM using hardware description languages (HDLs) such as VHDL and Verilog. 9 These are synthesized into a form (an FPGA bit stream or image) that can be loaded onto an FPGA. This is a fairly comfortable level to work at if one is familiar with the HDL and the CCM. This process does have some drawbacks. Even though one can describe and simulate hardware in an HDL, most HDLs can only synthesize a subset of the HDL. (The designer must write the code so it is synthesizable.) More importantly, since the FPGA tool sets and the host software tool sets are different languages (and neither are designed by the CCM maker), the host/hardware interaction is often difficult (if not impossible) to simulate and debug. Another painful problem is that the design tools do not predict how many of the FPGA resources a piece of hardware will require or how fast the hardware will run. The size and speed are not known until after the hardware has been synthesized. After many long hours of simulating and debugging a section of hardware, it may have to be rewritten because the section requires more resources than the FPGA has or the hardware may not run at the required speed. If the hardware is too big for a single PE, the designer must divide the hardware among multiple PEs. hardware. This must be done by hand. The vendor tools do not help divide the 2.2.5.1 Spatial and Temporal Partitioning If an application is too big to fit into a single FPGA, it must be divided into partitions. If there is more than one FPGA available, the application can be spatially divided among the FPGAs. A good overview of research on how to spatially divide an application can be found in [25]. Methods for partitioning a graph into two parts (bipartitioning) are explored in [26], [27] and [28]. Multiple partitions can be handled by repeated use of bipartitioning [29]. Methods specific to partitioning for FPGAs are found in [30], [31], [32] and [33]. Other references are [34] and [35]. If the application will not fit in the available number of FPGAs, the application must be temporally partitioned. This means the application must be partitioned in time such that one part is implemented on the CCM in one time slice while other parts are 10 implemented on the CCM in other time slices (run time reconfiguration). Research on how to temporally partition an application can be found in [25]. Although much research has been done on how to partition applications into temporal and spatial partitions, few commercial tools offer these features. Most FPGA tools are converted from ASIC tools where resources are not strictly limited and partitioning is not as crucial. 2.2.6 Adding Platform Specific Code After the computational elements are written, simulated and debugged, extra code for the CCM must be added to the hardware partitions. Besides the raw computations, the hardware must include provisions for interacting with the host. This includes tasks such as memory requests, interrupts and address generation. The temporal partition needs some type of control to tell the host when it has finished its task. This can be a centralized control (in just one PE) or a distributed control (in each PE). For cases where the exact number of cycles to complete the computations in each spatial partition is known (meaning there are no conditional cases), one can implement a simple counter in a single PE that counts the clock cycles. All the calculations are completed when it reaches a certain count. This approach only works for rigidly timed applications. A more flexible approach is to allow each operator in each spatial partition to create a DONE signal that is asserted when it has completed its calculations. This method allows operators to perform computations where the number of cycles to complete the computation is not known. The DONE signals must be unanimously asserted before notifying the host that the temporal partition has finished its tasks. 2.2.7 Synthesis After the hardware has been written, simulated and debugged, it needs to be synthesized. In some cases, rewriting the hardware description will be necessary to make the hardware 11 partitions synthesizable. If any code is rewritten, the hardware must be simulated again to make sure it still meets the requirements of the specifications. The synthesis phase in the design process includes using the FPGA vendor tools for placing and routing the design on a specific FPGA. Once this has been done, the vendor tools create reports that the designer uses to determine the exact size and speed of the partitions. If the design is too slow, changes can be made. If the partitions are too big, the design must be repartitioned. Synthesis is often a time consuming part of the design process. For example, for a design that takes 90% or more of the resources of a Xilinx XC4062, synthesizing takes about one hour. 2.2.8 Hardware Software Integration Once all the partitions have been synthesized, the host software can be tested in conjunction with the hardware. simulating this phase. There are currently no good commercial tools for The host and hardware interaction can be very difficult to troubleshoot. In the next section it will become clear how JHDL can help with this and other problems. 2.3 CCM Compilers This section starts with a brief look at a theoretical ideal RTR design environment and then gives details of a specific CCM compiler tool, Janus. 2.3.1 The Ideal RTR Design Environment An ideal RTR design environment lets the designer easily move pieces of high level code between software (on the host) and hardware (on the CCM), provides the designer with an always up to date status on the resources available on the CCM, provides a guaranteed minimum execution speed for the hardware portions before synthesis, and provides an accurate simulation of the entire system. The designer can change the choice of FPGAs and CCMs and get updated information based on the current choice. 12 While this environment is ideal for today's CCM application designers, a high level language compiler is necessary to allow the average programmer to take advantage of CCMs. 2.3.1.1 Creating Hardware From High Level Languages A current goal of the RTR research community is a CCM compiler that accepts the behavioral model (or high level language model) and optimizes the application, providing the partitioning and scheduling that maximizes the speed of the application. The compiler creates the final implementation, complete with host code and board images (synthesized FPGA configurations ready to download). Although some work has been done to create tools that convert algorithms written in high level languages such as C into hardware, it is still a difficult task. C is often the language of choice for this effort because of its popularity. Despite its common use it is difficult to use for describing hardware because it is a sequential language and it has no provisions for specifying bit widths or controlling the physical timing of events. Algorithms written as sequential programs are difficult to convert to parallel implementations that can efficiently take advantage of the parallel execution that makes FPGAs (and ASICs) so fast. A language capable of parallel description can ease the task of the compiler by encouraging designers to think in terms of parallel execution. The designer can create algorithms optimized for parallel execution rather than relying on the compiler to figure out where parallelism can be used. (It may not be possible to generate a parallel implementation from a sequential algorithm that is as efficient as starting with a parallel algorithm.) Also, using C rather than VHDL can cause problems when creating component libraries. VHDL allows the insertion and simulation of physical time delays so the hardware can be accurately simulated in time as well as logic. C does not allow for time to be explicitly described. Some tools can back annotate VHDL when it is synthesized so the component can later be simulated with exact timing specifications for the target device or fabrication process. If a good library of pre-synthesized components is used, the timing and area problems created by the lack of predictability of synthesis results can be avoided. The Janus project at Virginia Tech addresses some of these issues by making it 13 possible to compile a project of arbitrary size onto an arbitrary platform. The compiled project can consist of any number of temporal partitions that are swapped in and out of the platform. An important part of the success of the Janus project is the development of an HDL based on Java called JHDL [22] [36]. JHDL allows the host software and the custom hardware to be written in the same language, making it possible to simulate the interaction between host and CCM. 2.3.2 The Janus Compiler The Janus project is an ongoing effort at Virginia Tech to create a compiler for RTR applications. This section introduces the aspects of the compiler that are important for this thesis. 2.3.2.1 Janus Overview In the current approach, a user creates synthesizable operators in structural JHDL [22]. Users are insulated from most of the architecture specifics by a structure similar to the one in Chapter 4. However, users are responsible for making sure each operator is synthesizable and fits on the target PE. Also, users must write operators so the memory timing of the specific architecture is meet. To make the operators more architecture independent, a memory interface tool such as the MMP can be used to isolate users from architecture specific, memory timing issues. For example, when a user writes operators for the WILDFORCE board without the MMP, the user must account for the WILDFORCE specific latency of two clock cycles to read data from memory. With the MMP, the user can assume the latency for reading data from memory will always be one clock cycle, regardless of the architecture. Chapter 3 has more details about using the MMP with Janus. Users create applications by combining operators into stages. The compiler implements the stages by combining operators into temporal partitions and synthesizing them. Janus has a scheduler that creates a list of events that will be executed when the application runs. At run time, Janus oversees the execution of the application by 14 configuring the CCM, loading data to the PE memories, fetching data from the PE memories, and other tasks. 2.3.2.3 Janus Operation Specification Janus operators are not just simple gates or adders or filters. Along with the computational task, operators must generate memory addresses, control the flow of data to and from their memory port (or ports), monitor the computational task and generate a DONE signal when the computational task is complete. A designer must be careful to meet all the operator specifications in section 2.3.1.4 or the system may not work. To ease the task of the compiler, a standardized operator model with a known interface is required to make sure the operators can be incorporated into a Janus application and connected to hardware. The operation I/O is shown in Listing 2.1. Public void Operation(Node parent, Wire data_in, Wire data_out, Wire addr, Wire write_sel, Wire strobe, Wire enable, Wire done); Listing 2.1 Janus Operation Interface The interface for the operation shown in Listing 2.1 is the standard interface that all operations must use. It is written in JHDL. The name of the operation would replace Operation, however, the input and output names must stay as they are. The inputs and outputs of the operation are defined as wires. The inputs are DATA_IN and ENABLE. The outputs are DATA_OUT, ADDR, WRITE_SEL, STROBE and DONE. The signals ENABLE, STROBE and DONE are control signals while the rest form a basic memory port interface that Janus uses to connect the operation to a memory port of the PE. Each operator is given exclusive access to the memory port. This means that for PEs with only one memory port (as in the WILDFORCE), only one operator will be put in each PE. The operator will most likely need some type of address generator for its memory interface. Various address generation schemes are covered in Chapter 3. 15 2.3.2.4 Requirements of synthesized operators Operators must meet the following requirements to work in the Janus system. Each operator must assert its DONE signal when it completes its task. (Operator run time can be data dependent.) The operator must use the ENABLE input to control the execution of the computations. If the enable is negated, the operator must stop execution and then return to execution when enable is asserted without causing any errors or losing any data. 2.3.2.5 Janus Applications A Janus application consists of Ordered and Unordered stages. Each Unordered stage contains a set of operations that can be executed in parallel (there are no data interdependencies). Ordered stages are built by sequentially adding Unordered stages to an Ordered stage. The sequence of execution within the Ordered stage at run time is determined by the order in which the Unordered stages are added. 2.4 Synthesizable Components As part of the effort to ease the task of writing operators and applications, an extensive library of synthesizable components has been created in both VHDL and JHDL. By using the library, a designer can avoid repeating work that has already been done. Each component was carefully tested and synthesized so the user knows how many CLBs these components will require when synthesized. The designer can use this information to avoid making operators that use more resources than are available. For example, Libinfo.txt specifies that a 32 bit, signed multiplier requires 922 CLBs. Using this information a designer knows that only two 32 bit, signed multipliers will fit into the 2304 CLBs of a Xilinx XC4062. All VHDL components were carefully simulated, synthesized, and tested on an FPGA to make sure the component would function correctly. The number of CLBs required for each bit width was recorded for later use by compilers or application 16 designers. The JHDL components created at BYU do not give their CLB sizes but they should be similar to their VHDL counterparts. One of the major goals for the components is to make them parameterized. VHDL allows the designer to create components that are flexible. Some aspects of their operation can be changed based on variables called generics. For example, the most common use in this research is a generic to control the bit width of the component. Rather than build an adder for every possible bit width, one can use a generic to specify the bit width. Thus, the same component can be used for any bit width. When the file is synthesized, the synthesizer will create an adder that is the width specified by the generic. By making the exact function of the component flexible, their usefulness and reusability is maximized. Other generics let the designer specify various parameters such as how many bits to shift and which bit to extract from a word. It is interesting to note that many functions require no CLBs (logic resources). They only require wiring (routing resources). For example, shifts can be accomplished by just directing the wires to the appropriate place. No effort was made to determine the amount of routing resources required by the components. While working on the library of components it became clear that some memory tools would be useful for writing applications. The next section examines CCM memories and how they affect the Janus compiler effort. 2.5 Effects of CCM Memory Limitations on the Janus Compiler Algorithms and applications that benefit from CCM implementations tend to be highly parallel with much data throughput. Given this knowledge plus the fact that memory access has sometimes been a bottleneck for processors [37], it makes sense to design CCMs with multiple attached memories or memory ports. However, many contemporary reconfigurable platforms have been designed with only one memory port to the PE even though FPGAs do not inherently have the limitation of a single path to memory. The lack of memory ports may be due to the need for interconnect. The PEs on the WILDFORCE board have 72 pins dedicated to connecting the PEs to their adjacent neighbors as well as 17 another 36 pins connected to a configurable crossbar. The crossbar can be used to further interconnect the PEs. When PEs had less logic it was necessary to spread even simple circuits across several PEs and interconnect helped connect crucial parts of the board. As FPGAs continue to grow to much larger sizes, interconnect may not be as crucial. In the future it may be more beneficial to use the pins for more memory ports or more attached memories than for interconnect. Another possible contributing factor for each PE having only a single memory port is the long history of sequential software programming. High level languages such as C are inherently sequential. Languages and compilers have developed with microprocessor architectures that have generally followed the von Nuemann architecture with the standard fetch, decode, execute cycle. Programmers are usually not trained to write algorithms or code in parallel fashion, unless they specialize in parallel processing. Some applications where data is continually stored and retrieved internally can ease the memory port bottleneck by using memory internal to the PE. RAMs can be created inside the FPGA but it takes many CLBs to create the RAMs. Some FPGAs include dedicated RAM in various locations throughout the chip since it is much more efficient to build dedicated RAM in silicon than to configure CLBs as RAM. The benefits of multiple memory ports are obvious when looking at common CCM applications. For example, in an FFT, if eight pieces of data can be read on each clock cycle, the algorithm could run eight times faster than an implementation that allows only one piece of data to be read on each clock cycle. Another example is complex multiplication. Multiplying two complex numbers requires four pieces of data on the inputs (the real and imaginary part of each multiplicand) and produces two outputs (the real and imaginary part of the result). Another example of an algorithm that benefits from access to more than one value each clock cycle is Edge Detection with the Roberts operator [38]. The existence of multiple memory ports can not only increase the speed but also greatly ease the writing of these applications. The SLAAC [39] board from ISI East is an example of a new architecture that includes more memories per PE. Each PE has four memories. However, no interconnect has been given up. Each PE is a Xilinx XC40150 and includes 144 pins of interconnect 18 for two neighbors (72 pins for each neighbor) and 72 pins of interconnect for a crossbar [40]. Others have also worked on the memory bandwidth problem. The MoM architecture [41] increases application speeds by using complex memory interfaces for the PEs to provide several new pieces of data to each PE on each clock cycle. Also, Fross [42] has shown how a modification to a Splash 2 (WILDFORCE type architecture) can increase the performance by using a shared memory architecture. The lack of attached memories and memory ports limits the efficiency of the Janus compiler. The compiler assumes that the operators have exclusive access to a memory port. For example, WILDFORCE PEs only have one memory with one memory port so only one operator can be assigned to each PE. If the operators are small, this results in inefficient use of the PE logic space. To maximize the application execution speed, the compiler must fit as many operators as possible into a temporal partition, avoiding as many reconfigurations as possible. A multiplexed memory port has been created to give each PE eight virtual memory ports, allowing the compiler to assign up to eight operations per PE. 19 Chapter 3. Multiplexed Memory Port CCMs can have varying hardware structures but usually have some common traits that designers assume when designing a compiler. One of the most important assumptions made is that each PE will have an attached memory that can be accessed by the host as well as the PE. The PE accesses the memory through a memory port. The number of memory ports that each PE has to the attached memory can affect the ease of writing algorithms and the efficiency of compilers. Complex algorithms can be much easier to implement if more than one memory port is available. Various approaches to developing a compiler revealed that the compiler could be simplified and achieve more efficient results if PEs had multiple memory ports. A multiplexed memory port (MMP) provides a way to simulate the existence of multiple memory ports by creating virtual memory ports. The MMP can decrease the time required to create a new application. It can also increase the speed of applications generated by a compiler by decreasing the number of reconfigurations required. Another important role of the MMP is to abstract away the memory interface details of the specific CCM. For example, if applications are written for the WILDFORCE, without the MMP they must account for the two cycle delay for reading memory. To move the application to another platform that has a different delay would require rewriting parts of the application. Rather than rewriting each application, the MMP can be rewritten for a new platform so the application sees the same memory interface characteristics. The MMP allows one to design operations with the assumption that there is access to eight memory ports on each clock cycle. On each MMP clock cycle the PE can read or write data to each memory port. For example, for an application that smoothes a picture by averaging several pixels, the designer can use several memory ports to read in values and use another port to write the result to memory and accomplish a complete operation on each MMP clock cycle. 20 3.1 Inputs and Outputs of the MMP Address0 Sink0 Source0 Mem_Dir0 Address1 Sink1 Source1 Mem_Dir1 Address2 Sink2 Source2 Mem_Dir2 Address3 Sink3 Source3 Mem_Dir3 Address4 Sink4 Source4 Mem_Dir4 Address5 Sink5 Source5 Mem_Dir5 Address6 Sink6 Source6 Mem_Dir6 Address7 Sink7 Source7 Mem_Dir7 Clock Clk_Enable Write_Sel Addr Data_In Data_Out MMP Figure 3.1 Multiplexed Memory Port The standard version of the MMP provides eight virtual memory ports given one physical memory port. Each port has a connection for address (ADDRESSn), read/write (MEM_DIRn), data to memory (SINKn), and data from memory (SOURCEn). Four of the additional lines of the MMP (WRITE_SEL, ADDR, DATA_IN, DATA_OUT) connect to the physical memory port of the PE. The virtual ports are connected to the physical memory port one at a time in a cyclic manner as shown in Figure 3.2. The remaining line, CLK_ENABLE, is a control signal that can be used by the PE. 3.1.1 CLK_ENABLE The name CLK_ENABLE was derived from using the signal to enable the clock of the operators connected to the virtual memory ports. 21 Port 7 Port 6 Port 5 Port 4 Port 3 Port 0 Port 1 Port 2 The user can do this by anding CLK_ENABLE with the real clock or by using CLK_ENABLE as the clock. With this approach, the designer can write the operator using the simple assumption that every time the PE reads data from memory, the data will be available on the next clock cycle. CLK_ENABLE can be used strictly as a clock signal, but it can also be used as a data valid signal. If the designer thinks of the signal as a data valid signal, other processing can be done with the current data while waiting for the next data valid signal (as long as the inputs to the memory ports stay constant). Then the designer can design the operators in such a way that they are free to perform calculations that do not require memory accesses on each clock cycle (in-between the enabledcycles). In either case, during the cycle that CLK_ENABLE is asserted, the operator must take in the data (or put out the next output value) and update the address and read/write line for the next memory port cycle. 3.1.2 MEM_DIR Each memory port has a MEM_DIR line. This is a read/write line for the port that determines whether to read or write for that specific memory port on each virtual clock cycle. A logic 1 on MEM_DIR sets the memory port to read data from memory (i.e. read data from specified address and output to SOURCE). A logic 0 on MEM_DIR sets the memory port to write data to memory (i.e. write value from SINK to the specified address in memory). The source data is registered as it is read from memory. The sink data is not registered. 3.1.3 SOURCE Each memory port has a source port. SOURCE outputs the data that is read from memory. The source data is registered as it is read from memory. 3.1.4 SINK Each memory port has a sink port. Data that is written to memory is sent to SINK. The sink data is not registered so it must remain constant until the CLK_ENABLE line goes 22 high. 3.1.5 ADDRESS The address for reading or writing data to memory should be applied to the address port. Each memory port has access to the entire memory. Memory addressing schemes are discussed in Section 3.5. 3.1.6 Other Lines The MMP has four additional lines that are used to connect to the physical memory port of the PE. These are WRITE_SEL, ADDR, DATA_IN and DATA_OUT. During an MMP cycle, these lines are alternately connected to the lines of each virtual memory port in the MMP to simulate the existence of eight virtual memory ports. 3.2 Operation of the MMP An MMP cycle takes 11 clock cycles numbered as 0 to 10. The data on the inputs of each memory port (ADDRESSn, MEM_DIRn, SINKn) must remain constant until the last cycle when CLK_ENABLE goes high. The memory port inputs can be changed on the last cycle of each virtual clock cycle (while CLK_ENABLE is asserted). On the 11th clock cycle the CLK_ENABLE line from the MMP is asserted to signify that source data is valid and sink data has been written to memory. For the WILDFORCE, a memory read takes two clock cycles. Thus, cycle 2 feeds data to port 0 if it was set to read, cycle 3 feeds port 1 if it was set to read, cycle 4 feeds port 2, cycle 5 feeds port 3, up to cycle 9 feeding data to port 7 if it was a read. Cycle 10 is an enable (or data valid) cycle. Figure 3.2 shows when the signals connected to the physical memory interface (ADDR, WRITE_SEL, DATA_OUT, DATA_IN) are connected to the virtual memory ports. 23 Cycle ADDR 0 1 2 3 4 5 6 7 8 9 10 0 Port 0 Port 0 Port 0 Port 0 Port 1 Port 2 Port 3 Port 4 Port 5 Port 6 Port 7 WRITE_SEL Port 0 Port 1 Port 2 Port 3 Port 4 Port 5 Port 6 Port 7 DATA_OUT Port 0 Port 1 Port 2 Port 3 Port 4 Port 5 Port 6 Port 7 DATA_IN CLK_ENABLE CLOCK Port 0 Port 1 Port 2 Port 3 Port 4 Port 5 Port 6 Port 7 Figure 3.2 Timing diagram for the MMP 3.3 Using the MMP with the Janus Compiler Since work on compilers was the biggest motivating factor for the MMP, it is important to show that the MMP can be used with a compiler to improve the results of the compiler. Some work on the MMP and on the Janus tools was necessary to integrate the MMP and achieve the desired results. The Janus compiler can schedule more than one operation per PE if a CCM architecture is used that has more than one memory port for each PE. The first step is to create a new architecture that has eight memory ports per PE. Then Janus can schedule up to eight operations per PE. The MMP can be used to implement this scheduling if some minor changes are made to the MMP. When a user writes a Janus operation, they assume the operation has exclusive use of a memory port and the address space starts at location 0. Therefore each operation that Janus assigns to a PE is designed so that its address space starts at 0. With the standard version of the MMP, the operators would access and overwrite the data of the other operators in the same PE. To account for this in the MMP, a modified version was created that limits the addresses each memory port can access. For each memory port the upper address lines are tied to a unique value. Thus each memory port has an address 24 space of 0 to 32767 and can not access the memory of other memory ports. The code for this version of the MMP is in Appendix A under janusmmp.java. Chapter 5 has details on how Janus was modified to use the MMP. 3.4 Detailed Multiplexed Memory Port Design 3.4.1 VHDL Version of the MMP The code for the VHDL MMP is in mem_port.vhd. The VHDL version is built using a case statement. The active memory port is determined by the PORT_SELECT signal. The first three cases of the case statement are shown in Listing 3.1. Simple if then statements are used to control the flow of data to and from memory. case PORT_SELECT is when "0000" => CLK_ENABLE <= '0'; PORT_SELECT <= "0001"; MEM_ADDRESS <= ADDRESS0; MEM_PORT_WRITE_SEL_N <= MEMORY_DIRECTION0; if MEMORY_DIRECTION0 = '0' then DATA_TO_MEM <= SINK0; -- value to write to memory end if; when "0001" => CLK_ENABLE <= '0'; PORT_SELECT <= "0010"; MEM_ADDRESS <= ADDRESS1; MEM_PORT_WRITE_SEL_N <= MEMORY_DIRECTION1; if MEMORY_DIRECTION1 = '0' then DATA_TO_MEM <= SINK1; -- value to write to memory end if; when "0010" => CLK_ENABLE <= '0'; PORT_SELECT <= "0011"; MEM_ADDRESS <= ADDRESS2; MEM_PORT_WRITE_SEL_N <= MEMORY_DIRECTION2; if MEMORY_DIRECTION2 = '0' then DATA_TO_MEM <= SINK2; -- value to write to memory end if; if MEMORY_DIRECTION0 = '1' then -- read port 0 data if '1' from 2 cycles ago SOURCE0 <= DATA_FROM_MEM; end if; Listing 3.1 Main VHDL Code for MMP 25 This structure makes it easy to add more virtual memory ports. For example, the VHDL application in Chapter 5 uses an MMP with 13 virtual memory ports. 3.4.2 JHDL Version of the MMP JHDL does not support the same type of behavioral modeling that was used to build the VHDL version. Therefore the JHDL MMP must be created using a structural model. The JHDL structural model is not as flexible as the VHDL behavioral model. The JHDL version of the MMP is controlled by a four bit counter. The counter is wired to reset at 1010 as shown in Figure 3.3. The outputs of the counter (C0, C1, C2, C3) are used to drive the logic for the rest of the MMP. Four Bit Counter reset C3 C2 C1 C0 Figure 3.3 Counter for JHDL Version of MMP The DATA_IN line is connected to eight enabled registers as shown in Figure 3.4. 26 SOURCE0 Enable Combinational Logic Port 0 DATA_IN SOURCE7 Enable Combinational Logic Port 7 Figure 3.4 Schematic of SOURCE Structure A different combinational logic function for each SOURCE register drives the enable signal to accomplish the timing shown in Figure 3.2. The combinational logic function is a function of the read/write line for the associated port and the current value of the counter. The remaining MMP outputs (WRITE_SEL, ADDR, DATA_OUT) are determined using 8-to-1 muxes as shown in Figure 3.5. Mem_Dir0 Mem_Dir1 Mem_Dir2 Mem_Dir3 Mem_Dir4 Mem_Dir5 Mem_Dir6 Mem_Dir7 C2 C1 C0 Write_Sel Sink0 Sink1 Sink2 Sink3 Sink4 Sink5 Sink6 Sink7 C2 C1 C0 Data_Out Address0 Address1 Address2 Address3 Address4 Address5 Address6 Address7 C2 C1 C0 Addr Figure 3.5 Schematic of WRITE_SEL, ADDR, and DATA_OUT Structure The MMP files are in Appendix A. The main code for the JHDL MMP is in 27 struct_m.java. The JHDL MMP that has been modified for use with the Janus compiler is in janusmmp.java. 3.4.3 VHDL MMP vs. JHDL MMP The VHDL MMP uses behavioral modeling and is more robust than the JHDL MMP. The number of virtual memory ports in the VHDL version can easily be changed by rearranging some code. This is a simple modification compared to the JHDL version. In the JHDL version, changing the number of memory ports requires the designer to create a new structural model. It is easier to use the VHDL MMP with existing code. VHDL requires that the clocks for synchronous components pass through the component interface so the CLK_ENABLE output can be routed to the clock for existing code. This keeps the code from executing until each virtual clock cycle is complete. However, JHDL uses an implied clock so you can not override it with another signal. To use the CLK_ENABLE as a clock, the JHDL components must have been written with an enable line to control execution. If such an enable line exists then the JHDL MMP can be used by connecting the CLK_ENABLE to the enable line. The JHDL components do not have generics like the VHDL components. In JHDL it easier to make the Bit Width of the inputs variable. Unlike VHDL, the user does not need to pass the actual value of the Bit Width as a generic. The width of the inputs can be left undefined. When the component is simulated or synthesized, the code fixes the widths of the inputs for each instance of the component based on the signals passed to it. 3.5 Address Generation for the MMP While writing sample applications and operators, it is apparent that one of the most common and important aspects is the generation of the addresses for memory accesses. Below are several methods that can be used. These are the same methods one might use for address generation for any application that accesses memory. It can be harder in 28 JHDL because one is limited to structural models. The designer does not have the option of for loops or if then structures. These must be generated with counters and muxes respectively. 3.5.1 Address Generation with Constants The simplest use of the MMP would be to leave the address lines constant during the whole operation (temporal partition). Some ports can be used as inputs with the MEM_DIR lines tied to 1 and other ports can be used as outputs with the MEM_DIR lines tied to 0. Alternatively, data can be read in on the first cycle, processed, and the results written out to the same addresses that were used to read data. In this case the designer just needs to change the MEM_DIR lines. 3.5.2 Address Generation with Muxes If the memory address is only changing between two values, a 2-to-1 multiplexor (mux) is probably the easiest structure to use. The inputs to each mux can be constants. Then some logic is needed to drive the select lines. This could be as simple as connecting the select line to the clock. For more complicated timing, a counter can be used to count the number of clock cycles that have passed and a logical function of the count can be used to drive the select line of the mux. 3.5.3 Address Generation with Counters Some applications need a constant progression of address values. For example, to sum all the values in the first 64 locations in memory (address 0 to 63) and store the result in address 64, use the output of a counter as the address. On each clock cycle add the input to a running sum register. Connect the output of the register to a SINK port. Use the inverse of bit seven of the counter to drive the read/write line so that when the counter reaches 64, the memory port will be set to 0 and the value in the sum register will be written to location 64. Note that the sum output can always be connected to SINK since it is only written when the read/write line is set to write. 29 3.5.4 Address Generation with a Generic Address Generator A generic address generator component was written in VHDL to make complex address generation easy. The interface is shown in Listing 3.2. generic ( PATH_DELAY: INTEGER := 0; -- number of clock cycles to wait before starting BIT_WIDTH: INTEGER := 22; -- width of address INIT0: INTEGER:= 0; -- initial value of loop 0 TERM0: INTEGER:= 0; -- final value of loop 0 INC0:INTEGER:= 0; -- value to increment each loop 0 INIT1: INTEGER:= 0; -- initial value of loop 1 TERM1: INTEGER:= 0; -- final value of loop 1 INC1:INTEGER:= 0; -- value to increment each loop 1 INIT2: INTEGER:= 0; -- initial value of loop 2 TERM2: INTEGER:= 0; -- final value of loop 2 INC2:INTEGER:= 0); -- value to increment each loop 2 port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; ADDRESS : out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); DONE: out STD_LOGIC ); Listing 3.2 Interface for Generic Address Generator There are three loops, I, J, and K, where K is the innermost loop, J is the middle loop and I is the outermost loop. I goes with the terms INIT0, TERM0, INC0, J goes with terms INIT1, TERM1, INC1 and K goes with terms INIT2, TERM2, INC2. The resulting address is I + J + K. I represents the row index, J represents the column index and K represents a third dimensional index. As an example, consider an image stored in memory as a 20 x 10 matrix as in Table 3.1. 30 Table 3.1 A 20 x 10 Matrix 0 10 20 . . . 190 1 11 21 . . . 191 2 12 22 . . . 192 3 13 23 . . . 193 4 14 24 . . . 194 5 15 25 . . . 195 6 16 26 . . . 196 7 17 27 . . . 197 8 18 28 . . . 198 9 19 29 . . . 199 The numbers in the table correspond to the addresses of the memory cells. For a row traversal, INIT0 = 0, TERM0 = 200, and INC0 = 10. I increments from 0 to 190 using a step size of 10. For the J term, INIT1 = 0, TERM1 = 10, and INC1 = 1. J increments from 0 to 9 using a step size of 1. With these values the address generator will produce the sequence 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, up to 199. For a column traversal, the I and J values are switched. INIT0 = 0, TERM0 = 10, INC0 = 1, INIT1 = 0, TERM1 = 200 and INC1 = 10. With these values the address generator will produce the sequence 0, 10, 20, 30, 40, 50, 60, 70, 190, 1, 11, 21, and so on up to 199. The generic address generator asserts the DONE output when it has completed the sequence. The DONE signal can be used to signal the host that processing is complete. 31 Chapter 4. WILDFORCE Specifics and Tools 4.1 WILDFORCE Architectural Overview The various configurations of the MMP in Chapter 3 have been implemented on a configurable computing machine made by Annapolis Micro Systems (AMS) [43]. Figure 4.1 shows the AMS WILDFORCE configurable computing machine. Figure 4.1 WILDFORCE PCI Board Figure 4.2 shows a block diagram of the WILDFORCE board. The WILDFORCE board has four PEs plus an additional control PE. Each PE has a memory that can be accessed by the host or the PE. PEs can not directly access the memory of other PE memories. Each PE is connected to a neighboring PE with a systolic array and the crossbar can be used to create data paths between non-adjacent PEs. On the beginning and end of the systolic array (PE1 and PE4) are FIFOs. Each PE also has a mezzanine port (not shown) that can be used for additional memory or other hardware. 32 Mem1 Mem2 Fifo1 Mem0 PE1 PE2 Fifo0 CPE0 Crossbar Fifo4 PE4 PE3 Mem4 Mem3 Figure 4.2 Block Diagram of WF 4.2 PE Memory Each PE has a single attached memory that can be accessed by the host and the PE. The PE memories are one of the primary paths used to pass data back and forth between the host and the PE. Applications generally write data to the memory for processing by the PE. When the PE completes the processing, the host reads the results from memory. One of the aspects that complicates the process of writing applications is that when the PE reads a word it takes two cycles for the data to appear. Figure 4.3 shows a timing diagram for a memory read. 33 PE_Pclk PE_MemWriteSel_n PE_MemAddr_OutReg Addr PE_MemData_InReg Data Figure 4.3 Timing Diagram for a Memory Read A memory write is less complicated than a memory read. The address and the data to be written to memory are applied to the memory interface in the same clock cycle. A timing diagram for a memory write is shown in Figure 4.4. PE_Pclk PE_MemWriteSel_n PE_MemAddr_OutReg Addr PE_MemData_OutReg Data Figure 4.4 Timing Diagram for a Memory Write Various combinations of memory reads and writes can be done on consecutive clock cycles. An example of consecutive reads and writes is shown in Figure 4.5. 34 PE_Pclk PE_MemWriteSel_n PE_MemAddr_OutReg Addr0 Addr1 Addr2 Addr3 Addr4 PE_MemData_OutReg D_Out1 D_Out3 D_Out4 PE_MemData_InReg D_In0 D_In2 Figure 4.5 Timing Diagram for Consecutive Memory Reads and Writes 4.3 PE Interconnects CCMs often have additional FPGAs dedicated to creating links between PEs. The FPGAs form a reconfigurable crossbar. The crossbar can be used to create additional communication channels between the PEs. In the WILDFORCE, without the crossbar, there is only one connection between adjacent PEs. The interconnect and crossbar structure varies with different CCMs. This makes it difficult to design an architecture independent compiler that uses the crossbar. A good compiler should be able to utilize these interconnects to maximize data flow and minimize the number of reconfigurations required. But, it is difficult to design the compiler to utilize these communication options without being CCM specific. The Janus framework currently does not allow communication between PEs. 4.4 Multiplexed Memory Port During work on several versions of the Janus project, it became apparent that the compiler results would be more efficient if the PEs had more memory ports. With this in s mind, a multiplexed memory port (MMP) was developed and is presented in Chapter 3. 35 Although the MMP has been designed and tested only on the WILDFORCE board, it can easily be modified for other CCMs. To make the MMP work for a particular CCM, connect the memory signals to the specific signals used for that CCM. For the WILDFORCE board, the MMP signals are connected as follows. ADDR is connected to PE_MemAddr_OutReg. DATA_IN is connected to PE_MemData_InReg. DATA_OUT is connected to PE_MemData_OutReg. WRITE_SEL is connected to PE_MemWriteSel_n. The timing for memory reads and writes may need to be adjusted within the MMP to match the specific timing of the CCM. 4.5 VHDL Skeleton Structure for WILDFORCE Logic Core In addition to the MMP, an application structure was created. The structure was built to ease the somewhat cumbersome aspects of creating and synthesizing each spatial partition (one for each PE) for the WILDFORCE board. The user creates a computational element as a component in VHDL (where the user may be a compiler) with the assumption there are eight memory ports available. On each clock cycle, a 32-bit word can be written or read from each port, with the data being valid on the next clock cycle. After the component is tested, the user can then insert the component into a higher level VHDL file (which includes the MMP) by mapping the data and address lines to the appropriate signals. Then, the top-level file can be synthesized into a new partition. The provided structure takes care of all the startup and shutdown conditions. 4.5.1 Hierarchical Structure When creating a spatial partition (an FPGA size computational element) in VHDL for a WILDFORCE PE, a hierarchical structure is used. This structure is shown in Figure 4.6. 36 WILDFORCE PE, VT_PEx.vhd WILDFORCE Logic Core Description, VT_SKLCx.vhd Generic Component with MMP, MMPCOMP.vhd Multiplexed Memory Port Synthesizable User Component Figure 4.6 Hierarchy of VHDL PE The WILDFORCE hardware is abstracted into a structural hierarchy of models. These files are provided to simplify what the designer or compiler must do to create a working partition and most do not need to be modified. There is a PE at the top level of the hierarchy, representing the FPGA. The PE level contains a logic core component that is at the next level down in the hierarchy. The logic core component contains an FSM that controls the execution of the hardware including start up conditions, reset, memory request, setting an interrupt to signal to the host that the task is complete, and other tasks. The logic core contains a generic computational component. The generic component contains the MMP and the user's synthesizable VHDL component that does the actual data processing for the partition. The user simply needs to map the MMP signals to the signals in their component. The interface of the generic component is shown in Listing 4.1. port (CLOCK: in STD_LOGIC; CLR: in STD_LOGIC; DONE: out STD_LOGIC; MEM_SIGNAL0: out STD_LOGIC; --used for read/write signal MEM_SIGNAL1: out STD_LOGIC_VECTOR (21 downto 0); -- used for passing address MEM_SIGNAL2: out STD_LOGIC_VECTOR (31 downto 0); -- used for data to memory MEM_SIGNAL3: in STD_LOGIC_VECTOR (31 downto 0); -- used for data from memory DATA_SIGNAL0: in STD_LOGIC_VECTOR (35 downto 0); -- used for PE_LEFT_IN DATA_SIGNAL1: out STD_LOGIC_VECTOR (35 downto 0); -- used for PE_LEFT_OUT DATA_SIGNAL2: in STD_LOGIC_VECTOR (35 downto 0); -- used for PE_RIGHT_IN DATA_SIGNAL3: out STD_LOGIC_VECTOR (35 downto 0) -- used for PE_RIGHT_OUT); Listing 4.1 Interface of Generic VHDL Component 37 The MEM_SIGNALs and DATA_SIGNALs are used to pass signals up through the hierarchy. They connect signals in the top level of the PE to the WILDFORCE specific ports. MEM_SIGNALS pass address, data from memory, data to memory, and the read/write signal. The DATA_SIGNALs are connected to PE_Left_In, PE_Left_Out, PE_Right_In, PE_Right_Out. These can be used for communication between adjacent PEs. 4.5.2 The Logic Core Model A common procedure has been established for executing partitions within an application. The procedure of a WILDFORCE application at run time is: 1) Load the WILDFORCE FPGAs with PE images (bit streams), 2) Load the FPGA memories with data to be processed, 3) Start the board clock (Enable the Clock), 4) Wait for an interrupt (Done Signal), 5) Read Data from memories, and 6) If another partition is to be used, then go back to 1. The Logic Core accomplishes this using an FSM. The FSM is constructed with a CASE statement. The current state is held in a register and updated as the conditions for the next state are met. 4.5.3 Problems with the current VHDL structure One of the difficulties with the current configuration is that the memory interface signals must be passed up through the hierarchy from the Multiplexed Memory Port to the top level design, VT_PEx.vhd. VHDL is very strict about having to pass all the signals through the component interfaces. This is somewhat cumbersome because the signals have to be routed through the entire VHDL PE structure. The signals that are passed up are WILDFORCE specific but the MMP files should not be WILDFORCE specific. This is handled by assuming that the generic PE files for other CCMs will include generic 38 signals that can be used by the MMP to transfer necessary signals. Since all CCMs will probably need some passing of signals like this, it is probably a safe assumption. 4.5.4 Creating the VHDL Computational Component A library of synthesizable VHDL components has been created to ease the task (for the user or the compiler) of creating a computational element. Each component has a component declaration and a VHDL behavioral model. These can be used to build a VHDL model that links the components together. Then this structure can be simulated. The component declarations are all included in the file janpack.vhd. This file also includes a package required by the square root operator (from Logiblox). The behavioral (and synthesizable) models are gathered together in the file janbehav.vhd. 4.5.4.1 Building a Structural Model To make a VHDL component, first include the standard IEEE library. library IEEE; use IEEE.std_logic_1164.all; Next include the component declarations and behavioral models. library janlib; use janlib.janpack.all; Then declare the entity and its input and output ports. entity USER_COMPONENT is port ( CLOCK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(15 downto 0); B: in STD_LOGIC_VECTOR(15 downto 0); RESULT: out STD_LOGIC_VECTOR(15 downto 0) ); end USER_COMPONENT; 39 Next, the architecture declaration must be followed by all the signals that will be internal to the structure, interconnecting the components. architecture USER_COMPONENT_arch of USER_COMPONENT is In the main body, each instance of each component must be instantiated. Each component has a generic input (bit_width) that determines the bit width of the component inputs. The structure is created by using these instantiations to map the entity inputs, internal signals, and entity outputs. This simple example builds the function: Result = A nand B. begin nand1 : nandgate generic map (bit_width => 16) port map (CLOCK, CLR, A, B, RESULT); end USER_COMPONENT_arch; The complete file can be found in Appendix A under januslibtest.vhd. 4.5.4.2 Creating WILDFORCE Application with mmpcomp.vhd The USER_COMPONENT is incorporated into mmpcomp.vhd to create a WILDFORCE application. The component interface must be updated in mmpcomp.vhd. Two address generators are used to generate addresses for reading data from memory for inputs A and B. A third address generator is used to generate addresses for writing the results to memory. The memory direction lines are fixed in this example. The signals from the MMP are mapped to USER_COMPONENT using the port map command. The resulting file, mmpcomp_sample.vhd is in Appendix A. After mmpcomp.vhd has been modified for the user component, the top level s WILDFORCE PE file, VT_Pex.vhd can be simulated and synthesized. 40 4.6 JHDL Skeleton Structure for WILDFORCE Logic Core The JHDL structure is similar to the VHDL structure. The JHDL version is simpler and more compact because JHDL does not require signals to be explicitly passed through the component interfaces. For example, the memory address signal, PE_MemAddr_OutReg, can be written to from any component without having to include it in the component interface. 4.6.1 Hierarchical Structure Figure 4.7 shows the structure of a JHDL PE for the WILDFORCE board. WILDFORCE PE, generic_pe.java Logic Core Description, user_component_MMP.java Multiplexed Memory Port Synthesizable User Component Figure 4.7 Hierarchy of JHDL Structure The generic WILDFORCE PE file is generic_pe.java. This is the top level file that must be synthesized to create a spatial partition. The interface for generic_pe is shown in Listing 4.2. public class generic_pe extends pelca { public static CellInterface[] cell_interface = { out("MEM_ADDRESS", 20), out("DATA_TO_MEM", 32), in("DATA_FROM_MEM", 32), in("grant", 1), out("request", 1), out("strobe", 1), out("MEM_WRITE_SEL", 1), 41 in("reset", 1), out("intreq", 1), in("intack", 1) }; Listing 4.2 Interface for generic_pe.java The generic signal names in the interface are mapped to the WILDFORCE specific signals in the body of generic_pe.java as shown in Listing 4.3. Wire MEM_ADDRESS = port("MEM_ADDRESS",MemAddr(19,0)); Wire DATA_TO_MEM = port("DATA_TO_MEM",MemDataOut()); Wire DATA_FROM_MEM = port("DATA_FROM_MEM",MemDataIn()); Wire grant = port("grant",MemBusGrant_n()); Wire request = port("request",MemBusReq_n()); Wire strobe = port("strobe",MemStrobe_n()); Wire MEM_WRITE_SEL = port("MEM_WRITE_SEL",MemWriteSel_n()); Wire reset = port("reset",Reset()); Wire intreq = port("intreq", InterruptReq_n()); Wire intack = port("intack", InterruptAck_n()); Listing 4.3 Generic Signals are Mapped to WILDFORCE Specific Signals The generic_pe.java file includes an FSM that is functionally equivalent to the FSM in the VHDL Logic Core file. However, in JHDL the method for creating the FSM is different. The FSM is treated as a component and instantiated in generic_pe.java using the following line: BusGrantFSM fsm = new BusGrantFSM(this,intack,grant,done,request,enable,intreq); The file BusGrantFSM.java contains the interface for the FSM. The actual FSM is built from a description contained in BusGrantFSM.fsm. 4.6.2 JHDL Logic Core Description The file user_component_MMP.java is a generic file that is used with generic_pe.java. It includes the MMP and the necessary wires for using the MMP. After the user component is tested, it can be added to user_component_mmp.java by mapping the source and sink, address, and memory direction lines to the appropriate ports. 42 Chapter 5. Using the MMP Chapter 3 and Chapter 4 presented the tools and the framework necessary to build temporal and spatial partitions. Using that basis, Section 5.1 constructs an example application in JHDL and Section 5.2 constructs an example application in VHDL. Section 5.3 shows how the MMP is used to enhance the functionality of a CCM compiler. 5.1 JHDL Example, Radix-4 Butterfly The Fast Fourier Transform (FFT) is a common operation found in Digital Signal Processing (DSP) Applications. In cases where the number of input samples is a power of four (or expanded with zeros to make it a power of four), a more efficient version of the FFT algorithm can be implemented using a structure made up of the component shown in Figure 5.1. a Tw0 + E b Tw1 -j -1 j + F -1 c Tw2 -1 j -1 d Tw3 -j + G + H Figure 5.1 Radix-4 Butterfly 43 The boxes in Figure 5.1 represent multiplications and the circles represent sums. The Twn values (Tw0, Tw1, Tw2, Tw3) are commonly referred to as twiddle factors. These values are determined by the number of samples in the input. The twiddle factors will vary for each butterfly in the FFT structure. For more details on FFT algorithms and twiddle factors see [44]. Figure 5.2 shows the Radix 4 Butterfly in a simplified component form. a b Radix 4 Butterfly c d Twiddle0 Twiddle1 Twiddle2 Twiddle3 G H E F Figure 5.2 Block Diagram of Radix-4 Component Since higher order FFTs use many butterfly stages, each having a different set of twiddle factors, the twiddle factors are not fixed, but are left as inputs. The inputs and outputs are assumed to be complex so each input and output has a real port and an imaginary port. In this implementation, the data is in signed (2's complement) form. Solving for the outputs E, F, G, and H from Figure 5.1, the formulas are as follows, where R and I refer to the real and imaginary parts of the complex numbers. 44 ER = (aTw0)R + (bTw1)R + (cTw2)R + (dTw3)R EI = (aTw0)I + (bTw1)I + (cTw2)I + (dTw3)I FR = (aTw0)R + (bTw1)I (cTw2)R (dTw3)I FI = (aTw0)I (bTw1)R (cTw2)I + (dTw3)R GR = (aTw0)R (bTw1)R + (cTw2)R (dTw3)R GI = (aTw0)I (bTw1)I + (cTw2)I (dTw3)I HR = (aTw0)R (bTw1)I (cTw2)R + (dTw3)I HI = (aTw0)I + (bTw1)R (cTw2)I (dTw3)R Figure 5.3 Formulas for Radix-4 Butterfly The multiplications by j or 1 in Figure 5.1 are implemented with a swap of the real and imaginary part and/or a change of signs. These affects are hard wired into the butterfly component, resulting in a direct implementation of the equations above. 5.1.1 Radix-4 Butterfly Implementation Details 5.1.1.1 Complex Multiplier Each multiplication in the formulas of Figure 5.3 is a complex multiplication. An asynchronous complex multiplier component was built to perform the complex multiplication. It is shown in Figure 5.4. X_Real X_Imaginary Complex Multiplier Y_Real Y_Imaginary OutR_High OutR_Low OutI_High OutI_Low Figure 5.4 Asynchronous, Complex Multiplier 45 The complex multiplier uses four N bit real multipliers (signed). The multiplier has four inputs consisting of the real part and imaginary part of two complex numbers. The input width, N, is variable. The output is a complex number with 2*N bits for the real part and 2*N bits for the imaginary part. The output is divided into four outputs consisting of the high and low N bit sections of the real and imaginary parts. Given two complex numbers x and y where x = xR + jxI and y = yR + jyI, then x * y = ((xR yR) (xI yI)) + j((xR yI) + (xI yR)) and OutR = ((xR yR) (xI yI)) OutI = ((xR yI) + (xI yR)). Figure 5.5 shows a block diagram of the complex multiplier. XR YR XI YI XR YI XI YR OUT R + OUT I Figure 5.5 Complex Multiplier Block Diagram The code for the complex multiplier is in complex_mult.java and can be found in Appendix A. 46 5.1.1.2 Radix-4 Butterfly Component The code for the radix-4 butterfly component from Figure 5.2 is in butterfly_4.java. It has 17 input ports and eight output ports. Eight inputs are for the real and imaginary parts of four data samples. Another eight inputs are for the real and imaginary parts of the four twiddle factors. The last input is an ENABLE line. The eight outputs are for the real and imaginary parts of the four results. The input width is variable. To avoid errors, all data and twiddle inputs must be the same width. The output width is twice the input width. The component is a synchronous device even though a clock input is not shown in Figure 5.2. In JHDL the clock signal is an implied signal that is always available and does not need to be explicitly passed through the component interface. The radix-4 butterfly requires four complex multiplications for a total of 16 real multipliers. If the inputs are 32 bits each, then one real multiplier requires 544 CLBs. A full implementation of a 32 bit radix-4 butterfly would require 8,704 CLBs. This is more than are available in a single WILDFORCE PE (the current boards use Xilinx 4062s with 2304 CLBs each). For this reason a single complex multiplier is used and it is time multiplexed over four clock cycles to achieve the four complex multiplications. Thus a single butterfly operation takes four clock cycles. With this configuration, a butterfly with inputs up to 23 bits wide can be implemented in a Xilinx 4062. Access to the single complex multiplier is controlled by a two bit counter. This counter can be turned on and off using the butterfly's ENABLE input. It is possible for an overflow to occur when the results of the four multiplications are added together. performed. Figures 5.6, 5.7, and 5.8 show the results of the simulation of the radix-4 butterfly component by itself. To keep this implementation simple, no overflow checks are 47 Figure 5.6 Simulation of Radix-4 Butterfly, Data Inputs Figure 5.7 Simulation of Radix-4 Butterfly, Twiddle Inputs 48 Figure 5.8 Simulation of Radix-4 Butterfly, Outputs 5.1.1.3 Radix-4 Butterfly with MMP After the radix-4 butterfly has been tested it can be combined with the MMP to create an operation that can be used on the WILDFORCE board. Up to this point the butterfly component has been written and simulated without any consideration for memory access issues or WILDFORCE specific issues. Now the tools from Chapter 3 and Chapter 4 can be used to quickly turn the radix-4 component into a spatial partition for a WILDFORCE PE. A new file called bfly4_mmp.java is created using the generic file user_component_MMP.java (see the JHDL generic files in Chapter 4 and Appendix A). The new file replaces the generic user component with the radix-4 butterfly and maps the MMP signals to the butterfly component. The input bit width for the radix-4 component is determined at this point by how many bits from each MMP source are mapped to the component inputs. For this case, an input size of eight bits is used for the real and imaginary parts of each input. The WILDFORCE memory uses 32-bit words so the four data inputs and the four twiddle factors are packed into four memory locations. The four outputs are 32 bits each (counting the real and imaginary parts) so each output is packed 49 into a single memory location. The new file also includes the code required to generate the addresses for the memory ports (see Section 5.1.1.5). The resulting component can be used in generic_pe.java to create a WILDFORCE PE. 5.1.1.4 Radix-4 Butterfly Implemented as WILDFORCE PE To complete the spatial partition, a synthesizable PE is built by incorporating bfly4_mmp.java from Section 5.1.1.3 into the generic file generic_pe.java. The new file is called generic_pe_bfly4.java. The only change required to the generic file is the name of the user component. The name is changed from user_component_mmp to bfly4_mmp. Now the downloadable PE bit stream for the spatial partition is created by net listing and synthesizing generic_pe_bfly4.java. The synthesized PE with an eight bit radix-4 butterfly takes 650 CLBs. Other bit widths were implemented to find the number of CLBs required for different bit widths and determine the largest bit width that can be used. The results are shown in Table 5.1. Table 5.1 Number of CLBs Required for Various Radix-4 Bit Widths Bit Width 8 16 23 30 CLBs 650 1484 2256 2942 5.1.1.5 Address Generation for the radix-4 Butterfly The bfly4_mmp.java operation uses 8 memory locations. The four complex inputs are packed into location 0 and 1. The four complex twiddle factors are packed into location 2 and 3. The four complex outputs are packed into location 4, 5, 6, and 7. A counter counts the number of CLK_ENABLEs that have passed to determine the specific memory 50 location to use. On the first cycle the component reads the four complex inputs and the four twiddle factors. On the second cycle the results are written to addresses 4 to 7. A simple 2-to-1 mux is used to control addresses applied to the address lines for each memory port. Each mux has two constants tied to the inputs and the select line is driven by a read/write line. For port 0, the mux has the constants zero and four, port 1 has constants one and five, port 2 has constants two and six, and port 3 has constants three and seven. The muxes output the first set of addresses for the first cycle and the second set of addresses for the second cycle. 5.1.1.5 Simulation Results Figure 5.9 shows the simulation results for the WILDFORCE implementation of the radix4 Butterfly. The PE start up process is completed in the first four steps and the first MMP cycle starts on step four. The data and twiddle inputs have been read in on step ten. The radix-4 butterfly starts processing on step ten and is completed on step 14. The CLK_ENABLE signal is asserted on step 14 and the second MMP cycle begins on step 15. The WRITE_SEL line is negated to write the results to memory locations 4, 5, 6, and 7. 51 Figure 5.9 Simulation with WILDFORCE Signals 5.1.1.6 Notes about other implementations For an FFT that has more than four data samples (requiring more than one radix-4 butterfly), the twiddle factors will have non-integer values. In this case the user can format the data as some fixed point format. No changes need to be made to the hardware. For example, if an eight bit format is used with four integer bits and four bits to the right of the decimal point, the complex multiplication will produce 16 bit results with eight integer bits and eight decimal bits. The additions of the complex multiplication results will not affect the format. If a floating-point format is required, changes must be made to the butterfly code. JHDL provides a floating-point multiplier and a floating-point adder. The multipliers in the complex multiplier simply need to be changed to the JHDL floating point multiplier 52 and the adders in the butterfly should be changed to the JHDL floating point adder. If this configuration is used, the outputs will be the same number of bits as the input, simplifying higher order FFTs. 5.1.2 A 16 point FFT A 16-point FFT component was created using the radix-4 butterfly to explore the issues one might face when building higher order FFTs. A 16-point butterfly can be built using two stages with four radix-4 butterflies in each stage. The 16 data inputs are connected to the first stage. The outputs of the first stage are connected to the inputs of the second stage. The outputs of the second stage are the outputs of the 16 point FFT component. The user has many choices to make concerning the implementation. Since the multiplications in a butterfly double the number of bits, a two-stage version will quadruple the number of bits. However, the user can choose to connect less than the full output of the first stage to the second stage. For example, if the input is eight bits with four decimal places, the user may choose to use only the integer part of the first stage result in the second stage. Two 16-point FFTs were done using eight bit inputs for the first stage and using just the lower eight bits of the first stage output as input to the second stage. For the second stage, eight bits were used for the twiddle factors with seven decimal bits and one sign bit (2's complement format). The results of the first FFT are shown in Table 5.2. Finite word length effects are caused by only using seven decimal bits. These effects account for the small differences between the JHDL results and the Matlab simulation results. 53 Table 5.2 Results From First FFT Input Data 1, x(0) to x(15) 2 + 3j 3 + 4j 4 + 5j 5 + 6j 6 + 7j 7 + 8j 8 + 9j 9 + 10j 10 + 11j 11 + 12j 12 + 13j 13 + 14j 14 + 15j 15 + 2j 2 + 3j 3 + 4j Matlab Simulation Results 124 + 126j -42.8611 - 11.2297j -3.4142 - 12.5858j 2.0453 - 1.3847j -2 -0.411 - 3.3666j 4.3848 - 0.5858j 1.8359 + 6.5256j -8 + 6j -11.7663 - 5.7408j -0.5858 - 15.4142j 14.9252 - 7.9879j 14 + 12j -8.9615 + 20.3372j -32.3848 -3.4142j -18.8065 - 61.153j JHDL 16 pt FFT 123.03125 + 125.015625j -42.766 - 11.28125j -3.40625 - 12.46875j 2.09375 - 1.28175j -1.984375 -0.21875 - 3.296875j 4.59375 - 0.5625j 1.765625 + 6.53125j -7.9375 + 5.953125j -11.734375 - 5.78125 -0.5625 - 15.3125j 14.96875 - 7.76525j 13.890625 + 11.90625 -8.78125 + 20.359375j -32.375 -3.40625j -18.828125 - 61.03125j For the second FFT, two decimal places were used on the input data values. As in the first FFT, the lower eight bits were used as input to the second stage (four decimal bits) and the twiddle factors have seven decimal bits and one sign bit. Thus the output has eleven decimal bits. 54 Table 5.3 Results From Second FFT Input Data 1, x(0) to x(15) 2.5 - 3j 2.25 + 2.5j 3.5 + 2j 4 - 5.5j 3.75 4 - 2.25j -6j 1.5 + 3j 7 + 4j 5.5 + j 2.75 2j 0.5 - 3.75j -3.25 - 3j -5 - 2.25j -2 + 0.5j 3 + 1.25j Matlab Simulation Results 30 - 13.5j -8.246 - 19.151j 16.798 + 4.766j -0.3 + 1.324j 9.75 + 5.75j 0.4123 - 11.829j -6.576 - 9.14j -3.268 + 0.539j -1.5 -1.5j -0.057 + 2.11j 12.202 - 13.266j -16.468 - 20.062j 1.75 + 1.25j 1.891 - 27.131j 13.576 + 33.64j -9.965 + 18.2j JHDL 16 pt FFT 29.7656 - 13.3945j -8.2422 - 18.9863j 16.6973 + 4.8477j -0.2637 + 1.2617j 9.6738 + 5.7051j 0.4434 - 11.7539j -6.6582 - 9.3516j -3.2031 + 0.6152j -1.4883 - 1.4883j -0.0664 + 2.2246j 12.0762 - 13.2812j -16.3965 - 20.1016j 1.7363 + 1.2402j 1.9121 - 27.0469j 13.6035 + 33.6602j -9.9023 + 18.2246j 5.2 Linear Filter The linear filter is used for image processing. It creates a new output image based on an input image [45]. The pixels in the output image are calculated using a 3 x 3 template. Each pixel in the 3 x 3 square from the input image is multiplied by a factor and the resulting value is used as the output pixel for the center position as in Figure 5.10. Depending on the multiplication values used, several useful functions can be performed. 55 ABC DEF GHI Input Image Output Image Figure 5.10 Linear Filter Operation The basic component is the 3 x 3 linear filter. The code for the filter is in lin_filt.vhd in Appendix A. The linear filter component has 18 inputs and one output. Nine inputs are for the multiplication factors. The multiplication factors are four bit values in 2 complement form allowing positive and negative factors. The remaining nine inputs s are the values in the square of input pixels. The output is: Output = A* MA + B * MB + C * MC + D * MD + E * ME + F * MF + G * MG + H * MH + I * MI. A color image will have three values for each pixel where each value represents the strength of one of the pixel colors (red, blue or green). Each pixel color has a value from 0 to 255. A separate filter is needed to process the nine red values, the nine blue values, and the nine green values. The file lin_full.vhd is a modified version of the generic file mmpcomp.vhd from Chapter 4. The new file changes the standard eight port MMP to a 13 port MMP, adds three lin_filt components (one for each color) and configures the generic address generators for the memory. It also has some simple logic for packing and unpacking the data from the 32 bit memory words. Without the MMP, the designer would have to generate a complex memory interface to read the multiplication factors and pixel values, signal to the filters that the data is ready to be processed, and then write the result to the correct memory location. The user would need to fully understand the subtleties of the WILDFORCE memory interface. By using the 13 port MMP, the creation of the application is greatly simplified. 56 To create a synthesizable PE, a new file called vt_lflc1.vhd is made by modifying the generic file vt_sklc1.vhd. The only change required is the name of the user component to lin_full. The new file can be synthesized into a downloadable PE image. Figure 5.11 shows an original image and the same image after it has been processed with the linear filter using the multiplication factors shown in the square. -1 -1 -1 -1 8 -1 -1 -1 -1 Figure 5.11 Sample of Linear Filter Application 5.3 Incorporation of the MMP into Janus Compiler The development of the MMP was driven by research on CCM compilers. Section 3.3 covered the changes necessary in the MMP to incorporate it into the Janus compiler. The changes required within the Janus tool set to complete the incorporation of the MMP are covered in this section. The Janus compiler limits the number of operations that will fit in a PE to the number of memory ports in that PE. For example, with the WILDFORCE board, each PE has one memory port so only one operation can be assigned to each PE. One of the motivating factors for building the MMP was to extend the number of operations that can be assigned by the compiler to a PE. This goal has been achieved. 57 5.3.1 New WILDFORCE Architecture Description The Janus compiler is designed to be architecture independent. One way this is achieved is by keeping the architecture specific information in a separate place from the tools. CCM architectures are described by two files. One describes the characteristics of the PEs on the board and the other file describes the implementation of the PEs on the board. A new architecture was needed to use the MMP. The details for the new architecture are below. 5.3.1.1 New WILDFORCE Element A new description for a WILDFORCE PE was created called W4_XC4062_MMP. The only difference between it and the standard WILDFORCE PE is the number of memory ports. When Janus reads the new PE description it will schedule the application with the limitation that the WILDFORCE PE has eight memory ports. public class W4_XC4062_MMP extends ProcessingElement { public W4_XC4062_MMP(String n) { super(n); } protected final int MaxLogic = 2304; protected final int MaxMemPort = 8; // Modified for MMP public int getMaxLogic() { return MaxLogic; } public int getMaxMemPort() { return MaxMemPort; } public Stack getMemPorts() { Stack v = new Stack(); v.push("mem7"); v.push("mem6"); v.push("mem5"); v.push("mem4"); v.push("mem3"); v.push("mem2"); v.push("mem1"); v.push("mem0"); return v; } Listing 5.1 New WILDFORCE PE Description The MaxMemPort value was changed to 8 and the 7 memory ports were added to the stack. The stack holds the available memory ports for the PE. 58 5.3.1.2 New WILDFORCE Platform A new WILDFORCE platform file called WildForceXL_4062_MMP was created to use the MMP. It extends the WildForceXL_4062 class. It replaces the standard WILDFORCE PEs with the new W4_XC4062_MMP PEs. It also overrides the standard load and retrieve memory functions. New memory functions were required because of the way Janus operations are written. Each operation is written with the assumption that it has access to all the memory starting at address 0. If two operations are combined on one PE, they may overwrite data from each other. To prevent this, the MMP hardware divides the address space into eight blocks. The address space for each block is 0 to 32767. The new memory functions shown below add offsets to the memory target addresses based on which memory port is reading or writing to memory. On the host side, if the operation requests that its data be placed starting at address 0 then the load memory function will add the appropriate offset before loading the data into memory. On the hardware side, the MMP also adds this offset so that when the operation requests data from address 0, it will have an offset added to it and get data from its block of memory. It is not possible for an operation to read or write data to another operation memory block. s public void loadMemory(String pe,String mem,MemoryImage image) { // parse out "board#pe#" to node and pe ids int nodeID = this.getNodeID(pe); int peID = this.getPEID(pe); int offset = 32768 * this.getMemTarget(mem); // 2^15 * mem port number int target = offset + image.getTarget(); int buf[] = image.getData(); int start = image.getStart(); int len = image.getLength(); } public void retrieveMemory(String pe,String mem,MemoryImage image) { // parse out "board#pe#" to node and pe ids int nodeID = this.getNodeID(pe); int peID = this.getPEID(pe); int offset = 32768 * this.getMemTarget(mem); // 2^15 * mem port number int target = offset + image.getTarget(); int buf[] = image.getData(); int start = image.getStart(); int len = image.getLength(); } Listing 5.2 New Load and Retrieve Memory Functions 59 With these two classes the user can run Janus and choose the new platform, WildForceXL_4062_MMP. Janus will use the limitation of eight memory ports rather than one and create scheduling for the new PEs. A sample application was built to test the Janus behavior and it worked correctly. 5.3.2 Expanding the Capabilities of Janus The Janus tools were limited to Unordered stages with a single operation type. This limitation was useful to create a first working set of tools but is not a limitation of the system. Since the MMP expands the number of operations that can fit on a WILDFORCE PE, it makes it possible to eliminate the restriction of one operation type per Unordered stage. The only place this restriction is used is in the scheduler. Janus was designed so that new schedulers could be written later and substituted for existing schedulers with minimal effort. A new scheduler was written without the restriction of one operation type. The new scheduler is MultiOpScheduler.java. It can be used to replace the existing scheduler, BasicScheduler.java. When used with the MMP, it extends the number of operations per Unordered stage to 32. The MultiOp Scheduler correctly places operations of varying types and combinations on the WILDFORCE PEs and correctly creates the run time scheduling for the application. Another feature of the Janus Compiler that required some work to retain was the automatic generation of a netlist for each PE. To retain this feature a new file called DynamicPE_MMP.java was created that includes the janusmmp class. This feature has also been successfully implemented with the MMP. 60 Chapter 6. Results Several aspects of creating spatial and temporal partitions manually and automatically have been examined. A multiplexed memory port and other structures have been created in JHDL and VHDL to address some of the problems that existed. The MMP has been incorporated into a CCM compiler. 6.1 Area and Performance Penalties While the MMP makes it easier for humans and compilers to create partitions, there is a price paid in area and performance. The MMP requires a three cycle delay that would not be necessary with customized memory interface schemes. For an eight port MMP this means that three out of eleven cycles (27.3 %) are not accessing memory. Looking at the results of implementing the MMP in a Xilinx 4062, one can see the speed and required area for the MMP. The Xilinx tools report that the MMP alone can run at 16 MHz. The area used by an eight port MMP is 237 CLBs. The area used by a 13 port MMP is 342 CLBs. 6.2 Incorporation into Janus Compiler The MMP has been successfully incorporated into the Janus CCM compiler. The compiler can now achieve more efficient results. Also, application designers are no longer required to use only one type of operation in each Unordered stage. 61 Chapter 7. Future Work The research presented in this thesis opens up many areas of further work. 7.1 Improved MMP 7.1.1 More Parameters The standard MMP is fixed at eight memory ports. Ideally the designer would be able to specify the number of memory ports needed. It may also be helpful to let the designer specify the bit width of the ports. If the designer only requires memory ports to be eight bits wide, there may be a way to turn one 32 bit memory port into four eight bit memory ports. 7.1.2 Improved Efficiency All of the existing versions of the MMP potentially waste clock cycles on each MMP cycle. For the eight port MMP, during cycles 8 and 9, the MMP is waiting for any data that may have been read by Port 6 or Port 7. These extra cycles could be skipped if neither Port 6 nor Port 7 read from memory. 7.2 Task Sharing It may be possible for the MMP to control task sharing where access to the physical memory port is arbitrated much like access to a processor is arbitrated by an operating system. Operations could have a memory port request signal that they assert when they need to access memory. This may be the situation in a desktop workstation where two different programs simultaneously have operations implemented in the reconfigurable hardware. 62 7.3 Better Generic Address Generator The generic address generator could be improved by adding a looping option. In some cases the user may want the address to repeatedly go through a sequence of memory locations a fixed number of times. An additional generic value could be added that lets the user specify how many times to loop through the given sequence. 7.4 Smarter Janus Memory Tools Currently, if there are eight operations on a PE, the data for each operation is written to the memory and read from the memory by the host as separate operations. A small gain in speed may be realized by combining these eight operations into a single fetch or retire memory operation. 7.5 Further Improvements to the Janus Scheduler The new MultiOp Scheduler does not work for the case that there are more types of operations than there are available memory ports in a single Unordered stage. To allow this case, a reconfiguration of the hardware must be performed at some point within the Unordered stage. The current schedulers do not have the ability to schedule a reconfiguration during an Unordered stage. 63 Appendix A. Source Code A.1 Multiplexed Memory Port Source Files A.1.1 VHDL MMP Source Files A.1.1.1 mem_port.vhd library IEEE; use IEEE.STD_LOGIC_1164.all; entity MEM_PORT is generic ( PATH_DELAY: INTEGER := 0; -- delay until valid data (this is not implemented yet) --(# clk cycles to wait before writing or reading from memory) DATA_WIDTH: INTEGER := 32; -- width of data vectors ADDRESS_WIDTH: INTEGER := 22; -- width of address bus MEMORY_PORTS: INTEGER := 8 -- number of memory ports (currently ignored (fixed at 8)) ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; CLK_ENABLE : out STD_LOGIC; MEM_PORT_WRITE_SEL_N: out STD_LOGIC; -- conveys R/W signal up to skeleton graph MEM_ADDRESS: out STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); --address for memory read or write DATA_FROM_MEM: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); --data coming from memory DATA_TO_MEM: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); -- data going to memory MEMORY_DIRECTION0: in STD_LOGIC; MEMORY_DIRECTION1: in STD_LOGIC; MEMORY_DIRECTION2: in STD_LOGIC; MEMORY_DIRECTION3: in STD_LOGIC; MEMORY_DIRECTION4: in STD_LOGIC; MEMORY_DIRECTION5: in STD_LOGIC; MEMORY_DIRECTION6: in STD_LOGIC; MEMORY_DIRECTION7: in STD_LOGIC; ADDRESS0: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS1: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS2: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS3: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS4: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS5: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS6: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS7: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); SINK0: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK1: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK2: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK3: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK4: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK5: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK6: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK7: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE0: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE1: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE2: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE3: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); 64 SOURCE4: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE5: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE6: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE7: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0)); end MEM_PORT; architecture MEM_PORT_ARCH of MEM_PORT is signal PORT_SELECT : STD_LOGIC_VECTOR(3 downto 0); signal DELAY : integer; begin process(CLR, CLK) variable FIRST_TIME : integer; begin if CLR = '0' then MEM_PORT_WRITE_SEL_N <= '1'; -- Read Memory CLK_ENABLE <= '0'; PORT_SELECT <= "0000"; DELAY <= 0; elsif CLK'EVENT and CLK = '0' then case PORT_SELECT is when "0000" => CLK_ENABLE <= '0'; PORT_SELECT <= "0001"; MEM_ADDRESS <= ADDRESS0; MEM_PORT_WRITE_SEL_N <= MEMORY_DIRECTION0; if MEMORY_DIRECTION0 = '0' then DATA_TO_MEM <= SINK0; -- value to write to memory end if; when "0001" => CLK_ENABLE <= '0'; PORT_SELECT <= "0010"; MEM_ADDRESS <= ADDRESS1; MEM_PORT_WRITE_SEL_N <= MEMORY_DIRECTION1; if MEMORY_DIRECTION1 = '0' then DATA_TO_MEM <= SINK1; -- value to write to memory end if; when "0010" => CLK_ENABLE <= '0'; PORT_SELECT <= "0011"; MEM_ADDRESS <= ADDRESS2; MEM_PORT_WRITE_SEL_N <= MEMORY_DIRECTION2; if MEMORY_DIRECTION2 = '0' then DATA_TO_MEM <= SINK2; -- value to write to memory end if; if MEMORY_DIRECTION0 = '1' then -- read port 0 data if necessary from 2 cycles ago SOURCE0 <= DATA_FROM_MEM; end if; when "0011" => CLK_ENABLE <= '0'; PORT_SELECT <= "0100"; MEM_ADDRESS <= ADDRESS3; MEM_PORT_WRITE_SEL_N <= MEMORY_DIRECTION3; if MEMORY_DIRECTION3 = '0' then DATA_TO_MEM <= SINK3; -- value to write to memory end if; if MEMORY_DIRECTION1 = '1' then -- read port 1 data if necessary from 2 cycles ago SOURCE1 <= DATA_FROM_MEM; end if; when "0100" => CLK_ENABLE <= '0'; PORT_SELECT <= "0101"; MEM_ADDRESS <= ADDRESS4; MEM_PORT_WRITE_SEL_N <= MEMORY_DIRECTION4; if MEMORY_DIRECTION4 = '0' then DATA_TO_MEM <= SINK4; -- value to write to memory end if; if MEMORY_DIRECTION2 = '1' then -- read port 2 data if necessary from 2 cycles ago 65 SOURCE2 <= DATA_FROM_MEM; end if; when "0101" => CLK_ENABLE <= '0'; PORT_SELECT <= "0110"; MEM_ADDRESS <= ADDRESS5; MEM_PORT_WRITE_SEL_N <= MEMORY_DIRECTION5; if MEMORY_DIRECTION5 = '0' then DATA_TO_MEM <= SINK5; -- value to write to memory end if; if MEMORY_DIRECTION3 = '1' then -- read port 3 data if necessary from 2 cycles ago SOURCE3 <= DATA_FROM_MEM; end if; when "0110" => CLK_ENABLE <= '0'; PORT_SELECT <= "0111"; MEM_ADDRESS <= ADDRESS6; MEM_PORT_WRITE_SEL_N <= MEMORY_DIRECTION6; if MEMORY_DIRECTION6 = '0' then DATA_TO_MEM <= SINK6; -- value to write to memory end if; if MEMORY_DIRECTION4 = '1' then -- read port 4 data if necessary from 2 cycles ago SOURCE4 <= DATA_FROM_MEM; end if; when "0111" => -- This cycle actually takes 3 cycles, waiting for previous data if DELAY = 0 then -- write port 7 data and/or read port 5 data CLK_ENABLE <= '0'; PORT_SELECT <= "0111"; MEM_ADDRESS <= ADDRESS7; MEM_PORT_WRITE_SEL_N <= MEMORY_DIRECTION7; if MEMORY_DIRECTION7 = '0' then DATA_TO_MEM <= SINK7; -- value to write to memory end if; if MEMORY_DIRECTION5 = '1' then -- read port 5 data if necessary from 2 cycles ago SOURCE5 <= DATA_FROM_MEM; end if; DELAY <= 1; elsif DELAY = 1 then -- read port 6 data if necessary CLK_ENABLE <= '0'; PORT_SELECT <= "0111"; DELAY <= 2; if MEMORY_DIRECTION6 = '1' then -- read port 6 data if necessary from 2 cycles ago SOURCE6 <= DATA_FROM_MEM; end if; elsif DELAY = 2 then -- read port 7 data if necessary CLK_ENABLE <= '0'; DELAY <= 3; PORT_SELECT <= "0111"; if MEMORY_DIRECTION7 = '1' then -- read port 7 data if necessary from 2 cycles ago SOURCE7 <= DATA_FROM_MEM; end if; else CLK_ENABLE <= '1'; -- turn on clock pulse (all data has been read) PORT_SELECT <= "0000"; DELAY <= 0; end if; when others => PORT_SELECT <= "0000"; MEM_PORT_WRITE_SEL_N <= '1'; -- Read Memory end case; end if; end process; end MEM_PORT_ARCH; 66 A.1.1.2 mem_port_tb.vhd library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_arith.all; entity MEM_PORT_TB is end MEM_PORT_TB; architecture MEM_PORT_TB_ARCH of MEM_PORT_TB is component MEM_PORT generic ( PATH_DELAY: INTEGER := 0; -- delay until valid data --(# clk cycles to wait before writing or reading from memory) --(this is not implemented yet) DATA_WIDTH: INTEGER := 32; -- width of data vectors ADDRESS_WIDTH: INTEGER := 22; -- width of address bus MEMORY_PORTS: INTEGER := 8 -- number of memory ports (currently ignored (fixed at 8)) ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; CLK_ENABLE : out STD_LOGIC; MEM_PORT_WRITE_SEL_N: out STD_LOGIC; -- conveys R/W signal up to skeleton graph MEM_ADDRESS: out STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); --address for memory read or write DATA_FROM_MEM: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); --data coming from memory DATA_TO_MEM: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); -- data going to memory MEMORY_DIRECTION0: in STD_LOGIC; MEMORY_DIRECTION1: in STD_LOGIC; MEMORY_DIRECTION2: in STD_LOGIC; MEMORY_DIRECTION3: in STD_LOGIC; MEMORY_DIRECTION4: in STD_LOGIC; MEMORY_DIRECTION5: in STD_LOGIC; MEMORY_DIRECTION6: in STD_LOGIC; MEMORY_DIRECTION7: in STD_LOGIC; ADDRESS0: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS1: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS2: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS3: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS4: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS5: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS6: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS7: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); SINK0: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK1: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK2: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK3: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK4: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK5: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK6: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK7: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE0: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE1: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE2: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE3: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE4: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE5: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE6: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE7: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0)); end component; 67 signal CLOCK: STD_LOGIC; signal CLR: STD_LOGIC; signal CLK_ENABLE: STD_LOGIC; signal MEM_PORT_WRITE_SEL_N: STD_LOGIC; signal MEM_ADDRESS: STD_LOGIC_VECTOR(21 downto 0); signal DATA_FROM_MEM: STD_LOGIC_VECTOR(31 downto 0); signal DATA_TO_MEM: STD_LOGIC_VECTOR(31 downto 0); signal MEMORY_DIRECTION0, MEMORY_DIRECTION1, MEMORY_DIRECTION2, MEMORY_DIRECTION3, MEMORY_DIRECTION4, MEMORY_DIRECTION5, MEMORY_DIRECTION6, MEMORY_DIRECTION7: STD_LOGIC; signal ADDRESS0, ADDRESS1, ADDRESS2, ADDRESS3, ADDRESS4, ADDRESS5, ADDRESS6, ADDRESS7: STD_LOGIC_VECTOR(21 downto 0); signal SINK0, SINK1, SINK2, SINK3, SINK4, SINK5, SINK6, SINK7: STD_LOGIC_VECTOR(31 downto 0); signal SOURCE0, SOURCE1, SOURCE2, SOURCE3, SOURCE4, SOURCE5, SOURCE6, SOURCE7: STD_LOGIC_VECTOR(31 downto 0); begin mem_port1 : MEM_PORT port map (CLK => CLOCK, CLR => CLR, CLK_ENABLE => CLK_ENABLE, MEM_PORT_WRITE_SEL_N => MEM_PORT_WRITE_SEL_N, MEM_ADDRESS => MEM_ADDRESS, DATA_FROM_MEM => DATA_FROM_MEM, DATA_TO_MEM => DATA_TO_MEM, MEMORY_DIRECTION0 => MEMORY_DIRECTION0, MEMORY_DIRECTION1 => MEMORY_DIRECTION1, MEMORY_DIRECTION2 => MEMORY_DIRECTION2, MEMORY_DIRECTION3 => MEMORY_DIRECTION3, MEMORY_DIRECTION4 => MEMORY_DIRECTION4, MEMORY_DIRECTION5 => MEMORY_DIRECTION5, MEMORY_DIRECTION6 => MEMORY_DIRECTION6, MEMORY_DIRECTION7 => MEMORY_DIRECTION7, ADDRESS0 => ADDRESS0, ADDRESS1 => ADDRESS1, ADDRESS2 => ADDRESS2, ADDRESS3 => ADDRESS3, ADDRESS4 => ADDRESS4, ADDRESS5 => ADDRESS5, ADDRESS6 => ADDRESS6, ADDRESS7 => ADDRESS7, SINK0 => SINK0, SINK1 => SINK1, SINK2 => SINK2, SINK3 => SINK3, SINK4 => SINK4, SINK5 => SINK5, SINK6 => SINK6, SINK7 => SINK7, SOURCE0 => SOURCE0, SOURCE1 => SOURCE1, SOURCE2 => SOURCE2, SOURCE3 => SOURCE3, SOURCE4 => SOURCE4, SOURCE5 => SOURCE5, SOURCE6 => SOURCE6, SOURCE7 => SOURCE7); CLR <= '0', '1' after 40 ns; process (CLOCK, CLR, CLK_ENABLE) variable CYCLES:INTEGER:=0; begin if CLR = '0' then CLOCK <= '0'; DATA_FROM_MEM <= "00000000000000000000000000000001"; MEMORY_DIRECTION0 <= '0'; MEMORY_DIRECTION1 <= '0'; 68 MEMORY_DIRECTION2 <= '0'; MEMORY_DIRECTION3 <= '0'; MEMORY_DIRECTION4 <= '0'; MEMORY_DIRECTION5 <= '0'; MEMORY_DIRECTION6 <= '0'; MEMORY_DIRECTION7 <= '0'; ADDRESS0 <= (others => '0'); ADDRESS1 <= (others => '0'); ADDRESS2 <= (others => '0'); ADDRESS3 <= (others => '0'); ADDRESS4 <= (others => '0'); ADDRESS5 <= (others => '0'); ADDRESS6 <= (others => '0'); ADDRESS7 <= (others => '0'); SINK0 <= (others => '0'); SINK1 <= (others => '0'); SINK2 <= (others => '0'); SINK3 <= (others => '0'); SINK4 <= (others => '0'); SINK5 <= (others => '0'); SINK6 <= (others => '0'); SINK7 <= (others => '0'); else CLOCK <= not(CLOCK) after 20 ns; end if; if CLOCK'EVENT and CLOCK = '1' then DATA_FROM_MEM <= DATA_FROM_MEM(30 downto 0) & '0'; end if; if CLK_ENABLE'EVENT and CLK_ENABLE = '1' then CYCLES := CYCLES + 1; if CYCLES = 1 then MEMORY_DIRECTION0 <= '1'; MEMORY_DIRECTION1 <= '0'; MEMORY_DIRECTION2 <= '1'; MEMORY_DIRECTION3 <= '0'; MEMORY_DIRECTION4 <= '1'; MEMORY_DIRECTION5 <= '0'; MEMORY_DIRECTION6 <= '1'; MEMORY_DIRECTION7 <= '0'; ADDRESS0 <= "0000000000000000000000"; ADDRESS1 <= "0000000000000000000001"; ADDRESS2 <= "0000000000000000000010"; ADDRESS3 <= "0000000000000000000011"; ADDRESS4 <= "0000000000000000000100"; ADDRESS5 <= "0000000000000000000101"; ADDRESS6 <= "0000000000000000000110"; ADDRESS7 <= "0000000000000000000111"; SINK0 <= "00000000000000000000000000000010"; SINK1 <= "00000000000000000000000000000100"; SINK2 <= "00000000000000000000000000000110"; SINK3 <= "00000000000000000000000000001000"; SINK4 <= "00000000000000000000000000001010"; SINK5 <= "00000000000000000000000000001100"; SINK6 <= "00000000000000000000000000001110"; SINK7 <= "00000000000000000000000000010000"; else MEMORY_DIRECTION0 <= '0'; MEMORY_DIRECTION1 <= '1'; MEMORY_DIRECTION2 <= '0'; MEMORY_DIRECTION3 <= '1'; MEMORY_DIRECTION4 <= '0'; MEMORY_DIRECTION5 <= '1'; MEMORY_DIRECTION6 <= '0'; MEMORY_DIRECTION7 <= '1'; ADDRESS0 <= ADDRESS0(20 downto 0) & '0'; ADDRESS1 <= ADDRESS1(20 downto 0) & '0'; ADDRESS2 <= ADDRESS2(20 downto 0) & '0'; ADDRESS3 <= ADDRESS3(20 downto 0) & '0'; ADDRESS4 <= ADDRESS4(20 downto 0) & '0'; ADDRESS5 <= ADDRESS5(20 downto 0) & '0'; ADDRESS6 <= ADDRESS6(20 downto 0) & '0'; 69 ADDRESS7 <= ADDRESS7(20 downto 0) & '0'; SINK0 <= SINK0(30 downto 0) & '0'; SINK1 <= SINK1(30 downto 0) & '0'; SINK2 <= SINK2(30 downto 0) & '0'; SINK3 <= SINK3(30 downto 0) & '0'; SINK4 <= SINK4(30 downto 0) & '0'; SINK5 <= SINK5(30 downto 0) & '0'; SINK6 <= SINK6(30 downto 0) & '0'; SINK7 <= SINK7(30 downto 0) & '0'; end if; end if; end process; end MEM_PORT_TB_ARCH; A.1.2 JHDL MMP Source Files A.1.2.1 mux_8_1.java // James Atwell - 8 to 1 multiplexor package visc.cc.rtr.mmp; import byucc.jhdl.Logic.*; import byucc.jhdl.base.*; import byucc.jhdl.Xilinx.*; import byucc.jhdl.Xilinx.XC4000.*; import byucc.jhdl.Xilinx.XC4000.carryLogic.*; public class mux_8_1 extends Logic { public static CellInterface[] cell_interface = { param("DATA_WIDTH", INTEGER), in("DATA_IN0", "DATA_WIDTH"), in("DATA_IN1", "DATA_WIDTH"), in("DATA_IN2", "DATA_WIDTH"), in("DATA_IN3", "DATA_WIDTH"), in("DATA_IN4", "DATA_WIDTH"), in("DATA_IN5", "DATA_WIDTH"), in("DATA_IN6", "DATA_WIDTH"), in("DATA_IN7", "DATA_WIDTH"), in("SELECT", "SELECT_WIDTH"), out("DATA_OUT", "DATA_WIDTH")}; public static final String cellname = "mux_8_1thatwasfun"; public mux_8_1(Node parent, Wire DATA_IN0, Wire DATA_IN1, Wire DATA_IN2, Wire DATA_IN3, Wire DATA_IN4, Wire DATA_IN5, Wire DATA_IN6, Wire DATA_IN7, Wire SELECT, Wire DATA_OUT) { super(parent); int WIDTH1 = DATA_IN0.getWidth(); bind("DATA_WIDTH", WIDTH1); int WIDTH2 = SELECT.getWidth(); bind("SELECT_WIDTH", WIDTH2); port("DATA_IN0", DATA_IN0); port("DATA_IN1", DATA_IN1); port("DATA_IN2", DATA_IN2); port("DATA_IN3", DATA_IN3); port("DATA_IN4", DATA_IN4); port("DATA_IN5", DATA_IN5); port("DATA_IN6", DATA_IN6); port("DATA_IN7", DATA_IN7); port("SELECT", SELECT); port("DATA_OUT", DATA_OUT); 70 Wire INTERNAL_0 = wire(this, WIDTH1, "INTERNAL_0"); Wire INTERNAL_1 = wire(this, WIDTH1, "INTERNAL_1"); Wire INTERNAL_2 = wire(this, WIDTH1, "INTERNAL_2"); Wire INTERNAL_3 = wire(this, WIDTH1, "INTERNAL_3"); Wire INTERNAL_4 = wire(this, WIDTH1, "INTERNAL_4"); Wire INTERNAL_5 = wire(this, WIDTH1, "INTERNAL_5"); //first level mux_o( DATA_IN0, DATA_IN1, SELECT.getWire(0), INTERNAL_0); mux_o( DATA_IN2, DATA_IN3, SELECT.getWire(0), INTERNAL_1); mux_o( DATA_IN4, DATA_IN5, SELECT.getWire(0), INTERNAL_2); mux_o( DATA_IN6, DATA_IN7, SELECT.getWire(0), INTERNAL_3); //second level mux_o( INTERNAL_0, INTERNAL_1, SELECT.getWire(1), INTERNAL_4); mux_o( INTERNAL_2, INTERNAL_3, SELECT.getWire(1), INTERNAL_5); //third level - choose final output mux_o( INTERNAL_4, INTERNAL_5, SELECT.getWire(2), DATA_OUT); } } A.1.2.2 struct_m.java // James Atwell - multiplexed memory port // This is the actual memory port file package visc.cc.rtr.mmp; import byucc.jhdl.base.*; import byucc.jhdl.Logic.*; import byucc.jhdl.Xilinx.*; import byucc.jhdl.Xilinx.XC4000.*; import byucc.jhdl.Xilinx.XC4000.carryLogic.*; public class struct_m extends Logic { public static CellInterface[] cell_interface = { out("CLK_ENABLE", 1), out("MEM_WRITE_SEL", 1), out("MEM_ADDRESS", 20), in("DATA_FROM_MEM", 32), out("DATA_TO_MEM", 32), in("MEMORY_DIRECTION0", 1), in("MEMORY_DIRECTION1", 1), in("MEMORY_DIRECTION2", 1), in("MEMORY_DIRECTION3", 1), in("MEMORY_DIRECTION4", 1), in("MEMORY_DIRECTION5", 1), in("MEMORY_DIRECTION6", 1),in("MEMORY_DIRECTION7", 1), 71 in("ADDRESS0", 20), in("ADDRESS1", 20), in("ADDRESS2", 20), in("ADDRESS3", 20), in("ADDRESS4", 20), in("ADDRESS5", 20), in("ADDRESS6", 20), in("ADDRESS7", 20), in("SINK0", 32), in("SINK1", 32), in("SINK2", 32), in("SINK3", 32), in("SINK4", 32), in("SINK5", 32), in("SINK6", 32), in("SINK7", 32), out("SOURCE0", 32), out("SOURCE1", 32), out("SOURCE2", 32), out("SOURCE3", 32), out("SOURCE4", 32), out("SOURCE5", 32), out("SOURCE6", 32), out("SOURCE7", 32) }; public static final String cellname = "memory_port_guy"; public struct_m(Node parent, Wire CLK_ENABLE, Wire MEM_WRITE_SEL, Wire MEM_ADDRESS, Wire DATA_FROM_MEM, Wire DATA_TO_MEM, Wire MEMORY_DIRECTION0, Wire MEMORY_DIRECTION1, Wire MEMORY_DIRECTION2, Wire MEMORY_DIRECTION3, Wire MEMORY_DIRECTION4, Wire MEMORY_DIRECTION5, Wire MEMORY_DIRECTION6, Wire MEMORY_DIRECTION7, Wire ADDRESS0, Wire ADDRESS1, Wire ADDRESS2, Wire ADDRESS3, Wire ADDRESS4, Wire ADDRESS5, Wire ADDRESS6, Wire ADDRESS7, Wire SINK0, Wire SINK1, Wire SINK2, Wire SINK3, Wire SINK4, Wire SINK5, Wire SINK6, Wire SINK7, Wire SOURCE0, Wire SOURCE1, Wire SOURCE2, Wire SOURCE3, Wire SOURCE4, Wire SOURCE5, Wire SOURCE6, Wire SOURCE7) { super(parent); connect("CLK_ENABLE", CLK_ENABLE); connect("MEM_WRITE_SEL", MEM_WRITE_SEL); connect("MEM_ADDRESS", MEM_ADDRESS); connect("DATA_FROM_MEM", DATA_FROM_MEM); connect("DATA_TO_MEM", DATA_TO_MEM); connect("MEMORY_DIRECTION0", MEMORY_DIRECTION0); // =1 means input(sink), 0 means output (source) connect("MEMORY_DIRECTION1", MEMORY_DIRECTION1); connect("MEMORY_DIRECTION2", MEMORY_DIRECTION2); connect("MEMORY_DIRECTION3", MEMORY_DIRECTION3); connect("MEMORY_DIRECTION4", MEMORY_DIRECTION4); connect("MEMORY_DIRECTION5", MEMORY_DIRECTION5); connect("MEMORY_DIRECTION6", MEMORY_DIRECTION6); connect("MEMORY_DIRECTION7", MEMORY_DIRECTION7); connect("ADDRESS0", ADDRESS0); connect("ADDRESS1", ADDRESS1); connect("ADDRESS2", ADDRESS2); connect("ADDRESS3", ADDRESS3); connect("ADDRESS4", ADDRESS4); connect("ADDRESS5", ADDRESS5); connect("ADDRESS6", ADDRESS6); connect("ADDRESS7", ADDRESS7); connect("SINK0", SINK0); connect("SINK1", SINK1); connect("SINK2", SINK2); connect("SINK3", SINK3); connect("SINK4", SINK4); connect("SINK5", SINK5); connect("SINK6", SINK6); connect("SINK7", SINK7); connect("SOURCE0", SOURCE0); connect("SOURCE1", SOURCE1); connect("SOURCE2", SOURCE2); connect("SOURCE3", SOURCE3); connect("SOURCE4", SOURCE4); connect("SOURCE5", SOURCE5); connect("SOURCE6", SOURCE6); connect("SOURCE7", SOURCE7); Wire SELECT = wire(this, 3, "SELECT"); 72 Wire SELECT_DELAY = wire(this, 3, "SELECT_DELAY"); regc_o(regc(SELECT),SELECT_DELAY); // SELECT Delayed by 2 cycles for data to come from memory //assert CLK_ENABLE after data has been read regc_o(and(SELECT_DELAY.gw(2), SELECT_DELAY.gw(1), SELECT_DELAY.gw(0)), CLK_ENABLE); // SOURCE0 to SOURCE7 are regce_o(DATA_FROM_MEM, and(not(SELECT_DELAY.gw(2)), not(SELECT_DELAY.gw(1)), not(SELECT_DELAY.gw(0)), "and0"), SOURCE0);//000 regce_o(DATA_FROM_MEM, and(not(SELECT_DELAY.gw(2)), not(SELECT_DELAY.gw(1)), (SELECT_DELAY.gw(0)),"and1"), SOURCE1); //001 regce_o(DATA_FROM_MEM, and(not(SELECT_DELAY.gw(2)), (SELECT_DELAY.gw(1)), not(SELECT_DELAY.gw(0)),"and2"), SOURCE2); //010 regce_o(DATA_FROM_MEM, and(not(SELECT_DELAY.gw(2)), (SELECT_DELAY.gw(1)), (SELECT_DELAY.gw(0)),"and3"), SOURCE3); //011 regce_o(DATA_FROM_MEM, and((SELECT_DELAY.gw(2)), not(SELECT_DELAY.gw(1)), not(SELECT_DELAY.gw(0)),"and4"), SOURCE4); //100 regce_o(DATA_FROM_MEM, and((SELECT_DELAY.gw(2)), not(SELECT_DELAY.gw(1)), (SELECT_DELAY.gw(0)),"and5"), SOURCE5); //101 regce_o(DATA_FROM_MEM, and((SELECT_DELAY.gw(2)), (SELECT_DELAY.gw(1)), not(SELECT_DELAY.gw(0)),"and6"), SOURCE6); //110 regce_o(DATA_FROM_MEM, and((SELECT_DELAY.gw(2)), (SELECT_DELAY.gw(1)), (SELECT_DELAY.gw(0)),"and7"), SOURCE7); //111 // out_select is a counter that cycles from 0 to 10 and then repeats // It drives SELECT with the values 0 to 7 and holds at 7 until it resets // new out_select(this, SELECT); Wire SUM = wire(this, 4, "SUM"); Wire NEXT_COUNT = wire(this, 4, "NEXT_COUNT"); Wire COUNT = wire(this, 4, "COUNT"); // the 4 bit counter // wired to reset when count gets to 1010B mux_o(constant(4,0), SUM, (not(and(COUNT.getWire(3), COUNT.getWire(1)))), NEXT_COUNT, "reset_mux"); add_o(constant(4,1), COUNT, SUM, "adder"); reg_o(NEXT_COUNT, COUNT); // select output mux_o(COUNT.range(2,0), constant(3,0), COUNT.getWire(3), SELECT, "output_mux"); // ADDRESS, DATA_TO_MEM, and MEMORY_DIRECTION are each time multiplexed to the real memory port. new mux_8_1(this, ADDRESS0, ADDRESS1, ADDRESS2, ADDRESS3, ADDRESS4, ADDRESS5, ADDRESS6, ADDRESS7, SELECT, MEM_ADDRESS); new mux_8_1(this, SINK0, SINK1, SINK2, SINK3, SINK4, SINK5, SINK6, SINK7, SELECT, DATA_TO_MEM); new mux_8_1(this, MEMORY_DIRECTION0, MEMORY_DIRECTION1, MEMORY_DIRECTION2, MEMORY_DIRECTION3, MEMORY_DIRECTION4, MEMORY_DIRECTION5, MEMORY_DIRECTION6, MEMORY_DIRECTION7, SELECT, MEM_WRITE_SEL); } } A.2 Generic WILDFORCE Structure Files A.2.1 Generic WILDFORCE VHDL Files A.2.1.1 vt_sklc1.vhd -- Skeleton Logic Core Description -- James Atwell ------------------------------------------------------------------------------- Entity : PE1_Logic_Core -- Architecture : Shell 73 -- Current Filename : vt_sklc1.vhd -- Original Filename : vf_pe1lc.vhd -- Original Date : 8/17/98 -- Date Last Modified : 1/22/99 -----------------------------------------------------------------------------library IEEE, WF4; use IEEE.std_logic_1164.all; use IEEE.std_logic_unsigned.all; use IEEE.std_logic_arith.all; use WF4.AMS_XC4000.all; use WF4.WFPCI_Package.all; use wf4.WFPCI_Components.all; ----------------------- Entity Declaration ----------------------entity PE1_Logic_Core is generic ( BD_ID : integer := 0; PE_ID : integer := 1 ); port ( PE_Pclk : in std_logic; PE_Mclk : in std_logic; PE_Kclk : in std_logic; PE_Reset PE_InterruptReq_n PE_InterruptAck_n : in std_logic; : out std_logic; : in std_logic; : out std_logic_vector ( 21 downto 0 ); : in std_logic_vector ( 31 downto 0 ); : out std_logic_vector ( 31 downto 0 ); : out std_logic; : in std_logic; : out std_logic; : out std_logic; : out std_logic; : in std_logic; : in std_logic_vector ( 35 downto 0 ); : out std_logic_vector ( 35 downto 0 ); : out std_logic_vector ( 35 downto 0 ); : in std_logic_vector ( 35 downto 0 ); : out std_logic_vector ( 35 downto 0 ); : out std_logic_vector ( 35 downto 0 ); : in std_logic_vector ( 35 downto 0 ); : out std_logic_vector ( 35 downto 0 ); : out std_logic_vector ( 35 downto 0 ); : out std_logic_vector ( 8 downto 0 ); PE_MemAddr_OutReg PE_MemData_InReg PE_MemData_OutReg PE_MemBusReq_n PE_MemBusGrant_n PE_MemStrobe_n PE_MemWriteSel_n PE_MemHoldReq_n PE_MemHoldAck_n PE_Left_In PE_Left_Out PE_Left_OE PE_Right_In PE_Right_Out PE_Right_OE PE_XbarData_In PE_XbarData_Out PE_XbarData_OE PE_XbarData_WE PE_FifoSelect : out std_logic_vector ( 1 downto 0 ); PE_Fifo_WE_n : out std_logic; PE_FifoPtrIncr_EN : out std_logic; PE_HostToPeFifoEmpty_n : in std_logic; PE_HostToPeFifoAlmostEmpty_n : in std_logic; PE_PeToHostFifoAlmostFull_n : in std_logic; PE_PeToHostFifoFull_n : in std_logic; PE_HostToPeMboxEmpty_n : in std_logic; PE_PeToHostMboxFull_n : in std_logic; PE_ExtIoToPeFifoEmpty_n : in std_logic; PE_ExtIoToPeFifoAlmostEmpty_n : in std_logic; PE_PeToExtIoFifoAlmostFull_n : in std_logic; PE_PeToExtIoFifoFull_n : in std_logic; PE_CPE_Bus_In PE_CPE_Bus_Out PE_CPE_Bus_OE ); : in std_logic_vector ( 1 downto 0 ); : out std_logic_vector ( 1 downto 0 ); : out std_logic_vector ( 1 downto 0 ) 74 end PE1_Logic_Core; -------------------------- Architecture Declaration -------------------------architecture SKELETON_UNIT of PE1_Logic_Core is -- Jim's homemade FSM type type SKELETON_FSM_STATES is(STATE_INIT, STATE_MEMREQUEST, STATE_WAIT_FOR_DONE, STATE_INTERRUPT_HOST, STATE_DONE); -- Jim Inserted Signals signal SKELETON_FSM: SKELETON_FSM_STATES; signal SKELETON_FSM_NEXT: SKELETON_FSM_STATES; signal START_GRAPH_N: STD_LOGIC; -- controls whether graph is 'running' or cleared to 0. signal GRAPH_DONE: STD_LOGIC; -- conveys signal from graph that it is done signal MEM_PORT_WRITE_SEL_N: STD_LOGIC; -- conveys R/W signal from memory port. component USER_VHDL_MODEL port ( CLOCK: in STD_LOGIC; CLR: in STD_LOGIC; DONE: out STD_LOGIC; MEM_SIGNAL0: out STD_LOGIC; --used for read/write signal MEM_SIGNAL1: out STD_LOGIC_VECTOR (21 downto 0); -- used for passing address MEM_SIGNAL2: out STD_LOGIC_VECTOR (31 downto 0); -- used for data to memory MEM_SIGNAL3: in STD_LOGIC_VECTOR (31 downto 0); -- used for data from memory DATA_SIGNAL0: in STD_LOGIC_VECTOR (35 downto 0); -- used for PE_LEFT_IN DATA_SIGNAL1: out STD_LOGIC_VECTOR (35 downto 0); -- used for PE_LEFT_OUT DATA_SIGNAL2: in STD_LOGIC_VECTOR (35 downto 0); -- used for PE_RIGHT_IN DATA_SIGNAL3: out STD_LOGIC_VECTOR (35 downto 0) -- used for PE_RIGHT_OUT ); end component; begin JUGDISH : USER_VHDL_MODEL port map (CLOCK => PE_PCLK, CLR => START_GRAPH_N, DONE => GRAPH_DONE, MEM_SIGNAL0 => MEM_PORT_WRITE_SEL_N, -- gets R/W signal from memport via graph MEM_SIGNAL1 => PE_MemAddr_OutReg, MEM_SIGNAL2 => PE_MemData_OutReg, -- used for writing data to memory MEM_SIGNAL3 => PE_MemData_InReg, -- used for reading data from memory DATA_SIGNAL0 => PE_LEFT_IN, DATA_SIGNAL1 => PE_LEFT_OUT, DATA_SIGNAL2 => PE_RIGHT_IN, DATA_SIGNAL3 => PE_RIGHT_OUT); -- Synchronous process that clocks the state machines, and handles reset conditions. P_Sync: process (PE_Reset, PE_Pclk) begin if (PE_Reset = '1') then SKELETON_FSM <= STATE_INIT; elsif ( (PE_Pclk'event) and (PE_Pclk = '1') ) then SKELETON_FSM <= SKELETON_FSM_NEXT; end if; end process P_Sync; -- Asynchronous process that determies the current and next states of the state machine. SKELETON_ASync : process(SKELETON_FSM, PE_MemBusGrant_n, PE_MemData_InReg, PE_InterruptAck_n, MEM_PORT_WRITE_SEL_N) begin -- Initialization state. We wait for the reset signal to become deasserted before continuing. case SKELETON_FSM is when STATE_INIT => START_GRAPH_N <= '0'; -- Graph cleared and held at reset conditions PE_InterruptReq_n <= '1'; -- Not Interrupting host 75 PE_MemBusReq_n <= '1'; -- Don't Request Memory Yet PE_MemStrobe_n <= '1'; -- Don't Strobe Memory PE_MemWriteSel_n <= '0'; -- Set to Write Memory SKELETON_FSM_NEXT <= STATE_MEMREQUEST; -- Request access to the memory bus, and wait until we have been -- granted access from the DPMC before starting the read cycle. when STATE_MEMREQUEST => START_GRAPH_N <= '0'; -- Graph cleared and held at reset conditions PE_InterruptReq_n <= '1'; -- Not Interrupting host PE_MemBusReq_n <= '0'; -- Request Mem Bus by pulling low PE_MemStrobe_n <= '1'; -- Don't Strobe Memory PE_MemWriteSel_n <= '1'; -- Set to Read Memory if (PE_MemBusGrant_n = '0') then -- wait for memory accesss SKELETON_FSM_NEXT <= STATE_WAIT_FOR_DONE; else SKELETON_FSM_NEXT <= STATE_MEMREQUEST; end if; -- Memory has been granted. -- Wait for Done Signal when STATE_WAIT_FOR_DONE => -- Start graph. Graph will have a CLR line that signals whether to process or not. START_GRAPH_N <= '1'; -- Graph allowed to process data PE_InterruptReq_n <= '1'; -- No Interrupt PE_MemBusReq_n <= '0'; -- Request Memory PE_MemStrobe_n <= '0'; -- Strobe Memory if MEM_PORT_WRITE_SEL_N = '1' then PE_MemWriteSel_n <= '1'; -- Set to Read Memory else PE_MemWriteSel_n <= '0'; -- Set to Write Memory end if; if GRAPH_DONE = '1' then -- look for done signal from graph SKELETON_FSM_NEXT <= STATE_INTERRUPT_HOST; else SKELETON_FSM_NEXT <= STATE_WAIT_FOR_DONE; end if; when STATE_INTERRUPT_HOST => START_GRAPH_N <= '1'; -- Graph cleared and held at reset conditions PE_InterruptReq_n <= '0'; -- Interrupting host PE_MemBusReq_n <= '1'; -- Let memory be available to host PE_MemStrobe_n <= '1'; -- Don't Strobe Memory PE_MemWriteSel_n <= '1'; -- Set to Read Memory if PE_InterruptAck_n = '1' then -- look for interrupt acknowledge SKELETON_FSM_NEXT <= STATE_INTERRUPT_HOST; else SKELETON_FSM_NEXT <= STATE_DONE; end if; when STATE_DONE => START_GRAPH_N <= '1'; -- Graph cleared and held at reset conditions PE_InterruptReq_n <= '1'; -- Release Interrupt Request PE_MemBusReq_n <= '1'; -- Let memory be available to host PE_MemStrobe_n <= '1'; -- Don't Strobe Memory PE_MemWriteSel_n <= '1'; -- Set to Read Memory SKELETON_FSM_NEXT <= STATE_DONE; -- Catch any other possible states when others => START_GRAPH_N <= '1'; -- Graph cleared and held at reset conditions PE_InterruptReq_n <= '1'; -- Not Interrupting host PE_MemBusReq_n <= '1'; -- Release Mem Bus Request PE_MemStrobe_n <= '1'; PE_MemWriteSel_n <= '1'; -- Set to Read Memory SKELETON_FSM_NEXT <= STATE_INIT; end case; end process SKELETON_ASync; --------------------------------------------------------------------------- 76 -- "Inactive" output port signal assignments --------------------------------------------------------------------------PE_MemHoldReq_n <= '1'; -- Disable memory hold requests PE_Left_OE <= ( others => '0' ); PE_Right_OE <= ( others => '0' ); -- Disable left port output -- Disable right port output PE_XbarData_OE <= ( others => '0' ); -- Disable crossbar port output PE_XbarData_WE <= ( others => '0' ); -- Disable crossbar port writes PE_FifoSelect <= "00"; -- Deselect fifo -- "00" selects none -- "01" selects External I/O Fifo -- "10" selects On-Board Fifo -- "11" selects On-Board Mailbox -- Disable fifo write mode -- Disable fifo pointer increment -- Disable CPE bus output PE_Fifo_WE_n <= '1'; PE_FifoPtrIncr_EN <= '0'; PE_CPE_Bus_OE <= ( others => '0' ); end SKELETON_UNIT; A.2.1.2 mmpcomp.vhd library IEEE; use IEEE.std_logic_1164.all; -- library janlib; -- use janlib.janpack.all; entity USER_VHDL_MODEL is port (CLOCK: in STD_LOGIC; CLR: in STD_LOGIC; DONE: out STD_LOGIC; MEM_SIGNAL0: out STD_LOGIC; --used for read/write signal MEM_SIGNAL1: out STD_LOGIC_VECTOR (21 downto 0); -- used for passing address MEM_SIGNAL2: out STD_LOGIC_VECTOR (31 downto 0); -- used for data to memory MEM_SIGNAL3: in STD_LOGIC_VECTOR (31 downto 0); -- used for data from memory DATA_SIGNAL0: in STD_LOGIC_VECTOR (35 downto 0); -- used for PE_LEFT_IN DATA_SIGNAL1: out STD_LOGIC_VECTOR (35 downto 0); -- used for PE_LEFT_OUT DATA_SIGNAL2: in STD_LOGIC_VECTOR (35 downto 0); -- used for PE_RIGHT_IN DATA_SIGNAL3: out STD_LOGIC_VECTOR (35 downto 0) -- used for PE_RIGHT_OUT ); end USER_VHDL_MODEL; architecture USER_VHDL_MODEL_arch of USER_VHDL_MODEL is component MEM_PORT generic ( PATH_DELAY: INTEGER := 0; -- delay until valid data --(# clk cycles to wait before writing or reading from memory) --(this is not implemented yet. use addgen delay) DATA_WIDTH: INTEGER := 32; -- width of data vectors ADDRESS_WIDTH: INTEGER := 22; -- width of address bus MEMORY_PORTS: INTEGER := 8 -- number of memory ports (currently ignored (fixed at 8)) ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; CLK_ENABLE: out STD_LOGIC; MEM_PORT_WRITE_SEL_N: out STD_LOGIC; -- conveys R/W signal up to skeleton graph MEM_ADDRESS: out STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); --address for memory read or write DATA_FROM_MEM: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); --data coming from memory DATA_TO_MEM: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); -- data going to memory MEMORY_DIRECTION0: in STD_LOGIC; MEMORY_DIRECTION1: in STD_LOGIC; 77 MEMORY_DIRECTION2: in STD_LOGIC; MEMORY_DIRECTION3: in STD_LOGIC; MEMORY_DIRECTION4: in STD_LOGIC; MEMORY_DIRECTION5: in STD_LOGIC; MEMORY_DIRECTION6: in STD_LOGIC; MEMORY_DIRECTION7: in STD_LOGIC; ADDRESS0: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS1: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS2: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS3: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS4: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS5: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS6: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS7: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); SINK0: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK1: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK2: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK3: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK4: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK5: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK6: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK7: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE0: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE1: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE2: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE3: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE4: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE5: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE6: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE7: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0)); end component; component ADDGEN generic ( PATH_DELAY: INTEGER := 0; -- delay until valid data (# clk cycles to wait before writing or reading from memory) BIT_WIDTH: INTEGER := 22; -- width of input vector INIT0: INTEGER:= 0; TERM0: INTEGER:= 0; INC0:INTEGER:= 0; INIT1: INTEGER:= 0; TERM1: INTEGER:= 0; INC1:INTEGER:= 0; INIT2: INTEGER:= 0; TERM2: INTEGER:= 0; INC2:INTEGER:= 0 ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; ADDRESS : out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); DONE: out STD_LOGIC ); end component; component USER_COMPONENT port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC ); end component; signal GSINK0 : STD_LOGIC_VECTOR(31 downto 0); signal GSINK1 : STD_LOGIC_VECTOR(31 downto 0); signal GSINK2 : STD_LOGIC_VECTOR(31 downto 0); signal GSINK3 : STD_LOGIC_VECTOR(31 downto 0); signal GSINK4 : STD_LOGIC_VECTOR(31 downto 0); 78 signal GSINK5 : STD_LOGIC_VECTOR(31 downto 0); signal GSINK6 : STD_LOGIC_VECTOR(31 downto 0); signal GSINK7 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE0 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE1 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE2 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE3 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE4 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE5 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE6 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE7 : STD_LOGIC_VECTOR(31 downto 0); signal GADDRESS0 : STD_LOGIC_VECTOR(21 downto 0); signal GADDRESS1 : STD_LOGIC_VECTOR(21 downto 0); signal GADDRESS2 : STD_LOGIC_VECTOR(21 downto 0); signal GADDRESS3 : STD_LOGIC_VECTOR(21 downto 0); signal GADDRESS4 : STD_LOGIC_VECTOR(21 downto 0); signal GADDRESS5 : STD_LOGIC_VECTOR(21 downto 0); signal GADDRESS6 : STD_LOGIC_VECTOR(21 downto 0); signal GADDRESS7 : STD_LOGIC_VECTOR(21 downto 0); signal GMEMORY_DIRECTION0 : STD_LOGIC; signal GMEMORY_DIRECTION1 : STD_LOGIC; signal GMEMORY_DIRECTION2 : STD_LOGIC; signal GMEMORY_DIRECTION3 : STD_LOGIC; signal GMEMORY_DIRECTION4 : STD_LOGIC; signal GMEMORY_DIRECTION5 : STD_LOGIC; signal GMEMORY_DIRECTION6 : STD_LOGIC; signal GMEMORY_DIRECTION7 : STD_LOGIC; signal DONE0 : STD_LOGIC; signal DONE1 : STD_LOGIC; signal DONE2 : STD_LOGIC; signal DONE3 : STD_LOGIC; signal DONE4 : STD_LOGIC; signal DONE5 : STD_LOGIC; signal DONE6 : STD_LOGIC; signal DONE7 : STD_LOGIC; signal MCLK_ENABLE : STD_LOGIC; signal VIRTUAL_CLK : STD_LOGIC; begin VIRTUAL_CLK <= CLOCK and MCLK_ENABLE; DONE <= DONE0 and DONE1 and DONE2 and DONE3 and DONE4 and DONE5 and DONE6 and DONE7; main_0 : MEM_PORT generic map (DATA_WIDTH => 32, ADDRESS_WIDTH => 22) port map (CLK => CLOCK, CLR => CLR, CLK_ENABLE => MCLK_ENABLE, MEM_PORT_WRITE_SEL_N => MEM_SIGNAL0, MEM_ADDRESS => MEM_SIGNAL1, DATA_TO_MEM => MEM_SIGNAL2, DATA_FROM_MEM => MEM_SIGNAL3, MEMORY_DIRECTION0 => GMEMORY_DIRECTION0, MEMORY_DIRECTION1 => GMEMORY_DIRECTION1, MEMORY_DIRECTION2 => GMEMORY_DIRECTION2, MEMORY_DIRECTION3 => GMEMORY_DIRECTION3, MEMORY_DIRECTION4 => GMEMORY_DIRECTION4, MEMORY_DIRECTION5 => GMEMORY_DIRECTION5, MEMORY_DIRECTION6 => GMEMORY_DIRECTION6, MEMORY_DIRECTION7 => GMEMORY_DIRECTION7, ADDRESS0 => GADDRESS0, ADDRESS1 => GADDRESS1, ADDRESS2 => GADDRESS2, ADDRESS3 => GADDRESS3, ADDRESS4 => GADDRESS4, ADDRESS5 => GADDRESS5, ADDRESS6 => GADDRESS6, ADDRESS7 => GADDRESS7, SINK0 => GSINK0, SINK1 => GSINK1, SINK2 => GSINK2, SINK3 => GSINK3, SINK4 => GSINK4, SINK5 => GSINK5, SINK6 => GSINK6, SINK7 => GSINK7, 79 SOURCE0 => GSOURCE0, SOURCE1 => GSOURCE1, SOURCE2 => GSOURCE2, SOURCE3 => GSOURCE3, SOURCE4 => GSOURCE4, SOURCE5 => GSOURCE5, SOURCE6 => GSOURCE6, SOURCE7 => GSOURCE7); -- The following address generators can be removed if they are not required. If they are removed, the extra DONE signals should be tied to 1. ADD_GEN_0: ADDGEN -- Address Generator for memory port 0 generic map (PATH_DELAY => 0, BIT_WIDTH => 22, INIT0 => 0, TERM0 => 0, INC0 => 0, INIT1 => 0, TERM1 => 0, INC1 => 0, INIT2 => 0, TERM2 => 0, INC2 => 0) port map (CLK => VIRTUAL_CLK, CLR => CLR, ADDRESS => GADDRESS0, DONE => DONE0); ADD_GEN_1: ADDGEN -- Address Generator for memory port 1 generic map (PATH_DELAY => 0, BIT_WIDTH => 22, INIT0 => 0, TERM0 => 0, INC0 => 0, INIT1 => 0, TERM1 => 0, INC1 => 0, INIT2 => 0, TERM2 => 0, INC2 => 0) port map (CLK => VIRTUAL_CLK, CLR => CLR, ADDRESS => GADDRESS1, DONE => DONE1); ADD_GEN_2: ADDGEN -- Address Generator for memory port 2 generic map (PATH_DELAY => 0, BIT_WIDTH => 22, INIT0 => 0, TERM0 => 0, INC0 => 0, INIT1 => 0, TERM1 => 0, INC1 => 0, INIT2 => 0, TERM2 => 0, INC2 => 0) port map (CLK => VIRTUAL_CLK, CLR => CLR, ADDRESS => GADDRESS2, DONE => DONE2); ADD_GEN_3: ADDGEN -- Address Generator for memory port 3 generic map (PATH_DELAY => 0, BIT_WIDTH => 22, INIT0 => 0, TERM0 => 0, INC0 => 0, INIT1 => 0, TERM1 => 0, INC1 => 0, INIT2 => 0, TERM2 => 0, INC2 => 0) port map (CLK => VIRTUAL_CLK, CLR => CLR, ADDRESS => GADDRESS3, DONE => DONE3); ADD_GEN_4: ADDGEN -- Address Generator for memory port 4 generic map (PATH_DELAY => 0, BIT_WIDTH => 22, INIT0 => 0, TERM0 => 0, INC0 => 0, INIT1 => 0, TERM1 => 0, INC1 => 0, INIT2 => 0, TERM2 => 0, INC2 => 0) port map (CLK => VIRTUAL_CLK, CLR => CLR, ADDRESS => GADDRESS4, DONE => DONE4); ADD_GEN_5: ADDGEN -- Address Generator for memory port 5 generic map (PATH_DELAY => 0, BIT_WIDTH => 22, INIT0 => 0, TERM0 => 0, INC0 => 0, INIT1 => 0, TERM1 => 0, INC1 => 0, INIT2 => 0, TERM2 => 0, INC2 => 0) port map (CLK => VIRTUAL_CLK, CLR => CLR, ADDRESS => GADDRESS5, DONE => DONE5); ADD_GEN_6: ADDGEN -- Address Generator for memory port 6 generic map (PATH_DELAY => 0, BIT_WIDTH => 22, INIT0 => 0, TERM0 => 0, INC0 => 0, INIT1 => 0, TERM1 => 0, INC1 => 0, INIT2 => 0, TERM2 => 0, INC2 => 0) port map (CLK => VIRTUAL_CLK, CLR => CLR, ADDRESS => GADDRESS6, DONE => DONE6); ADD_GEN_7: ADDGEN -- Address Generator for memory port 7 generic map (PATH_DELAY => 0, BIT_WIDTH => 22, INIT0 => 0, TERM0 => 0, INC0 => 0, INIT1 => 0, TERM1 => 0, INC1 => 0, INIT2 => 0, TERM2 => 0, INC2 => 0) port map (CLK => VIRTUAL_CLK, CLR => CLR, ADDRESS => GADDRESS7, DONE => DONE7); -- If some ports are not used leave them set to read GMEMORY_DIRECTION0 <= '1'; -- Set Port 0 direction to input GMEMORY_DIRECTION1 <= '1'; -- Set Port 1 direction to input GMEMORY_DIRECTION2 <= '1'; -- Set Port 2 direction to input GMEMORY_DIRECTION3 <= '1'; -- Set Port 3 direction to input GMEMORY_DIRECTION4 <= '1'; -- Set Port 4 direction to input GMEMORY_DIRECTION5 <= '1'; -- Set Port 5 direction to input GMEMORY_DIRECTION6 <= '1'; -- Set Port 6 direction to input 80 GMEMORY_DIRECTION7 <= '1'; -- Set Port 7 direction to input user_component0 : USER_COMPONENT port map (CLK => VIRTUAL_CLK, CLR => CLR); end USER_VHDL_MODEL_arch; A.2.1.3 jchstcfg.vhd -- James Atwell -- Configuration file for generic skeleton file library WF4,work; use WF4.Clocks_Package.all; --use WF4.wf4host; ----------------------------------------------------------------------------- FIFOIF Architectures --- Name Description -- ==== =========== -- Behavior Standard FIFO interface for FIFO communication. -- Blank Do not process any FIFO I/O. --- FIFOIF Generics: --- Name Description -- ==== ============ -- FIFO_In_File Name of the input file to the FIFO. --- FIFO_Out_File Name of the output file from the FIFO. --- FIFO_Out_File_Type Type of the output file from the FIFO. -(ie ASCII_BIN | ASCII_HEX ) ------------------------------------------------------------------------------ WF4_ArrayBoard Generics: --- Name Description -- ==== ============ -- Board_ID ID Number of the WF4 board. ------------------------------------------------------------------------------ WFPCI_Clocks Generics: --- Name Description -- ==== ============ -- EClk_Frequency Clock frequency for the external I/O connection. -- PClk_Frequency Processor Clock frequency (300 KHz to 40 MHz). ------------------------------------------------------------------------------ FIFO Architectures : --- Name Description -- ==== =========== -- Blank Blank FIFO architecture whose signals are all -inactive. Configure FIFOs with this architecture -when you do not need to use a FIFO. -- Behavior FIFO architecture that describes the behavior of -the bi-directional 512-by-36 WF4 FIFOs. --- FIFO Generics: --- Name Description -- ==== ============ -- PeToHostAlmostEmptyThresh If there are less than 81 -'PeToHostAlmostEmptyThresh' words -left in FIFO2, AE_n goes low. --- HostToPeAlmostEmptyThresh If there are less than -'HostToPeAlmostEmptyThresh' words -left in FIFO1, AE_n goes low. --- HostToPeAlmostFullThresh If there are less than -'HostToPeAlmostFullThresh' words -which can be written until FIFO2 -becomes full, AF_n goes low. -(*NOTE: If this value is set to "1", the host -will attempt to completely fill the FIFO to -the PE.) --- PeToHostAlmostFullThresh If there are less than -'PeToHostAlmostFullThresh' words -which can be written until FIFO1 -becomes full, AF_n goes low. ------------------------------------------------------------------------------ Mezzanine_Card Architectures --- Name Description -- ==== =========== -- Memory This daughter board is a memory card. -Refer to the Memory_Part section for -available settings. -- None The mezzanine connector is unused at this PE. ------------------------------------------------------------------------------ Memory_Part Architectures --- Name Description -- ==== =========== -- Static Memory allocated at beginning of simulation. -- Dynamic Memory allocated as needed (hash-based). -- Linear_Dynamic Memory allocated as needed (linked list-based). -- Zero Memory is read-only, initialized to all zeros. -- Blank Memory is blank. -- None Same as "Blank". --- Memory_Part Generics --- Name Description -- ==== =========== -- Load_File Initialization file (string). -- Size Address space size, e.g. 2**10 (integer). ------------------------------------------------------------------------------ Crossbar Generics: --- Name Description -- ==== ============ -- Config_File File containing up to 16 configurations. -- Num_PEs Number of PEs per board. ------------------------------------------------------------------------------ External_IO Architectures: --- Name Description -- ==== ============ -- Blank User-modifiable unpopulated I/O board. -- None Architecture that should be chosen if no -external I/O board is attached. ---------------------------------------------------------------------------- 82 configuration SystemConfig of WF4HOST is for Behavior ------------------------------------------------- Configuration for Board number 0 -----------------------------------------------for GEN_BOARDS ( 0 ) ------------------------------------------------- Configuration for FIFO0 Interface -- (The configuration for the FIFOs is -- located below the processing element -- configurations.) -----------------------------------------------for P_FIFO0 : FIFOIF use entity WF4.FIFOIF ( Blank ) generic map ( FIFO_In_File => "", FIFO_Out_File => "", FIFO_Out_File_Type => ASCII_BIN ); end for; -- P_FIFO0 ------------------------------------------------- Configuration for FIFO1 Interface -- (The configuration for the FIFOs is -- located below the processing element -- configurations.) -----------------------------------------------for P_FIFO1 : FIFOIF use entity WF4.FIFOIF ( Blank ) generic map ( FIFO_In_File => "", FIFO_Out_File => "", FIFO_Out_File_Type => ASCII_BIN ); end for; -- P_FIFO1 ------------------------------------------------- Configuration for FIFO4 Interface -- (The configuration for the FIFOs is -- located below the processing element -- configurations.) -----------------------------------------------for P_FIFO4 : FIFOIF use entity WF4.FIFOIF ( Blank ) generic map ( FIFO_In_File => "", FIFO_Out_File => "", FIFO_Out_File_Type => ASCII_BIN ); end for; -- P_FIFO4 ------------------------------------------------- Configuration for the test WF4 board -----------------------------------------------for WF4BOARD : WF4_ArrayBoard use entity work.WF4_ArrayBoard ( Structure ) generic map ( Board_ID => 0 ); for Structure ------------------------------------------------- Configuration for the Clock generation -----------------------------------------------for CLK_GEN: WFPCI_Clocks use entity WF4.WFPCI_Clocks ( Behavior ) generic map ( EClk_Frequency => 15 MHz, PClk_Frequency => 20 MHz 83 ); end for; -- CLK_GEN ------------------------------------------------- Configuration for CPE0 ----------------------------------------------for U_PE0: CPE0 use entity WF4.CPE0 ( Core ); for Core for U_Core: CPE0_Logic_Core use entity WORK.CPE0_Logic_Core ( Shell ); end for; -- U_Core end for; -- Core end for; -- U_PE0 ------------------------------------------------- Configuration for PE1 -----------------------------------------------for U_PE1: PE1 use entity WF4.PE1 ( Core ); for Core for U_LC: PE1_Logic_Core use entity WORK.PE1_Logic_Core (SKELETON_UNIT); end for; -- U_LC end for; -- Core end for; -- U_PE1 ------------------------------------------------- Configuration for PE2 -----------------------------------------------for U_PE2: PE2 use entity WF4.PE2 ( Core ); for Core for U_LC: PE2_Logic_Core use entity WORK.PE2_Logic_Core ( Shell ); end for; -- U_LC end for; -- Core end for; -- U_PE2 ------------------------------------------------- Configuration for PE3 -----------------------------------------------for U_PE3: PE3 use entity WF4.PE3 ( Core ); for Core for U_LC: PE3_Logic_Core use entity WORK.PE3_Logic_Core ( Shell ); end for; -- U_LC end for; -- Core end for; -- U_PE3 ------------------------------------------------- Configuration for PE4 -----------------------------------------------for U_PE4: PE4 use entity WF4.PE4 ( Core ); for Core for U_LC: PE4_Logic_Core use entity WORK.PE4_Logic_Core ( Shell ); end for; -- U_LC end for; -- Core end for; -- U_PE4 ------------------------------------------------- Configuration for Fifo 0 -- (The configuration for the FIFO interfaces -- is located right at the top of this file.) -----------------------------------------------for U_FIFO0: FIFO use entity WF4.FIFO ( Blank ) generic map ( PeToHostAlmostEmptyThresh => 30, HostToPeAlmostEmptyThresh => 30, HostToPeAlmostFullThresh => 1, 84 PeToHostAlmostFullThresh => 30 ); end for; -- U_FIFO0 ------------------------------------------------- Configuration for Fifo 1 -- (The configuration for the FIFO interfaces -- is located right at the top of this file.) -----------------------------------------------for U_FIFO1: FIFO use entity WF4.FIFO ( Blank ) generic map ( PeToHostAlmostEmptyThresh => 30, HostToPeAlmostEmptyThresh => 30, HostToPeAlmostFullThresh => 1, PeToHostAlmostFullThresh => 30 ); end for; -- U_FIFO1 ------------------------------------------------- Configuration for Fifo 4 -- (The configuration for the FIFO interfaces -- is located right at the top of this file.) -----------------------------------------------for U_FIFO4: FIFO use entity WF4.FIFO ( Blank ) generic map ( PeToHostAlmostEmptyThresh => 30, HostToPeAlmostEmptyThresh => 30, HostToPeAlmostFullThresh => 1, PeToHostAlmostFullThresh => 30 ); end for; -- U_FIFO4 ------------------------------------------------- Configuration for Mezzanine Cards -----------------------------------------------for GEN_MEMS ( 0 to 4 ) for U_MC: Mezzanine_Card use entity WF4.Mezzanine_Card ( Memory ); for Memory for U_MEM: Memory_Part use entity WF4.Memory_Part ( Dynamic ) generic map ( Load_File => "c:\temp\memory.txt", Size 2**4 ); end for; -- U_MEM end for; -- Memory end for; -- U_MC end for; -- GEN_MEMS ------------------------------------------------- Configuration for Crossbar -----------------------------------------------for U_XBAR: Crossbar use entity WF4.Crossbar ( Behavior ) generic map ( Config_File => "" ); end for; -- U_XBAR ------------------------------------------------- Configuration for External I/O board -----------------------------------------------for U_EXTIO: External_IO use entity wf4.External_IO ( None ); end for; -- U_EXTIO => 85 end for; -- Structure end for; -- WF4BOARD end for; -- GEN_BOARDS end for; -- Behavior end SystemConfig; A.2.1.4 janbehav.vhd -- janbehav.vhd holds all the synthesizeable behavioral models for -- the janus library. library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; -- This code is used for synthesys. -- It synthesizes to 0 CLBs because it is just wires. -- This operator adds one bit to an n bit vector to create a n+1 -- bit vector. The bit can be added as the LSB or MSB. entity addbit is generic ( SIDE: INTEGER := 0; BIT_WIDTH: INTEGER := 8 -- width of the input vector ); port ( VECT_IN : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); BIT_IN : in STD_LOGIC_VECTOR(0 downto 0); VECT_OUT : out STD_LOGIC_VECTOR(BIT_WIDTH downto 0) ); end addbit; architecture struct of addbit is begin -- add bit as LSB (shift rest left) SIDE0: if( SIDE=0 ) generate process(VECT_IN) begin VECT_OUT(0) <= BIT_IN(0); for I in 1 to BIT_WIDTH loop VECT_OUT(I) <= VECT_IN(I-1); end loop; end process; end generate; -- add bit as MSB SIDE1: if( SIDE=1 ) generate process(VECT_IN) begin VECT_OUT(BIT_WIDTH) <= BIT_IN(0); for I in 0 to BIT_WIDTH-1 loop VECT_OUT(I) <= VECT_IN(I); end loop; end process; end generate; end struct; -- This is a registered andgate gate. -- This file is used for simulation and is synthesizable. -- Currently, both input vectors must be same size. library IEEE; use IEEE.STD_LOGIC_1164.all; 86 entity andgate is generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESULT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end andgate; architecture andgate_arch of andgate is begin process(CLR, CLK) begin if CLR = '0' then RESULT <= (others => '0'); elsif CLK'EVENT and CLK = '1' then RESULT <= A and B; end if; end process; end andgate_arch; -- This code is used for simulation and is also synthesizeable. -- It synthesizes to 0 CLBs and 0 latency because it is just wires. library IEEE; use IEEE.STD_LOGIC_1164.all; entity asl is generic ( SHIFT: INTEGER := 4; BIT_WIDTH: INTEGER := 8 -- width of the input vector ); port ( VECT_IN : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); VECT_OUT : out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end asl; architecture struct of asl is begin VECT_OUT <= (others => '0'); VECT_OUT(BIT_WIDTH-1 downto SHIFT) <= VECT_IN(BIT_WIDTH-1-SHIFT downto 0); end struct; -- This code is used for simulation and is also synthesizeable. -- It synthesizes to 0 CLBs because it is just wires. library IEEE; use IEEE.STD_LOGIC_1164.all; entity asr is generic ( SHIFT: INTEGER := 5; BIT_WIDTH: INTEGER := 8 -- width of the input vector ); port ( VECT_IN : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); VECT_OUT : out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); 87 end asr; architecture struct of asr is begin VECT_OUT <= (others => '0'); VECT_OUT(BIT_WIDTH-SHIFT-1 downto 0) <= VECT_IN(BIT_WIDTH-1 downto SHIFT); end struct; -- Constant Operator library IEEE; use IEEE.STD_LOGIC_1164.all; entity const is generic ( BIT_WIDTH: INTEGER:=8; VALUE: INTEGER:=35 ); port ( Q: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end const; architecture struct of const is FUNCTION two_comp(vect : STD_LOGIC_VECTOR) RETURN STD_LOGIC_VECTOR IS variable local_vect : STD_LOGIC_VECTOR(vect'HIGH downto 0); variable toggle : BOOLEAN := FALSE; BEGIN FOR i IN 0 to vect'HIGH LOOP IF (toggle = TRUE) THEN IF (vect(i) = '0') THEN local_vect(i) := '1'; ELSE local_vect(i) := '0'; END IF; ELSE local_vect(i) := vect(i); IF (vect(i) = '1') THEN toggle := TRUE; END IF; END IF; END LOOP; RETURN local_vect; END two_comp; FUNCTION int_2_SLV( value, bitwidth : INTEGER ) RETURN STD_LOGIC_VECTOR IS VARIABLE running_value : INTEGER := value; VARIABLE running_result : STD_LOGIC_VECTOR(bitwidth-1 DOWNTO 0); BEGIN IF (value < 0) THEN running_value := -1 * value; END IF; FOR i IN 0 TO bitwidth-1 LOOP IF running_value MOD 2 = 0 THEN running_result(i) := '0'; ELSE running_result(i) := '1'; END IF; running_value := running_value/2; END LOOP; 88 IF (value < 0) THEN -- find the 2s complement RETURN two_comp(running_result); ELSE RETURN running_result; END IF; END int_2_SLV; begin Q <= int_2_SLV(VALUE, BIT_WIDTH); --process variable QCONV:INTEGER:=VALUE; -begin -Q <= CONV_STD_LOGIC_VECTOR(VALUE, BIT_WIDTH); -for I in Q'high downto 0 loop -if QCONV >= 2**I then -Q(I) <= '1'; -QCONV := QCONV - 2**I; -else -Q(I) <= '0'; -end if; -end loop; -wait; -end process; end struct; library IEEE; use IEEE.STD_LOGIC_1164.all; -- fbmux. Synthesizeable. entity FBMUX is generic ( BIT_WIDTH: INTEGER:= 8 ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); SEL: in STD_LOGIC_VECTOR(0 downto 0); O: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end FBMUX; architecture FBMUX_ARCH of FBMUX is begin process(CLR, CLK) begin if CLR = '0' then O <= (others => '0'); elsif CLK'EVENT and CLK = '1' then if SEL(0) = '0' then O <= A; end if; end if; end process; end FBMUX_ARCH; -- This code is used for simulation and is also synthesizeable. -- It synthesizes to 0 CLBs because it is just wires. -- getbit extracts a bit from an input vector library IEEE; use IEEE.STD_LOGIC_1164.all; 89 entity getbit is generic ( SIDE: INTEGER := 0; BITNUM: INTEGER := 0; -- bit to be extracted (0 = LSB); BIT_WIDTH: INTEGER := 2 -- width of input vector ); port ( VECT_IN : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); BIT_OUT: out STD_LOGIC_VECTOR(0 downto 0) ); end getbit; architecture struct of getbit is begin BIT_OUT(0) <= VECT_IN(BITNUM); end struct; -- 2 to 1 mux -- This model can be synthesized library IEEE; use IEEE.STD_LOGIC_1164.all; entity mux2 is generic ( BIT_WIDTH: INTEGER:= 8 ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; d0: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); d1: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); s0: in STD_LOGIC_VECTOR(0 downto 0); o: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end mux2; architecture mux2_arch of mux2 is begin process(CLR, CLK) begin if CLR = '0' then o <= (others => '0'); elsif CLK'EVENT and CLK = '1' then if s0(0) = '0' then o <= d0; else o <= d1; end if; end if; end process; end mux2_arch; -- 4 to 1 MUX -- This model can be synthesized --- s1 | s0 | Output -- ----------------- 0 | 0 | d0 -- 0 | 1 | d1 -- 1 | 0 | d2 -- 1 | 1 | d3 library IEEE; use IEEE.STD_LOGIC_1164.all; 90 entity mux4 is generic ( BIT_WIDTH: INTEGER:= 8 ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; d0: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); d1: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); d2: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); d3: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); s0: in STD_LOGIC_VECTOR(0 downto 0); s1: in STD_LOGIC_VECTOR(0 downto 0); o: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end mux4; architecture mux4_arch of mux4 is begin process(CLR, CLK) begin if CLR = '0' then o <= (others => '0'); elsif CLK'EVENT and CLK = '1' then if s1(0) = '0' then if s0(0) = '0' then o <= d0; else o <= d1; end if; else if s0(0) = '0' then o <= d2; else o <= d3; end if; end if; end if; end process; end mux4_arch; -- This is a registered nandgate gate. -- This file is used for simulation and is synthesizable. -- Currently, both input vectors must be same size. library IEEE; use IEEE.STD_LOGIC_1164.all; entity nandgate is generic ( BIT_WIDTH: INTEGER := 1 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESULT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end nandgate; architecture nandgate_arch of nandgate is begin process(CLR, CLK) begin if CLR = '0' then RESULT <= (others => '0'); 91 elsif CLK'EVENT and CLK = '1' then RESULT <= A nand B; end if; end process; end nandgate_arch; -- Two's Complementer library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; entity NEGATE is generic ( BIT_WIDTH: INTEGER := 8 -- width of the input vector ); port( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); Q: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0)); end NEGATE; architecture negate_arch of NEGATE is begin process(CLK, CLR) variable A_SIGNED: SIGNED(BIT_WIDTH-1 downto 0); variable SUM: SIGNED(BIT_WIDTH-1 downto 0); begin if CLR = '0' then Q <= (others => '0'); elsif (CLK'EVENT and CLK='1') then A_SIGNED := SIGNED(A); SUM := -A_SIGNED; Q <= STD_LOGIC_VECTOR(SUM); end if; end process; end negate_arch; -- This is a registered norgate gate. -- This file is used for simulation and is synthesizable. -- Currently, both input vectors must be same size. library IEEE; use IEEE.STD_LOGIC_1164.all; entity norgate is generic ( BIT_WIDTH: INTEGER := 1 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESULT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end norgate; architecture norgate_arch of norgate is begin process(CLR, CLK) begin 92 if CLR = '0' then RESULT <= (others => '0'); elsif CLK'EVENT and CLK = '1' then RESULT <= A nor B; end if; end process; end norgate_arch; -- This is a registered notgate gate. -- This file is used for simulation and is synthesizable. library IEEE; use IEEE.STD_LOGIC_1164.all; entity notgate is generic ( BIT_WIDTH: INTEGER := 1 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESULT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end notgate; architecture notgate_arch of notgate is begin process(CLR, CLK) begin if CLR = '0' then RESULT <= (others => '0'); elsif CLK'EVENT and CLK = '1' then RESULT <= not A; end if; end process; end notgate_arch; -- This is a registered orgate gate. -- This file is used for simulation and is synthesizable. -- Currently, both input vectors must be same size. library IEEE; use IEEE.STD_LOGIC_1164.all; entity orgate is generic ( BIT_WIDTH: INTEGER := 1 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESULT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end orgate; -- or gate architecture orgate_arch of orgate is begin process(CLR, CLK) begin 93 if CLR = '0' then RESULT <= (others => '0'); elsif CLK'EVENT and CLK = '1' then RESULT <= A or B; end if; end process; end orgate_arch; -- registers library IEEE; use IEEE.STD_LOGIC_1164.all; entity reg is generic ( BIT_WIDTH: INTEGER := 8 -- width of input vector ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; D : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); Q : out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end reg; architecture struct of reg is begin process(CLR, CLK) begin if CLR = '0' then Q <= (others => '0'); elsif CLK'EVENT and CLK = '1' then Q <= D; end if; end process; end struct; -- This code is used for simulation and is also synthesizeable. -- It synthesizes to 0 CLBs because it is just wires. library IEEE; use IEEE.STD_LOGIC_1164.all; entity rmvbit is generic ( SIDE: INTEGER := 0; BIT_WIDTH: INTEGER := 8 -- width of the input vector ); port ( VECT_IN : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); VECT_OUT : out STD_LOGIC_VECTOR(BIT_WIDTH-2 downto 0); BIT_OUT: out STD_LOGIC_VECTOR(0 downto 0) ); end rmvbit; architecture struct of rmvbit is begin --take away LSB SIDE0: if( SIDE=0 ) generate process(VECT_IN) begin BIT_OUT(0) <= VECT_IN(0); 94 for I in 0 to BIT_WIDTH-2 loop VECT_OUT(I) <= VECT_IN(I+1); end loop; end process; end generate; --take away MSB SIDE1: if( SIDE=1 ) generate process(VECT_IN) begin BIT_OUT(0) <= VECT_IN(BIT_WIDTH-1); for I in 0 to BIT_WIDTH-2 loop VECT_OUT(I) <= VECT_IN(I); end loop; end process; end generate; end struct; -- This file is used for simulation and is synthesizable. -- Currently, both input vectors must be same size. library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; entity sadd is generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); C_IN: in STD_LOGIC_VECTOR(0 downto 0); Q_OUT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); C_OUT: out STD_LOGIC_VECTOR(0 downto 0) ); end sadd; architecture sadd_arch of sadd is begin process(CLR, CLK) variable PADDED_CIN: STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); variable A_SIGNED: SIGNED(BIT_WIDTH-1 downto 0); variable C_SIGNED: SIGNED(BIT_WIDTH downto 0); begin if CLR = '0' then Q_OUT <= (others => '0'); C_OUT(0) <= '0'; elsif CLK'EVENT and CLK = '1' then A_SIGNED := SIGNED(A); PADDED_CIN := (others => '0'); PADDED_CIN(0) := C_IN(0); C_SIGNED := CONV_SIGNED(A_SIGNED, BIT_WIDTH+1) + SIGNED(B) + SIGNED(PADDED_CIN); Q_OUT <= STD_LOGIC_VECTOR(C_SIGNED(BIT_WIDTH-1 downto 0)); C_OUT(0) <= C_SIGNED(BIT_WIDTH); end if; end process; end sadd_arch; -- This file is used for simulation and is synthesizable. -- Currently, both input vectors must be same size. library IEEE; 95 use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; entity scadd is generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESET: in STD_LOGIC_VECTOR(0 downto 0); O: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end scadd; architecture scadd_arch of scadd is signal TOTAL: STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); begin process(CLR, CLK) variable A_UNSIGNED: UNSIGNED(BIT_WIDTH-1 downto 0); variable T_UNSIGNED: UNSIGNED(BIT_WIDTH-1 downto 0); variable C_UNSIGNED: UNSIGNED(BIT_WIDTH downto 0); begin if CLR = '0' then O <= (others => '0'); TOTAL <= (others => '0'); elsif CLK'EVENT and CLK = '1' then if RESET(0) = '0' then TOTAL <= (others => '0'); O <= (others => '0'); else A_UNSIGNED := UNSIGNED(A); T_UNSIGNED := UNSIGNED(TOTAL); C_UNSIGNED := CONV_UNSIGNED(A_UNSIGNED, BIT_WIDTH+1) + CONV_UNSIGNED(T_UNSIGNED, BIT_WIDTH+1); O <= STD_LOGIC_VECTOR(C_UNSIGNED(BIT_WIDTH-1 downto 0)); TOTAL <= STD_LOGIC_VECTOR(C_UNSIGNED(BIT_WIDTH-1 downto 0)); end if; end if; end process; end scadd_arch; -- This file is used for simulation and is synthesizable. -- Currently, both input vectors must be same size. library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; entity scmp is generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); EQ, GT: out STD_LOGIC_VECTOR(0 downto 0) ); end scmp; architecture scmp_arch of scmp is 96 begin process(CLR, CLK) begin if CLR = '0' then EQ(0) <= '0'; GT(0) <= '0'; elsif CLK'EVENT and CLK = '1' then if A = B then EQ(0) <= '1'; GT(0) <= '0'; elsif (A(BIT_WIDTH-1) = '1' and B(BIT_WIDTH-1) = '1') or (A(BIT_WIDTH-1) = '0' and B(BIT_WIDTH-1) = '0') then if A < B then EQ(0) <= '0'; GT(0) <= '0'; else EQ(0) <= '0'; GT(0) <= '1'; end if; elsif A(BIT_WIDTH-1) = '0' and B(BIT_WIDTH-1) = '1' then EQ(0) <= '0'; GT(0) <= '1'; else EQ(0) <= '0'; GT(0) <= '0'; end if; end if; end process; end scmp_arch; library IEEE; use IEEE.STD_LOGIC_1164.all; entity sink is generic ( PATH_DELAY: INTEGER := 0; -- delay until valid data (# clk cycles to wait before writing to memory.) BIT_WIDTH: INTEGER := 32; -- width of input vector FILENAME: STRING:= "sink1.txt"; -- name of file to read from INIT0: INTEGER:= 0; TERM0: INTEGER:= 0; INC0:INTEGER:= 0; INIT1: INTEGER:= 0; TERM1: INTEGER:= 0; INC1:INTEGER:= 0; INIT2: INTEGER:= 0; TERM2: INTEGER:= 0; INC2:INTEGER:= 0 ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; D : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end sink; architecture synth of sink is signal FAKE_MEMORY_LOCATION: STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); begin process(CLR, CLK) begin if CLR = '0' then FAKE_MEMORY_LOCATION <= (others => '0'); elsif (CLK'EVENT and CLK = '1') then FAKE_MEMORY_LOCATION <= D; -- first attempt at really simple synthesizable sink end if; 97 end process; end synth; -- The Source operator is for simulation only library IEEE; use IEEE.STD_LOGIC_1164.all; entity source is generic ( BIT_WIDTH: INTEGER := 32; -- width of input vector FILENAME: STRING:= "source1.txt"; -- name of file to read from INIT0: INTEGER:= 0; TERM0: INTEGER:= 0; INC0:INTEGER:= 0; INIT1: INTEGER:= 0; TERM1: INTEGER:= 0; INC1:INTEGER:= 0; INIT2: INTEGER:= 0; TERM2: INTEGER:= 0; INC2:INTEGER:= 0 ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; Q : out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end source; architecture synth of source is begin process(CLR, CLK) begin if CLR = '0' then Q <= (others => '0'); elsif CLK'EVENT and CLK = '1' then Q <= (others => '1'); -- The simplest of examples to start with end if; end process; end synth; -- This file is used for simulation and is synthesizable. -- Currently, both input vectors must be same size. library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; entity spmult is generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); PRODH: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); PRODL: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end spmult; architecture spmult_arch of spmult is begin 98 process(CLR, CLK) variable A_SIGNED: SIGNED(BIT_WIDTH-1 downto 0); variable B_SIGNED: SIGNED(BIT_WIDTH-1 downto 0); variable PROD: SIGNED((BIT_WIDTH*2)-1 downto 0); begin if CLR = '0' then PRODH <= (others => '0'); PRODL <= (others => '0'); elsif CLK'EVENT and CLK = '1' then A_SIGNED := SIGNED(A); B_SIGNED := SIGNED(B); PROD := A_SIGNED * B_SIGNED; PRODH <= STD_LOGIC_VECTOR(PROD((BIT_WIDTH*2)-1 downto BIT_WIDTH)); PRODL <= STD_LOGIC_VECTOR(PROD(BIT_WIDTH-1 downto 0)); end if; end process; end spmult_arch; -- This file is used for simulation and is synthesizable. -- Currently, both input vectors must be same size. library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; entity ssub is generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); C_IN: in STD_LOGIC_VECTOR(0 downto 0); Q_OUT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); C_OUT: out STD_LOGIC_VECTOR(0 downto 0) ); end ssub; architecture ssub_arch of ssub is begin process(CLR, CLK) variable PADDED_CIN: STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); variable A_SIGNED: SIGNED(BIT_WIDTH-1 downto 0); variable C_SIGNED: SIGNED(BIT_WIDTH downto 0); begin if CLR = '0' then Q_OUT <= (others => '0'); C_OUT(0) <= '0'; elsif CLK'EVENT and CLK = '1' then A_SIGNED := SIGNED(A); PADDED_CIN := (others => '0'); PADDED_CIN(0) := C_IN(0); C_SIGNED := CONV_SIGNED(A_SIGNED, BIT_WIDTH+1) - SIGNED(B) - SIGNED(PADDED_CIN); Q_OUT <= STD_LOGIC_VECTOR(C_SIGNED(BIT_WIDTH-1 downto 0)); C_OUT(0) <= C_SIGNED(BIT_WIDTH); end if; end process; end ssub_arch; -- This file is used for simulation and is synthesizable. -- Currently, both input vectors must be same size. library IEEE; 99 use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; entity ucmp is generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); EQ, GT: out STD_LOGIC_VECTOR(0 downto 0) ); end ucmp; architecture ucmp_arch of ucmp is begin process(CLR, CLK) begin if CLR = '0' then EQ(0) <= '0'; GT(0) <= '0'; elsif CLK'EVENT and CLK = '1' then if A = B then EQ(0) <= '1'; GT(0) <= '0'; elsif A > B then EQ(0) <= '0'; GT(0) <= '1'; else EQ(0) <= '0'; GT(0) <= '0'; end if; end if; end process; end ucmp_arch; -- This file is used for simulation and is synthesizable. -- Currently, both input vectors must be same size. library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; entity umax is generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESET: in STD_LOGIC_VECTOR(0 downto 0); O: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end umax; architecture umax_arch of umax is signal MAXNUM : STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); begin process(CLR, CLK) begin if CLR = '0' then O <= (others => '0'); 100 MAXNUM <= (others => '0'); elsif CLK'EVENT and CLK = '1' then if RESET(0) = '0' then MAXNUM <= (others => '0'); elsif A > MAXNUM then MAXNUM <= A; end if; O <= MAXNUM; end if; end process; end umax_arch; -- This file is used for simulation and is synthesizable. -- Currently, both input vectors must be same size. library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; entity umin is generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESET: in STD_LOGIC_VECTOR(0 downto 0); O: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end umin; architecture umin_arch of umin is signal MINNUM : STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); begin process(CLR, CLK) begin if CLR = '0' then O <= (others => '0'); MINNUM <= (others => '1'); elsif CLK'EVENT and CLK = '1' then if RESET(0) = '0' then MINNUM <= (others => '1'); elsif A < MINNUM then MINNUM <= A; end if; O <= MINNUM; end if; end process; end umin_arch; -- This is a registered xorgate gate. -- This file is used for simulation and is synthesizable. -- Currently, both input vectors must be same size. library IEEE; use IEEE.STD_LOGIC_1164.all; entity xorgate is generic ( BIT_WIDTH: INTEGER := 1 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; 101 A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESULT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end xorgate; architecture xorgate_arch of xorgate is begin process(CLR, CLK) begin if CLR = '0' then RESULT <= (others => '0'); elsif CLK'EVENT and CLK = '1' then RESULT <= A xor B; end if; end process; end xorgate_arch; A.2.1.5 janpack.vhd library IEEE; use IEEE.STD_LOGIC_1164.all; package JANPACK is component addbit generic ( SIDE: INTEGER := 0; BIT_WIDTH: INTEGER := 8 --width of the input vector ); port ( VECT_IN : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); BIT_IN: out STD_LOGIC_VECTOR(0 downto 0); VECT_OUT : out STD_LOGIC_VECTOR(BIT_WIDTH downto 0) ); end component; component andgate generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESULT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0)); end component; component asl generic ( SHIFT: INTEGER := 0; -- number of places to shift BIT_WIDTH: INTEGER := 8 -- width of the input vector ); port ( VECT_IN : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); VECT_OUT : out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end component; component asr generic ( 102 SHIFT: INTEGER := 0; BIT_WIDTH: INTEGER := 8 --width of the input vector ); port ( VECT_IN : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); VECT_OUT : out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end component; component const generic ( BIT_WIDTH: INTEGER; VALUE: INTEGER ); port ( Q: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end component; component FBMUX generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: IN STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); SEL: IN STD_LOGIC_VECTOR(0 downto 0); O: OUT STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0)); end component; component getbit generic ( BITNUM: INTEGER := 0; -- bit to be extracted (0 = LSB); BIT_WIDTH: INTEGER := 8 -- width of input vector ); port ( VECT_IN : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); BIT_OUT: out STD_LOGIC_VECTOR(0 downto 0) ); end component; component mux2 generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; d0: IN STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); d1: IN STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); s0: IN STD_LOGIC_VECTOR(0 downto 0); o: OUT STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0)); end component; component mux4 generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; d0: IN STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); 103 d1: IN STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); d2: IN STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); d3: IN STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); s0: IN STD_LOGIC_VECTOR(0 downto 0); s1: IN STD_LOGIC_VECTOR(0 downto 0); o: OUT STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0)); end component; component nandgate generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESULT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0)); end component; component negate generic ( BIT_WIDTH: INTEGER := 8 --width of the input vector ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); Q : out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end component; component norgate generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESULT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0)); end component; component notgate generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESULT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0)); end component; component orgate generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESULT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0)); 104 end component; component reg generic ( BIT_WIDTH: INTEGER := 8 ); PORT( CLK: in STD_LOGIC; CLR: in STD_LOGIC; D : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); Q : out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end component; component rmvbit generic ( SIDE: INTEGER := 0; BIT_WIDTH: INTEGER := 8 --width of the input vector ); port ( VECT_IN : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); VECT_OUT : out STD_LOGIC_VECTOR(BIT_WIDTH-2 downto 0); BIT_OUT: out STD_LOGIC_VECTOR(0 downto 0) ); end component; component sadd generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); C_IN: in STD_LOGIC_VECTOR(0 downto 0); Q_OUT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); C_OUT: out STD_LOGIC_VECTOR(0 downto 0) ); end component; component scadd generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESET: in STD_LOGIC_VECTOR(0 downto 0); O: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end component; component scmp generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); PORT( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A,B : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); EQ, GT : out STD_LOGIC_VECTOR(0 downto 0) ); end component; 105 component sink generic ( PATH_DELAY: INTEGER := 0; BIT_WIDTH: INTEGER := 32; -- width of input vector FILENAME: STRING:= "sink1.txt"; -- name of file to write to INIT0: INTEGER:= 0; TERM0: INTEGER:= 0; INC0:INTEGER:= 0; INIT1: INTEGER:= 0; TERM1: INTEGER:= 0; INC1:INTEGER:= 0; INIT2: INTEGER:= 0; TERM2: INTEGER:= 0; INC2:INTEGER:= 0 ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; D : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end component; component source generic ( BIT_WIDTH: INTEGER := 32; -- width of input vector FILENAME: STRING:= "source1.txt"; -- name of file to read from INIT0: INTEGER:= 0; TERM0: INTEGER:= 0; INC0:INTEGER:= 0; INIT1: INTEGER:= 0; TERM1: INTEGER:= 0; INC1:INTEGER:= 0; INIT2: INTEGER:= 0; TERM2: INTEGER:= 0; INC2:INTEGER:= 0 ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; Q : out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end component; component spmult generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); PRODH: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); PRODL: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end component; component sqrt generic ( BIT_WIDTH: INTEGER:=8 ); port ( din: IN STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); 106 CLK: IN STD_LOGIC; ce: IN STD_LOGIC_VECTOR(0 downto 0); dout: OUT STD_LOGIC_VECTOR((BIT_WIDTH/2)-1 downto 0)); end component; component ssub generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); C_IN: in STD_LOGIC_VECTOR(0 downto 0); Q_OUT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); C_OUT: out STD_LOGIC_VECTOR(0 downto 0) ); end component; component ucmp generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); PORT( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A,B : in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); EQ, GT : out STD_LOGIC_VECTOR(0 downto 0) ); end component; component umax generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESET: in STD_LOGIC_VECTOR(0 downto 0); O: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end component; component umin generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESET: in STD_LOGIC_VECTOR(0 downto 0); O: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end component; component xorgate generic ( BIT_WIDTH: INTEGER := 8 -- width of input vectors ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); 107 B: in STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); RESULT: out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0) ); end component; end JANPACK; A.2.1.6 Libinfo.txt /**************************************************************** * File : Libinfo.txt * * Original Author: James Atwell * * Last Updated: * * 10/12/98 ~ J Atwell * * - Changed multipliers to 32x32->32:32 * * 9/16/98 ~ J Atwell * * - added BIT_WIDTH attribute * * 9/11/98 ~ J Atwell * * - Added feed back mux, fbmux * * 9/7/98 ~ J Atwell * * - Fixed some syntax errors * * - Put in alphbetical order * * 9/5/98 ~ J Atwell * * - Made minor changed so I/O is correct * * 9/4/98 ~ J Atwell * * - Removed ocomp * * - changed tcomp to negate * * - changed min/max to umin/umax * * - added smin and smax * * 9/2/98 ~ J Atwell * * - changed remvbit to rmvbit * * 9/1/98 ~ J Hess * * - added bit widths to NAME attributes * * - added 'gate' to and, nand, or, nor, xor * * 8/31/98 ~ J Atwell * * - added muxes defined by Jason * * - changed generic SIDE from left/right to 0/1* * - added estimates for unknown CLB counts * * 8/26/98 ~ J Atwell * * - Changed removebit to remvbit * * 8/24/98 ~ J Hess * * - Expanded operator list * * 8/21/98 ~ J Atwell * * - Changed Mux CLB Counts * * 8/13/98 ~ J Atwell * * - removed errant comment * * 8/12/98 ~ J Atwell * * - added semicolons to end of each declaration* * - fixed sadd32 error * * - modified sink operator * * 8/10/98 ~ J Atwell * * - added CLB counts for unsigned operators * * - modified comments to match C syntax * * - Removed const operators (commented out) * * 8/7/98 ~ J Atwell * * - added unsigned comparators * * 8/5/98 ~ J Atwell * * - added signed compare operators * * - added scmp CLB sizes * * 7/31/98 ~ J Atwell * * - added constant operators * * 7/29/98 ~ J Atwell * * - added CLB counts to the shift operators * * - redid logic operators to be registered * * - removed xnor operators * * - added CLB counts for logic operators * * 7/27/98 ~ J Atwell * * - added airthmetic shift right operators * * - added airthmetic shift left operators * 108 * 7/17/98 ~ J Atwell * * - added join and split operators to library * * 7/15/98 ~ J Atwell * * - put in LogiBlox CLB Sizes * * 7/10/98 ~ J Atwell * * - 1 bit logic operators completed * * 7/8/98 ~ J Atwell * * - added 32 bit accumulator * * 6/17/98 ~ J Atwell * * - fixed names of sqrt operators * * - added more split and join operators * * - removed wildforce memread/write * * - added source and sink operators * * 6/16/98 ~ J Atwell * * - added square root operators * * - added split and join operators * * - added constant operator * * 6/5/98 ~ J Atwell * * - Adjusted I/O list (removed C, CE, CLR) * * - added 32 bit functions * * - cleaned up logic functions * * - Changed add/sub to LogiBLOX components * * 6/2/98 ~ J Atwell * * - changed wfmemin/out to wfmemread/write * * 5/29/98 ~ J Atwell * * - added reg8, reg16 CLB Count and IO * * - added sadd8, sadd16 IO map * * - fixed compementer names * * 5/26/98 ~ J Atwell * *****************************************************************/ /******************************************************************************************************/ /* This file contains information about functions included in the library for our 'Tool'. */ /* Most functions have 1 to 32 bit versions. */ /* Most functions are clocked with a minimum latency of 1 clock cycle. */ /* Those that are not clocked optimize out to just routing with a latency of 0 */ /* */ /* The format of the information is as follows: */ /* Name<CLB=__, LAT=__, NAME=__>(Input1:Length(Bits), Input2:Length, ...)->(Output1:Length, ...) */ /* Name will start with s or u for signed or unsigned (if relevant), followed by the */ /* function name, followed by 8 or 16 to indicate 8 bit or 16 bit function. */ /* CLB is the numbers of CLBs used to implement this function. (some are estimated) */ /* LAT is the latency or number of clock cycles until the output is valid. This is a */ /* 'start up' latency. All operators are pipelined such that once the pipeline is full, */ /* the next output should be available after each clock cycle. */ /* NAME will probably be used to specify the synthesis or xnf file. */ /* CE is Clock Enable. This active high input enables transfer of data into the internal register. */ /* */ /******************************************************************************************************/ /*add bit creates a new vector with one bit added as the MSB or the LSB*/ /*of the input vector. use SIDE = 1 for MSB and SIDE = 0 for LSB*/ addbit<CLB=0, LAT=0, NAME=addbit1, SIDE = 0, BIT_WIDTH = 1> (VECT_IN:1, BIT_IN:1) -> (VECT_OUT:2); addbit<CLB=0, LAT=0, NAME=addbit2, SIDE = 0, BIT_WIDTH = 2> (VECT_IN:2, BIT_IN:1) -> (VECT_OUT:3); addbit<CLB=0, LAT=0, NAME=addbit3, SIDE = 0, BIT_WIDTH = 3> (VECT_IN:3, BIT_IN:1) -> (VECT_OUT:4); addbit<CLB=0, LAT=0, NAME=addbit4, SIDE = 0, BIT_WIDTH = 4> (VECT_IN:4, BIT_IN:1) -> (VECT_OUT:5); addbit<CLB=0, LAT=0, NAME=addbit5, SIDE = 0, BIT_WIDTH = 5> (VECT_IN:5, BIT_IN:1) -> (VECT_OUT:6); addbit<CLB=0, LAT=0, NAME=addbit6, SIDE = 0, BIT_WIDTH = 6> (VECT_IN:6, BIT_IN:1) -> (VECT_OUT:7); addbit<CLB=0, LAT=0, NAME=addbit7, SIDE = 0, BIT_WIDTH = 7> (VECT_IN:7, BIT_IN:1) -> (VECT_OUT:8); addbit<CLB=0, LAT=0, NAME=addbit8, SIDE = 0, BIT_WIDTH = 8> (VECT_IN:8, BIT_IN:1) -> (VECT_OUT:9); addbit<CLB=0, LAT=0, NAME=addbit9, SIDE = 0, BIT_WIDTH = 9> (VECT_IN:9, BIT_IN:1) -> (VECT_OUT:10); addbit<CLB=0, LAT=0, NAME=addbit10, SIDE = 0, BIT_WIDTH = 10> (VECT_IN:10, BIT_IN:1) -> (VECT_OUT:11); addbit<CLB=0, LAT=0, NAME=addbit11, SIDE = 0, BIT_WIDTH = 11> (VECT_IN:11, BIT_IN:1) -> (VECT_OUT:12); addbit<CLB=0, LAT=0, NAME=addbit12, SIDE = 0, BIT_WIDTH = 12> (VECT_IN:12, BIT_IN:1) -> (VECT_OUT:13); addbit<CLB=0, LAT=0, NAME=addbit13, SIDE = 0, BIT_WIDTH = 13> (VECT_IN:13, BIT_IN:1) -> (VECT_OUT:14); addbit<CLB=0, LAT=0, NAME=addbit14, SIDE = 0, BIT_WIDTH = 14> (VECT_IN:14, BIT_IN:1) -> (VECT_OUT:15); addbit<CLB=0, LAT=0, NAME=addbit15, SIDE = 0, BIT_WIDTH = 15> (VECT_IN:15, BIT_IN:1) -> (VECT_OUT:16); addbit<CLB=0, LAT=0, NAME=addbit16, SIDE = 0, BIT_WIDTH = 16> (VECT_IN:16, BIT_IN:1) -> (VECT_OUT:17); addbit<CLB=0, LAT=0, NAME=addbit17, SIDE = 0, BIT_WIDTH = 17> (VECT_IN:17, BIT_IN:1) -> (VECT_OUT:18); addbit<CLB=0, LAT=0, NAME=addbit18, SIDE = 0, BIT_WIDTH = 18> (VECT_IN:18, BIT_IN:1) -> (VECT_OUT:19); addbit<CLB=0, LAT=0, NAME=addbit19, SIDE = 0, BIT_WIDTH = 19> (VECT_IN:19, BIT_IN:1) -> (VECT_OUT:20); 109 addbit<CLB=0, LAT=0, NAME=addbit20, SIDE = 0, BIT_WIDTH = 20> (VECT_IN:20, BIT_IN:1) -> (VECT_OUT:21); addbit<CLB=0, LAT=0, NAME=addbit21, SIDE = 0, BIT_WIDTH = 21> (VECT_IN:21, BIT_IN:1) -> (VECT_OUT:22); addbit<CLB=0, LAT=0, NAME=addbit22, SIDE = 0, BIT_WIDTH = 22> (VECT_IN:22, BIT_IN:1) -> (VECT_OUT:23); addbit<CLB=0, LAT=0, NAME=addbit23, SIDE = 0, BIT_WIDTH = 23> (VECT_IN:23, BIT_IN:1) -> (VECT_OUT:24); addbit<CLB=0, LAT=0, NAME=addbit24, SIDE = 0, BIT_WIDTH = 24> (VECT_IN:24, BIT_IN:1) -> (VECT_OUT:25); addbit<CLB=0, LAT=0, NAME=addbit25, SIDE = 0, BIT_WIDTH = 25> (VECT_IN:25, BIT_IN:1) -> (VECT_OUT:26); addbit<CLB=0, LAT=0, NAME=addbit26, SIDE = 0, BIT_WIDTH = 26> (VECT_IN:26, BIT_IN:1) -> (VECT_OUT:27); addbit<CLB=0, LAT=0, NAME=addbit27, SIDE = 0, BIT_WIDTH = 27> (VECT_IN:27, BIT_IN:1) -> (VECT_OUT:28); addbit<CLB=0, LAT=0, NAME=addbit28, SIDE = 0, BIT_WIDTH = 28> (VECT_IN:28, BIT_IN:1) -> (VECT_OUT:29); addbit<CLB=0, LAT=0, NAME=addbit29, SIDE = 0, BIT_WIDTH = 29> (VECT_IN:29, BIT_IN:1) -> (VECT_OUT:30); addbit<CLB=0, LAT=0, NAME=addbit30, SIDE = 0, BIT_WIDTH = 30> (VECT_IN:30, BIT_IN:1) -> (VECT_OUT:31); addbit<CLB=0, LAT=0, NAME=addbit31, SIDE = 0, BIT_WIDTH = 31> (VECT_IN:31, BIT_IN:1) -> (VECT_OUT:32); /*ands*/ /* generated by hand*/ andgate<CLB=2, LAT=1, NAME=and1, BIT_WIDTH = 1>(A:1, B:1)->(RESULT:1); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=3, LAT=1, NAME=and2, BIT_WIDTH = 2>(A:2, B:2)->(RESULT:2); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=4, LAT=1, NAME=and3, BIT_WIDTH = 3>(A:3, B:3)->(RESULT:3); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=5, LAT=1, NAME=and4, BIT_WIDTH = 4>(A:4, B:4)->(RESULT:4); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=6, LAT=1, NAME=and5, BIT_WIDTH = 5>(A:5, B:5)->(RESULT:5); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=7, LAT=1, NAME=and6, BIT_WIDTH = 6>(A:6, B:6)->(RESULT:6); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=8, LAT=1, NAME=and7, BIT_WIDTH = 7>(A:7, B:7)->(RESULT:7); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=9, LAT=1, NAME=and8, BIT_WIDTH = 8>(A:8, B:8)->(RESULT:8); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=10, LAT=1, NAME=and9, BIT_WIDTH = 9>(A:9, B:9)->(RESULT:9); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=11, LAT=1, NAME=and10, BIT_WIDTH = 10>(A:10, B:10)->(RESULT:10); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=12, LAT=1, NAME=and11, BIT_WIDTH = 11>(A:11, B:11)->(RESULT:11); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=13, LAT=1, NAME=and12, BIT_WIDTH = 12>(A:12, B:12)->(RESULT:12); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=14, LAT=1, NAME=and13, BIT_WIDTH = 13>(A:13, B:13)->(RESULT:13); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=15, LAT=1, NAME=and14, BIT_WIDTH = 14>(A:14, B:14)->(RESULT:14); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=16, LAT=1, NAME=and15, BIT_WIDTH = 15>(A:15, B:15)->(RESULT:15); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=17, LAT=1, NAME=and16, BIT_WIDTH = 16>(A:16, B:16)->(RESULT:16); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=18, LAT=1, NAME=and17, BIT_WIDTH = 17>(A:17, B:17)->(RESULT:17); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=19, LAT=1, NAME=and18, BIT_WIDTH = 18>(A:18, B:18)->(RESULT:18); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=20, LAT=1, NAME=and19, BIT_WIDTH = 19>(A:19, B:19)->(RESULT:19); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=21, LAT=1, NAME=and20, BIT_WIDTH = 20>(A:20, B:20)->(RESULT:20); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=22, LAT=1, NAME=and21, BIT_WIDTH = 21>(A:21, B:21)->(RESULT:21); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=23, LAT=1, NAME=and22, BIT_WIDTH = 22>(A:22, B:22)->(RESULT:22); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=24, LAT=1, NAME=and23, BIT_WIDTH = 23>(A:23, B:23)->(RESULT:23); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=25, LAT=1, NAME=and24, BIT_WIDTH = 24>(A:24, B:24)->(RESULT:24); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=26, LAT=1, NAME=and25, BIT_WIDTH = 25>(A:25, B:25)->(RESULT:25); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=27, LAT=1, NAME=and26, BIT_WIDTH = 26>(A:26, B:26)->(RESULT:26); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=28, LAT=1, NAME=and27, BIT_WIDTH = 27>(A:27, B:27)->(RESULT:27); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=29, LAT=1, NAME=and28, BIT_WIDTH = 28>(A:28, B:28)->(RESULT:28); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=30, LAT=1, NAME=and29, BIT_WIDTH = 29>(A:29, B:29)->(RESULT:29); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=31, LAT=1, NAME=and30, BIT_WIDTH = 30>(A:30, B:30)->(RESULT:30); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=32, LAT=1, NAME=and31, BIT_WIDTH = 31>(A:31, B:31)->(RESULT:31); /* Other Inputs (CLK:1, CLR:1)*/ andgate<CLB=33, LAT=1, NAME=and32, BIT_WIDTH = 32>(A:32, B:32)->(RESULT:32); /* Other Inputs (CLK:1, CLR:1)*/ /* Parameterized Shift Operators */ /* 2's complement shift, where # of shifts is determined by SHIFT attribute*/ /* generated by hand. no XNF files for these, just synthesizable code*/ /* the generic SHIFT should be entered as an INTEGER*/ /* asl is airthmetic shift left. bit shifted in is 0 so functionally the same as logical shift left.*/ asl<CLB=0, LAT=0, NAME=asl1, SHIFT=0, BIT_WIDTH = 1>(VECT_IN:1)->(VECT_OUT:1); /*this one doesn't really make sense*/ asl<CLB=0, LAT=0, NAME=asl2, SHIFT=0, BIT_WIDTH = 2>(VECT_IN:2)->(VECT_OUT:2); asl<CLB=0, LAT=0, NAME=asl3, SHIFT=0, BIT_WIDTH = 3>(VECT_IN:3)->(VECT_OUT:3); asl<CLB=0, LAT=0, NAME=asl4, SHIFT=0, BIT_WIDTH = 4>(VECT_IN:4)->(VECT_OUT:4); asl<CLB=0, LAT=0, NAME=asl5, SHIFT=0, BIT_WIDTH = 5>(VECT_IN:5)->(VECT_OUT:5); asl<CLB=0, LAT=0, NAME=asl6, SHIFT=0, BIT_WIDTH = 6>(VECT_IN:6)->(VECT_OUT:6); asl<CLB=0, LAT=0, NAME=asl7, SHIFT=0, BIT_WIDTH = 7>(VECT_IN:7)->(VECT_OUT:7); asl<CLB=0, LAT=0, NAME=asl8, SHIFT=0, BIT_WIDTH = 8>(VECT_IN:8)->(VECT_OUT:8); asl<CLB=0, LAT=0, NAME=asl9, SHIFT=0, BIT_WIDTH = 9>(VECT_IN:9)->(VECT_OUT:9); asl<CLB=0, LAT=0, NAME=asl10, SHIFT=0, BIT_WIDTH = 10>(VECT_IN:10)->(VECT_OUT:10); asl<CLB=0, LAT=0, NAME=asl11, SHIFT=0, BIT_WIDTH = 11>(VECT_IN:11)->(VECT_OUT:11); asl<CLB=0, LAT=0, NAME=asl12, SHIFT=0, BIT_WIDTH = 12>(VECT_IN:12)->(VECT_OUT:12); asl<CLB=0, LAT=0, NAME=asl13, SHIFT=0, BIT_WIDTH = 13>(VECT_IN:13)->(VECT_OUT:13); asl<CLB=0, LAT=0, NAME=asl14, SHIFT=0, BIT_WIDTH = 14>(VECT_IN:14)->(VECT_OUT:14); asl<CLB=0, LAT=0, NAME=asl15, SHIFT=0, BIT_WIDTH = 15>(VECT_IN:15)->(VECT_OUT:15); asl<CLB=0, LAT=0, NAME=asl16, SHIFT=0, BIT_WIDTH = 16>(VECT_IN:16)->(VECT_OUT:16); asl<CLB=0, LAT=0, NAME=asl17, SHIFT=0, BIT_WIDTH = 17>(VECT_IN:17)->(VECT_OUT:17); 110 asl<CLB=0, LAT=0, NAME=asl18, SHIFT=0, BIT_WIDTH = 18>(VECT_IN:18)->(VECT_OUT:18); asl<CLB=0, LAT=0, NAME=asl19, SHIFT=0, BIT_WIDTH = 19>(VECT_IN:19)->(VECT_OUT:19); asl<CLB=0, LAT=0, NAME=asl20, SHIFT=0, BIT_WIDTH = 20>(VECT_IN:20)->(VECT_OUT:20); asl<CLB=0, LAT=0, NAME=asl21, SHIFT=0, BIT_WIDTH = 21>(VECT_IN:21)->(VECT_OUT:21); asl<CLB=0, LAT=0, NAME=asl22, SHIFT=0, BIT_WIDTH = 22>(VECT_IN:22)->(VECT_OUT:22); asl<CLB=0, LAT=0, NAME=asl23, SHIFT=0, BIT_WIDTH = 23>(VECT_IN:23)->(VECT_OUT:23); asl<CLB=0, LAT=0, NAME=asl24, SHIFT=0, BIT_WIDTH = 24>(VECT_IN:24)->(VECT_OUT:24); asl<CLB=0, LAT=0, NAME=asl25, SHIFT=0, BIT_WIDTH = 25>(VECT_IN:25)->(VECT_OUT:25); asl<CLB=0, LAT=0, NAME=asl26, SHIFT=0, BIT_WIDTH = 26>(VECT_IN:26)->(VECT_OUT:26); asl<CLB=0, LAT=0, NAME=asl27, SHIFT=0, BIT_WIDTH = 27>(VECT_IN:27)->(VECT_OUT:27); asl<CLB=0, LAT=0, NAME=asl28, SHIFT=0, BIT_WIDTH = 28>(VECT_IN:28)->(VECT_OUT:28); asl<CLB=0, LAT=0, NAME=asl29, SHIFT=0, BIT_WIDTH = 29>(VECT_IN:29)->(VECT_OUT:29); asl<CLB=0, LAT=0, NAME=asl30, SHIFT=0, BIT_WIDTH = 30>(VECT_IN:30)->(VECT_OUT:30); asl<CLB=0, LAT=0, NAME=asl31, SHIFT=0, BIT_WIDTH = 31>(VECT_IN:31)->(VECT_OUT:31); asl<CLB=0, LAT=0, NAME=asl32, SHIFT=0, BIT_WIDTH = 32>(VECT_IN:32)->(VECT_OUT:32); /* asr is airthmetic shift right. (bit shifted in is same as high bit)*/ asr<CLB=0, LAT=0, NAME=asr1, SHIFT=0, BIT_WIDTH = 1>(VECT_IN:1)->(VECT_OUT:1); /*this one doesn't really make sense*/ asr<CLB=0, LAT=0, NAME=asr2, SHIFT=0, BIT_WIDTH = 2>(VECT_IN:2)->(VECT_OUT:2); asr<CLB=0, LAT=0, NAME=asr3, SHIFT=0, BIT_WIDTH = 3>(VECT_IN:3)->(VECT_OUT:3); asr<CLB=0, LAT=0, NAME=asr4, SHIFT=0, BIT_WIDTH = 4>(VECT_IN:4)->(VECT_OUT:4); asr<CLB=0, LAT=0, NAME=asr5, SHIFT=0, BIT_WIDTH = 5>(VECT_IN:5)->(VECT_OUT:5); asr<CLB=0, LAT=0, NAME=asr6, SHIFT=0, BIT_WIDTH = 6>(VECT_IN:6)->(VECT_OUT:6); asr<CLB=0, LAT=0, NAME=asr7, SHIFT=0, BIT_WIDTH = 7>(VECT_IN:7)->(VECT_OUT:7); asr<CLB=0, LAT=0, NAME=asr8, SHIFT=0, BIT_WIDTH = 8>(VECT_IN:8)->(VECT_OUT:8); asr<CLB=0, LAT=0, NAME=asr9, SHIFT=0, BIT_WIDTH = 9>(VECT_IN:9)->(VECT_OUT:9); asr<CLB=0, LAT=0, NAME=asr10, SHIFT=0, BIT_WIDTH = 10>(VECT_IN:10)->(VECT_OUT:10); asr<CLB=0, LAT=0, NAME=asr11, SHIFT=0, BIT_WIDTH = 11>(VECT_IN:11)->(VECT_OUT:11); asr<CLB=0, LAT=0, NAME=asr12, SHIFT=0, BIT_WIDTH = 12>(VECT_IN:12)->(VECT_OUT:12); asr<CLB=0, LAT=0, NAME=asr13, SHIFT=0, BIT_WIDTH = 13>(VECT_IN:13)->(VECT_OUT:13); asr<CLB=0, LAT=0, NAME=asr14, SHIFT=0, BIT_WIDTH = 14>(VECT_IN:14)->(VECT_OUT:14); asr<CLB=0, LAT=0, NAME=asr15, SHIFT=0, BIT_WIDTH = 15>(VECT_IN:15)->(VECT_OUT:15); asr<CLB=0, LAT=0, NAME=asr16, SHIFT=0, BIT_WIDTH = 16>(VECT_IN:16)->(VECT_OUT:16); asr<CLB=0, LAT=0, NAME=asr17, SHIFT=0, BIT_WIDTH = 17>(VECT_IN:17)->(VECT_OUT:17); asr<CLB=0, LAT=0, NAME=asr18, SHIFT=0, BIT_WIDTH = 18>(VECT_IN:18)->(VECT_OUT:18); asr<CLB=0, LAT=0, NAME=asr19, SHIFT=0, BIT_WIDTH = 19>(VECT_IN:19)->(VECT_OUT:19); asr<CLB=0, LAT=0, NAME=asr20, SHIFT=0, BIT_WIDTH = 20>(VECT_IN:20)->(VECT_OUT:20); asr<CLB=0, LAT=0, NAME=asr21, SHIFT=0, BIT_WIDTH = 21>(VECT_IN:21)->(VECT_OUT:21); asr<CLB=0, LAT=0, NAME=asr22, SHIFT=0, BIT_WIDTH = 22>(VECT_IN:22)->(VECT_OUT:22); asr<CLB=0, LAT=0, NAME=asr23, SHIFT=0, BIT_WIDTH = 23>(VECT_IN:23)->(VECT_OUT:23); asr<CLB=0, LAT=0, NAME=asr24, SHIFT=0, BIT_WIDTH = 24>(VECT_IN:24)->(VECT_OUT:24); asr<CLB=0, LAT=0, NAME=asr25, SHIFT=0, BIT_WIDTH = 25>(VECT_IN:25)->(VECT_OUT:25); asr<CLB=0, LAT=0, NAME=asr26, SHIFT=0, BIT_WIDTH = 26>(VECT_IN:26)->(VECT_OUT:26); asr<CLB=0, LAT=0, NAME=asr27, SHIFT=0, BIT_WIDTH = 27>(VECT_IN:27)->(VECT_OUT:27); asr<CLB=0, LAT=0, NAME=asr28, SHIFT=0, BIT_WIDTH = 28>(VECT_IN:28)->(VECT_OUT:28); asr<CLB=0, LAT=0, NAME=asr29, SHIFT=0, BIT_WIDTH = 29>(VECT_IN:29)->(VECT_OUT:29); asr<CLB=0, LAT=0, NAME=asr30, SHIFT=0, BIT_WIDTH = 30>(VECT_IN:30)->(VECT_OUT:30); asr<CLB=0, LAT=0, NAME=asr31, SHIFT=0, BIT_WIDTH = 31>(VECT_IN:31)->(VECT_OUT:31); asr<CLB=0, LAT=0, NAME=asr32, SHIFT=0, BIT_WIDTH = 32>(VECT_IN:32)->(VECT_OUT:32); /*feedback mux.*/ /* The mux holds it's current value if the select line is 1.*/ /* Otherwise the mux passes the input value*/ fbmux<CLB=1, LAT=1, NAME=fbmux1, BIT_WIDTH = 1>(A:1, SEL:1)->(O:1); fbmux<CLB=1, LAT=1, NAME=fbmux2, BIT_WIDTH = 2>(A:2, SEL:1)->(O:2); fbmux<CLB=2, LAT=1, NAME=fbmux3, BIT_WIDTH = 3>(A:3, SEL:1)->(O:3); fbmux<CLB=2, LAT=1, NAME=fbmux4, BIT_WIDTH = 4>(A:4, SEL:1)->(O:4); fbmux<CLB=3, LAT=1, NAME=fbmux5, BIT_WIDTH = 5>(A:5, SEL:1)->(O:5); fbmux<CLB=3, LAT=1, NAME=fbmux6, BIT_WIDTH = 6>(A:6, SEL:1)->(O:6); fbmux<CLB=4, LAT=1, NAME=fbmux7, BIT_WIDTH = 7>(A:7, SEL:1)->(O:7); fbmux<CLB=4, LAT=1, NAME=fbmux8, BIT_WIDTH = 8>(A:8, SEL:1)->(O:8); fbmux<CLB=5, LAT=1, NAME=fbmux9, BIT_WIDTH = 9>(A:9, SEL:1)->(O:9); fbmux<CLB=5, LAT=1, NAME=fbmux10, BIT_WIDTH = 10>(A:10, SEL:1)->(O:10); fbmux<CLB=6, LAT=1, NAME=fbmux11, BIT_WIDTH = 11>(A:11, SEL:1)->(O:11); fbmux<CLB=6, LAT=1, NAME=fbmux12, BIT_WIDTH = 12>(A:12, SEL:1)->(O:12); fbmux<CLB=7, LAT=1, NAME=fbmux13, BIT_WIDTH = 13>(A:13, SEL:1)->(O:13); fbmux<CLB=7, LAT=1, NAME=fbmux14, BIT_WIDTH = 14>(A:14, SEL:1)->(O:14); fbmux<CLB=8, LAT=1, NAME=fbmux15, BIT_WIDTH = 15>(A:15, SEL:1)->(O:15); fbmux<CLB=8, LAT=1, NAME=fbmux16, BIT_WIDTH = 16>(A:16, SEL:1)->(O:16); fbmux<CLB=9, LAT=1, NAME=fbmux17, BIT_WIDTH = 17>(A:17, SEL:1)->(O:17); 111 fbmux<CLB=9, LAT=1, NAME=fbmux18, BIT_WIDTH = 18>(A:18, SEL:1)->(O:18); fbmux<CLB=10, LAT=1, NAME=fbmux19, BIT_WIDTH = 19>(A:19, SEL:1)->(O:19); fbmux<CLB=10, LAT=1, NAME=fbmux20, BIT_WIDTH = 20>(A:20, SEL:1)->(O:20); fbmux<CLB=11, LAT=1, NAME=fbmux21, BIT_WIDTH = 21>(A:21, SEL:1)->(O:21); fbmux<CLB=11, LAT=1, NAME=fbmux22, BIT_WIDTH = 22>(A:22, SEL:1)->(O:22); fbmux<CLB=12, LAT=1, NAME=fbmux23, BIT_WIDTH = 23>(A:23, SEL:1)->(O:23); fbmux<CLB=12, LAT=1, NAME=fbmux24, BIT_WIDTH = 24>(A:24, SEL:1)->(O:24); fbmux<CLB=13, LAT=1, NAME=fbmux25, BIT_WIDTH = 25>(A:25, SEL:1)->(O:25); fbmux<CLB=13, LAT=1, NAME=fbmux26, BIT_WIDTH = 26>(A:26, SEL:1)->(O:26); fbmux<CLB=14, LAT=1, NAME=fbmux27, BIT_WIDTH = 27>(A:27, SEL:1)->(O:27); fbmux<CLB=14, LAT=1, NAME=fbmux28, BIT_WIDTH = 28>(A:28, SEL:1)->(O:28); fbmux<CLB=15, LAT=1, NAME=fbmux29, BIT_WIDTH = 29>(A:29, SEL:1)->(O:29); fbmux<CLB=15, LAT=1, NAME=fbmux30, BIT_WIDTH = 30>(A:30, SEL:1)->(O:30); fbmux<CLB=16, LAT=1, NAME=fbmux31, BIT_WIDTH = 31>(A:31, SEL:1)->(O:31); fbmux<CLB=16, LAT=1, NAME=fbmux32, BIT_WIDTH = 32>(A:32, SEL:1)->(O:32); /*Bit adding, removal and access*/ getbit<CLB=0, LAT=0, NAME=getbit2, BITNUM=0, BIT_WIDTH = 2>(VECT_IN:2) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit3, BITNUM=0, BIT_WIDTH = 3>(VECT_IN:3) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit4, BITNUM=0, BIT_WIDTH = 4>(VECT_IN:4) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit5, BITNUM=0, BIT_WIDTH = 5>(VECT_IN:5) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit6, BITNUM=0, BIT_WIDTH = 6>(VECT_IN:6) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit7, BITNUM=0, BIT_WIDTH = 7>(VECT_IN:7) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit8, BITNUM=0, BIT_WIDTH = 8>(VECT_IN:8) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit9, BITNUM=0, BIT_WIDTH = 9>(VECT_IN:9) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit10, BITNUM=0, BIT_WIDTH = 10>(VECT_IN:10) -> getbit<CLB=0, (BIT_OUT:1); LAT=0, NAME=getbit11, BITNUM=0, BIT_WIDTH = 11>(VECT_IN:11) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit12, BITNUM=0, BIT_WIDTH = 12>(VECT_IN:12) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit13, BITNUM=0, BIT_WIDTH = 13>(VECT_IN:13) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit14, BITNUM=0, BIT_WIDTH = 14>(VECT_IN:14) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit15, BITNUM=0, BIT_WIDTH = 15>(VECT_IN:15) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit16, BITNUM=0, BIT_WIDTH = 16>(VECT_IN:16) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit17, BITNUM=0, BIT_WIDTH = 17>(VECT_IN:17) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit18, BITNUM=0, BIT_WIDTH = 18>(VECT_IN:18) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit19, BITNUM=0, BIT_WIDTH = 19>(VECT_IN:19) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit20, BITNUM=0, BIT_WIDTH = 20>(VECT_IN:20) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit21, BITNUM=0, BIT_WIDTH = 21>(VECT_IN:21) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit22, BITNUM=0, BIT_WIDTH = 22>(VECT_IN:22) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit23, BITNUM=0, BIT_WIDTH = 23>(VECT_IN:23) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit24, BITNUM=0, BIT_WIDTH = 24>(VECT_IN:24) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit25, BITNUM=0, BIT_WIDTH = 25>(VECT_IN:25) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit26, BITNUM=0, BIT_WIDTH = 26>(VECT_IN:26) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit27, BITNUM=0, BIT_WIDTH = 27>(VECT_IN:27) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit28, BITNUM=0, BIT_WIDTH = 28>(VECT_IN:28) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit29, BITNUM=0, BIT_WIDTH = 29>(VECT_IN:29) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit30, BITNUM=0, BIT_WIDTH = 30>(VECT_IN:30) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit31, BITNUM=0, BIT_WIDTH = 31>(VECT_IN:31) -> (BIT_OUT:1); getbit<CLB=0, LAT=0, NAME=getbit32, BITNUM=0, BIT_WIDTH = 32>(VECT_IN:32) -> (BIT_OUT:1); /*muxes (8_2 means 8 bit 2 to 1 mux, 8_4 means 8 bit 4 to 1 mux)*/ /* generated with Xilinx Core Generator*/ /*muxes (8_2 means 8 bit 2 to 1 mux, 8_4 means 8 bit 4 to 1 mux)*/ /* generated with Xilinx Core Generator*/ /*2x1 muxes*/ mux2<CLB=1, LAT=1, NAME=mux2_1, BIT_WIDTH = 1>(D0:1, D1:1, S0:1)->(O:1); mux2<CLB=1, LAT=1, NAME=mux2_2, BIT_WIDTH = 2>(D0:2, D1:2, S0:1)->(O:2); mux2<CLB=2, LAT=1, NAME=mux2_3, BIT_WIDTH = 3>(D0:3, D1:3, S0:1)->(O:3); mux2<CLB=2, LAT=1, NAME=mux2_4, BIT_WIDTH = 4>(D0:4, D1:4, S0:1)->(O:4); mux2<CLB=3, LAT=1, NAME=mux2_5, BIT_WIDTH = 5>(D0:5, D1:5, S0:1)->(O:5); mux2<CLB=3, LAT=1, NAME=mux2_6, BIT_WIDTH = 6>(D0:6, D1:6, S0:1)->(O:6); mux2<CLB=4, LAT=1, NAME=mux2_7, BIT_WIDTH = 7>(D0:7, D1:7, S0:1)->(O:7); mux2<CLB=4, LAT=1, NAME=mux2_8, BIT_WIDTH = 8>(D0:8, D1:8, S0:1)->(O:8); mux2<CLB=5, LAT=1, NAME=mux2_9, BIT_WIDTH = 9>(D0:9, D1:9, S0:1)->(O:9); mux2<CLB=5, LAT=1, NAME=mux2_10, BIT_WIDTH = 10>(D0:10, D1:10, S0:1)->(O:10); mux2<CLB=6, LAT=1, NAME=mux2_11, BIT_WIDTH = 11>(D0:11, D1:11, S0:1)->(O:11); mux2<CLB=6, LAT=1, NAME=mux2_12, BIT_WIDTH = 12>(D0:12, D1:12, S0:1)->(O:12); mux2<CLB=7, LAT=1, NAME=mux2_13, BIT_WIDTH = 13>(D0:13, D1:13, S0:1)->(O:13); mux2<CLB=7, LAT=1, NAME=mux2_14, BIT_WIDTH = 14>(D0:14, D1:14, S0:1)->(O:14); mux2<CLB=8, LAT=1, NAME=mux2_15, BIT_WIDTH = 15>(D0:15, D1:15, S0:1)->(O:15); mux2<CLB=8, LAT=1, NAME=mux2_16, BIT_WIDTH = 16>(D0:16, D1:16, S0:1)->(O:16); mux2<CLB=9, LAT=1, NAME=mux2_17, BIT_WIDTH = 17>(D0:17, D1:17, S0:1)->(O:17); 112 mux2<CLB=9, LAT=1, NAME=mux2_18, BIT_WIDTH = 18>(D0:18, D1:18, S0:1)->(O:18); mux2<CLB=10, LAT=1, NAME=mux2_19, BIT_WIDTH = 19>(D0:19, D1:19, S0:1)->(O:19); mux2<CLB=10, LAT=1, NAME=mux2_20, BIT_WIDTH = 20>(D0:20, D1:20, S0:1)->(O:20); mux2<CLB=11, LAT=1, NAME=mux2_21, BIT_WIDTH = 21>(D0:21, D1:21, S0:1)->(O:21); mux2<CLB=11, LAT=1, NAME=mux2_22, BIT_WIDTH = 22>(D0:22, D1:22, S0:1)->(O:22); mux2<CLB=12, LAT=1, NAME=mux2_23, BIT_WIDTH = 23>(D0:23, D1:23, S0:1)->(O:23); mux2<CLB=12, LAT=1, NAME=mux2_24, BIT_WIDTH = 24>(D0:24, D1:24, S0:1)->(O:24); mux2<CLB=13, LAT=1, NAME=mux2_25, BIT_WIDTH = 25>(D0:25, D1:25, S0:1)->(O:25); mux2<CLB=13, LAT=1, NAME=mux2_26, BIT_WIDTH = 26>(D0:26, D1:26, S0:1)->(O:26); mux2<CLB=14, LAT=1, NAME=mux2_27, BIT_WIDTH = 27>(D0:27, D1:27, S0:1)->(O:27); mux2<CLB=14, LAT=1, NAME=mux2_28, BIT_WIDTH = 28>(D0:28, D1:28, S0:1)->(O:28); mux2<CLB=15, LAT=1, NAME=mux2_29, BIT_WIDTH = 29>(D0:29, D1:29, S0:1)->(O:29); mux2<CLB=15, LAT=1, NAME=mux2_30, BIT_WIDTH = 30>(D0:30, D1:30, S0:1)->(O:30); mux2<CLB=16, LAT=1, NAME=mux2_31, BIT_WIDTH = 31>(D0:31, D1:30, S0:1)->(O:31); mux2<CLB=16, LAT=1, NAME=mux2_32, BIT_WIDTH = 32>(D0:32, D1:32, S0:1)->(O:32); /*4x1 muxes*/ mux4<CLB=1, LAT=1, NAME=mux4_1, BIT_WIDTH = 1>(D0:1, D1:1, D2:1, D3:1, S0:1, S1:1)->(O:1); mux4<CLB=2, LAT=1, NAME=mux4_2, BIT_WIDTH = 2>(D0:2, D1:2, D2:2, D3:2, S0:1, S1:1)->(O:2); mux4<CLB=3, LAT=1, NAME=mux4_3, BIT_WIDTH = 3>(D0:3, D1:3, D2:3, D3:3, S0:1, S1:1)->(O:3); mux4<CLB=4, LAT=1, NAME=mux4_4, BIT_WIDTH = 4>(D0:4, D1:4, D2:4, D3:4, S0:1, S1:1)->(O:4); mux4<CLB=5, LAT=1, NAME=mux4_5, BIT_WIDTH = 5>(D0:5, D1:5, D2:5, D3:5, S0:1, S1:1)->(O:5); mux4<CLB=6, LAT=1, NAME=mux4_6, BIT_WIDTH = 6>(D0:6, D1:6, D2:6, D3:6, S0:1, S1:1)->(O:6); mux4<CLB=7, LAT=1, NAME=mux4_7, BIT_WIDTH = 7>(D0:7, D1:7, D2:7, D3:7, S0:1, S1:1)->(O:7); mux4<CLB=8, LAT=1, NAME=mux4_8, BIT_WIDTH = 8>(D0:8, D1:8, D2:8, D3:8, S0:1, S1:1)->(O:8); mux4<CLB=9, LAT=1, NAME=mux4_9, BIT_WIDTH = 9>(D0:9, D1:9, D2:9, D3:9, S0:1, S1:1)->(O:9); mux4<CLB=10, LAT=1, NAME=mux4_10, BIT_WIDTH = 10>(D0:10, D1:10, D2:10, D3:10, S0:1,S1:1)->(O:10); mux4<CLB=11, LAT=1, NAME=mux4_11, BIT_WIDTH = 11>(D0:11, D1:11, D2:11, D3:11, S0:1,S1:1)->(O:11); mux4<CLB=12, LAT=1, NAME=mux4_12, BIT_WIDTH = 12>(D0:12, D1:12, D2:12, D3:12, S0:1,S1:1)->(O:12); mux4<CLB=13, LAT=1, NAME=mux4_13, BIT_WIDTH = 13>(D0:13, D1:13, D2:13, D3:13, S0:1,S1:1)->(O:13); mux4<CLB=14, LAT=1, NAME=mux4_14, BIT_WIDTH = 14>(D0:14, D1:14, D2:14, D3:14, S0:1,S1:1)->(O:14); mux4<CLB=15, LAT=1, NAME=mux4_15, BIT_WIDTH = 15>(D0:15, D1:15, D2:15, D3:15, S0:1,S1:1)->(O:15); mux4<CLB=16, LAT=1, NAME=mux4_16, BIT_WIDTH = 16>(D0:16, D1:16, D2:16, D3:16, S0:1,S1:1)->(O:16); mux4<CLB=17, LAT=1, NAME=mux4_17, BIT_WIDTH = 17>(D0:17, D1:17, D2:17, D3:17, S0:1,S1:1)->(O:17); mux4<CLB=18, LAT=1, NAME=mux4_18, BIT_WIDTH = 18>(D0:18, D1:18, D2:18, D3:18, S0:1,S1:1)->(O:18); mux4<CLB=19, LAT=1, NAME=mux4_19, BIT_WIDTH = 19>(D0:19, D1:19, D2:19, D3:19, S0:1,S1:1)->(O:19); mux4<CLB=20, LAT=1, NAME=mux4_20, BIT_WIDTH = 20>(D0:20, D1:20, D2:20, D3:20, S0:1,S1:1)->(O:20); mux4<CLB=21, LAT=1, NAME=mux4_21, BIT_WIDTH = 21>(D0:21, D1:21, D2:21, D3:21, S0:1,S1:1)->(O:21); mux4<CLB=22, LAT=1, NAME=mux4_22, BIT_WIDTH = 22>(D0:22, D1:22, D2:22, D3:22, S0:1,S1:1)->(O:22); mux4<CLB=23, LAT=1, NAME=mux4_23, BIT_WIDTH = 23>(D0:23, D1:23, D2:23, D3:23, S0:1,S1:1)->(O:23); mux4<CLB=24, LAT=1, NAME=mux4_24, BIT_WIDTH = 24>(D0:24, D1:24, D2:24, D3:24, S0:1,S1:1)->(O:24); mux4<CLB=25, LAT=1, NAME=mux4_25, BIT_WIDTH = 25>(D0:25, D1:25, D2:25, D3:25, S0:1,S1:1)->(O:25); mux4<CLB=26, LAT=1, NAME=mux4_26, BIT_WIDTH = 26>(D0:26, D1:26, D2:26, D3:26, S0:1,S1:1)->(O:26); mux4<CLB=27, LAT=1, NAME=mux4_27, BIT_WIDTH = 27>(D0:27, D1:27, D2:27, D3:27, S0:1,S1:1)->(O:27); mux4<CLB=28, LAT=1, NAME=mux4_28, BIT_WIDTH = 28>(D0:28, D1:28, D2:28, D3:28, S0:1,S1:1)->(O:28); mux4<CLB=29, LAT=1, NAME=mux4_29, BIT_WIDTH = 29>(D0:29, D1:29, D2:29, D3:29, S0:1,S1:1)->(O:29); mux4<CLB=30, LAT=1, NAME=mux4_30, BIT_WIDTH = 30>(D0:30, D1:30, D2:30, D3:30, S0:1,S1:1)->(O:30); mux4<CLB=31, LAT=1, NAME=mux4_31, BIT_WIDTH = 31>(D0:31, D1:31, D2:31, D3:31, S0:1,S1:1)->(O:31); mux4<CLB=32, LAT=1, NAME=mux4_32, BIT_WIDTH = 32>(D0:32, D1:32, D2:32, D3:32, S0:1,S1:1)->(O:32); /*nands*/ /* generated by hand*/ nandgate<CLB=2, LAT=1, NAME=nand1, BIT_WIDTH = 1>(A:1, B:1)->(RESULT:1); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=3, LAT=1, NAME=nand2, BIT_WIDTH = 2>(A:2, B:2)->(RESULT:2); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=4, LAT=1, NAME=nand3, BIT_WIDTH = 3>(A:3, B:3)->(RESULT:3); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=5, LAT=1, NAME=nand4, BIT_WIDTH = 4>(A:4, B:4)->(RESULT:4); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=6, LAT=1, NAME=nand5, BIT_WIDTH = 5>(A:5, B:5)->(RESULT:5); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=7, LAT=1, NAME=nand6, BIT_WIDTH = 6>(A:6, B:6)->(RESULT:6); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=8, LAT=1, NAME=nand7, BIT_WIDTH = 7>(A:7, B:7)->(RESULT:7); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=9, LAT=1, NAME=nand8, BIT_WIDTH = 8>(A:8, B:8)->(RESULT:8); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=10, LAT=1, NAME=nand9, BIT_WIDTH = 9>(A:9, B:9)->(RESULT:9); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=11, LAT=1, NAME=nand10, BIT_WIDTH = 10>(A:10, B:10)->(RESULT:10); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=12, LAT=1, NAME=nand11, BIT_WIDTH = 11>(A:11, B:11)->(RESULT:11); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=13, LAT=1, NAME=nand12, BIT_WIDTH = 12>(A:12, B:12)->(RESULT:12); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=14, LAT=1, NAME=nand13, BIT_WIDTH = 13>(A:13, B:13)->(RESULT:13); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=15, LAT=1, NAME=nand14, BIT_WIDTH = 14>(A:14, B:14)->(RESULT:14); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=16, LAT=1, NAME=nand15, BIT_WIDTH = 15>(A:15, B:15)->(RESULT:15); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=17, LAT=1, NAME=nand16, BIT_WIDTH = 16>(A:16, B:16)->(RESULT:16); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=18, LAT=1, NAME=nand17, BIT_WIDTH = 17>(A:17, B:17)->(RESULT:17); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=19, LAT=1, NAME=nand18, BIT_WIDTH = 18>(A:18, B:18)->(RESULT:18); /* Other Inputs (CLK:1, CLR:1)*/ nandgate<CLB=20, LAT=1, NAME=nand19, BIT_WIDTH = 19>(A:19, B:19)->(RESULT:19); /* Other Inputs (CLK:1, CLR:1)*/ 113 nandgate<CLB=21, LAT=1, NAME=nand20, BIT_WIDTH = 20>(A:20, B:20)->(RESULT:20); nandgate<CLB=22, LAT=1, NAME=nand21, BIT_WIDTH = 21>(A:21, B:21)->(RESULT:21); nandgate<CLB=23, LAT=1, NAME=nand22, BIT_WIDTH = 22>(A:22, B:22)->(RESULT:22); nandgate<CLB=24, LAT=1, NAME=nand23, BIT_WIDTH = 23>(A:23, B:23)->(RESULT:23); nandgate<CLB=25, LAT=1, NAME=nand24, BIT_WIDTH = 24>(A:24, B:24)->(RESULT:24); nandgate<CLB=26, LAT=1, NAME=nand25, BIT_WIDTH = 25>(A:25, B:25)->(RESULT:25); nandgate<CLB=27, LAT=1, NAME=nand26, BIT_WIDTH = 26>(A:26, B:26)->(RESULT:26); nandgate<CLB=28, LAT=1, NAME=nand27, BIT_WIDTH = 27>(A:27, B:27)->(RESULT:27); nandgate<CLB=29, LAT=1, NAME=nand28, BIT_WIDTH = 28>(A:28, B:28)->(RESULT:28); nandgate<CLB=30, LAT=1, NAME=nand29, BIT_WIDTH = 29>(A:29, B:29)->(RESULT:29); nandgate<CLB=31, LAT=1, NAME=nand30, BIT_WIDTH = 30>(A:30, B:30)->(RESULT:30); nandgate<CLB=32, LAT=1, NAME=nand31, BIT_WIDTH = 31>(A:31, B:31)->(RESULT:31); nandgate<CLB=33, LAT=1, NAME=nand32, BIT_WIDTH = 32>(A:32, B:32)->(RESULT:32); /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /*complementors*/ negate<CLB=1, LAT=1, NAME=negate1, BIT_WIDTH = 1>(A:1)->(Q:1); negate<CLB=1, LAT=1, NAME=negate2, BIT_WIDTH = 2>(A:2)->(Q:2); negate<CLB=2, LAT=1, NAME=negate3, BIT_WIDTH = 3>(A:3)->(Q:3); negate<CLB=2, LAT=1, NAME=negate4, BIT_WIDTH = 4>(A:4)->(Q:4); negate<CLB=3, LAT=1, NAME=negate5, BIT_WIDTH = 5>(A:5)->(Q:5); negate<CLB=3, LAT=1, NAME=negate6, BIT_WIDTH = 6>(A:6)->(Q:6); negate<CLB=4, LAT=1, NAME=negate7, BIT_WIDTH = 7>(A:7)->(Q:7); negate<CLB=4, LAT=1, NAME=negate8, BIT_WIDTH = 8>(A:8)->(Q:8); negate<CLB=5, LAT=1, NAME=negate9, BIT_WIDTH = 9>(A:9)->(Q:9); negate<CLB=5, LAT=1, NAME=negate10, BIT_WIDTH = 10>(A:10)->(Q:10); negate<CLB=6, LAT=1, NAME=negate11, BIT_WIDTH = 11>(A:11)->(Q:11); negate<CLB=6, LAT=1, NAME=negate12, BIT_WIDTH = 12>(A:12)->(Q:12); negate<CLB=7, LAT=1, NAME=negate13, BIT_WIDTH = 13>(A:13)->(Q:13); negate<CLB=7, LAT=1, NAME=negate14, BIT_WIDTH = 14>(A:14)->(Q:14); negate<CLB=8, LAT=1, NAME=negate15, BIT_WIDTH = 15>(A:15)->(Q:15); negate<CLB=8, LAT=1, NAME=negate16, BIT_WIDTH = 16>(A:16)->(Q:16); negate<CLB=9, LAT=1, NAME=negate17, BIT_WIDTH = 17>(A:17)->(Q:17); negate<CLB=9, LAT=1, NAME=negate18, BIT_WIDTH = 18>(A:18)->(Q:18); negate<CLB=10, LAT=1, NAME=negate19, BIT_WIDTH = 19>(A:19)->(Q:19); negate<CLB=10, LAT=1, NAME=negate20, BIT_WIDTH = 20>(A:20)->(Q:20); negate<CLB=11, LAT=1, NAME=negate21, BIT_WIDTH = 21>(A:21)->(Q:21); negate<CLB=11, LAT=1, NAME=negate22, BIT_WIDTH = 22>(A:22)->(Q:22); negate<CLB=12, LAT=1, NAME=negate23, BIT_WIDTH = 23>(A:23)->(Q:23); negate<CLB=12, LAT=1, NAME=negate24, BIT_WIDTH = 24>(A:24)->(Q:24); negate<CLB=13, LAT=1, NAME=negate25, BIT_WIDTH = 25>(A:25)->(Q:25); negate<CLB=13, LAT=1, NAME=negate26, BIT_WIDTH = 26>(A:26)->(Q:26); negate<CLB=14, LAT=1, NAME=negate27, BIT_WIDTH = 27>(A:27)->(Q:27); negate<CLB=15, LAT=1, NAME=negate28, BIT_WIDTH = 28>(A:28)->(Q:28); negate<CLB=15, LAT=1, NAME=negate29, BIT_WIDTH = 29>(A:29)->(Q:29); negate<CLB=15, LAT=1, NAME=negate30, BIT_WIDTH = 30>(A:30)->(Q:30); negate<CLB=16, LAT=1, NAME=negate31, BIT_WIDTH = 31>(A:31)->(Q:31); negate<CLB=16, LAT=1, NAME=negate32, BIT_WIDTH = 32>(A:32)->(Q:32); /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /*nors*/ /* generated by hand*/ norgate<CLB=2, LAT=1, NAME=nor1, BIT_WIDTH = 1>(A:1, B:1)->(RESULT:1); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=3, LAT=1, NAME=nor2, BIT_WIDTH = 2>(A:2, B:2)->(RESULT:2); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=4, LAT=1, NAME=nor3, BIT_WIDTH = 3>(A:3, B:3)->(RESULT:3); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=5, LAT=1, NAME=nor4, BIT_WIDTH = 4>(A:4, B:4)->(RESULT:4); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=6, LAT=1, NAME=nor5, BIT_WIDTH = 5>(A:5, B:5)->(RESULT:5); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=7, LAT=1, NAME=nor6, BIT_WIDTH = 6>(A:6, B:6)->(RESULT:6); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=8, LAT=1, NAME=nor7, BIT_WIDTH = 7>(A:7, B:7)->(RESULT:7); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=9, LAT=1, NAME=nor8, BIT_WIDTH = 8>(A:8, B:8)->(RESULT:8); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=10, LAT=1, NAME=nor9, BIT_WIDTH = 9>(A:9, B:9)->(RESULT:9); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=11, LAT=1, NAME=nor10, BIT_WIDTH = 10>(A:10, B:10)->(RESULT:10); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=12, LAT=1, NAME=nor11, BIT_WIDTH = 11>(A:11, B:11)->(RESULT:11); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=13, LAT=1, NAME=nor12, BIT_WIDTH = 12>(A:12, B:12)->(RESULT:12); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=14, LAT=1, NAME=nor13, BIT_WIDTH = 13>(A:13, B:13)->(RESULT:13); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=15, LAT=1, NAME=nor14, BIT_WIDTH = 14>(A:14, B:14)->(RESULT:14); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=16, LAT=1, NAME=nor15, BIT_WIDTH = 15>(A:15, B:15)->(RESULT:15); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=17, LAT=1, NAME=nor16, BIT_WIDTH = 16>(A:16, B:16)->(RESULT:16); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=18, LAT=1, NAME=nor17, BIT_WIDTH = 17>(A:17, B:17)->(RESULT:17); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=19, LAT=1, NAME=nor18, BIT_WIDTH = 18>(A:18, B:18)->(RESULT:18); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=20, LAT=1, NAME=nor19, BIT_WIDTH = 19>(A:19, B:19)->(RESULT:19); /* Other Inputs (CLK:1, CLR:1)*/ norgate<CLB=21, LAT=1, NAME=nor20, BIT_WIDTH = 20>(A:20, B:20)->(RESULT:20); /* Other Inputs (CLK:1, CLR:1)*/ 114 norgate<CLB=22, LAT=1, NAME=nor21, BIT_WIDTH = 21>(A:21, B:21)->(RESULT:21); norgate<CLB=23, LAT=1, NAME=nor22, BIT_WIDTH = 22>(A:22, B:22)->(RESULT:22); norgate<CLB=24, LAT=1, NAME=nor23, BIT_WIDTH = 23>(A:23, B:23)->(RESULT:23); norgate<CLB=25, LAT=1, NAME=nor24, BIT_WIDTH = 24>(A:24, B:24)->(RESULT:24); norgate<CLB=26, LAT=1, NAME=nor25, BIT_WIDTH = 25>(A:25, B:25)->(RESULT:25); norgate<CLB=27, LAT=1, NAME=nor26, BIT_WIDTH = 26>(A:26, B:26)->(RESULT:26); norgate<CLB=28, LAT=1, NAME=nor27, BIT_WIDTH = 27>(A:27, B:27)->(RESULT:27); norgate<CLB=29, LAT=1, NAME=nor28, BIT_WIDTH = 28>(A:28, B:28)->(RESULT:28); norgate<CLB=30, LAT=1, NAME=nor29, BIT_WIDTH = 29>(A:29, B:29)->(RESULT:29); norgate<CLB=31, LAT=1, NAME=nor30, BIT_WIDTH = 30>(A:30, B:30)->(RESULT:30); norgate<CLB=32, LAT=1, NAME=nor31, BIT_WIDTH = 31>(A:31, B:31)->(RESULT:31); norgate<CLB=33, LAT=1, NAME=nor32, BIT_WIDTH = 32>(A:32, B:32)->(RESULT:32); /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /*nots*/ /* generated by hand*/ notgate<CLB=2, LAT=1, NAME=not1, BIT_WIDTH = 1>(A:1)->(RESULT:1); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=3, LAT=1, NAME=not2, BIT_WIDTH = 2>(A:2)->(RESULT:2); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=4, LAT=1, NAME=not3, BIT_WIDTH = 3>(A:3)->(RESULT:3); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=5, LAT=1, NAME=not4, BIT_WIDTH = 4>(A:4)->(RESULT:4); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=6, LAT=1, NAME=not5, BIT_WIDTH = 5>(A:5)->(RESULT:5); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=7, LAT=1, NAME=not6, BIT_WIDTH = 6>(A:6)->(RESULT:6); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=8, LAT=1, NAME=not7, BIT_WIDTH = 7>(A:7)->(RESULT:7); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=9, LAT=1, NAME=not8, BIT_WIDTH = 8>(A:8)->(RESULT:8); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=10, LAT=1, NAME=not9, BIT_WIDTH = 9>(A:9)->(RESULT:9); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=11, LAT=1, NAME=not10, BIT_WIDTH = 10>(A:10)->(RESULT:10); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=12, LAT=1, NAME=not11, BIT_WIDTH = 11>(A:11)->(RESULT:11); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=13, LAT=1, NAME=not12, BIT_WIDTH = 12>(A:12)->(RESULT:12); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=14, LAT=1, NAME=not13, BIT_WIDTH = 13>(A:13)->(RESULT:13); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=15, LAT=1, NAME=not14, BIT_WIDTH = 14>(A:14)->(RESULT:14); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=16, LAT=1, NAME=not15, BIT_WIDTH = 15>(A:15)->(RESULT:15); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=17, LAT=1, NAME=not16, BIT_WIDTH = 16>(A:16)->(RESULT:16); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=18, LAT=1, NAME=not17, BIT_WIDTH = 17>(A:17)->(RESULT:17); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=19, LAT=1, NAME=not18, BIT_WIDTH = 18>(A:18)->(RESULT:18); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=20, LAT=1, NAME=not19, BIT_WIDTH = 19>(A:19)->(RESULT:19); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=21, LAT=1, NAME=not20, BIT_WIDTH = 20>(A:20)->(RESULT:20); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=22, LAT=1, NAME=not21, BIT_WIDTH = 21>(A:21)->(RESULT:21); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=23, LAT=1, NAME=not22, BIT_WIDTH = 22>(A:22)->(RESULT:22); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=24, LAT=1, NAME=not23, BIT_WIDTH = 23>(A:23)->(RESULT:23); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=25, LAT=1, NAME=not24, BIT_WIDTH = 24>(A:24)->(RESULT:24); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=26, LAT=1, NAME=not25, BIT_WIDTH = 25>(A:25)->(RESULT:25); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=27, LAT=1, NAME=not26, BIT_WIDTH = 26>(A:26)->(RESULT:26); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=28, LAT=1, NAME=not27, BIT_WIDTH = 27>(A:27)->(RESULT:27); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=29, LAT=1, NAME=not28, BIT_WIDTH = 28>(A:28)->(RESULT:28); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=30, LAT=1, NAME=not29, BIT_WIDTH = 29>(A:29)->(RESULT:29); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=31, LAT=1, NAME=not30, BIT_WIDTH = 30>(A:30)->(RESULT:30); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=32, LAT=1, NAME=not31, BIT_WIDTH = 31>(A:31)->(RESULT:31); /* Other Inputs (CLK:1, CLR:1)*/ notgate<CLB=33, LAT=1, NAME=not32, BIT_WIDTH = 32>(A:32)->(RESULT:32); /* Other Inputs (CLK:1, CLR:1)*/ /*ors*/ /* generated by hand*/ orgate<CLB=2, LAT=1, NAME=or1, BIT_WIDTH = 1>(A:1, B:1)->(RESULT:1); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=3, LAT=1, NAME=or2, BIT_WIDTH = 2>(A:2, B:2)->(RESULT:2); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=4, LAT=1, NAME=or3, BIT_WIDTH = 3>(A:3, B:3)->(RESULT:3); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=5, LAT=1, NAME=or4, BIT_WIDTH = 4>(A:4, B:4)->(RESULT:4); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=6, LAT=1, NAME=or5, BIT_WIDTH = 5>(A:5, B:5)->(RESULT:5); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=7, LAT=1, NAME=or6, BIT_WIDTH = 6>(A:6, B:6)->(RESULT:6); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=8, LAT=1, NAME=or7, BIT_WIDTH = 7>(A:7, B:7)->(RESULT:7); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=9, LAT=1, NAME=or8, BIT_WIDTH = 8>(A:8, B:8)->(RESULT:8); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=10, LAT=1, NAME=or9, BIT_WIDTH = 9>(A:9, B:9)->(RESULT:9); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=11, LAT=1, NAME=or10, BIT_WIDTH = 10>(A:10, B:10)->(RESULT:10); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=12, LAT=1, NAME=or11, BIT_WIDTH = 11>(A:11, B:11)->(RESULT:11); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=13, LAT=1, NAME=or12, BIT_WIDTH = 12>(A:12, B:12)->(RESULT:12); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=14, LAT=1, NAME=or13, BIT_WIDTH = 13>(A:13, B:13)->(RESULT:13); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=15, LAT=1, NAME=or14, BIT_WIDTH = 14>(A:14, B:14)->(RESULT:14); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=16, LAT=1, NAME=or15, BIT_WIDTH = 15>(A:15, B:15)->(RESULT:15); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=17, LAT=1, NAME=or16, BIT_WIDTH = 16>(A:16, B:16)->(RESULT:16); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=18, LAT=1, NAME=or17, BIT_WIDTH = 17>(A:17, B:17)->(RESULT:17); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=19, LAT=1, NAME=or18, BIT_WIDTH = 18>(A:18, B:18)->(RESULT:18); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=20, LAT=1, NAME=or19, BIT_WIDTH = 19>(A:19, B:19)->(RESULT:19); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=21, LAT=1, NAME=or20, BIT_WIDTH = 20>(A:20, B:20)->(RESULT:20); /* Other Inputs (CLK:1, CLR:1)*/ orgate<CLB=22, LAT=1, NAME=or21, BIT_WIDTH = 21>(A:21, B:21)->(RESULT:21); /* Other Inputs (CLK:1, CLR:1)*/ 115 orgate<CLB=23, LAT=1, NAME=or22, BIT_WIDTH = 22>(A:22, B:22)->(RESULT:22); orgate<CLB=24, LAT=1, NAME=or23, BIT_WIDTH = 23>(A:23, B:23)->(RESULT:23); orgate<CLB=25, LAT=1, NAME=or24, BIT_WIDTH = 24>(A:24, B:24)->(RESULT:24); orgate<CLB=26, LAT=1, NAME=or25, BIT_WIDTH = 25>(A:25, B:25)->(RESULT:25); orgate<CLB=27, LAT=1, NAME=or26, BIT_WIDTH = 26>(A:26, B:26)->(RESULT:26); orgate<CLB=28, LAT=1, NAME=or27, BIT_WIDTH = 27>(A:27, B:27)->(RESULT:27); orgate<CLB=29, LAT=1, NAME=or28, BIT_WIDTH = 28>(A:28, B:28)->(RESULT:28); orgate<CLB=30, LAT=1, NAME=or29, BIT_WIDTH = 29>(A:29, B:29)->(RESULT:29); orgate<CLB=31, LAT=1, NAME=or30, BIT_WIDTH = 30>(A:30, B:30)->(RESULT:30); orgate<CLB=32, LAT=1, NAME=or31, BIT_WIDTH = 31>(A:31, B:31)->(RESULT:31); orgate<CLB=33, LAT=1, NAME=or32, BIT_WIDTH = 32>(A:32, B:32)->(RESULT:32); /*delay buffers*/ reg<CLB=1, LAT=1, NAME=reg1, BIT_WIDTH = 1>(D:1)->(Q:1); reg<CLB=1, LAT=1, NAME=reg2, BIT_WIDTH = 2>(D:2)->(Q:2); reg<CLB=2, LAT=1, NAME=reg3, BIT_WIDTH = 3>(D:3)->(Q:3); reg<CLB=2, LAT=1, NAME=reg4, BIT_WIDTH = 4>(D:4)->(Q:4); reg<CLB=3, LAT=1, NAME=reg5, BIT_WIDTH = 5>(D:5)->(Q:5); reg<CLB=3, LAT=1, NAME=reg6, BIT_WIDTH = 6>(D:6)->(Q:6); reg<CLB=4, LAT=1, NAME=reg7, BIT_WIDTH = 7>(D:7)->(Q:7); reg<CLB=4, LAT=1, NAME=reg8, BIT_WIDTH = 8>(D:8)->(Q:8); reg<CLB=5, LAT=1, NAME=reg9, BIT_WIDTH = 9>(D:9)->(Q:9); reg<CLB=5, LAT=1, NAME=reg10, BIT_WIDTH = 10>(D:10)->(Q:10); reg<CLB=6, LAT=1, NAME=reg11, BIT_WIDTH = 11>(D:11)->(Q:11); reg<CLB=6, LAT=1, NAME=reg12, BIT_WIDTH = 12>(D:12)->(Q:12); reg<CLB=7, LAT=1, NAME=reg13, BIT_WIDTH = 13>(D:13)->(Q:13); reg<CLB=7, LAT=1, NAME=reg14, BIT_WIDTH = 14>(D:14)->(Q:14); reg<CLB=8, LAT=1, NAME=reg15, BIT_WIDTH = 15>(D:15)->(Q:15); reg<CLB=8, LAT=1, NAME=reg16, BIT_WIDTH = 16>(D:16)->(Q:16); reg<CLB=9, LAT=1, NAME=reg17, BIT_WIDTH = 17>(D:17)->(Q:17); reg<CLB=9, LAT=1, NAME=reg18, BIT_WIDTH = 18>(D:18)->(Q:18); reg<CLB=10, LAT=1, NAME=reg19, BIT_WIDTH = 19>(D:19)->(Q:19); reg<CLB=10, LAT=1, NAME=reg20, BIT_WIDTH = 20>(D:20)->(Q:20); reg<CLB=11, LAT=1, NAME=reg21, BIT_WIDTH = 21>(D:21)->(Q:21); reg<CLB=11, LAT=1, NAME=reg22, BIT_WIDTH = 22>(D:22)->(Q:22); reg<CLB=12, LAT=1, NAME=reg23, BIT_WIDTH = 23>(D:23)->(Q:23); reg<CLB=12, LAT=1, NAME=reg24, BIT_WIDTH = 24>(D:24)->(Q:24); reg<CLB=13, LAT=1, NAME=reg25, BIT_WIDTH = 25>(D:25)->(Q:25); reg<CLB=13, LAT=1, NAME=reg26, BIT_WIDTH = 26>(D:26)->(Q:26); reg<CLB=14, LAT=1, NAME=reg27, BIT_WIDTH = 27>(D:27)->(Q:27); reg<CLB=15, LAT=1, NAME=reg28, BIT_WIDTH = 28>(D:28)->(Q:28); reg<CLB=15, LAT=1, NAME=reg29, BIT_WIDTH = 29>(D:29)->(Q:29); reg<CLB=15, LAT=1, NAME=reg30, BIT_WIDTH = 30>(D:30)->(Q:30); reg<CLB=16, LAT=1, NAME=reg31, BIT_WIDTH = 31>(D:31)->(Q:31); reg<CLB=16, LAT=1, NAME=reg32, BIT_WIDTH = 32>(D:32)->(Q:32); /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /*rmvbit creates a new vector with either the MSB or LSB removed from*/ /*the input vector. use SIDE = 1 for MSB and SIDE = 0 for LSB*/ rmvbit<CLB=0, LAT=0, NAME=rmvbit2, SIDE = 0, BIT_WIDTH = 2> (VECT_IN:2) -> (VECT_OUT:1, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit3, SIDE = 0, BIT_WIDTH = 3> (VECT_IN:3) -> (VECT_OUT:2, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit4, SIDE = 0, BIT_WIDTH = 4> (VECT_IN:4) -> (VECT_OUT:3, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit5, SIDE = 0, BIT_WIDTH = 5> (VECT_IN:5) -> (VECT_OUT:4, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit6, SIDE = 0, BIT_WIDTH = 6> (VECT_IN:6) -> (VECT_OUT:5, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit7, SIDE = 0, BIT_WIDTH = 7> (VECT_IN:7) -> (VECT_OUT:6, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit8, SIDE = 0, BIT_WIDTH = 8> (VECT_IN:8) -> (VECT_OUT:7, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit9, SIDE = 0, BIT_WIDTH = 9> (VECT_IN:9) -> (VECT_OUT:8, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit10, SIDE = 0, BIT_WIDTH = 10> (VECT_IN:10) -> (VECT_OUT:9, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit11, SIDE = 0, BIT_WIDTH = 11> (VECT_IN:11) -> (VECT_OUT:10, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit12, SIDE = 0, BIT_WIDTH = 12> (VECT_IN:12) -> (VECT_OUT:11, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit13, SIDE = 0, BIT_WIDTH = 13> (VECT_IN:13) -> (VECT_OUT:12, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit14, SIDE = 0, BIT_WIDTH = 14> (VECT_IN:14) -> (VECT_OUT:13, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit15, SIDE = 0, BIT_WIDTH = 15> (VECT_IN:15) -> (VECT_OUT:14, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit16, SIDE = 0, BIT_WIDTH = 16> (VECT_IN:16) -> (VECT_OUT:15, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit17, SIDE = 0, BIT_WIDTH = 17> (VECT_IN:17) -> (VECT_OUT:16, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit18, SIDE = 0, BIT_WIDTH = 18> (VECT_IN:18) -> (VECT_OUT:17, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit19, SIDE = 0, BIT_WIDTH = 19> (VECT_IN:19) -> (VECT_OUT:18, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit20, SIDE = 0, BIT_WIDTH = 20> (VECT_IN:20) -> (VECT_OUT:19, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit21, SIDE = 0, BIT_WIDTH = 21> (VECT_IN:21) -> (VECT_OUT:20, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit22, SIDE = 0, BIT_WIDTH = 22> (VECT_IN:22) -> (VECT_OUT:21, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit23, SIDE = 0, BIT_WIDTH = 23> (VECT_IN:23) -> (VECT_OUT:22, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit24, SIDE = 0, BIT_WIDTH = 24> (VECT_IN:24) -> (VECT_OUT:23, BIT_OUT:1); 116 rmvbit<CLB=0, LAT=0, NAME=rmvbit25, SIDE = 0, BIT_WIDTH = 25> (VECT_IN:25) -> (VECT_OUT:24, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit26, SIDE = 0, BIT_WIDTH = 26> (VECT_IN:26) -> (VECT_OUT:25, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit27, SIDE = 0, BIT_WIDTH = 27> (VECT_IN:27) -> (VECT_OUT:26, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit28, SIDE = 0, BIT_WIDTH = 28> (VECT_IN:28) -> (VECT_OUT:27, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit29, SIDE = 0, BIT_WIDTH = 29> (VECT_IN:29) -> (VECT_OUT:28, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit30, SIDE = 0, BIT_WIDTH = 30> (VECT_IN:30) -> (VECT_OUT:29, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit31, SIDE = 0, BIT_WIDTH = 31> (VECT_IN:31) -> (VECT_OUT:30, BIT_OUT:1); rmvbit<CLB=0, LAT=0, NAME=rmvbit32, SIDE = 0, BIT_WIDTH = 32> (VECT_IN:32) -> (VECT_OUT:31, BIT_OUT:1); /*adders*/ sadd<CLB=3, LAT=1, NAME=sadd1, BIT_WIDTH = 1>(A:1, B:1, C_IN:1)->(Q_OUT:1, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=3, LAT=1, NAME=sadd2, BIT_WIDTH = 2>(A:2, B:2, C_IN:1)->(Q_OUT:2, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=4, LAT=1, NAME=sadd3, BIT_WIDTH = 3>(A:3, B:3, C_IN:1)->(Q_OUT:3, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=4, LAT=1, NAME=sadd4, BIT_WIDTH = 4>(A:4, B:4, C_IN:1)->(Q_OUT:4, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=5, LAT=1, NAME=sadd5, BIT_WIDTH = 5>(A:5, B:5, C_IN:1)->(Q_OUT:5, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=5, LAT=1, NAME=sadd6, BIT_WIDTH = 6>(A:6, B:6, C_IN:1)->(Q_OUT:6, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=6, LAT=1, NAME=sadd7, BIT_WIDTH = 7>(A:7, B:7, C_IN:1)->(Q_OUT:7, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=6, LAT=1, NAME=sadd8, BIT_WIDTH = 8>(A:8, B:8, C_IN:1)->(Q_OUT:8, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=7, LAT=1, NAME=sadd9, BIT_WIDTH = 9>(A:9, B:9, C_IN:1)->(Q_OUT:9, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=7, LAT=1, NAME=sadd10, BIT_WIDTH = 10>(A:10, B:10, C_IN:1)->(Q_OUT:10, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=8, LAT=1, NAME=sadd11, BIT_WIDTH = 11>(A:11, B:11, C_IN:1)->(Q_OUT:11, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=8, LAT=1, NAME=sadd12, BIT_WIDTH = 12>(A:12, B:12, C_IN:1)->(Q_OUT:12, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=9, LAT=1, NAME=sadd13, BIT_WIDTH = 13>(A:13, B:13, C_IN:1)->(Q_OUT:13, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=9, LAT=1, NAME=sadd14, BIT_WIDTH = 14>(A:14, B:14, C_IN:1)->(Q_OUT:14, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=10, LAT=1, NAME=sadd15, BIT_WIDTH = 15>(A:15, B:15, C_IN:1)->(Q_OUT:15, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=10, LAT=1, NAME=sadd16, BIT_WIDTH = 16>(A:16, B:16, C_IN:1)->(Q_OUT:16, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=11, LAT=1, NAME=sadd17, BIT_WIDTH = 17>(A:17, B:17, C_IN:1)->(Q_OUT:17, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=11, LAT=1, NAME=sadd18, BIT_WIDTH = 18>(A:18, B:18, C_IN:1)->(Q_OUT:18, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=12, LAT=1, NAME=sadd19, BIT_WIDTH = 19>(A:19, B:19, C_IN:1)->(Q_OUT:19, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=12, LAT=1, NAME=sadd20, BIT_WIDTH = 20>(A:20, B:20, C_IN:1)->(Q_OUT:20, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=13, LAT=1, NAME=sadd21, BIT_WIDTH = 21>(A:21, B:21, C_IN:1)->(Q_OUT:21, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=13, LAT=1, NAME=sadd22, BIT_WIDTH = 22>(A:22, B:22, C_IN:1)->(Q_OUT:22, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=14, LAT=1, NAME=sadd23, BIT_WIDTH = 23>(A:23, B:23, C_IN:1)->(Q_OUT:23, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=14, LAT=1, NAME=sadd24, BIT_WIDTH = 24>(A:24, B:24, C_IN:1)->(Q_OUT:24, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=15, LAT=1, NAME=sadd25, BIT_WIDTH = 25>(A:25, B:25, C_IN:1)->(Q_OUT:25, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=15, LAT=1, NAME=sadd26, BIT_WIDTH = 26>(A:26, B:26, C_IN:1)->(Q_OUT:26, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=16, LAT=1, NAME=sadd27, BIT_WIDTH = 27>(A:27, B:27, C_IN:1)->(Q_OUT:27, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=16, LAT=1, NAME=sadd28, BIT_WIDTH = 28>(A:28, B:28, C_IN:1)->(Q_OUT:28, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=17, LAT=1, NAME=sadd29, BIT_WIDTH = 29>(A:29, B:29, C_IN:1)->(Q_OUT:29, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=17, LAT=1, NAME=sadd30, BIT_WIDTH = 30>(A:30, B:30, C_IN:1)->(Q_OUT:30, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ sadd<CLB=18, LAT=1, NAME=sadd31, BIT_WIDTH = 31>(A:31, B:31, C_IN:1)->(Q_OUT:31, C_OUT:1); /* Other Inputs 117 (CLK:1, CLR:1)*/ sadd<CLB=18, LAT=1, NAME=sadd32, BIT_WIDTH = 32>(A:32, B:32, C_IN:1)->(Q_OUT:32, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ /*accumulator*/ /*RESET is active low*/ scadd<CLB=18, LAT=1, NAME=scadd1, BIT_WIDTH = 1>(A:1, RESET:1)->(Q_OUT:1, C_OUT:1); CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd2, BIT_WIDTH = 2>(A:2, RESET:1)->(Q_OUT:2, C_OUT:1); CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd3, BIT_WIDTH = 3>(A:3, RESET:1)->(Q_OUT:3, C_OUT:1); CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd4, BIT_WIDTH = 4>(A:4, RESET:1)->(Q_OUT:4, C_OUT:1); CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd5, BIT_WIDTH = 5>(A:5, RESET:1)->(Q_OUT:5, C_OUT:1); CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd6, BIT_WIDTH = 6>(A:6, RESET:1)->(Q_OUT:6, C_OUT:1); CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd7, BIT_WIDTH = 7>(A:7, RESET:1)->(Q_OUT:7, C_OUT:1); CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd8, BIT_WIDTH = 8>(A:8, RESET:1)->(Q_OUT:8, C_OUT:1); CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd9, BIT_WIDTH = 9>(A:9, RESET:1)->(Q_OUT:9, C_OUT:1); CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd10, BIT_WIDTH = 10>(A:10, RESET:1)->(Q_OUT:10, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd11, BIT_WIDTH = 11>(A:11, RESET:1)->(Q_OUT:11, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd12, BIT_WIDTH = 12>(A:12, RESET:1)->(Q_OUT:12, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd13, BIT_WIDTH = 13>(A:13, RESET:1)->(Q_OUT:13, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd14, BIT_WIDTH = 14>(A:14, RESET:1)->(Q_OUT:14 C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd15, BIT_WIDTH = 15>(A:15, RESET:1)->(Q_OUT:15, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd16, BIT_WIDTH = 16>(A:16, RESET:1)->(Q_OUT:16, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd17, BIT_WIDTH = 17>(A:17, RESET:1)->(Q_OUT:17, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd18, BIT_WIDTH = 18>(A:18, RESET:1)->(Q_OUT:18, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd19, BIT_WIDTH = 19>(A:19, RESET:1)->(Q_OUT:19, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd20, BIT_WIDTH = 20>(A:20, RESET:1)->(Q_OUT:20, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd21, BIT_WIDTH = 21>(A:21, RESET:1)->(Q_OUT:21, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd22, BIT_WIDTH = 22>(A:22, RESET:1)->(Q_OUT:22, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd23, BIT_WIDTH = 23>(A:23, RESET:1)->(Q_OUT:23, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd24, BIT_WIDTH = 24>(A:24, RESET:1)->(Q_OUT:24, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd25, BIT_WIDTH = 25>(A:25, RESET:1)->(Q_OUT:25, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd26, BIT_WIDTH = 26>(A:26, RESET:1)->(Q_OUT:26, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd27, BIT_WIDTH = 27>(A:27, RESET:1)->(Q_OUT:27, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd28, BIT_WIDTH = 28>(A:28, RESET:1)->(Q_OUT:28, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd29, BIT_WIDTH = 29>(A:29, RESET:1)->(Q_OUT:29, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd30, BIT_WIDTH = 30>(A:30, RESET:1)->(Q_OUT:30, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd31, BIT_WIDTH = 31>(A:31, RESET:1)->(Q_OUT:31, C_OUT:1); (CLK:1, CLR:1)*/ scadd<CLB=18, LAT=1, NAME=scadd32, BIT_WIDTH = 32>(A:32, RESET:1)->(Q_OUT:32, C_OUT:1); (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, /* Other Inputs (CLK:1, /* Other Inputs (CLK:1, /* Other Inputs (CLK:1, /* Other Inputs (CLK:1, /* Other Inputs (CLK:1, /* Other Inputs (CLK:1, /* Other Inputs (CLK:1, /* Other Inputs (CLK:1, /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs /* Other Inputs 118 /* Compare Operators*/ /* generated by hand.*/ /* signed operators*/ /* The operators accept two inputs A and B. The inputs are assumed to be in 2's complement format.*/ /* Two outputs are set or cleared based on A and B. The Eq output is set if A and B are equal and */ /* cleared if not. The GT output is set if A is greater than B and cleared if not. The default*/ /* case is Eq = 0 and GT = 0.*/ scmp<CLB=4, LAT=1, NAME=scmp1, BIT_WIDTH = 1>(A:1, B:1)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=5, LAT=1, NAME=scmp2, BIT_WIDTH = 2>(A:2, B:2)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=6, LAT=1, NAME=scmp3, BIT_WIDTH = 3>(A:3, B:3)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=7, LAT=1, NAME=scmp4, BIT_WIDTH = 4>(A:4, B:4)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=8, LAT=1, NAME=scmp5, BIT_WIDTH = 5>(A:5, B:5)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=9, LAT=1, NAME=scmp6, BIT_WIDTH = 6>(A:6, B:6)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=10, LAT=1, NAME=scmp7, BIT_WIDTH = 7>(A:7, B:7)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=11, LAT=1, NAME=scmp8, BIT_WIDTH = 8>(A:8, B:8)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=13, LAT=1, NAME=scmp9, BIT_WIDTH = 9>(A:9, B:9)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=14, LAT=1, NAME=scmp10, BIT_WIDTH = 10>(A:10, B:10)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=15, LAT=1, NAME=scmp11, BIT_WIDTH = 11>(A:11, B:11)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=16, LAT=1, NAME=scmp12, BIT_WIDTH = 12>(A:12, B:12)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=17, LAT=1, NAME=scmp13, BIT_WIDTH = 13>(A:13, B:13)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=18, LAT=1, NAME=scmp14, BIT_WIDTH = 14>(A:14, B:14)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=19, LAT=1, NAME=scmp15, BIT_WIDTH = 15>(A:15, B:15)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=20, LAT=1, NAME=scmp16, BIT_WIDTH = 16>(A:16, B:16)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=23, LAT=1, NAME=scmp17, BIT_WIDTH = 17>(A:17, B:17)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=24, LAT=1, NAME=scmp18, BIT_WIDTH = 18>(A:18, B:18)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=25, LAT=1, NAME=scmp19, BIT_WIDTH = 19>(A:19, B:19)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=26, LAT=1, NAME=scmp20, BIT_WIDTH = 20>(A:20, B:20)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=27, LAT=1, NAME=scmp21, BIT_WIDTH = 21>(A:21, B:21)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=28, LAT=1, NAME=scmp22, BIT_WIDTH = 22>(A:22, B:22)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=29, LAT=1, NAME=scmp23, BIT_WIDTH = 23>(A:23, B:23)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=30, LAT=1, NAME=scmp24, BIT_WIDTH = 24>(A:24, B:24)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=31, LAT=1, NAME=scmp25, BIT_WIDTH = 25>(A:25, B:25)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=32, LAT=1, NAME=scmp26, BIT_WIDTH = 26>(A:26, B:26)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=33, LAT=1, NAME=scmp27, BIT_WIDTH = 27>(A:27, B:27)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=34, LAT=1, NAME=scmp28, BIT_WIDTH = 28>(A:28, B:28)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=35, LAT=1, NAME=scmp29, BIT_WIDTH = 29>(A:29, B:29)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=36, LAT=1, NAME=scmp30, BIT_WIDTH = 30>(A:30, B:30)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=37, LAT=1, NAME=scmp31, BIT_WIDTH = 31>(A:31, B:31)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ scmp<CLB=37, LAT=1, NAME=scmp32, BIT_WIDTH = 32>(A:32, B:32)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ /* max operators (keep larger of input and internally stored old maximum) */ /*signed versions*/ /*RESET is active low*/ smax<CLB=25, LAT=1, NAME=smax1, BIT_WIDTH = 1>(A:1, RESET:1)->(O:1); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=25, LAT=1, NAME=smax2, BIT_WIDTH = 2>(A:2, RESET:1)->(O:2); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=26, LAT=1, NAME=smax3, BIT_WIDTH = 3>(A:3, RESET:1)->(O:3); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=26, LAT=1, NAME=smax4, BIT_WIDTH = 4>(A:4, RESET:1)->(O:4); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=27, LAT=1, NAME=smax5, BIT_WIDTH = 5>(A:5, RESET:1)->(O:5); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=27, LAT=1, NAME=smax6, BIT_WIDTH = 6>(A:6, RESET:1)->(O:6); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=28, LAT=1, NAME=smax7, BIT_WIDTH = 7>(A:7, RESET:1)->(O:7); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=28, LAT=1, NAME=smax8, BIT_WIDTH = 8>(A:8, RESET:1)->(O:8); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=29, LAT=1, NAME=smax9, BIT_WIDTH = 9>(A:9, RESET:1)->(O:9); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=29, LAT=1, NAME=smax10, BIT_WIDTH = 10>(A:10, RESET:1)->(O:10); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=30, LAT=1, NAME=smax11, BIT_WIDTH = 11>(A:11, RESET:1)->(O:11); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=30, LAT=1, NAME=smax12, BIT_WIDTH = 12>(A:12, RESET:1)->(O:12); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=31, LAT=1, NAME=smax13, BIT_WIDTH = 13>(A:13, RESET:1)->(O:13); /* Other Inputs (CLK:1, CLR:1)*/ 119 smax<CLB=31, LAT=1, NAME=smax14, BIT_WIDTH = 14>(A:14, RESET:1)->(O:14); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=32, LAT=1, NAME=smax15, BIT_WIDTH = 15>(A:15, RESET:1)->(O:15); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=32, LAT=1, NAME=smax16, BIT_WIDTH = 16>(A:16, RESET:1)->(O:16); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=33, LAT=1, NAME=smax17, BIT_WIDTH = 17>(A:17, RESET:1)->(O:17); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=33, LAT=1, NAME=smax18, BIT_WIDTH = 18>(A:18, RESET:1)->(O:18); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=34, LAT=1, NAME=smax19, BIT_WIDTH = 19>(A:19, RESET:1)->(O:19); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=34, LAT=1, NAME=smax20, BIT_WIDTH = 20>(A:20, RESET:1)->(O:20); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=35, LAT=1, NAME=smax21, BIT_WIDTH = 21>(A:21, RESET:1)->(O:21); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=35, LAT=1, NAME=smax22, BIT_WIDTH = 22>(A:22, RESET:1)->(O:22); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=36, LAT=1, NAME=smax23, BIT_WIDTH = 23>(A:23, RESET:1)->(O:23); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=36, LAT=1, NAME=smax24, BIT_WIDTH = 24>(A:24, RESET:1)->(O:24); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=37, LAT=1, NAME=smax25, BIT_WIDTH = 25>(A:25, RESET:1)->(O:25); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=37, LAT=1, NAME=smax26, BIT_WIDTH = 26>(A:26, RESET:1)->(O:26); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=38, LAT=1, NAME=smax27, BIT_WIDTH = 27>(A:27, RESET:1)->(O:27); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=38, LAT=1, NAME=smax28, BIT_WIDTH = 28>(A:28, RESET:1)->(O:28); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=39, LAT=1, NAME=smax29, BIT_WIDTH = 29>(A:29, RESET:1)->(O:29); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=39, LAT=1, NAME=smax30, BIT_WIDTH = 30>(A:30, RESET:1)->(O:30); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=40, LAT=1, NAME=smax31, BIT_WIDTH = 31>(A:31, RESET:1)->(O:31); /* Other Inputs (CLK:1, CLR:1)*/ smax<CLB=40, LAT=1, NAME=smax32, BIT_WIDTH = 32>(A:32, RESET:1)->(O:32); /* Other Inputs (CLK:1, CLR:1)*/ /* min operators (keep smaller of input and internally stored old minimum) */ /*RESET is active low*/ smin<CLB=25, LAT=1, NAME=smin1, BIT_WIDTH = 1>(A:1, RESET:1)->(O:1); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=25, LAT=1, NAME=smin2, BIT_WIDTH = 2>(A:2, RESET:1)->(O:2); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=26, LAT=1, NAME=smin3, BIT_WIDTH = 3>(A:3, RESET:1)->(O:3); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=26, LAT=1, NAME=smin4, BIT_WIDTH = 4>(A:4, RESET:1)->(O:4); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=27, LAT=1, NAME=smin5, BIT_WIDTH = 5>(A:5, RESET:1)->(O:5); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=27, LAT=1, NAME=smin6, BIT_WIDTH = 6>(A:6, RESET:1)->(O:6); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=28, LAT=1, NAME=smin7, BIT_WIDTH = 7>(A:7, RESET:1)->(O:7); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=28, LAT=1, NAME=smin8, BIT_WIDTH = 8>(A:8, RESET:1)->(O:8); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=29, LAT=1, NAME=smin9, BIT_WIDTH = 9>(A:9, RESET:1)->(O:9); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=29, LAT=1, NAME=smin10, BIT_WIDTH = 10>(A:10, RESET:1)->(O:10); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=30, LAT=1, NAME=smin11, BIT_WIDTH = 11>(A:11, RESET:1)->(O:11); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=30, LAT=1, NAME=smin12, BIT_WIDTH = 12>(A:12, RESET:1)->(O:12); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=31, LAT=1, NAME=smin13, BIT_WIDTH = 13>(A:13, RESET:1)->(O:13); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=31, LAT=1, NAME=smin14, BIT_WIDTH = 14>(A:14, RESET:1)->(O:14); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=32, LAT=1, NAME=smin15, BIT_WIDTH = 15>(A:15, RESET:1)->(O:15); /* Other Inputs (CLK:1, CLR:1)*/ 120 smin<CLB=32, LAT=1, NAME=smin16, BIT_WIDTH = 16>(A:16, RESET:1)->(O:16); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=33, LAT=1, NAME=smin17, BIT_WIDTH = 17>(A:17, RESET:1)->(O:17); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=33, LAT=1, NAME=smin18, BIT_WIDTH = 18>(A:18, RESET:1)->(O:18); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=34, LAT=1, NAME=smin19, BIT_WIDTH = 19>(A:19, RESET:1)->(O:19); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=34, LAT=1, NAME=smin20, BIT_WIDTH = 20>(A:20, RESET:1)->(O:20); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=35, LAT=1, NAME=smin21, BIT_WIDTH = 21>(A:21, RESET:1)->(O:21); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=35, LAT=1, NAME=smin22, BIT_WIDTH = 22>(A:22, RESET:1)->(O:22); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=36, LAT=1, NAME=smin23, BIT_WIDTH = 23>(A:23, RESET:1)->(O:23); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=36, LAT=1, NAME=smin24, BIT_WIDTH = 24>(A:24, RESET:1)->(O:24); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=37, LAT=1, NAME=smin25, BIT_WIDTH = 25>(A:25, RESET:1)->(O:25); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=37, LAT=1, NAME=smin26, BIT_WIDTH = 26>(A:26, RESET:1)->(O:26); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=38, LAT=1, NAME=smin27, BIT_WIDTH = 27>(A:27, RESET:1)->(O:27); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=38, LAT=1, NAME=smin28, BIT_WIDTH = 28>(A:28, RESET:1)->(O:28); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=39, LAT=1, NAME=smin29, BIT_WIDTH = 29>(A:29, RESET:1)->(O:29); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=39, LAT=1, NAME=smin30, BIT_WIDTH = 30>(A:30, RESET:1)->(O:30); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=40, LAT=1, NAME=smin31, BIT_WIDTH = 31>(A:31, RESET:1)->(O:31); /* Other Inputs (CLK:1, CLR:1)*/ smin<CLB=40, LAT=1, NAME=smin32, BIT_WIDTH = 32>(A:32, RESET:1)->(O:32); /* Other Inputs (CLK:1, CLR:1)*/ /*parallel multipliers*/ spmult<CLB=20, LAT=1, NAME=spmult1, BIT_WIDTH = 1>(A:1, B:1)->(PRODH:1, PRODL:1); /* Other Inputs (CLK:1, CLR:1)*/ /*would be implemented as and1*/ spmult<CLB=25, LAT=1, NAME=spmult2, BIT_WIDTH = 2>(A:2, B:2)->(PRODH:2, PRODL:2); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=30, LAT=1, NAME=spmult3, BIT_WIDTH = 3>(A:3, B:3)->(PRODH:3, PRODL:3); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=35, LAT=1, NAME=spmult4, BIT_WIDTH = 4>(A:4, B:4)->(PRODH:4, PRODL:4); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=40, LAT=1, NAME=spmult5, BIT_WIDTH = 5>(A:5, B:5)->(PRODH:5, PRODL:5); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=45, LAT=1, NAME=spmult6, BIT_WIDTH = 6>(A:6, B:6)->(PRODH:6, PRODL:6); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=50, LAT=1, NAME=spmult7, BIT_WIDTH = 7>(A:7, B:7)->(PRODH:7, PRODL:7); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=54, LAT=1, NAME=spmult8, BIT_WIDTH = 8>(A:8, B:8)->(PRODH:8, PRODL:8); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=70, LAT=1, NAME=spmult9, BIT_WIDTH = 9>(A:9, B:9)->(PRODH:9, PRODL:9); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=80, LAT=1, NAME=spmult10, BIT_WIDTH = 10>(A:10, B:10)->(PRODH:10, PRODL:10); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=100, LAT=1, NAME=spmult11, BIT_WIDTH = 11>(A:11, B:11)->(PRODH:11, PRODL:11); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=120, LAT=1, NAME=spmult12, BIT_WIDTH = 12>(A:12, B:12)->(PRODH:12, PRODL:12); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=150, LAT=1, NAME=spmult13, BIT_WIDTH = 13>(A:13, B:13)->(PRODH:13, PRODL:13); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=180, LAT=1, NAME=spmult14, BIT_WIDTH = 14>(A:14, B:14)->(PRODH:14, PRODL:14); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=200, LAT=1, NAME=spmult15, BIT_WIDTH = 15>(A:15, B:15)->(PRODH:15, PRODL:15); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=213, LAT=1, NAME=spmult16, BIT_WIDTH = 16>(A:16, B:16)->(PRODH:16, PRODL:16); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=350, LAT=1, NAME=spmult17, BIT_WIDTH = 17>(A:17, B:17)->(PRODH:17, PRODL:17); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=400, LAT=1, NAME=spmult18, BIT_WIDTH = 18>(A:18, B:18)->(PRODH:18, PRODL:18); /* Other Inputs 121 (CLK:1, CLR:1)*/ spmult<CLB=450, LAT=1, NAME=spmult19, BIT_WIDTH = 19>(A:19, B:19)->(PRODH:19, PRODL:19); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=500, LAT=1, NAME=spmult20, BIT_WIDTH = 20>(A:20, B:20)->(PRODH:20, PRODL:20); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=550, LAT=1, NAME=spmult21, BIT_WIDTH = 21>(A:21, B:21)->(PRODH:21, PRODL:21); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=600, LAT=1, NAME=spmult22, BIT_WIDTH = 22>(A:22, B:22)->(PRODH:22, PRODL:22); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=650, LAT=1, NAME=spmult23, BIT_WIDTH = 23>(A:23, B:23)->(PRODH:23, PRODL:23); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=700, LAT=1, NAME=spmult24, BIT_WIDTH = 24>(A:24, B:24)->(PRODH:24, PRODL:24); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=730, LAT=1, NAME=spmult25, BIT_WIDTH = 25>(A:25, B:25)->(PRODH:25, PRODL:25); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=770, LAT=1, NAME=spmult26, BIT_WIDTH = 26>(A:26, B:26)->(PRODH:26, PRODL:26); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=800, LAT=1, NAME=spmult27, BIT_WIDTH = 27>(A:27, B:27)->(PRODH:27, PRODL:27); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=825, LAT=1, NAME=spmult28, BIT_WIDTH = 28>(A:28, B:28)->(PRODH:28, PRODL:28); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=850, LAT=1, NAME=spmult29, BIT_WIDTH = 29>(A:29, B:29)->(PRODH:29, PRODL:29); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=875, LAT=1, NAME=spmult30, BIT_WIDTH = 30>(A:30, B:30)->(PRODH:30, PRODL:30); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=900, LAT=1, NAME=spmult31, BIT_WIDTH = 31>(A:31, B:31)->(PRODH:31, PRODL:31); /* Other Inputs (CLK:1, CLR:1)*/ spmult<CLB=922, LAT=1, NAME=spmult32, BIT_WIDTH = 32>(A:32, B:32)->(PRODH:32, PRODL:32); /* Other Inputs (CLK:1, CLR:1)*/ /*Square Root Operators*/ /* generated with Xilinx Core Generator*/ sqrt<CLB=8, LAT=, NAME=sqrt1, BIT_WIDTH = 1>(DIN:1)->(DOUT:1); /* Other Inputs (CE:1, C:1)*/ /*should be implemented as a reg1*/ sqrt<CLB=10, LAT=, NAME=sqrt2, BIT_WIDTH = 2>(DIN:2)->(DOUT:1); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=13, LAT=, NAME=sqrt3, BIT_WIDTH = 3>(DIN:3)->(DOUT:1); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=15, LAT=, NAME=sqrt4, BIT_WIDTH = 4>(DIN:4)->(DOUT:2); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=17, LAT=, NAME=sqrt5, BIT_WIDTH = 5>(DIN:5)->(DOUT:2); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=20, LAT=, NAME=sqrt6, BIT_WIDTH = 6>(DIN:6)->(DOUT:3); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=23, LAT=, NAME=sqrt7, BIT_WIDTH = 7>(DIN:7)->(DOUT:3); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=25, LAT=5, NAME=sqrt8, BIT_WIDTH = 8>(DIN:8)->(DOUT:4); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=40, LAT=, NAME=sqrt9, BIT_WIDTH = 9>(DIN:9)->(DOUT:4); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=50, LAT=, NAME=sqrt10, BIT_WIDTH = 10>(DIN:10)->(DOUT:5); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=55, LAT=, NAME=sqrt11, BIT_WIDTH = 11>(DIN:11)->(DOUT:5); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=60, LAT=, NAME=sqrt12, BIT_WIDTH = 12>(DIN:12)->(DOUT:6); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=65, LAT=, NAME=sqrt13, BIT_WIDTH = 13>(DIN:13)->(DOUT:6); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=70, LAT=, NAME=sqrt14, BIT_WIDTH = 14>(DIN:14)->(DOUT:7); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=75, LAT=, NAME=sqrt15, BIT_WIDTH = 15>(DIN:15)->(DOUT:7); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=81, LAT=9, NAME=sqrt16, BIT_WIDTH = 16>(DIN:16)->(DOUT:8); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=90, LAT=, NAME=sqrt17, BIT_WIDTH = 17>(DIN:17)->(DOUT:8); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=100, LAT=, NAME=sqrt18, BIT_WIDTH = 18>(DIN:18)->(DOUT:9); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=110, LAT=, NAME=sqrt19, BIT_WIDTH = 19>(DIN:19)->(DOUT:9); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=120, LAT=, NAME=sqrt20, BIT_WIDTH = 20>(DIN:20)->(DOUT:10); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=130, LAT=, NAME=sqrt21, BIT_WIDTH = 21>(DIN:21)->(DOUT:10); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=140, LAT=, NAME=sqrt22, BIT_WIDTH = 22>(DIN:22)->(DOUT:11); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=150, LAT=, NAME=sqrt23, BIT_WIDTH = 23>(DIN:23)->(DOUT:11); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=170, LAT=, NAME=sqrt24, BIT_WIDTH = 24>(DIN:24)->(DOUT:12); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=180, LAT=, NAME=sqrt25, BIT_WIDTH = 25>(DIN:25)->(DOUT:12); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=200, LAT=, NAME=sqrt26, BIT_WIDTH = 26>(DIN:26)->(DOUT:13); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=220, LAT=, NAME=sqrt27, BIT_WIDTH = 27>(DIN:27)->(DOUT:13); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=240, LAT=, NAME=sqrt28, BIT_WIDTH = 28>(DIN:28)->(DOUT:14); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=250, LAT=, NAME=sqrt29, BIT_WIDTH = 29>(DIN:29)->(DOUT:14); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=260, LAT=, NAME=sqrt30, BIT_WIDTH = 30>(DIN:30)->(DOUT:15); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=280, LAT=, NAME=sqrt31, BIT_WIDTH = 31>(DIN:31)->(DOUT:15); /* Other Inputs (CE:1, C:1)*/ sqrt<CLB=289, LAT=17, NAME=sqrt32, BIT_WIDTH = 32>(DIN:32)->(DOUT:16); /* Other Inputs (CE:1, C:1)*/ /*subtractors*/ ssub<CLB=3, LAT=1, NAME=ssub1, BIT_WIDTH = 1>(A:1, B:1, C_IN:1)->(Q_OUT:1, C_OUT:1); (CLK:1, CLR:1)*/ ssub<CLB=3, LAT=1, NAME=ssub2, BIT_WIDTH = 2>(A:2, B:2, C_IN:1)->(Q_OUT:2, C_OUT:1); (CLK:1, CLR:1)*/ /* Other Inputs /* Other Inputs 122 ssub<CLB=4, LAT=1, NAME=ssub3, BIT_WIDTH = 3>(A:3, B:3, C_IN:1)->(Q_OUT:3, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=4, LAT=1, NAME=ssub4, BIT_WIDTH = 4>(A:4, B:4, C_IN:1)->(Q_OUT:4, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=5, LAT=1, NAME=ssub5, BIT_WIDTH = 5>(A:5, B:5, C_IN:1)->(Q_OUT:5, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=5, LAT=1, NAME=ssub6, BIT_WIDTH = 6>(A:6, B:6, C_IN:1)->(Q_OUT:6, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=6, LAT=1, NAME=ssub7, BIT_WIDTH = 7>(A:7, B:7, C_IN:1)->(Q_OUT:7, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=6, LAT=1, NAME=ssub8, BIT_WIDTH = 8>(A:8, B:8, C_IN:1)->(Q_OUT:8, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=7, LAT=1, NAME=ssub9, BIT_WIDTH = 9>(A:9, B:9, C_IN:1)->(Q_OUT:9, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=7, LAT=1, NAME=ssub10, BIT_WIDTH = 10>(A:10, B:10, C_IN:1)->(Q_OUT:10, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=8, LAT=1, NAME=ssub11, BIT_WIDTH = 11>(A:11, B:11, C_IN:1)->(Q_OUT:11, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=8, LAT=1, NAME=ssub12, BIT_WIDTH = 12>(A:12, B:12, C_IN:1)->(Q_OUT:12, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=9, LAT=1, NAME=ssub13, BIT_WIDTH = 13>(A:13, B:13, C_IN:1)->(Q_OUT:13, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=9, LAT=1, NAME=ssub14, BIT_WIDTH = 14>(A:14, B:14, C_IN:1)->(Q_OUT:14, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=10, LAT=1, NAME=ssub15, BIT_WIDTH = 15>(A:15, B:15, C_IN:1)->(Q_OUT:15, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=10, LAT=1, NAME=ssub16, BIT_WIDTH = 16>(A:16, B:16, C_IN:1)->(Q_OUT:16, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=11, LAT=1, NAME=ssub17, BIT_WIDTH = 17>(A:17, B:17, C_IN:1)->(Q_OUT:17, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=11, LAT=1, NAME=ssub18, BIT_WIDTH = 18>(A:18, B:18, C_IN:1)->(Q_OUT:18, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=12, LAT=1, NAME=ssub19, BIT_WIDTH = 19>(A:19, B:19, C_IN:1)->(Q_OUT:19, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=12, LAT=1, NAME=ssub20, BIT_WIDTH = 20>(A:20, B:20, C_IN:1)->(Q_OUT:20, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=13, LAT=1, NAME=ssub21, BIT_WIDTH = 21>(A:21, B:21, C_IN:1)->(Q_OUT:21, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=14, LAT=1, NAME=ssub22, BIT_WIDTH = 22>(A:22, B:22, C_IN:1)->(Q_OUT:22, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=14, LAT=1, NAME=ssub23, BIT_WIDTH = 23>(A:23, B:23, C_IN:1)->(Q_OUT:23, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=14, LAT=1, NAME=ssub24, BIT_WIDTH = 24>(A:24, B:24, C_IN:1)->(Q_OUT:24, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=15, LAT=1, NAME=ssub25, BIT_WIDTH = 25>(A:25, B:25, C_IN:1)->(Q_OUT:25, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=15, LAT=1, NAME=ssub26, BIT_WIDTH = 26>(A:26, B:26, C_IN:1)->(Q_OUT:26, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=16, LAT=1, NAME=ssub27, BIT_WIDTH = 27>(A:27, B:27, C_IN:1)->(Q_OUT:27, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=16, LAT=1, NAME=ssub28, BIT_WIDTH = 28>(A:28, B:28, C_IN:1)->(Q_OUT:28, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=17, LAT=1, NAME=ssub29, BIT_WIDTH = 29>(A:29, B:29, C_IN:1)->(Q_OUT:29, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=17, LAT=1, NAME=ssub30, BIT_WIDTH = 30>(A:30, B:30, C_IN:1)->(Q_OUT:30, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=18, LAT=1, NAME=ssub31, BIT_WIDTH = 31>(A:31, B:31, C_IN:1)->(Q_OUT:31, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ ssub<CLB=18, LAT=1, NAME=ssub32, BIT_WIDTH = 32>(A:32, B:32, C_IN:1)->(Q_OUT:32, C_OUT:1); /* Other Inputs (CLK:1, CLR:1)*/ /* unsigned operators*/ ucmp<CLB=2, LAT=1, NAME=ucmp1, BIT_WIDTH = 1>(A:1, B:1)->(Eq:1, GT:1); ucmp<CLB=3, LAT=1, NAME=ucmp2, BIT_WIDTH = 2>(A:2, B:2)->(Eq:1, GT:1); ucmp<CLB=4, LAT=1, NAME=ucmp3, BIT_WIDTH = 3>(A:3, B:3)->(Eq:1, GT:1); ucmp<CLB=5, LAT=1, NAME=ucmp4, BIT_WIDTH = 4>(A:4, B:4)->(Eq:1, GT:1); ucmp<CLB=6, LAT=1, NAME=ucmp5, BIT_WIDTH = 5>(A:5, B:5)->(Eq:1, GT:1); ucmp<CLB=7, LAT=1, NAME=ucmp6, BIT_WIDTH = 6>(A:6, B:6)->(Eq:1, GT:1); ucmp<CLB=8, LAT=1, NAME=ucmp7, BIT_WIDTH = 7>(A:7, B:7)->(Eq:1, GT:1); ucmp<CLB=9, LAT=1, NAME=ucmp8, BIT_WIDTH = 8>(A:8, B:8)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ /* Other Inputs (CLK:1, CLR:1)*/ 123 ucmp<CLB=10, LAT=1, NAME=ucmp9, BIT_WIDTH = 9>(A:9, B:9)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=11, LAT=1, NAME=ucmp10, BIT_WIDTH = 10>(A:10, B:10)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=12, LAT=1, NAME=ucmp11, BIT_WIDTH = 11>(A:11, B:11)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=13, LAT=1, NAME=ucmp12, BIT_WIDTH = 12>(A:12, B:12)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=14, LAT=1, NAME=ucmp13, BIT_WIDTH = 13>(A:13, B:13)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=15, LAT=1, NAME=ucmp14, BIT_WIDTH = 14>(A:14, B:14)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=16, LAT=1, NAME=ucmp15, BIT_WIDTH = 15>(A:15, B:15)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=17, LAT=1, NAME=ucmp16, BIT_WIDTH = 16>(A:16, B:16)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=19, LAT=1, NAME=ucmp17, BIT_WIDTH = 17>(A:17, B:17)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=20, LAT=1, NAME=ucmp18, BIT_WIDTH = 18>(A:18, B:18)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=21, LAT=1, NAME=ucmp19, BIT_WIDTH = 19>(A:19, B:19)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=22, LAT=1, NAME=ucmp20, BIT_WIDTH = 20>(A:20, B:20)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=23, LAT=1, NAME=ucmp21, BIT_WIDTH = 21>(A:21, B:21)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=24, LAT=1, NAME=ucmp22, BIT_WIDTH = 22>(A:22, B:22)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=25, LAT=1, NAME=ucmp23, BIT_WIDTH = 23>(A:23, B:23)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=26, LAT=1, NAME=ucmp24, BIT_WIDTH = 24>(A:24, B:24)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=27, LAT=1, NAME=ucmp25, BIT_WIDTH = 25>(A:25, B:25)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=28, LAT=1, NAME=ucmp26, BIT_WIDTH = 26>(A:26, B:26)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=29, LAT=1, NAME=ucmp27, BIT_WIDTH = 27>(A:27, B:27)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=30, LAT=1, NAME=ucmp28, BIT_WIDTH = 28>(A:28, B:28)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=31, LAT=1, NAME=ucmp29, BIT_WIDTH = 29>(A:29, B:29)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=32, LAT=1, NAME=ucmp30, BIT_WIDTH = 30>(A:30, B:30)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=33, LAT=1, NAME=ucmp31, BIT_WIDTH = 31>(A:31, B:31)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ ucmp<CLB=34, LAT=1, NAME=ucmp32, BIT_WIDTH = 32>(A:32, B:32)->(Eq:1, GT:1); /* Other Inputs (CLK:1, CLR:1)*/ /* max operators (keep larger of input and internally stored old maximum) */ /*unsigned versions*/ /*RESET is active low*/ umax<CLB=25, LAT=1, NAME=umax1, BIT_WIDTH = 1>(A:1, RESET:1)->(O:1); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=25, LAT=1, NAME=umax2, BIT_WIDTH = 2>(A:2, RESET:1)->(O:2); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=26, LAT=1, NAME=umax3, BIT_WIDTH = 3>(A:3, RESET:1)->(O:3); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=26, LAT=1, NAME=umax4, BIT_WIDTH = 4>(A:4, RESET:1)->(O:4); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=27, LAT=1, NAME=umax5, BIT_WIDTH = 5>(A:5, RESET:1)->(O:5); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=27, LAT=1, NAME=umax6, BIT_WIDTH = 6>(A:6, RESET:1)->(O:6); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=28, LAT=1, NAME=umax7, BIT_WIDTH = 7>(A:7, RESET:1)->(O:7); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=28, LAT=1, NAME=umax8, BIT_WIDTH = 8>(A:8, RESET:1)->(O:8); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=29, LAT=1, NAME=umax9, BIT_WIDTH = 9>(A:9, RESET:1)->(O:9); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=29, LAT=1, NAME=umax10, BIT_WIDTH = 10>(A:10, RESET:1)->(O:10); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=30, LAT=1, NAME=umax11, BIT_WIDTH = 11>(A:11, RESET:1)->(O:11); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=30, LAT=1, NAME=umax12, BIT_WIDTH = 12>(A:12, RESET:1)->(O:12); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=31, LAT=1, NAME=umax13, BIT_WIDTH = 13>(A:13, RESET:1)->(O:13); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=31, LAT=1, NAME=umax14, BIT_WIDTH = 14>(A:14, RESET:1)->(O:14); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=32, LAT=1, NAME=umax15, BIT_WIDTH = 15>(A:15, RESET:1)->(O:15); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=32, LAT=1, NAME=umax16, BIT_WIDTH = 16>(A:16, RESET:1)->(O:16); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=33, LAT=1, NAME=umax17, BIT_WIDTH = 17>(A:17, RESET:1)->(O:17); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=33, LAT=1, NAME=umax18, BIT_WIDTH = 18>(A:18, RESET:1)->(O:18); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=34, LAT=1, NAME=umax19, BIT_WIDTH = 19>(A:19, RESET:1)->(O:19); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=34, LAT=1, NAME=umax20, BIT_WIDTH = 20>(A:20, RESET:1)->(O:20); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=35, LAT=1, NAME=umax21, BIT_WIDTH = 21>(A:21, RESET:1)->(O:21); 124 /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=35, LAT=1, NAME=umax22, BIT_WIDTH = 22>(A:22, RESET:1)->(O:22); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=36, LAT=1, NAME=umax23, BIT_WIDTH = 23>(A:23, RESET:1)->(O:23); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=36, LAT=1, NAME=umax24, BIT_WIDTH = 24>(A:24, RESET:1)->(O:24); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=37, LAT=1, NAME=umax25, BIT_WIDTH = 25>(A:25, RESET:1)->(O:25); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=37, LAT=1, NAME=umax26, BIT_WIDTH = 26>(A:26, RESET:1)->(O:26); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=38, LAT=1, NAME=umax27, BIT_WIDTH = 27>(A:27, RESET:1)->(O:27); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=38, LAT=1, NAME=umax28, BIT_WIDTH = 28>(A:28, RESET:1)->(O:28); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=39, LAT=1, NAME=umax29, BIT_WIDTH = 29>(A:29, RESET:1)->(O:29); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=39, LAT=1, NAME=umax30, BIT_WIDTH = 30>(A:30, RESET:1)->(O:30); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=40, LAT=1, NAME=umax31, BIT_WIDTH = 31>(A:31, RESET:1)->(O:31); /* Other Inputs (CLK:1, CLR:1)*/ umax<CLB=40, LAT=1, NAME=umax32, BIT_WIDTH = 32>(A:32, RESET:1)->(O:32); /* Other Inputs (CLK:1, CLR:1)*/ /* min operators (keep smaller of input and internally stored old minimum) */ /*RESET is active low*/ umin<CLB=25, LAT=1, NAME=umin1, BIT_WIDTH = 1>(A:1, RESET:1)->(O:1); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=25, LAT=1, NAME=umin2, BIT_WIDTH = 2>(A:2, RESET:1)->(O:2); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=26, LAT=1, NAME=umin3, BIT_WIDTH = 3>(A:3, RESET:1)->(O:3); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=26, LAT=1, NAME=umin4, BIT_WIDTH = 4>(A:4, RESET:1)->(O:4); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=27, LAT=1, NAME=umin5, BIT_WIDTH = 5>(A:5, RESET:1)->(O:5); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=27, LAT=1, NAME=umin6, BIT_WIDTH = 6>(A:6, RESET:1)->(O:6); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=28, LAT=1, NAME=umin7, BIT_WIDTH = 7>(A:7, RESET:1)->(O:7); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=28, LAT=1, NAME=umin8, BIT_WIDTH = 8>(A:8, RESET:1)->(O:8); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=29, LAT=1, NAME=umin9, BIT_WIDTH = 9>(A:9, RESET:1)->(O:9); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=29, LAT=1, NAME=umin10, BIT_WIDTH = 10>(A:10, RESET:1)->(O:10); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=30, LAT=1, NAME=umin11, BIT_WIDTH = 11>(A:11, RESET:1)->(O:11); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=30, LAT=1, NAME=umin12, BIT_WIDTH = 12>(A:12, RESET:1)->(O:12); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=31, LAT=1, NAME=umin13, BIT_WIDTH = 13>(A:13, RESET:1)->(O:13); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=31, LAT=1, NAME=umin14, BIT_WIDTH = 14>(A:14, RESET:1)->(O:14); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=32, LAT=1, NAME=umin15, BIT_WIDTH = 15>(A:15, RESET:1)->(O:15); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=32, LAT=1, NAME=umin16, BIT_WIDTH = 16>(A:16, RESET:1)->(O:16); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=33, LAT=1, NAME=umin17, BIT_WIDTH = 17>(A:17, RESET:1)->(O:17); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=33, LAT=1, NAME=umin18, BIT_WIDTH = 18>(A:18, RESET:1)->(O:18); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=34, LAT=1, NAME=umin19, BIT_WIDTH = 19>(A:19, RESET:1)->(O:19); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=34, LAT=1, NAME=umin20, BIT_WIDTH = 20>(A:20, RESET:1)->(O:20); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=35, LAT=1, NAME=umin21, BIT_WIDTH = 21>(A:21, RESET:1)->(O:21); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=35, LAT=1, NAME=umin22, BIT_WIDTH = 22>(A:22, RESET:1)->(O:22); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=36, LAT=1, NAME=umin23, BIT_WIDTH = 23>(A:23, RESET:1)->(O:23); 125 /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=36, LAT=1, NAME=umin24, BIT_WIDTH = 24>(A:24, RESET:1)->(O:24); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=37, LAT=1, NAME=umin25, BIT_WIDTH = 25>(A:25, RESET:1)->(O:25); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=37, LAT=1, NAME=umin26, BIT_WIDTH = 26>(A:26, RESET:1)->(O:26); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=38, LAT=1, NAME=umin27, BIT_WIDTH = 27>(A:27, RESET:1)->(O:27); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=38, LAT=1, NAME=umin28, BIT_WIDTH = 28>(A:28, RESET:1)->(O:28); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=39, LAT=1, NAME=umin29, BIT_WIDTH = 29>(A:29, RESET:1)->(O:29); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=39, LAT=1, NAME=umin30, BIT_WIDTH = 30>(A:30, RESET:1)->(O:30); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=40, LAT=1, NAME=umin31, BIT_WIDTH = 31>(A:31, RESET:1)->(O:31); /* Other Inputs (CLK:1, CLR:1)*/ umin<CLB=40, LAT=1, NAME=umin32, BIT_WIDTH = 32>(A:32, RESET:1)->(O:32); /* Other Inputs (CLK:1, CLR:1)*/ /*xors*/ /* generated by hand*/ xorgate<CLB=2, LAT=1, NAME=xor1, BIT_WIDTH = 1>(A:1, B:1)->(RESULT:1); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=3, LAT=1, NAME=xor2, BIT_WIDTH = 2>(A:2, B:2)->(RESULT:2); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=4, LAT=1, NAME=xor3, BIT_WIDTH = 3>(A:3, B:3)->(RESULT:3); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=5, LAT=1, NAME=xor4, BIT_WIDTH = 4>(A:4, B:4)->(RESULT:4); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=6, LAT=1, NAME=xor5, BIT_WIDTH = 5>(A:5, B:5)->(RESULT:5); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=7, LAT=1, NAME=xor6, BIT_WIDTH = 6>(A:6, B:6)->(RESULT:6); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=8, LAT=1, NAME=xor7, BIT_WIDTH = 7>(A:7, B:7)->(RESULT:7); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=9, LAT=1, NAME=xor8, BIT_WIDTH = 8>(A:8, B:8)->(RESULT:8); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=10, LAT=1, NAME=xor9, BIT_WIDTH = 9>(A:9, B:9)->(RESULT:9); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=11, LAT=1, NAME=xor10, BIT_WIDTH = 10>(A:10, B:10)->(RESULT:10); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=12, LAT=1, NAME=xor11, BIT_WIDTH = 11>(A:11, B:11)->(RESULT:11); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=13, LAT=1, NAME=xor12, BIT_WIDTH = 12>(A:12, B:12)->(RESULT:12); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=14, LAT=1, NAME=xor13, BIT_WIDTH = 13>(A:13, B:13)->(RESULT:13); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=15, LAT=1, NAME=xor14, BIT_WIDTH = 14>(A:14, B:14)->(RESULT:14); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=16, LAT=1, NAME=xor15, BIT_WIDTH = 15>(A:15, B:15)->(RESULT:15); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=17, LAT=1, NAME=xor16, BIT_WIDTH = 16>(A:16, B:16)->(RESULT:16); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=18, LAT=1, NAME=xor17, BIT_WIDTH = 17>(A:17, B:17)->(RESULT:17); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=19, LAT=1, NAME=xor18, BIT_WIDTH = 18>(A:18, B:18)->(RESULT:18); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=20, LAT=1, NAME=xor19, BIT_WIDTH = 19>(A:19, B:19)->(RESULT:19); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=21, LAT=1, NAME=xor20, BIT_WIDTH = 20>(A:20, B:20)->(RESULT:20); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=22, LAT=1, NAME=xor21, BIT_WIDTH = 21>(A:21, B:21)->(RESULT:21); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=23, LAT=1, NAME=xor22, BIT_WIDTH = 22>(A:22, B:22)->(RESULT:22); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=24, LAT=1, NAME=xor23, BIT_WIDTH = 23>(A:23, B:23)->(RESULT:23); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=25, LAT=1, NAME=xor24, BIT_WIDTH = 24>(A:24, B:24)->(RESULT:24); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=26, LAT=1, NAME=xor25, BIT_WIDTH = 25>(A:25, B:25)->(RESULT:25); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=27, LAT=1, NAME=xor26, BIT_WIDTH = 26>(A:26, B:26)->(RESULT:26); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=28, LAT=1, NAME=xor27, BIT_WIDTH = 27>(A:27, B:27)->(RESULT:27); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=29, LAT=1, NAME=xor28, BIT_WIDTH = 28>(A:28, B:28)->(RESULT:28); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=30, LAT=1, NAME=xor29, BIT_WIDTH = 29>(A:29, B:29)->(RESULT:29); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=31, LAT=1, NAME=xor30, BIT_WIDTH = 30>(A:30, B:30)->(RESULT:30); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=32, LAT=1, NAME=xor31, BIT_WIDTH = 31>(A:31, B:31)->(RESULT:31); /* Other Inputs (CLK:1, CLR:1)*/ xorgate<CLB=33, LAT=1, NAME=xor32, BIT_WIDTH = 32>(A:32, B:32)->(RESULT:32); /* Other Inputs (CLK:1, CLR:1)*/ A.2.2 Files for Sample Computational Element A.2.2.1 januslibtest.vhd library IEEE; use IEEE.std_logic_1164.all; library janlib; use janlib.janpack.all; 126 entity USER_COMPONENT is port ( CLOCK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(15 downto 0); B: in STD_LOGIC_VECTOR(15 downto 0); RESULT: out STD_LOGIC_VECTOR(15 downto 0) ); end USER_COMPONENT; architecture USER_COMPONENT_arch of USER_COMPONENT is begin nand1 : nandgate generic map (bit_width => 16) port map (CLOCK, CLR, A, B, RESULT); end USER_COMPONENT_arch; A.2.2.2 mmpcomp_sample.vhd library IEEE; use IEEE.std_logic_1164.all; -- library janlib; -- use janlib.janpack.all; entity USER_VHDL_MODEL is port (CLOCK: in STD_LOGIC; CLR: in STD_LOGIC; DONE: out STD_LOGIC; MEM_SIGNAL0: out STD_LOGIC; --used for read/write signal MEM_SIGNAL1: out STD_LOGIC_VECTOR (21 downto 0); -- used for passing address MEM_SIGNAL2: out STD_LOGIC_VECTOR (31 downto 0); -- used for data to memory MEM_SIGNAL3: in STD_LOGIC_VECTOR (31 downto 0); -- used for data from memory DATA_SIGNAL0: in STD_LOGIC_VECTOR (35 downto 0); -- used for PE_LEFT_IN DATA_SIGNAL1: out STD_LOGIC_VECTOR (35 downto 0); -- used for PE_LEFT_OUT DATA_SIGNAL2: in STD_LOGIC_VECTOR (35 downto 0); -- used for PE_RIGHT_IN DATA_SIGNAL3: out STD_LOGIC_VECTOR (35 downto 0) -- used for PE_RIGHT_OUT ); end USER_VHDL_MODEL; architecture USER_VHDL_MODEL_arch of USER_VHDL_MODEL is component MEM_PORT generic ( PATH_DELAY: INTEGER := 0; -- delay until valid data --(# clk cycles to wait before writing or reading from memory) --(this is not implemented yet. use addgen delay) DATA_WIDTH: INTEGER := 32; -- width of data vectors ADDRESS_WIDTH: INTEGER := 22; -- width of address bus MEMORY_PORTS: INTEGER := 8 -- number of memory ports (currently ignored (fixed at 8)) ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; CLK_ENABLE: out STD_LOGIC; MEM_PORT_WRITE_SEL_N: out STD_LOGIC; -- conveys R/W signal up to skeleton graph MEM_ADDRESS: out STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); --address for memory read or write DATA_FROM_MEM: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); --data coming from memory DATA_TO_MEM: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); -- data going to memory MEMORY_DIRECTION0: in STD_LOGIC; MEMORY_DIRECTION1: in STD_LOGIC; MEMORY_DIRECTION2: in STD_LOGIC; 127 MEMORY_DIRECTION3: in STD_LOGIC; MEMORY_DIRECTION4: in STD_LOGIC; MEMORY_DIRECTION5: in STD_LOGIC; MEMORY_DIRECTION6: in STD_LOGIC; MEMORY_DIRECTION7: in STD_LOGIC; ADDRESS0: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS1: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS2: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS3: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS4: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS5: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS6: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); ADDRESS7: in STD_LOGIC_VECTOR(ADDRESS_WIDTH-1 downto 0); SINK0: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK1: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK2: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK3: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK4: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK5: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK6: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SINK7: in STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE0: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE1: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE2: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE3: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE4: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE5: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE6: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0); SOURCE7: out STD_LOGIC_VECTOR(DATA_WIDTH-1 downto 0)); end component; component ADDGEN generic ( PATH_DELAY: INTEGER := 0; -- delay until valid data (# clk cycles to wait before writing or reading from memory) BIT_WIDTH: INTEGER := 22; -- width of input vector INIT0: INTEGER:= 0; TERM0: INTEGER:= 0; INC0:INTEGER:= 0; INIT1: INTEGER:= 0; TERM1: INTEGER:= 0; INC1:INTEGER:= 0; INIT2: INTEGER:= 0; TERM2: INTEGER:= 0; INC2:INTEGER:= 0 ); port ( CLK: in STD_LOGIC; CLR: in STD_LOGIC; ADDRESS : out STD_LOGIC_VECTOR(BIT_WIDTH-1 downto 0); DONE: out STD_LOGIC ); end component; component USER_COMPONENT port ( CLOCK: in STD_LOGIC; CLR: in STD_LOGIC; A: in STD_LOGIC_VECTOR(15 downto 0); B: in STD_LOGIC_VECTOR(15 downto 0); RESULT: out STD_LOGIC_VECTOR(15 downto 0) ); end component; signal GSINK0 : STD_LOGIC_VECTOR(31 downto 0); signal GSINK1 : STD_LOGIC_VECTOR(31 downto 0); signal GSINK2 : STD_LOGIC_VECTOR(31 downto 0); 128 signal GSINK3 : STD_LOGIC_VECTOR(31 downto 0); signal GSINK4 : STD_LOGIC_VECTOR(31 downto 0); signal GSINK5 : STD_LOGIC_VECTOR(31 downto 0); signal GSINK6 : STD_LOGIC_VECTOR(31 downto 0); signal GSINK7 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE0 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE1 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE2 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE3 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE4 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE5 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE6 : STD_LOGIC_VECTOR(31 downto 0); signal GSOURCE7 : STD_LOGIC_VECTOR(31 downto 0); signal GADDRESS0 : STD_LOGIC_VECTOR(21 downto 0); signal GADDRESS1 : STD_LOGIC_VECTOR(21 downto 0); signal GADDRESS2 : STD_LOGIC_VECTOR(21 downto 0); signal GADDRESS3 : STD_LOGIC_VECTOR(21 downto 0); signal GADDRESS4 : STD_LOGIC_VECTOR(21 downto 0); signal GADDRESS5 : STD_LOGIC_VECTOR(21 downto 0); signal GADDRESS6 : STD_LOGIC_VECTOR(21 downto 0); signal GADDRESS7 : STD_LOGIC_VECTOR(21 downto 0); signal GMEMORY_DIRECTION0 : STD_LOGIC; signal GMEMORY_DIRECTION1 : STD_LOGIC; signal GMEMORY_DIRECTION2 : STD_LOGIC; signal GMEMORY_DIRECTION3 : STD_LOGIC; signal GMEMORY_DIRECTION4 : STD_LOGIC; signal GMEMORY_DIRECTION5 : STD_LOGIC; signal GMEMORY_DIRECTION6 : STD_LOGIC; signal GMEMORY_DIRECTION7 : STD_LOGIC; signal DONE0 : STD_LOGIC; signal DONE1 : STD_LOGIC; signal DONE2 : STD_LOGIC; signal DONE3 : STD_LOGIC; signal DONE4 : STD_LOGIC; signal DONE5 : STD_LOGIC; signal DONE6 : STD_LOGIC; signal DONE7 : STD_LOGIC; signal MCLK_ENABLE : STD_LOGIC; signal VIRTUAL_CLK : STD_LOGIC; begin VIRTUAL_CLK <= CLOCK and MCLK_ENABLE; DONE <= DONE0 and DONE1 and DONE2 and DONE3 and DONE4 and DONE5 and DONE6 and DONE7; main_0 : MEM_PORT generic map (DATA_WIDTH => 32, ADDRESS_WIDTH => 22) port map (CLK => CLOCK, CLR => CLR, CLK_ENABLE => MCLK_ENABLE, MEM_PORT_WRITE_SEL_N => MEM_SIGNAL0, MEM_ADDRESS => MEM_SIGNAL1, DATA_TO_MEM => MEM_SIGNAL2, DATA_FROM_MEM => MEM_SIGNAL3, MEMORY_DIRECTION0 => GMEMORY_DIRECTION0, MEMORY_DIRECTION1 => GMEMORY_DIRECTION1, MEMORY_DIRECTION2 => GMEMORY_DIRECTION2, MEMORY_DIRECTION3 => GMEMORY_DIRECTION3, MEMORY_DIRECTION4 => GMEMORY_DIRECTION4, MEMORY_DIRECTION5 => GMEMORY_DIRECTION5, MEMORY_DIRECTION6 => GMEMORY_DIRECTION6, MEMORY_DIRECTION7 => GMEMORY_DIRECTION7, ADDRESS0 => GADDRESS0, ADDRESS1 => GADDRESS1, ADDRESS2 => GADDRESS2, ADDRESS3 => GADDRESS3, ADDRESS4 => GADDRESS4, ADDRESS5 => GADDRESS5, ADDRESS6 => GADDRESS6, ADDRESS7 => GADDRESS7, SINK0 => GSINK0, SINK1 => GSINK1, SINK2 => GSINK2, 129 SINK3 => GSINK3, SINK4 => GSINK4, SINK5 => GSINK5, SINK6 => GSINK6, SINK7 => GSINK7, SOURCE0 => GSOURCE0, SOURCE1 => GSOURCE1, SOURCE2 => GSOURCE2, SOURCE3 => GSOURCE3, SOURCE4 => GSOURCE4, SOURCE5 => GSOURCE5, SOURCE6 => GSOURCE6, SOURCE7 => GSOURCE7); -- For this sample application, 3 of the memory ports are used. The address generators will be set up to perform 5 operations. The first address generator is set to count from 0 to 5 so that port 0 will read 5 inputs (one on each virtual clock cycle) for input A. The second address generator will generate addresses 6 to 10 to read data for input B. The third address generator will generate addresses 11 through 15 for port 2 to write the results. ADD_GEN_0: ADDGEN -- Address Generator for memory port 0 generic map (PATH_DELAY => 0, BIT_WIDTH => 22, INIT0 => 0, TERM0 => 5, INC0 => 1, INIT1 => 0, TERM1 => 0, INC1 => 0, INIT2 => 0, TERM2 => 0, INC2 => 0) port map (CLK => VIRTUAL_CLK, CLR => CLR, ADDRESS => GADDRESS0, DONE => DONE0); ADD_GEN_1: ADDGEN -- Address Generator for memory port 1 generic map (PATH_DELAY => 0, BIT_WIDTH => 22, INIT0 => 5, TERM0 => 10, INC0 => 1, INIT1 => 0, TERM1 => 0, INC1 => 0, INIT2 => 0, TERM2 => 0, INC2 => 0) port map (CLK => VIRTUAL_CLK, CLR => CLR, ADDRESS => GADDRESS1, DONE => DONE1); ADD_GEN_2: ADDGEN -- Address Generator for memory port 2 generic map (PATH_DELAY => 1, BIT_WIDTH => 22, INIT0 => 10, TERM0 => 15, INC0 => 1, INIT1 => 0, TERM1 => 0, INC1 => 0, INIT2 => 0, TERM2 => 0, INC2 => 0) port map (CLK => VIRTUAL_CLK, CLR => CLR, ADDRESS => GADDRESS2, DONE => DONE2); -- For the unused ports, the DONE signals are set to 1. DONE3 <= '1'; DONE4 <= '1'; DONE5 <= '1'; DONE6 <= '1'; DONE7 <= '1'; GMEMORY_DIRECTION0 <= '1'; -- Set Port 0 direction to input GMEMORY_DIRECTION1 <= '1'; -- Set Port 1 direction to input GMEMORY_DIRECTION2 <= '0'; -- Set Port 2 direction to output -- Ports 3 to 7 are not used so just set them to read GMEMORY_DIRECTION3 <= '1'; -- Set Port 3 direction to input GMEMORY_DIRECTION4 <= '1'; -- Set Port 4 direction to input GMEMORY_DIRECTION5 <= '1'; -- Set Port 5 direction to input GMEMORY_DIRECTION6 <= '1'; -- Set Port 6 direction to input GMEMORY_DIRECTION7 <= '1'; -- Set Port 7 direction to input -- Port 0 is used for A inputs, Port 1 is used for B inputs, and Port C is used for writing results to memory user_component0 : USER_COMPONENT port map (CLOCK => VIRTUAL_CLK, CLR => CLR, A => GSOURCE0(15 downto 0), B => GSOURCE1(15 downto 0), RESULT => GSINK2(15 downto 0)); end USER_VHDL_MODEL_arch; A.2.3 Generic WILDFORCE JHDL Files A.2.3.1 user_component_mmp.java // This is a generic user operation file that gives an // outline of an operation used by pe_generic to make a 130 // computational element on a PE of the Wildforce board // This operation also uses the generic Janus operation // interface so that the operation could be used in a // Janus application with no changes to the file // This version uses the multiplexed memory port (mmp) to // make more memory ports available to the user and take // away the burden of dealing with the odd timing of the // Wildforce memory accesses import byucc.jhdl.base.*; import byucc.jhdl.Logic.*; import byucc.jhdl.Xilinx.*; import byucc.jhdl.Xilinx.XC4000.*; import byucc.jhdl.Xilinx.XC4000.carryLogic.*; public class user_component_mmp extends Logic { // interface uses Janus operation specification public static CellInterface[] cell_interface = { in("data_in", 32), out("data_out", 32), in("addr", 20), in("write_sel", 1), in("strobe", 1), in("enable", 1), out("done", 1)}; public static final String cellname = "user_comp_mmp"; public user_component_mmp(Node parent, Wire data_in, Wire data_out, Wire addr, Wire write_sel, Wire strobe, Wire enable, Wire done) { super(parent); connect("data_in", data_in); // data from memory port connect("data_out", data_out); // data to memory port connect("addr", addr); // memory address connect("write_sel", write_sel); // active low connect("strobe", strobe); // memory strobe (keep low?) connect("enable", enable); // goes low when board is ready to start connect("done", done); // set when stage done Wire CLK_ENABLE = wire(this, 1, "CLK_ENABLE"); Wire MEM_DIR0 = wire(this, 1, "MEM_DIR0"); Wire MEM_DIR1 = wire(this, 1, "MEM_DIR1"); Wire MEM_DIR2 = wire(this, 1, "MEM_DIR2"); Wire MEM_DIR3 = wire(this, 1, "MEM_DIR3"); Wire MEM_DIR4 = wire(this, 1, "MEM_DIR4"); Wire MEM_DIR5 = wire(this, 1, "MEM_DIR5"); Wire MEM_DIR6 = wire(this, 1, "MEM_DIR6"); Wire MEM_DIR7 = wire(this, 1, "MEM_DIR7"); Wire SOURCE0 = wire(this, 32, "SOURCE0"); //the sources are connected to registers. Wire SOURCE1 = wire(this, 32, "SOURCE1"); // the registers load the data when it is their cycle Wire SOURCE2 = wire(this, 32, "SOURCE2"); Wire SOURCE3 = wire(this, 32, "SOURCE3"); Wire SOURCE4 = wire(this, 32, "SOURCE4"); Wire SOURCE5 = wire(this, 32, "SOURCE5"); Wire SOURCE6 = wire(this, 32, "SOURCE6"); Wire SOURCE7 = wire(this, 32, "SOURCE7"); Wire SINK0 = wire(this, 32, "SINK0"); Wire SINK1 = wire(this, 32, "SINK1"); Wire SINK2 = wire(this, 32, "SINK2"); Wire SINK3 = wire(this, 32, "SINK3"); Wire SINK4 = wire(this, 32, "SINK4"); Wire SINK5 = wire(this, 32, "SINK5"); 131 Wire SINK6 = wire(this, 32, "SINK6"); Wire SINK7 = wire(this, 32, "SINK7"); Wire ADDRESS0 = wire(this, 20, "ADDRESS0"); Wire ADDRESS1 = wire(this, 20, "ADDRESS1"); Wire ADDRESS2 = wire(this, 20, "ADDRESS2"); Wire ADDRESS3 = wire(this, 20, "ADDRESS3"); Wire ADDRESS4 = wire(this, 20, "ADDRESS4"); Wire ADDRESS5 = wire(this, 20, "ADDRESS5"); Wire ADDRESS6 = wire(this, 20, "ADDRESS6"); Wire ADDRESS7 = wire(this, 20, "ADDRESS7"); // ********************************* // multiplexed memory port new struct_m(this, enable, CLK_ENABLE, write_sel, addr, data_in, data_out, MEM_DIR0, MEM_DIR1, MEM_DIR2, MEM_DIR3, MEM_DIR4, MEM_DIR5, MEM_DIR6, MEM_DIR7, ADDRESS0, ADDRESS1, ADDRESS2, ADDRESS3, ADDRESS4, ADDRESS5, ADDRESS6, ADDRESS7, SINK0, SINK1, SINK2, SINK3, SINK4, SINK5, SINK6, SINK7, SOURCE0, SOURCE1, SOURCE2, SOURCE3, SOURCE4, SOURCE5, SOURCE6, SOURCE7); // *****MEMORY DIRECTIONS********* // change the code below to get the proper memory direction // '0' is output (write), '1' is input (read) regpe_o(not(MEM_DIR), CLK_ENABLE, MEM_DIR); //alternate read and write // *** ADDDRESSES**** // change the code below to produce the addressing you require buf_o(constant(20,0), ADDRESS0); buf_o(constant(20,1), ADDRESS1); buf_o(constant(20,2), ADDRESS2); buf_o(constant(20,3), ADDRESS3); buf_o(constant(20,4), ADDRESS4); buf_o(constant(20,5), ADDRESS5); buf_o(constant(20,6), ADDRESS6); buf_o(constant(20,7), ADDRESS7); // ******** Counter ********** // This counter counts the number of MMP virtual clock cycles. // It can be used to set the done signal when the operation is complete. Wire NEXT_COUNT = wire(this, 4, "NEXT_COUNT"); Wire COUNT = wire(this, 4, "COUNT"); add_o(constant(4,1), COUNT, NEXT_COUNT, "counter_adder"); regce_o(NEXT_COUNT, CLK_ENABLE, COUNT, "counter_holder"); // In this case the done signal is set after two virtual clock cycles. buf_o(COUNT.getWire(1), done); // COUNT = 0010 // **************************** // Next should come the actual logic for the operation. // Using the CLK_ENABLE signal as a clock or data valid, the user // does not have to worry about the timing of the memory. For example, if the user // specifies an address and a read on a port on one 'clock' cycle, then the data // will be available on the next 'clock' cycle. // Map sources and sinks to the necessary places for the component. // Don't forget to set the done signal when you are done. } } A.2.3.2 generic_pe.java // James Atwell // 9/21/99 // generic_pe.java // This file can be compiled into a usable pe // The only change that needs to be made to use a Janus style // opeartion is the name of the operation in the last line. 132 import byucc.jhdl.platforms.wildforce.*; import byucc.jhdl.base.*; import byucc.jhdl.Xilinx.XC4000.*; public class generic_pe extends pelca { public static CellInterface[] cell_interface = { out("MEM_ADDRESS", 20), out("DATA_TO_MEM", 32), in("DATA_FROM_MEM", 32), in("grant", 1), out("request", 1), out("strobe", 1), out("MEM_WRITE_SEL", 1), in("reset", 1), out("intreq", 1), in("intack", 1) }; public generic_pe(pe parent) { super(parent); Wire MEM_ADDRESS = port("MEM_ADDRESS",MemAddr(19,0)); // Address lines Wire DATA_TO_MEM = port("DATA_TO_MEM",MemDataOut()); // Memory Write lines Wire DATA_FROM_MEM = port("DATA_FROM_MEM",MemDataIn());// Memory Read lines Wire grant = port("grant",MemBusGrant_n()); // Grants Access to Memory Wire request = port("request",MemBusReq_n()); // Used to Request Memory Access Wire strobe = port("strobe",MemStrobe_n()); Wire MEM_WRITE_SEL = port("MEM_WRITE_SEL",MemWriteSel_n()); // Read/Write line Wire reset = port("reset",Reset()); // PE Global Reset Wire intreq = port("intreq", InterruptReq_n()); Wire intack = port("intack", InterruptAck_n()); Wire enable = wire(this,1,"enable"); // goes high when ok to start accessing memory Wire done = wire(this,1,"done"); Wire int_done = wire(this,1,"int_done"); // user must set int_done when done processing to return control to host Wire int_strobe = wire(this,1,"int_strobe"); or_o(grant,int_strobe,strobe); buf_o(int_done, done); // FSM below controls startup and shutdown conditions for the WildForce board BusGrantFSM fsm = new BusGrantFSM(this,intack,grant,done,request,enable,intreq); // The user operation goes below // The user is responsible for driving the INT_DONE signal when processing is complete // The user should not drive the STROBE signal. new user_component_mmp(this, DATA_FROM_MEM, DATA_TO_MEM, MEM_ADDRESS, MEM_WRITE_SEL, strobe, enable, int_done); } } A.2.3.3 BusGrantFSM.java //package visc.cc.rtr.janus.hardware.platforms; import byucc.jhdl.base.*; import byucc.jhdl.Logic.*; import byucc.jhdl.Fsm.*; public class BusGrantFSM extends Fsm { public static final String[] portnames = { "intack", "interrupt"}; public static final String[] portwidths = { "1", "1", "1"}; "bus_grant", "1", "done", "bus_request", "1", "1", "enable", 133 public static final String[] portios = { "in", "out", "out"}; "in", "in", "out", public BusGrantFSM (Node parent,Wire intack,Wire bus_grant,Wire done,Wire bus_request,Wire enable,Wire interrupt) { super (parent, "BusGrantFSM"); port("intack",intack); port("bus_grant", bus_grant); port("done", done); port("bus_request", bus_request); port("enable",enable); port("interrupt",interrupt); buildFsm("BusGrantFSM.fsm"); // Ask that consistency checking be done } } A.2.3.4 BusGrantFSM.fsm .inputs intack bus_grant done; .outputs bus_request enable interrupt; .states x a b c d; .encodings default; --- x a 101; -1- a a 001; -0- a b 001; --0 b b 011; --1 b c 011; 1-- c c 100; 0-- c d 100; --- d d 101; A.3 Radix-4 Butterfly Source Files A.3.1 butterfly_4.java // This class performs an FFT Radix 4 butterfly operation. It takes 4 clock cycles. import byucc.jhdl.base.*; import byucc.jhdl.Logic.*; import byucc.jhdl.Xilinx.*; import byucc.jhdl.Xilinx.XC4000.*; import byucc.jhdl.Xilinx.XC4000.carryLogic.*; public class butterfly_4 extends Logic { public static CellInterface[] cell_interface = { // These are the actual data inputs to the FFT param("WIDTH", INTEGER), param("OWIDTH", INTEGER), in("ENABLE", 1), in("INPUTA_REAL", "WIDTH"), in("INPUTA_IMAG", "WIDTH"), in("INPUTB_REAL", "WIDTH"), in("INPUTB_IMAG", "WIDTH"), in("INPUTC_REAL", "WIDTH"), in("INPUTC_IMAG", "WIDTH"), in("INPUTD_REAL", "WIDTH"), in("INPUTD_IMAG", "WIDTH"), // These are the twiddle factors in("TW0_REAL", "WIDTH"), in("TW0_IMAG", "WIDTH"), 134 in("TW1_REAL", "WIDTH"), in("TW1_IMAG", "WIDTH"), in("TW2_REAL", "WIDTH"), in("TW2_IMAG", "WIDTH"), in("TW3_REAL", "WIDTH"), in("TW3_IMAG", "WIDTH"), out("OUTPUTE_REAL", "OWIDTH"), out("OUTPUTE_IMAG", "OWIDTH"), out("OUTPUTF_REAL", "OWIDTH"), out("OUTPUTF_IMAG", "OWIDTH"), out("OUTPUTG_REAL", "OWIDTH"), out("OUTPUTG_IMAG", "OWIDTH"), out("OUTPUTH_REAL", "OWIDTH"), out("OUTPUTH_IMAG", "OWIDTH")}; public static final String cellname = "BUTTERFLYORB"; public butterfly_4(Node parent, Wire ENABLE, Wire INPUTA_REAL, Wire INPUTA_IMAG, Wire INPUTB_REAL, Wire INPUTB_IMAG, Wire INPUTC_REAL, Wire INPUTC_IMAG, Wire INPUTD_REAL, Wire INPUTD_IMAG, Wire TW0_REAL, Wire TW0_IMAG, Wire TW1_REAL, Wire TW1_IMAG, Wire TW2_REAL, Wire TW2_IMAG, Wire TW3_REAL, Wire TW3_IMAG, Wire OUTPUTE_REAL, Wire OUTPUTE_IMAG, Wire OUTPUTF_REAL, Wire OUTPUTF_IMAG, Wire OUTPUTG_REAL, Wire OUTPUTG_IMAG, Wire OUTPUTH_REAL, Wire OUTPUTH_IMAG) { super(parent); int IN_WIDTH = INPUTA_REAL.getWidth(); bind("WIDTH", IN_WIDTH); int OUT_WIDTH = 2*IN_WIDTH; bind("OWIDTH", OUT_WIDTH); connect("ENABLE", ENABLE); connect("INPUTA_REAL", INPUTA_REAL); connect("INPUTA_IMAG", INPUTA_IMAG); connect("INPUTB_REAL", INPUTB_REAL); connect("INPUTB_IMAG", INPUTB_IMAG); connect("INPUTC_REAL", INPUTC_REAL); connect("INPUTC_IMAG", INPUTC_IMAG); connect("INPUTD_REAL", INPUTD_REAL); connect("INPUTD_IMAG", INPUTD_IMAG); connect("TW0_REAL", TW0_REAL); connect("TW0_IMAG", TW0_IMAG); connect("TW1_REAL", TW1_REAL); connect("TW1_IMAG", TW1_IMAG); connect("TW2_REAL", TW2_REAL); connect("TW2_IMAG", TW2_IMAG); connect("TW3_REAL", TW3_REAL); connect("TW3_IMAG", TW3_IMAG); connect("OUTPUTE_REAL", OUTPUTE_REAL); connect("OUTPUTE_IMAG", OUTPUTE_IMAG); connect("OUTPUTF_REAL", OUTPUTF_REAL); connect("OUTPUTF_IMAG", OUTPUTF_IMAG); connect("OUTPUTG_REAL", OUTPUTG_REAL); connect("OUTPUTG_IMAG", OUTPUTG_IMAG); connect("OUTPUTH_REAL", OUTPUTH_REAL); connect("OUTPUTH_IMAG", OUTPUTH_IMAG); // These are the results of the multiplication of the data with the twiddle factors Wire ATW_REAL = wire(this, 2*IN_WIDTH, "ATW_REAL"); Wire ATW_IMAG = wire(this, 2*IN_WIDTH, "ATW_IMAG"); Wire BTW_REAL = wire(this, 2*IN_WIDTH, "BTW_REAL"); Wire BTW_IMAG = wire(this, 2*IN_WIDTH, "BTW_IMAG"); Wire CTW_REAL = wire(this, 2*IN_WIDTH, "CTW_REAL"); Wire CTW_IMAG = wire(this, 2*IN_WIDTH, "CTW_IMAG"); Wire DTW_REAL = wire(this, 2*IN_WIDTH, "DTW_REAL"); Wire DTW_IMAG = wire(this, 2*IN_WIDTH, "DTW_IMAG"); // ******** Counter ********** // This counter counts the regular clock cycles Wire NEXT_COUNT = wire(this, 2, "NEXT_COUNT"); Wire COUNT = wire(this, 2, "COUNT"); 135 add_o(constant(2,1), COUNT, NEXT_COUNT, "counter_adder"); regce_o(NEXT_COUNT, ENABLE, COUNT, "counter_holder"); Wire CNT0 = wire(this, 1, "CNT0"); Wire CNT1 = wire(this, 1, "CNT1"); buf_o(COUNT.getWire(0), CNT0); buf_o(COUNT.getWire(1), CNT1); // **************************** // Start Butterfly Structure // Multiplication of Twiddle factors Wire XMULT_IN_REAL = wire(this, IN_WIDTH, "XMULT_IN_REAL"); Wire XMULT_IN_IMAG = wire(this, IN_WIDTH, "XMULT_IN_IMAG"); Wire YMULT_IN_REAL = wire(this, IN_WIDTH, "YMULT_IN_REAL"); Wire YMULT_IN_IMAG = wire(this, IN_WIDTH, "YMULT_IN_IMAG"); Wire MULT_OUT_LREAL = wire(this, IN_WIDTH, "MULT_OUT_LREAL"); Wire MULT_OUT_HREAL = wire(this, IN_WIDTH, "MULT_OUT_HREAL"); Wire MULT_OUT_LIMAG = wire(this, IN_WIDTH, "MULT_OUT_LIMAG"); Wire MULT_OUT_HIMAG = wire(this, IN_WIDTH, "MULT_OUT_HIMAG"); Wire SELECT_INS = wire(this, 2, "SELECT_INS"); buf_o(concat(CNT1, CNT0), SELECT_INS); // // the input to the complex multiplier is multiplexed so that the operation can get by with only // one complex multiplier. 4 complex multipliers would be to big for a PE new mux_4_1(this, INPUTA_REAL, INPUTB_REAL, INPUTC_REAL, INPUTD_REAL, SELECT_INS, XMULT_IN_REAL); new mux_4_1(this, INPUTA_IMAG, INPUTB_IMAG, INPUTC_IMAG, INPUTD_IMAG, SELECT_INS, XMULT_IN_IMAG); new mux_4_1(this, TW0_REAL, TW1_REAL, TW2_REAL, TW3_REAL, SELECT_INS, YMULT_IN_REAL); new mux_4_1(this, TW0_IMAG, TW1_IMAG, TW2_IMAG, TW3_IMAG, SELECT_INS, YMULT_IN_IMAG); new complex_mult(this, XMULT_IN_REAL, XMULT_IN_IMAG, YMULT_IN_REAL, YMULT_IN_IMAG, MULT_OUT_HREAL, MULT_OUT_LREAL, MULT_OUT_HIMAG, MULT_OUT_LIMAG); // multiplex the outputs from the complex multiplier to feed the result registers regce_o(concat(MULT_OUT_HREAL, MULT_OUT_LREAL), and(not(CNT1), not(CNT0)), ATW_REAL, "ATW_REAL"); regce_o(concat(MULT_OUT_HIMAG, MULT_OUT_LIMAG), and(not(CNT1), not(CNT0)), ATW_IMAG, "ATW_IMAG"); regce_o(concat(MULT_OUT_HREAL, MULT_OUT_LREAL), and(not(CNT1), CNT0), BTW_REAL, "BTW_REAL"); regce_o(concat(MULT_OUT_HIMAG, MULT_OUT_LIMAG), and(not(CNT1), CNT0), BTW_IMAG, "BTW_IMAG"); regce_o(concat(MULT_OUT_HREAL, MULT_OUT_LREAL), and(CNT1, not(CNT0)), CTW_REAL, "CTW_REAL"); regce_o(concat(MULT_OUT_HIMAG, MULT_OUT_LIMAG), and(CNT1, not(CNT0)), CTW_IMAG, "CTW_IMAG"); regce_o(concat(MULT_OUT_HREAL, MULT_OUT_LREAL), and(CNT1, CNT0), DTW_REAL, "DTW_REAL"); regce_o(concat(MULT_OUT_HIMAG, MULT_OUT_LIMAG), and(CNT1, CNT0), DTW_IMAG, "DTW_IMAG"); // combine the twiddled inputs to find the FFT outputs add_o(add(ATW_REAL, BTW_REAL), add(CTW_REAL, DTW_REAL), OUTPUTE_REAL);// outpute_real = a_real + b_real + c_real + d_real add_o(add(ATW_IMAG, BTW_IMAG), add(CTW_IMAG, DTW_IMAG), OUTPUTE_IMAG);// outpute_imag = a_imag + b_imag + c_imag + d_imag add_o(sub(ATW_REAL, CTW_REAL), sub(BTW_IMAG, DTW_IMAG), OUTPUTF_REAL);// outputf_real = a_real + b_imag c_real - d_imag add_o(sub(ATW_IMAG, BTW_REAL), sub(DTW_REAL, CTW_IMAG), OUTPUTF_IMAG);// outputf_imag = a_imag - b_real c_imag + d_real add_o(sub(ATW_REAL, BTW_REAL), sub(CTW_REAL, DTW_REAL), OUTPUTG_REAL);// outputg_real = a_real - b_real + c_real - d_real add_o(sub(ATW_IMAG, BTW_IMAG), sub(CTW_IMAG, DTW_IMAG), OUTPUTG_IMAG);// outputg_imag = a_imag - b_imag + c_imag - d_imag add_o(sub(ATW_REAL, BTW_IMAG), sub(DTW_IMAG, CTW_REAL), OUTPUTH_REAL);// outputh_real = a_real - b_imag c_real + d_imag add_o(sub(ATW_IMAG, CTW_IMAG), sub(BTW_REAL, DTW_REAL), OUTPUTH_IMAG);// outputh_imag = a_imag + b_real c_imag - d_real } } A.3.2 tb_bf4.java 136 import byucc.jhdl.base.*; import byucc.jhdl.Logic.*; import byucc.jhdl.Xilinx.XC4000.*; import java.io.*; public class tb_bf4 extends Synchronous implements TestBench { Wire ENABLE; Wire INPUTA_REAL; Wire INPUTA_IMAG; Wire INPUTB_REAL; Wire INPUTB_IMAG; Wire INPUTC_REAL; Wire INPUTC_IMAG; Wire INPUTD_REAL; Wire INPUTD_IMAG; // These are the twiddle factors Wire TW0_REAL; Wire TW0_IMAG; Wire TW1_REAL; Wire TW1_IMAG; Wire TW2_REAL; Wire TW2_IMAG; Wire TW3_REAL; Wire TW3_IMAG; Wire OUTPUTE_REAL; Wire OUTPUTE_IMAG; Wire OUTPUTF_REAL; Wire OUTPUTF_IMAG; Wire OUTPUTG_REAL; Wire OUTPUTG_IMAG; Wire OUTPUTH_REAL; Wire OUTPUTH_IMAG; public tb_bf4 (Node parent) { super(parent); ENABLE = Logic.wire(this, 1, "ENABLE"); INPUTA_REAL = Logic.wire(this, 8, "INPUTA_REAL"); INPUTA_IMAG = Logic.wire(this, 8, "INPUTA_IMAG"); INPUTB_REAL = Logic.wire(this, 8, "INPUTB_REAL"); INPUTB_IMAG = Logic.wire(this, 8, "INPUTB_IMAG"); INPUTC_REAL = Logic.wire(this, 8, "INPUTC_REAL"); INPUTC_IMAG = Logic.wire(this, 8, "INPUTC_IMAG"); INPUTD_REAL = Logic.wire(this, 8, "INPUTD_REAL"); INPUTD_IMAG = Logic.wire(this, 8, "INPUTD_IMAG"); // These are the twiddle factors TW0_REAL = Logic.wire(this, 8, "TW0_REAL"); TW0_IMAG = Logic.wire(this, 8, "TW0_IMAG"); TW1_REAL = Logic.wire(this, 8, "TW1_REAL"); TW1_IMAG = Logic.wire(this, 8, "TW1_IMAG"); TW2_REAL = Logic.wire(this, 8, "TW2_REAL"); TW2_IMAG = Logic.wire(this, 8, "TW2_IMAG"); TW3_REAL = Logic.wire(this, 8, "TW3_REAL"); TW3_IMAG = Logic.wire(this, 8, "TW3_IMAG"); OUTPUTE_REAL = Logic.wire(this, 16, "OUTPUTE_REAL"); OUTPUTE_IMAG = Logic.wire(this, 16, "OUTPUTE_IMAG"); OUTPUTF_REAL = Logic.wire(this, 16, "OUTPUTF_REAL"); OUTPUTF_IMAG = Logic.wire(this, 16, "OUTPUTF_IMAG"); OUTPUTG_REAL = Logic.wire(this, 16, "OUTPUTG_REAL"); OUTPUTG_IMAG = Logic.wire(this, 16, "OUTPUTG_IMAG"); OUTPUTH_REAL = Logic.wire(this, 16, "OUTPUTH_REAL"); OUTPUTH_IMAG = Logic.wire(this, 16, "OUTPUTH_IMAG"); new butterfly_4(this, ENABLE, INPUTA_REAL, INPUTA_IMAG, INPUTB_REAL, INPUTB_IMAG, INPUTC_REAL, INPUTC_IMAG, INPUTD_REAL, INPUTD_IMAG, TW0_REAL, TW0_IMAG, TW1_REAL, TW1_IMAG, TW2_REAL, TW2_IMAG, TW3_REAL, TW3_IMAG, OUTPUTE_REAL, OUTPUTE_IMAG, OUTPUTF_REAL, OUTPUTF_IMAG, OUTPUTG_REAL, OUTPUTG_IMAG, OUTPUTH_REAL, OUTPUTH_IMAG); } public void clock() 137 { System.out.println("\nclock " + count + ":"); System.out.println("ENABLE=" + ENABLE.get(this)); System.out.println("INPUTA_REAL=" + INPUTA_REAL.get(this)); System.out.println("INPUTA_IMAG=" + INPUTA_IMAG.get(this)); System.out.println("INPUTB_REAL=" + INPUTB_REAL.get(this)); System.out.println("INPUTB_IMAG=" + INPUTB_IMAG.get(this)); System.out.println("INPUTC_REAL=" + INPUTC_REAL.get(this)); System.out.println("INPUTC_IMAG=" + INPUTC_IMAG.get(this)); System.out.println("INPUTD_REAL=" + INPUTD_REAL.get(this)); System.out.println("INPUTD_IMAG=" + INPUTD_IMAG.get(this)); System.out.println("TW0_REAL=" + TW0_REAL.get(this)); System.out.println("TW0_IMAG=" + TW0_IMAG.get(this)); System.out.println("TW1_REAL=" + TW1_REAL.get(this)); System.out.println("TW1_IMAG=" + TW1_IMAG.get(this)); System.out.println("TW2_REAL=" + TW2_REAL.get(this)); System.out.println("TW2_IMAG=" + TW2_IMAG.get(this)); System.out.println("TW3_REAL=" + TW3_REAL.get(this)); System.out.println("TW3_IMAG=" + TW3_IMAG.get(this)); System.out.println("OUTPUTE_REAL=" + OUTPUTE_REAL.get(this)); System.out.println("OUTPUTE_IMAG=" + OUTPUTE_IMAG.get(this)); System.out.println("OUTPUTF_REAL=" + OUTPUTF_REAL.get(this)); System.out.println("OUTPUTF_IMAG=" + OUTPUTF_IMAG.get(this)); System.out.println("OUTPUTG_REAL=" + OUTPUTG_REAL.get(this)); System.out.println("OUTPUTG_IMAG=" + OUTPUTG_IMAG.get(this)); System.out.println("OUTPUTH_REAL=" + OUTPUTH_REAL.get(this)); System.out.println("OUTPUTH_IMAG=" + OUTPUTH_IMAG.get(this)); ENABLE.put(this, ENABLE_DATA[count%10]); INPUTA_REAL.put(this, INPUTA_REAL_DATA[count%10]); INPUTA_IMAG.put(this, INPUTA_IMAG_DATA[count%10]); INPUTB_REAL.put(this, INPUTB_REAL_DATA[count%10]); INPUTB_IMAG.put(this, INPUTB_IMAG_DATA[count%10]); INPUTC_REAL.put(this, INPUTC_REAL_DATA[count%10]); INPUTC_IMAG.put(this, INPUTC_IMAG_DATA[count%10]); INPUTD_REAL.put(this, INPUTD_REAL_DATA[count%10]); INPUTD_IMAG.put(this, INPUTD_IMAG_DATA[count%10]); TW0_REAL.put(this, TW0_REAL_DATA[count%10]); TW0_IMAG.put(this, TW0_IMAG_DATA[count%10]); TW1_REAL.put(this, TW1_REAL_DATA[count%10]); TW1_IMAG.put(this, TW1_IMAG_DATA[count%10]); TW2_REAL.put(this, TW2_REAL_DATA[count%10]); TW2_IMAG.put(this, TW2_IMAG_DATA[count%10]); TW3_REAL.put(this, TW3_REAL_DATA[count%10]); TW3_IMAG.put(this, TW3_IMAG_DATA[count%10]); count++; } public void reset() { ENABLE.put(this, 0); INPUTA_REAL.put(this, 0); INPUTA_IMAG.put(this, 0); INPUTB_REAL.put(this, 0); INPUTB_IMAG.put(this, 0); INPUTC_REAL.put(this, 0); INPUTC_IMAG.put(this, 0); INPUTD_REAL.put(this, 0); INPUTD_IMAG.put(this, 0); TW0_REAL.put(this, 0); TW0_IMAG.put(this, 0); TW1_REAL.put(this, 0); TW1_IMAG.put(this, 0); TW2_REAL.put(this, 0); TW2_IMAG.put(this, 0); TW3_REAL.put(this, 0); TW3_IMAG.put(this, 0); } // Here is the data that we will pump in 138 static int ENABLE_DATA[] = { 0,0,0,1,1,1,1,0,0,0 }; static int INPUTA_REAL_DATA[] = { 2,2,2,2,2,2,2,2,2,2 }; static int INPUTA_IMAG_DATA[] = { 1,1,1,1,1,1,1,1,1,1 }; static int INPUTB_REAL_DATA[] = { 2,2,2,2,2,2,2,2,2,2 }; static int INPUTB_IMAG_DATA[] = { 3,3,3,3,3,3,3,3,3,3 }; static int INPUTC_REAL_DATA[] = { 4,4,4,4,4,4,4,4,4,4 }; static int INPUTC_IMAG_DATA[] = { 5,5,5,5,5,5,5,5,5,5 }; static int INPUTD_REAL_DATA[] = { 6,6,6,6,6,6,6,6,6,6 }; static int INPUTD_IMAG_DATA[] = { 7,7,7,7,7,7,7,7,7,7 }; static int TW0_REAL_DATA[] = { 8,8,8,8,8,8,8,8,8,8 }; static int TW0_IMAG_DATA[] = { 9,9,9,9,9,9,9,9,9,9 }; static int TW1_REAL_DATA[] = { 10,10,10,10,10,10,10,10,10,10 }; static int TW1_IMAG_DATA[] = { 11,11,11,11,11,11,11,11,11,11 }; static int TW2_REAL_DATA[] = { 12,12,12,12,12,12,12,12,12,12 }; static int TW2_IMAG_DATA[] = { 13,13,13,13,13,13,13,13,13,13 }; static int TW3_REAL_DATA[] = { 14,14,14,14,14,14,14,14,14,14 }; static int TW3_IMAG_DATA[] = { 15,15,15,15,15,15,15,15,15,15 }; int count=0; public static void main(String argv[]) { HWSystem hw = new HWSystem(); tb_bf4 test_bench = new tb_bf4(hw); hw.cycle(20); } } A.3.3 bfly4_mmp.java // This class performs an FFT Radix 4 butterfly operation // uses a multiplexed memory port (mmp) to simplify the operation import byucc.jhdl.base.*; import byucc.jhdl.Logic.*; import byucc.jhdl.Xilinx.*; import byucc.jhdl.Xilinx.XC4000.*; import byucc.jhdl.Xilinx.XC4000.carryLogic.*; public class bfly4_mmp extends Logic { // interface uses Janus operation specification public static CellInterface[] cell_interface = { in("data_in", 32), out("data_out", 32), in("addr", 20), in("write_sel", 1), in("strobe", 1), in("enable", 1), out("done", 1)}; public static final String cellname = "BUTTERFLYORB_mmp"; public bfly4_mmp(Node parent, Wire data_in, Wire data_out, Wire addr, Wire write_sel, Wire strobe, Wire enable, Wire done) { super(parent); connect("data_in", data_in); // data from memory port connect("data_out", data_out); // data to memory port connect("addr", addr); // memory address connect("write_sel", write_sel); // active low connect("strobe", strobe); // memory strobe (keep low?) connect("enable", enable); // goes low when board is ready to start connect("done", done); // set when stage done Wire CLK_ENABLE = wire(this, 1, "CLK_ENABLE"); 139 Wire MEM_DIR = wire(this, 1, "MEM_DIR"); Wire SOURCE0 = wire(this, 32, "SOURCE0"); //the sources are connected to registers. Wire SOURCE1 = wire(this, 32, "SOURCE1"); // the registers load the data when it is their cycle Wire SOURCE2 = wire(this, 32, "SOURCE2"); Wire SOURCE3 = wire(this, 32, "SOURCE3"); Wire SOURCE4 = wire(this, 32, "SOURCE4"); Wire SOURCE5 = wire(this, 32, "SOURCE5"); Wire SOURCE6 = wire(this, 32, "SOURCE6"); Wire SOURCE7 = wire(this, 32, "SOURCE7"); Wire SINK0 = wire(this, 32, "SINK0"); Wire SINK1 = wire(this, 32, "SINK1"); Wire SINK2 = wire(this, 32, "SINK2"); Wire SINK3 = wire(this, 32, "SINK3"); Wire SINK4 = wire(this, 32, "SINK4"); Wire SINK5 = wire(this, 32, "SINK5"); Wire SINK6 = wire(this, 32, "SINK6"); Wire SINK7 = wire(this, 32, "SINK7"); Wire ADDRESS0 = wire(this, 20, "ADDRESS0"); Wire ADDRESS1 = wire(this, 20, "ADDRESS1"); Wire ADDRESS2 = wire(this, 20, "ADDRESS2"); Wire ADDRESS3 = wire(this, 20, "ADDRESS3"); Wire ADDRESS4 = wire(this, 20, "ADDRESS4"); Wire ADDRESS5 = wire(this, 20, "ADDRESS5"); Wire ADDRESS6 = wire(this, 20, "ADDRESS6"); Wire ADDRESS7 = wire(this, 20, "ADDRESS7"); // ********************************* // multiplexed memory port new struct_m(this, enable, CLK_ENABLE, write_sel, addr, data_in, data_out, MEM_DIR, MEM_DIR, MEM_DIR, MEM_DIR, MEM_DIR, MEM_DIR, MEM_DIR, MEM_DIR, ADDRESS0, ADDRESS1, ADDRESS2, ADDRESS3, ADDRESS4, ADDRESS5, ADDRESS6, ADDRESS7, SINK0, SINK1, SINK2, SINK3, SINK4, SINK5, SINK6, SINK7, SOURCE0, SOURCE1, SOURCE2, SOURCE3, SOURCE4, SOURCE5, SOURCE6, SOURCE7); // *****MEMORY DIRECTIONS********* // change the code below to get the proper memory direction // '0' is output (write), '1' is input (read) regpe_o(not(MEM_DIR), CLK_ENABLE, MEM_DIR); // *** ADDDRESSES**** // change the code below to produce the addressing you require buf_o(mux(constant(20,4), constant(20, 0), MEM_DIR), ADDRESS0); buf_o(mux(constant(20,5), constant(20, 1), MEM_DIR), ADDRESS1); buf_o(mux(constant(20,6), constant(20, 2), MEM_DIR), ADDRESS2); buf_o(mux(constant(20,7), constant(20, 3), MEM_DIR), ADDRESS3); buf_o(constant(20,8), ADDRESS4); buf_o(constant(20,8), ADDRESS5); buf_o(constant(20,8), ADDRESS6); buf_o(constant(20,8), ADDRESS7); // ******** Counter ********** // This counter counts the number of MMP virtual clock cycles. // It can be used to set the done signal when the operation is complete. Wire NEXT_COUNT = wire(this, 4, "NEXT_COUNT"); Wire COUNT = wire(this, 4, "COUNT"); add_o(constant(4,1), COUNT, NEXT_COUNT, "counter_adder"); regce_o(NEXT_COUNT, CLK_ENABLE, COUNT, "counter_holder"); // In this case the done signal is set after two virtual clock cycles buf_o(COUNT.getWire(1), done); // **************************** // Butterfly Structure // For this example, Inputs are 8 bits, packed into 32 bit words // These are the actual data inputs to the FFT Wire INPUTA_REAL = wire(this, 8, "INPUTA_REAL"); // Port 0, bits 15 downto 8 Wire INPUTA_IMAG = wire(this, 8, "INPUTA_IMAG"); // Port 0, bits 7 downto 0 140 Wire INPUTB_REAL = wire(this, 8, "INPUTB_REAL"); // Port 0, bits 31 downto 24 Wire INPUTB_IMAG = wire(this, 8, "INPUTB_IMAG"); // Port 0, bits 23 downto 16 Wire INPUTC_REAL = wire(this, 8, "INPUTC_REAL"); // Port 1, bits 15 downto 8 Wire INPUTC_IMAG = wire(this, 8, "INPUTC_IMAG"); // Port 1, bits 7 downto 0 Wire INPUTD_REAL = wire(this, 8, "INPUTD_REAL"); // Port 1, bits 31 downto 24 Wire INPUTD_IMAG = wire(this, 8, "INPUTD_IMAG"); // Port 1, bits 23 downto 16 // These are the twiddle factors Wire TW0_REAL = wire(this, 8, "TW0_REAL"); // Port 2, bits 15 downto 8 Wire TW0_IMAG = wire(this, 8, "TW0_IMAG"); // Port 2, bits 7 downto 0 Wire TW1_REAL = wire(this, 8, "TW1_REAL"); // Port 2, bits 31 downto 24 Wire TW1_IMAG = wire(this, 8, "TW1_IMAG"); // Port 2, bits 23 downto 16 Wire TW2_REAL = wire(this, 8, "TW2_REAL"); // Port 3, bits 15 downto 8 Wire TW2_IMAG = wire(this, 8, "TW2_IMAG"); // Port 3, bits 7 downto 0 Wire TW3_REAL = wire(this, 8, "TW3_REAL"); // Port 3, bits 31 downto 24 Wire TW3_IMAG = wire(this, 8, "TW3_IMAG"); // Port 3, bits 23 downto 16 // These are the outputs of the Radix 4 butterfly Wire OUTPUTE_REAL = wire(this, 16, "OUTPUTE_REAL"); // Port 4, bits 31 downto 16 Wire OUTPUTE_IMAG = wire(this, 16, "OUTPUTE_IMAG"); // Port 4, bits 15 downto 0 Wire OUTPUTF_REAL = wire(this, 16, "OUTPUTF_REAL"); // Port 5, bits 31 downto 16 Wire OUTPUTF_IMAG = wire(this, 16, "OUTPUTF_IMAG"); // Port 5, bits 15 downto 0 Wire OUTPUTG_REAL = wire(this, 16, "OUTPUTG_REAL"); // Port 6, bits 31 downto 16 Wire OUTPUTG_IMAG = wire(this, 16, "OUTPUTG_IMAG"); // Port 6, bits 15 downto 0 Wire OUTPUTH_REAL = wire(this, 16, "OUTPUTH_REAL"); // Port 7, bits 31 downto 16 Wire OUTPUTH_IMAG = wire(this, 16, "OUTPUTH_IMAG"); // Port 7, bits 15 downto 0 buf_o(SOURCE0.range(15,8), INPUTA_REAL); buf_o(SOURCE0.range(7,0), INPUTA_IMAG); buf_o(SOURCE0.range(31,24), INPUTB_REAL); buf_o(SOURCE0.range(23,16), INPUTB_IMAG); buf_o(SOURCE1.range(15,8), INPUTC_REAL); buf_o(SOURCE1.range(7,0), INPUTC_IMAG); buf_o(SOURCE1.range(31,24), INPUTD_REAL); buf_o(SOURCE1.range(23,16), INPUTD_IMAG); buf_o(SOURCE2.range(15,8), TW0_REAL); buf_o(SOURCE2.range(7,0), TW0_IMAG); buf_o(SOURCE2.range(31,24), TW1_REAL); buf_o(SOURCE2.range(23,16), TW1_IMAG); buf_o(SOURCE3.range(15,8), TW2_REAL); buf_o(SOURCE3.range(7,0), TW2_IMAG); buf_o(SOURCE3.range(31,24), TW3_REAL); buf_o(SOURCE3.range(23,16), TW3_IMAG); buf_o(concat(OUTPUTE_REAL, OUTPUTE_IMAG), SINK0); buf_o(concat(OUTPUTF_REAL, OUTPUTF_IMAG), SINK1); buf_o(concat(OUTPUTG_REAL, OUTPUTG_IMAG), SINK2); buf_o(concat(OUTPUTH_REAL, OUTPUTH_IMAG), SINK3); new butterfly_4(this, enable, INPUTA_REAL, INPUTA_IMAG, INPUTB_REAL, INPUTB_IMAG, INPUTC_REAL, INPUTC_IMAG, INPUTD_REAL, INPUTD_IMAG, TW0_REAL, TW0_IMAG, TW1_REAL, TW1_IMAG, TW2_REAL, TW2_IMAG, TW3_REAL, TW3_IMAG, OUTPUTE_REAL, OUTPUTE_IMAG, OUTPUTF_REAL, OUTPUTF_IMAG, OUTPUTG_REAL, OUTPUTG_IMAG, OUTPUTH_REAL, OUTPUTH_IMAG); } } A.3.4 complex_mult.java // James Atwell - complex multiplication // Takes two complex inputs of any width and returns a complex result // (INPUTS must be the same width) // The outputs are given as high and low words the same size as the inputs import byucc.jhdl.Logic.*; import byucc.jhdl.base.*; import byucc.jhdl.Xilinx.*; import byucc.jhdl.Xilinx.XC4000.*; import byucc.jhdl.Xilinx.XC4000.carryLogic.*; import byucc.jhdl.modgen.arrayMult; 141 public class complex_mult extends Logic { public static CellInterface[] cell_interface = { param("X_WIDTH", INTEGER), in("X_REAL", "X_WIDTH"), in("X_IMAG", "X_WIDTH"), in("Y_REAL", "X_WIDTH"), in("Y_IMAG", "X_WIDTH"), out("OUTR_HIGH", "X_WIDTH"), out("OUTR_LOW", "X_WIDTH"), out("OUTI_HIGH", "X_WIDTH"), out("OUTI_LOW", "X_WIDTH")}; public static final String cellname = "complex_mult_unit"; public complex_mult(Node parent, Wire X_REAL, Wire X_IMAG, Wire Y_REAL, Wire Y_IMAG, Wire OUTR_HIGH, Wire OUTR_LOW, Wire OUTI_HIGH, Wire OUTI_LOW) { super(parent); if( X_REAL.getWidth() != Y_REAL.getWidth() ) { throw new BuildException(cellname + ": I/O widths must be the same"); } int WIDTH = X_REAL.getWidth(); bind("X_WIDTH", WIDTH); connect("X_REAL", X_REAL); connect("X_IMAG", X_IMAG); connect("Y_REAL", Y_REAL); connect("Y_IMAG", Y_IMAG); connect("OUTR_HIGH", OUTR_HIGH); connect("OUTR_LOW", OUTR_LOW); connect("OUTI_HIGH", OUTI_HIGH); connect("OUTI_LOW", OUTI_LOW); // For complex numbers x and y where x = a + jb and y = c + jd, // This section finds the real part of the result = ac - bd. Wire TEMP_0 = wire(this, 2*WIDTH, "TEMP_0"); Wire TEMP_1 = wire(this, 2*WIDTH, "TEMP_1"); Wire TEMP_4 = wire(this, 2*WIDTH-1, "TEMP_4"); Wire OUTR = wire(this, 2*WIDTH, "OUTR"); new arrayMult(this, X_REAL, Y_REAL, constant(this, 1, 1), TEMP_0, 0 ,0); // a*c new arrayMult(this, X_IMAG, Y_IMAG, constant(this, 1, 1), TEMP_1, 0 ,0); // b*d sub_o(TEMP_0, TEMP_1, OUTR, "OUTR_RES"); // OUTR = Xr*Yr - Xi*Yi // This section finds the imaginary part of the result = ad + bc Wire TEMP_2 = wire(this, 2*WIDTH, "TEMP_2"); Wire TEMP_3 = wire(this, 2*WIDTH, "TEMP_3"); Wire OUTI = wire(this, 2*WIDTH, "OUTI"); new arrayMult(this, X_REAL, Y_IMAG, constant(this, 1, 1), TEMP_2, 0 ,0); // a*d new arrayMult(this, X_IMAG, Y_REAL, constant(this, 1, 1), TEMP_3, 0 ,0); // b*c add_o(TEMP_2, TEMP_3, OUTI, "OUTI_RES"); // OUTI = Xr*Yi + Xi*Yr buf_o(range(OUTR, (2*WIDTH)-1, WIDTH), OUTR_HIGH); buf_o(range(OUTR, WIDTH-1, 0), OUTR_LOW); buf_o(range(OUTI, (2*WIDTH)-1, WIDTH), OUTI_HIGH); buf_o(range(OUTI, WIDTH-1, 0), OUTI_LOW); } } A.3.5 tb_cm.java import byucc.jhdl.base.*; import byucc.jhdl.Logic.*; import byucc.jhdl.Xilinx.XC4000.*; import java.io.*; public class tb_cm extends Synchronous implements TestBench 142 { Wire XMULT_IN_REAL; Wire XMULT_IN_IMAG; Wire YMULT_IN_REAL; Wire YMULT_IN_IMAG; Wire MULT_OUT_REAL; Wire MULT_OUT_IMAG; Wire fake_rhigh; Wire fake_ihigh; public tb_cm (Node parent) { super(parent); XMULT_IN_REAL = Logic.wire(this, 32, "XMULT_IN_REAL"); XMULT_IN_IMAG = Logic.wire(this, 32, "XMULT_IN_IMAG"); YMULT_IN_REAL = Logic.wire(this, 32, "YMULT_IN_REAL"); YMULT_IN_IMAG = Logic.wire(this, 32, "YMULT_IN_IMAG"); MULT_OUT_REAL = Logic.wire(this, 32, "MULT_OUT_REAL"); MULT_OUT_IMAG = Logic.wire(this, 32, "MULT_OUT_IMAG"); fake_rhigh = Logic.wire(this, 32, "fake_rhigh"); fake_ihigh = Logic.wire(this, 32, "fake_ihigh"); new complex_mult(this, XMULT_IN_REAL, XMULT_IN_IMAG, YMULT_IN_REAL, YMULT_IN_IMAG, fake_rhigh, MULT_OUT_REAL, fake_ihigh, MULT_OUT_IMAG); // Just use lower 32 bits of result } public void clock() { System.out.println("\nclock " + count + ":"); System.out.println("XMULT_IN_REAL=" + XMULT_IN_REAL.get(this)); System.out.println("XMULT_IN_IMAG=" + XMULT_IN_IMAG.get(this)); System.out.println("YMULT_IN_REAL=" + YMULT_IN_REAL.get(this)); System.out.println("YMULT_IN_IMAG=" + YMULT_IN_IMAG.get(this)); System.out.println("fake_rhigh=" + fake_rhigh.get(this)); System.out.println("MULT_OUT_REAL=" + MULT_OUT_REAL.get(this)); System.out.println("fake_ihigh=" + fake_ihigh.get(this)); System.out.println("MULT_OUT_IMAG=" + MULT_OUT_IMAG.get(this)); XMULT_IN_REAL.put(this, XMULT_IN_REAL_DATA[count%10]); XMULT_IN_IMAG.put(this, XMULT_IN_IMAG_DATA[count%10]); YMULT_IN_REAL.put(this, YMULT_IN_REAL_DATA[count%10]); YMULT_IN_IMAG.put(this, YMULT_IN_IMAG_DATA[count%10]); count++; } public void reset() { XMULT_IN_REAL.put(this, 0); XMULT_IN_IMAG.put(this, 0); YMULT_IN_REAL.put(this, 0); YMULT_IN_IMAG.put(this, 0); } // Here is the data that we will pump in static int XMULT_IN_REAL_DATA[] static int XMULT_IN_IMAG_DATA[] static int YMULT_IN_REAL_DATA[] static int YMULT_IN_IMAG_DATA[] int count=0; public static void main(String argv[]) { HWSystem hw = new HWSystem(); tb_cm test_bench = new tb_cm(hw); hw.cycle(20); } } = { 2,2,4,6,7,5,2,3,4,5 }; = { 1,3,5,7,6,4,2,3,4,5 }; = { 8,10,12,14,15,13,2,3,4,5 }; = { 9,11,13,15,14,12,2,3,4,5 }; 143 A.3.6 mux_4_1.java // James Atwell - 4 to 1 multiplexor import byucc.jhdl.Logic.*; import byucc.jhdl.base.*; import byucc.jhdl.Xilinx.*; import byucc.jhdl.Xilinx.XC4000.*; import byucc.jhdl.Xilinx.XC4000.carryLogic.*; public class mux_4_1 extends Logic { public static CellInterface[] cell_interface = { param("DATA_WIDTH", INTEGER), in("DATA_IN0", "DATA_WIDTH"), in("DATA_IN1", "DATA_WIDTH"), in("DATA_IN2", "DATA_WIDTH"), in("DATA_IN3", "DATA_WIDTH"), in("SELECT", 2), out("DATA_OUT", "DATA_WIDTH")}; public static final String cellname = "mux_4_1thisisfun"; public mux_4_1(Node parent, Wire DATA_IN0, Wire DATA_IN1, Wire DATA_IN2, Wire DATA_IN3, Wire SELECT, Wire DATA_OUT) { super(parent); int WIDTH1 = DATA_IN0.getWidth(); bind("DATA_WIDTH", WIDTH1); connect("DATA_IN0", DATA_IN0); connect("DATA_IN1", DATA_IN1); connect("DATA_IN2", DATA_IN2); connect("DATA_IN3", DATA_IN3); connect("SELECT", SELECT); connect("DATA_OUT", DATA_OUT); Wire INTERNAL_0 = wire(this, WIDTH1, "INTERNAL_0"); Wire INTERNAL_1 = wire(this, WIDTH1, "INTERNAL_1"); //first level mux_o( DATA_IN0, DATA_IN1, SELECT.getWire(0), INTERNAL_0); mux_o( DATA_IN2, DATA_IN3, SELECT.getWire(0), INTERNAL_1); //second level mux_o( INTERNAL_0, INTERNAL_1, SELECT.getWire(1), DATA_OUT); } } A.3.7 generic_pe_bly4.java // James Atwell // 9/21/99 // generic_pe_bfly4.java 144 // This file can be compiled into a usable pe // The only change that needs to be made to use a Janus style // opeartion is the name of the operation in the last line. import byucc.jhdl.platforms.wildforce.*; import byucc.jhdl.base.*; import byucc.jhdl.Xilinx.XC4000.*; public class generic_pe_bfly4 extends pelca { public static CellInterface[] cell_interface = { out("MEM_ADDRESS", 20), out("DATA_TO_MEM", 32), in("DATA_FROM_MEM", 32), in("grant", 1), out("request", 1), out("strobe", 1), out("MEM_WRITE_SEL", 1), in("reset", 1), out("intreq", 1), in("intack", 1) }; public generic_pe_bfly4(pe parent) { super(parent); Wire MEM_ADDRESS = port("MEM_ADDRESS",MemAddr(19,0)); // Address lines Wire DATA_TO_MEM = port("DATA_TO_MEM",MemDataOut()); // Memory Write lines Wire DATA_FROM_MEM = port("DATA_FROM_MEM",MemDataIn());// Memory Read lines Wire grant = port("grant",MemBusGrant_n()); // Grants Access to Memory Wire request = port("request",MemBusReq_n()); // Used to Request Memory Access Wire strobe = port("strobe",MemStrobe_n()); Wire MEM_WRITE_SEL = port("MEM_WRITE_SEL",MemWriteSel_n()); // Read/Write line Wire reset = port("reset",Reset()); // PE Global Reset Wire intreq = port("intreq", InterruptReq_n()); Wire intack = port("intack", InterruptAck_n()); Wire enable = wire(this,1,"enable"); // goes high when ok to start accessing memory Wire done = wire(this,1,"done"); Wire int_done = wire(this,1,"int_done"); // user must set int_done when done processing to return control to host Wire int_strobe = wire(this,1,"int_strobe"); or_o(grant,int_strobe,strobe); buf_o(int_done, done); // FSM below controls startup and shutdown conditions for the PE BusGrantFSM fsm = new BusGrantFSM(this,intack,grant,done,request,enable,intreq); // Change name of operation below new bfly4_mmp(this, DATA_FROM_MEM, DATA_TO_MEM, MEM_ADDRESS, MEM_WRITE_SEL, strobe, enable, int_done); } } A.4 16 Point FFT Source Files A.4.1 fft_16pt.java import byucc.jhdl.base.*; import byucc.jhdl.Logic.*; import byucc.jhdl.Xilinx.*; import byucc.jhdl.Xilinx.XC4000.*; import byucc.jhdl.Xilinx.XC4000.carryLogic.*; 145 public class fft_16pt extends Logic { public static CellInterface[] cell_interface = { in("x_0R", 8), in("x_0I", 8), in("x_1R", 8), in("x_1I", 8), in("x_2R", 8), in("x_2I", 8), in("x_3R", 8), in("x_3I", 8), in("x_4R", 8), in("x_4I", 8), in("x_5R", 8), in("x_5I", 8), in("x_6R", 8), in("x_6I", 8), in("x_7R", 8), in("x_7I", 8), in("x_8R", 8), in("x_8I", 8), in("x_9R", 8), in("x_9I", 8), in("x_10R", 8), in("x_10I", 8), in("x_11R", 8), in("x_11I", 8), in("x_12R", 8), in("x_12I", 8), in("x_13R", 8), in("x_13I", 8), in("x_14R", 8), in("x_14I", 8), in("x_15R", 8), in("x_15I", 8), out("X_0R", 16), out("X_0I", 16), out("X_1R", 16), out("X_1I", 16), out("X_2R", 16), out("X_2I", 16), out("X_3R", 16), out("X_3I", 16), out("X_4R", 16), out("X_4I", 16), out("X_5R", 16), out("X_5I", 16), out("X_6R", 16), out("X_6I", 16), out("X_7R", 16), out("X_7I", 16), out("X_8R", 16), out("X_8I", 16), out("X_9R", 16), out("X_9I", 16), out("X_10R", 16), out("X_10I", 16), out("X_11R", 16), out("X_11I", 16), out("X_12R", 16), out("X_12I", 16), out("X_13R", 16), out("X_13I", 16), out("X_14R", 16), out("X_14I", 16), out("X_15R", 16), out("X_15I", 16)}; public static final String cellname = "16_point_FFT"; 146 public fft_16pt(Node parent, Wire x_0R, Wire x_0I, Wire x_1R, Wire x_1I, Wire x_2R, Wire x_2I, Wire x_3R, Wire x_3I, Wire x_4R, Wire x_4I, Wire x_5R, Wire x_5I, Wire x_6R, Wire x_6I, Wire x_7R, Wire x_7I, Wire x_8R, Wire x_8I, Wire x_9R, Wire x_9I, Wire x_10R, Wire x_10I, Wire x_11R, Wire x_11I, Wire x_12R, Wire x_12I, Wire x_13R, Wire x_13I, Wire x_14R, Wire x_14I, Wire x_15R, Wire x_15I, Wire X_0R, Wire X_0I, Wire X_1R, Wire X_1I, Wire X_2R, Wire X_2I, Wire X_3R, Wire X_3I, Wire X_4R, Wire X_4I, Wire X_5R, Wire X_5I, Wire X_6R, Wire X_6I, Wire X_7R, Wire X_7I, Wire X_8R, Wire X_8I, Wire X_9R, Wire X_9I, Wire X_10R, Wire X_10I, Wire X_11R, Wire X_11I, Wire X_12R, Wire X_12I, Wire X_13R, Wire X_13I, Wire X_14R, Wire X_14I, Wire X_15R, Wire X_15I) { super(parent); connect("x_0R", x_0R); connect("x_0I", x_0I); connect("x_1R", x_1R); connect("x_1I", x_1I); connect("x_2R", x_2R); connect("x_2I", x_2I); connect("x_3R", x_3R); connect("x_3I", x_3I); connect("x_4R", x_4R); connect("x_4I", x_4I); connect("x_5R", x_5R); connect("x_5I", x_5I); connect("x_6R", x_6R); connect("x_6I", x_6I); connect("x_7R", x_7R); connect("x_7I", x_7I); connect("x_8R", x_8R); connect("x_8I", x_8I); connect("x_9R", x_9R); connect("x_9I", x_9I); connect("x_10R", x_10R); connect("x_10I", x_10I); connect("x_11R", x_11R); connect("x_11I", x_11I); connect("x_12R", x_12R); connect("x_12I", x_12I); connect("x_13R", x_13R); connect("x_13I", x_13I); connect("x_14R", x_14R); connect("x_14I", x_14I); connect("x_15R", x_15R); connect("x_15I", x_15I); connect("X_0R", X_0R); connect("X_0I", X_0I); connect("X_1R", X_1R); connect("X_1I", X_1I); connect("X_2R", X_2R); connect("X_2I", X_2I); connect("X_3R", X_3R); connect("X_3I", X_3I); connect("X_4R", X_4R); connect("X_4I", X_4I); con...
MOST POPULAR MATERIALS FROM ETD
MOST POPULAR MATERIALS FROM Virginia Tech