Ashok Kulkarni, S2C 10/12/2011 05:04 PM EDT
Introduction Silicon process technologies have entered into the sub-30 nm realm, making it feasible to realize more than several billion transistors on a single chip. This, in turn, has made it possible to implement multiple extremely large and complex functions into a single SoC/ASIC/ASSP (hereafter referred to as SoC that applies to ASIC and ASSP) and conceive a myriad of applications catering to both consumer space and commercial usage. Furthermore, many consumer gadgets have a short life-cycle (some as low as 3 to 6 months) and are often differentiated by the software content that runs on them.
There are two primary requirements for a successful launch of these devices. First, the design must undergo thorough and comprehensive verification prior to the design tape-out. Second, an affordable hardware platform, representing the SoC prototype, for software development must be available prior to the arrival of first batch of silicon.
Silicon re-spin is not an option anymore as the mask cost alone runs into tens of millions of dollars and the lost market opportunity due to delayed product introduction may mean risking the business viability altogether.
This article first looks into various available solutions that can be used for functional verification and software development. It briefly compares various solutions and summarizes the pros and cons in using these solutions. The article focuses on FPGA-based prototyping and discusses various factors that must be taken into account to successfully implement a prototyping strategy.
Solutions for functional verification and software development There are broadly two categories of verification options available: one based on software simulation techniques and the second based on hardware platforms; the latter option is often referred to as hardware-assisted verification. The hardware platforms can be categorized as hardware acceleration, emulation, or FPGA-based prototyping. When compared in terns of ease of use, performance, and cost, each approach has its own pros and cons.
Simulation: RTL simulators are easy to setup and use, and they are relatively inexpensive. They have both high controllability and observability and hence offer 100% signal visibility into the design. The disadvantage is that the simulation performance is very low; typically in the range of 20 Hz to 30 Hz (slightly higher speeds are possible in regression mode in some simulators). Modern day designs are tens of millions of ASIC gates size, which requires the running of a very large number of test suites. Even if the simulations were run on a high-end computing machine, the effective performance is very low. Further, at this speed it is impractical to develop even low-level diagnostic type software. Also, it is not feasible to debug and verify the design in the context of the real hardware interface. Hence RTL simulators are often used only for block-level design development.
Hardware Acceleration: With hardware acceleration, the design being verified is mapped into hardware (an array of FPGAs or custom processors) to accelerate the design performance. The real bottleneck in performance is that the test bench resides in the software simulator. The performance is directly proportional to the ratio of TB (test bench) activity and the design under test (DUT) activity. To speed up performance, the TB may be required to communicate at the application level, instead of the signal level, through synthesized transactors that are mapped to the hardware along with the DUT. Transaction-based verification (TBV) minimizes the interaction of the test bench to boost performance. One major limitation of accelerators is that they verify the design in isolation and not within the context of the system. Cost is another issue that makes this approach unaffordable for multiple software and hardware developers. Typical performance for hardware acceleration is in the 250 Hz to 100 kHz range. Once again, this speed is insufficient for software development.
Emulation: Emulation allows you to map both the design and the synthesizable test bench into the hardware (this often consists of an array of FPGAs or custom processors). You can verify the emulated design in the context of the actual system using in-circuit mode. The debugging capabilities offered by emulators are powerful; however the performance can be anywhere in the range of 500 KHz to 750 KHz – possibly as high as 1.5 MHz when the design clocks are highly synchronous to each other. With these speeds, it is possible to run low-level diagnostic software, but not enough to develop application software or to bring up an operating system (that may require several hundred million CPU instructions). While emulators can be shared, the overall cost is very high several hundred thousand to over a million dollars. This high cost of emulators prohibits creating multiple platforms (replicates) and limits the number of early users in the form of software developers and potential customers.
FPGA-Based Prototyping: The key benefits of FPGA based prototyping are low cost, high performance, and easy deployment. In addition to these benefits, the ability to interface with real-time devices allows the designer to observe the SoC prototype behavior in the context of a real system. Low system cost makes it affordable to create multiple pre-silicon hardware platforms for software development. FPGA based prototyping offers significantly high operating performance, typically in the range of 10 MHz to 80 MHz, thereby enabling verification of designs that are subjective in nature, such as video processors. Higher performance also makes it practical to develop software: device drivers, OS boot-up, and application software – which is not possible using other verification techniques. A small system footprint makes this approach easily deployable and portable to the field for demonstration and testing.
In summary their high performance and low cost makes FPGA-based rapid prototypes ideal hardware platforms for:
Comprehensive design verification
Pre-silicon software development platform
Multiple, low cost replicates
In order to compare the performance of the various approaches, let us consider the time required to decode a video stream using the H.264 standard for the baseline profile (used for video phone, conferencing etc.).
We will assume a compressed high definition (HD) video with a resolution up to 1280x720p, an operating frequency of 88 MHz, and real-time decoding at up to 30 frames per second. The baseline profile includes I (Intra-coded) and P (Predictive coded) slice, CAVLC (Context-based Adaptive Variable Length Coding) entropy encoding, FMO (Flexible Macro block Ordering), ASO (Arbitrary Slice Ordering), and Redundant Slice.
Prototyping challenges FPGA-based prototyping systems do have certain limitations. For example, although debug tools have been significantly improving, debugging a design is not as intuitive as in a simulator. Also, the process of mapping the RTL design into the board and verifying is not completely automated and needs user intervention. However, the overall benefits of FPGA-based prototyping significantly outweigh these minor limitations.
Some of the key challenges in a prototyping system are:
Debug visibility and RTL co-relation with debug probes
The ability to debug any portion of a design mapped over multiple FPGAs
High-speed data transfer between the verification and design environments using high-level languages such as C and SCE-MI
The availability of reference designs for quick validation, and wide choice of daughter boards to interface with different protocols such as PCIe, USB 3.0, GbE and other emerging standards
Availability of comprehensive self-tests that quickly isolate prototyping board issues from design issues.
In the following sections we will consider how these challenges affect overall productivity and the approach that S2C has taken to overcome these challenges.
Addressing FPGA-based prototyping challenges S2C Inc. is currently offering fourth-generation FPGA-based hardware prototyping platforms. The FPGAs on the prototyping system could be from Altera or Xilinx. The boards are available in various device implementations to meet different design sizes such as a single FPGA, dual FPGA, and quad FPGA configurations. The systems are designed such that it is easy to scale the gate capacity by stacking or tiling two or more boards. The design can be configured into the prototyping system boards via JTAG, standard USB, or SD card.
Debug Visibility: In a typical design flow, the design is represented in a Hardware Description Language (HDL) – such as Verilog, SystemVerilog, or VHDL – and simulated in a RTL simulator. Most design errors are discovered and fixed in a simulation environment. In simulation, the visibility and controllability of debug signals is much greater and hence can be fixed easily. The designer can observe and fix the incorrect design behavior at the RTL level while debugging the design on a prototype means working at a gate level synthesized net-list. The synthesis process (RTL to gate level transformation) often renames the signals and sometimes completely optimizes the signals away making it difficult to correlate signals at two different abstract levels.
This year, S2C introduced its Verification Module (VM) (patent pending) to address many of these verification challenges.
Figure 1(a). S2C Verification Module (with Altera FPGA)
Figure 1(b). S2C Verification Module (with Xilinx FPGA)
First, this solution allows the designer to set debug probes at the RTL level. The setting is intuitive and is performed using S2C's TAI Player Pro Software GUI or it can be scripted into a tcl command file. Another important capability offered by the VM debug solution is that of RTL visibility. The gate-level debug signals names listed in the debug tool after synthesis are very similar to the original RTL names as shown in Figure 2.
Figure 2. The Verification Module maintains the original RTL signal names while performing gate-level debug.
This makes it very easy to traverse from gate-level signals to the corresponding RTL so that an error detected during a gate-level debug session is easily identified in the RTL. This significantly simplifies the debugging process.
Secondly, it allows you to use standard vendor debug tools, such as SignalTap (for Altera FPGAs) and ChipScope (for Xilinx FPGAs). For example, for a Stratix 4 based Quad S4 prototyping system, S2C TAI Pro Player software allows you to define 1,920 signals. At any given time of the debug session, you can observe up to 480 signals of the 1,920 defined above. If the debug point of interest doesn’t lie within the selected 480 signals, then you can simply select another set of 480 signals.
Third, it drastically reduces the debug iteration time. For instance when a new set of signals is selected for observation in a standard waveform viewer, the design need not be compiled – this means the design doesn’t go through synthesis and place and route which form the “long pole” of the design cycle.
Multi-FPGA Debug: Most designs don’t fit into a single FPGA. Often it is necessary to partition the design and map it over multiple FPGAs. When debugging a design on the prototype it is not always known in advance as to which portion of the design, and therefore which FPGA, might contain a potential design error. And sometimes even if you knew precisely the design block that has potential errors, it is quite possible that the design block of interest has been partitioned and mapped over multiple FPGAs. Most debug tools currently available are capable of debugging only a single FPGA at any given time. This limits the simultaneous visibility into more than one FPGA.
Figure 3. The Verification Module allows multi-FPGA debug and high-speed data transfer
The Verification Module addresses this limitation by enabling multi-FPGA debug. For an Altera-based prototyping system you would use an S4 VM as illustrated in Figure 1(a); for Xilinx-based FPGAs you would use V6 VM as illustrated in Figure 1(b). The S4 VM is based on a Stratix 4 FPGA and the V6 VM is based on a Xilinx Virtex 6 FPGA. The VMs can be used to debug any number of FPGAs available on a given prototyping system, including a single FPGA. The VM technology allows users to set various trigger conditions on each FPGA. The trigger conditions are set in the SignalTap (Altera) or ChipScope (Xilinx) debug settings menu either on a single or multiple probes pre-defined and available for debug session. Likewise you can set trigger conditions on additional FPGAs of interest (all FPGAs if needed) available on the board. If S1, S2 ... Sn correspond to trigger condition for each FPGA, where n is the number of FPGAs selected for debug, then the master trigger condition, say ‘S’, would be a Boolean product of each of the individual trigger conditions. S=S1.S2.S3 …Sn. It may be noted that S1 or S2 … etc. corresponds to Boolean product of debug probe logic values of the design for a given FPGA.
High Speed Data Transfer: In prototyping, it is often necessary to transfer large amounts of data between the design and the verification environment. This data could be design data that is stored in the FPGA memory or prototype board memory as an image file. Or it could be data captured from a debug session in the form of a VCD file. The data size can sometimes run into gigabytes. To efficiently manage such large amounts of data you need two capabilities. First, you need to have high speed ports that support protocols such as PCIe with multiple lanes that can quickly transfer large amounts of data, for example, x4 PCIe Gen 2 can provide a raw data bandwidth of up to 20 Gigabits (or 2.5Gbytes) per second of data. Second, you need the ability to communicate with the design from the computer verification environment through high level commands such transactors or simple C-API calls. Transaction level communication is efficient and when implemented as SCE-MI (Simulation Co Emulation Modeling Interface) interface, it is portable across multiple projects and prototyping systems. S2C prototyping systems allow designs to communicate with verification environment via C-API calls and SCE-MI based TLM interface.
Reference Designs Enable Quicker Implementation: Designs that need to interface to high speed memories such as DDR3 memory (that potentially run over GHz speed) or PCIe Gen 3 or USB 3.0 pose additional challenges in implementing and verifying the prototype design. One is the sheer functional complexity. Another is that to comply with the protocol specs the timing requirements are stringent and require the system to interface with high speed devices. For example, USB 3.0 implemented in ‘super-speed’ mode runs up to 5 Gbps, while an x4 PCIe Gen3 interface would need to run at about 32 Gbps bandwidth. In each of the designs mentioned above, whether you are implementing a controller or PHY, a working design will helps users go a long way. This often serves as a starting point for the designer to develop their own implementation. For example, if you are a developing a PCIe controller, then you could run the reference design and make sure it meets both functional and timing protocols. Later, as you develop your own controller design, you can replace the controller in the reference design with the design in progress and validate for correct behavior, without changing the rest of the verification environment. This makes it much easier to start with a known working reference design and develop and debug your own design. S2C offers a number of ‘Prototype Ready Reference Designs’ that allows user to quickly begin with a known working verification environment and plug in user’s design. This allows user to compare and contrast a working design with a design in progress making debug an easier process.
In addition to Reference Design it is important to have a library of daughter cards that can be used to interface with other devices and validate in the context of a system. These daughter cards may contain third party IPs or standard bus interfaces such as PCIe or GbE and so on.
Debug the Design not the Prototype Board: Regardless of whether you build the prototyping system yourself, or you acquire it from a prototyping vendor, one thing is obvious – you want to spend the least amount of your time in debugging the prototyping board. After all, your goal is to debug your design and not the prototype board. Consider a prototyping system with quad Stratix 4 SE820 device with the F1760 package. The total number of pins, including power and ground, on such boards can be close to 8,000. In order to make sure that the pin connectivity is correct and all signals in the prototyping system function properly, it is critical to have comprehensive self-tests that identify and pin-point the sources of any problems. More importantly, these tests must be easy to run, have short run-times, and provide detailed yet easy-to-understand results. At times, when you suspect the board could be a potential problem, such self-tests will be critical. All S2C prototyping systems come with comprehensive self-tests that allow you to verify I/O pins, clock pins, memory interface and all other signals.
Summary Advances in process technologies have made it feasible to implement very complex capabilities and offer a number of functions in SoC/ASIC/ASSP devices.
Software content continues to dominate and plays a key differentiating role among similar devices. In order to ensure that the designs are thoroughly verified and the software developed to match the desired functional behavior, it is critical to create prototypes of these designs.
Due to their low cost, high performance, and easy deployment, FPGA-based prototyping serves as the ideal pre-silicon platform choice for verification and software development. S2C offers a number of capabilities – debug visibility, multi-FPGA debug, high speed data transfer, prototype ready reference designs, and a library of daughter cards – that makes S0C verification and software development an easier and faster process.
About the author Ashok Kulkarni is the Director of Applications Engineering at S2C Inc. His twenty years of experience includes both semiconductor and EDA industry in ASIC design, Applications Support, ASIC Consulting and Technical Marketing.
Ashok has worked for Synopsys, Cadence, Mentor Graphics, Synplicity, Quickturn and Semi-Custom Logic in various roles. He holds a MSEE from Louisiana State University and an MSEE from Bombay University.