A natively flexible 32-bit Arm microprocessor
To take full advantage of the highly automated and fast implementation and verification offered by modern silicon integrated circuit design workflows, we have developed a small library of standard cells. A standard cell library is a collection of small, pre-checked building blocks from which much larger and more complex designs can be built quickly and easily using sophisticated electronic design automation tools such as the synthesis, location and routing.
Before the implementation of the standard cell library could begin, preliminary investigations were carried out to determine the most appropriate standard cell architecture for the library given the constraints of the target technology. Cell architecture is the set of features common to all cells in the library, such as cell height, size of power straps, routing grid, etc., which allow cells to be assembled. in a standard way to form larger structures. These common characteristics are largely governed by the design rules of the manufacturing process, but are also influenced by the performance and surface requirements of the final design.
Once the cellular architecture was established, the next step was to determine the content of the cell library not only in terms of the variety of logic functions, but also in terms of the number of drive power variants of each logic function. Since the effort required to design, implement and characterize each standard cell is significant, it was decided to perform some tests with a small library of prototypes and then expand the library as needed. To evaluate the performance of this small prototype of a standard cell library, simple representative circuits (such as ring oscillators, counters, and shift matrices) were implemented, fabricated and tested.
We migrated from the 1.0 μm design rules to the new 0.8 μm FlexIC design rules to reduce the area and, therefore, increase the yield. As this involved redesigning each cell in the library with smaller transistors, we also took the opportunity to modify the architecture of the standard cells to include MT1 (metal-tracking 1) pins to make it easier to connect the cells. through the router. Improvements made to the resistive material (higher sheet resistance, Rs) also allowed a 3x reduction in the size of the resistors.
This dramatic reduction in the size of transistors and resistors reduced the area of most cells by approximately 50% (see Extended Data Fig. 1), which improved manufacturing efficiency by reducing the overall design size. . However, since there were still some manufacturing yield issues that we could further alleviate by changing the standard cell architecture, the library was redesigned again. This time, we focused on things that could improve the overall performance of the final design, such as including redundant vias and contacts, reducing the number of vertices in source-drain polygons (if possible) and keeping the size of the stacked transistors to a minimum. Additionally, we reverted to lower sheet resistance to improve the spread of the process, but we were able to maintain the area savings by using tighter resistors. To improve the overall quality of logic synthesis, a number of complex AND-OR-INVERT and OR-AND-INVERT logic gates have been added to the library as well as simple high resistance logic gates, such as NAND2_X2 and NOR2_X2.
The FlexLogIC process is an NMOS process and therefore relies on a resistive load to pull the output of the cell to the power supply to drive logic 1. As a result, the rise times of the cell output are much slower than descent times and this asymmetry can affect performance, especially for heavily loaded nets. To improve timing on critical networks, such as the clock, we’ve added buffers with an active transistor pull-up. Although these active pull-ups slightly increase the area, they have the added benefit of reducing static energy consumption. The simulated transfer arrangements and characteristics of buffers with resistive pull-up and active transistor pull-up are shown in Extended Data Fig. 2.
This simple standard cell library was then used successfully as the target technology to implement the PlasticARM SoC using a typical silicon integrated circuit design flow based on industry standard electronics design automation tools. industry. The contents of the standard cell library and the information about cell usage are shown in the extended data table 1.
Since we do not yet have a dedicated FlexIC static random access memory, we have created a simple registry file by carefully placing a few modified standard cells in a tiled array that is butt connected to form 32 × 32 bit memory ( this block can be seen in the arrangement of the chip in figure 1c).
FlexLogIC technology (see Extended Data Table 2) has four routable metal layers of which only the lower two have been used inside standard cells. This left the top two metal layers free to be used for interconnection between standard cells, which could then be routed on top of all neighboring cells, resulting in a much improved overall grid density of approximately 300 screens per mm.2.
Process parameters and statistical variations of TFT parameters are summarized in extended data table 2. FlexLogIC is a proprietary 200mm semiconductor fabrication process that creates patterned layers of transistors and thin film resistors at metal oxide, with four circuits (gold free) metal layers deposited on a flexible polyimide substrate according to the FlexIC design. Repeated instances of the FlexIC design are achieved by performing multiple sequences of thin film material deposition, patterning, and etching. To facilitate handling and enable the use of industry standard processing tools and achieve submicron pattern characteristics (up to 0.8m), the flexible polyimide substrate is centrifugally coated onto the glass. at the start of production. The process has been optimized to ensure that the variation in thickness is substantially less than 3% over a lateral distance of 20mm. Thin film material deposition is achieved through a combination of physical vapor deposition, atomic layer deposition, and solution processing (eg, spin coating). Substrate processing conditions have been carefully optimized to minimize film tension and substrate curvature. Feature structuring is done using a 5 × photolithographic stepping tool, which images a shot that is repeated several times on the 200mm diameter slice. Each shot is individually focused, which further compensates for any variation in thickness in the spun film. Technological measurements were carried out using process control structures.
Simulation, testing and validation
We captured the timing characteristics of the functional PlasticARM FlexIC using a test measurement setup and compared the measured results with the results of its Register Transfer Level (RTL) simulation to validate functionality.
RTL simulation is shown in Extended Data Fig. 3. It begins by resetting the PlasticARM to a known state by setting a RESET input to “0”. Then, RESET is set to ‘1’, the processor is released from its reset state and starts executing code from ROM. First, the GPIO the output pin is toggled once before the three tests depicted in Figure 2 are performed. In the first test, data is read and added to an accumulator from ROM, and the sum is compared to an expected value (see Fig. 2a). If the values match, a short burst of two pulses is sent to the GPIO as shown in the extended data Fig. 3a. If the values are different, the period and the duty cycle of the pulses on GPIO is increased in Extended Data Fig. 3b. In the second test (Fig. 2b), data is written to RAM, read back and compared. If the data has not been corrupted while writing or reading from RAM, a short burst of three pulses is sent to the GPIO as shown in the extended data Fig. 3a. If the data has been corrupted, the period and duty cycle of the pulses on GPIO is increased as before. In the final test (Fig. 2c), the processor enters an infinite loop and measures the time that a ‘1’ is applied to the GPIO input pin. If GPIO is kept at ‘1’ without any problem long enough, GPIO changes from ‘0’ to ‘1’. PlasticARM has been implemented with a clock frequency of 20 kHz. As it does not use any timer, a value has been chosen in the software to represent the GPIO the signal is held at ‘1’ for approximately 1 s when operating at 20 kHz. In our simulations in Extended Data Fig. 3a, this value corresponds to 20,459 clock cycles, which at 20 kHz gives 1.02295 s.
After fabrication, PlasticARM was tested on a wafer probe station while still attached to a glass holder. The input signals including a clock signal were generated externally with an Xilinx ZC702 FPGA evaluation board. Input and output signals were captured using a Saleae Logic Pro 16 logic analyzer. Measurements were made at 3 V and 4.5 V, with different clock frequencies. An experiment with a power supply set to 3 V and a clock frequency of 20 kHz is shown in the extended data Fig. 4. The I / O voltage of the ZC702 caps the inputs and outputs at 2.5 V. The waveform of the measured data is shown in the extended data Fig. 4a, and corresponds to the waveform in the RTL simulation of the three tests in Extended Data Fig. 3a. PlasticARM is fully functional up to 29 kHz at 3 V and 40 kHz at 4.5 V.