Archive for the ‘Software’ Category

Notes from the Design Automation Conference – Part 1

Wednesday, June 8th, 2011 by rwilliamson


Every once in a while we like to play ‘reporter’ and talk about the things we see at industry trade shows. We try and relate what we learn to what our typical reader expects – which is a commentary on what is inside technology from a reverse engineering perspective. Our own Randy Torrance is at the Design Automation Conference, talking about reverse engineering and has taken some time to walk the floor and attend the conference.

All the usual suspects are at the Design Automation Conference. Cadence, Synopsys and Mentor all have large booths and a solid presence. TSMC has a booth straddling both sides of a main isle. But the booth that surprised me the most was GlobalFoundries. I didn’t do any square foot measurements, but they are in the running for the biggest booth here. And it blocks one of the most popular isles. It seems every time I get lost I walk right into it. They also seem very busy in other aspects of this show. Clearly they’re putting on a big push this year.

For most of the day I thought I would check out the sessions and see what was up and coming in EDA. I wasn’t disappointed as there were lots of interesting papers covering the gamut of IC CAD. But, as usual, we are looking for trends. Are there any overriding subjects that everyone seems to be talking about? The first one to jump out on day 1 was high level design. It looks like the EDA industry is doing pretty well at supporting most of the implementation stages of IC design. All the way from RTL to tape-out is pretty well served, but the creation of the original design at a high level seems to still be an issue. I guess it’s not a surprise, as 10 years ago everyone was happy to design chips using either of the main RTL languages of Verilog or VHDL. That worked well for million gate designs. But as we approach billion gate designs higher level languages have become necessary. And the EDA industry is trying to keep up.

In this morning’s keynote address Lisa Su, Senior VP and General Manager of Networking and Multimedia for Freescale Semiconductor made it clear that high level design is a major concern. One of her main issues is hardware – software co-design. In today’s complex chips it takes more effort to design the software than the hardware. However, the software design can’t start until there is some hardware to design it to. So problems aren’t identified until it is almost too late to fix them given the narrow market windows and heavy competition. Mrs. Su’s point was that “it” (the software) needs to be done quicker.

In fact, the software can’t even wait for the final chip design at tape-out. Electronic System Level (ESL) design needs to be good enough that it can be used as a Virtual Platform (VP) for the software to be developed on as soon as the ESL is complete (before RTL is even done).

At a luncheon given by Mentor Graphics this theme was continued. A collection of industry executives agreed that ESL was absolutely required, but not quite ready to solve all the problems yet. Gadi Singer, VP SOC Enabling Group at Intel requires ESL not just to enable designing more complexity, but also to speed up simulations, accelerate the design cycle, and reduce the lines of code needed to be verified by 10X. But mainly he needed ESL to allow early hardware software co-design. This theme was echoed by John Goodenough, VP of Design Technology at ARM, Ken Hansen, CTO at Freescale, and Jean-Marc Chateau, Director of System Platforms and Tools at STM. Jean-Marc stated that about 80% of STM’s SOCs were now designed partly using ESL. But he needs the other 20% to follow suit. He also needed ESL to find a way to incorporate power management.

The top 3 priorities of circuit designers are now 1) functionality, 2) performance, and 3) power management according to Jean-Marc. ESL has a good handle on the first, is doing OK on the second, but is not yet in the game for power. We need to do a bit of self promotion here, but Chipworks services also try to help companies address similar problems by showing them early on in the high level design, what other leading designers have done in the same or related chips.

So high level design is the watchword of the day. I guess it shouldn’t really be a surprise. It’s the decisions you make early on in the design cycle that have the most impact at the end.

Evaluating the Software in Automotive ECUs

Thursday, March 25th, 2010 by chipworks

contributed by Stanko Vuleta

Reverse Engineering Software and Systems

As part of our patent support business, Chipworks applies software reverse engineering to generate evidence of patent infringement, which is used by IP groups and outside counsel in patent licensing negotiations and litigations. This evidence is in the form of a claim chart which maps relevant patent claim elements to the infringing product. We apply software reverse engineering to analyze and document infringement on a wide variety of products, from consumer electronics to communication devices and automotive systems.

We recently went inside an automotive electronic control unit (ECU). These ECUs comprise a wide range of units like the powertrain control module (PCM), body control module (BCM), electric power steering (EPS), airbag control unit (ACU), or electronic brake control module (EBCM). In light of the recent media attention targeting the automotive industry, we thought it would be interesting to share some of our findings in relation to the perceived quality of the systems software against other semiconductor-based systems we have analyzed.

Since our overall findings are a bit mixed, we will spare the blushes of the leading automotive companies by not publishing the make or type of module we analyzed.

Reverse Engineering an Automotive Module

First we disassembled the module, reverse engineered the PCB, scoped its signals while operational, and reverse engineered its software.

Two types of software analysis were done. We extracted the raw binary code from the module and analyzed it statically (called ‘dead’ code analysis). We also analyzed the code while it was in action on the target device (called ‘dynamic’ or ‘live’ code analysis).

We then rated several aspects of the unit (e.g., physical protection, code complexity) for quality. The results were surprising to us.

Physical Protection: Good

The module has a heat sink which also plays the role of a mechanical cover. The CPU detects the presence of the module and stops running the code if it is removed. Below is an example of one of the holes used for screwing the module to the PCB.

Automotive resistor

Note the resistor connecting the screw pad to the CPU.

Code and Data Space Protection: Good

The code space is checked for corruption and tampering by calculating and checking a CRC over the complete code space. The twist on this particular implementation is that there are several routines doing similar CRC checks, making tampering with the code more difficult.

The content of the RAM data space cannot, of course, be checked for correctness since it changes all the time. Instead, the memory integrity is checked by writing and reading back standard 0xAA ‘ 0×55 patterns in the background. The obligatory enforcement of the atomic instructions has been observed here.

Compared with other consumer and communication devices that come across our desks, both the code and data space protection were a notch higher.

Code Complexity: Bad

Inner workings of the code have been analyzed, and a number of software routines completely dissected. The surprise here was just how much code was created for what was supposed to be simple processing of a few sensor values. A lot of the code improved sensor resolution and produced more precise reading; correction, normalization, and adjustment routines were abundant.

This left us with a question ‘ why was it necessary to go to great pains to provide 0.003% precision when 0.3% would have sufficed for the purpose?

This observation was further confirmed by the sheer number of routines in the code ‘ over 700 in total. Around 200 of them were involved in the comparatively simple task of processing two sensor inputs and outputting two variables. This certainly looks like overkill.

The calling tree, showing which routines call which others, looks rather intimidating.

Automotive calling tree

The overwhelming impression was that precision was put far ahead of code simplicity. Occam’s razor was rather dull in the process of this code development.

Debugability: Bad

A good coding practice is to sprinkle the code with debug logs. Granted, they do increase the code size, but they can also save your bacon when you need to debug a problem in the field on a live system. Usually, the execution of the debug logs is skipped and turned on only when needed.

We did not see evidence of debug logs in this code.

The CPU in question did not have a JTAG or any other debug port. The only way to debug it effectively is to use an in-circuit emulator (ICE). The only problem with the ICE is that the CPU needs to be completely removed and replaced with a bulky ICE module. Such a contraption can not fit mechanically in the narrow space provided for the module. Couple this with the above mentioned physical protection, and the result is a very difficult setup to debug in the field.

Error Handling: Ugly

Another good coding practice for mission critical applications is to check for software errors and log any found. A balance needs to be achieved between too much error checking and too little. Too much leads to code bloat (more code, more bugs), and too little will let code bugs lurk undetected.

The voluminous code we inspected is certainly not a result of too many error checks. In many of the routines inspected, there were no checks for software errors.

We did, however, find checks verifying the raw sensor values and final output values, but not much more. Even these errors were not logged immediately when they were detected, potentially making the debugging more difficult.

However, the most worrisome error checks were those where values were checked if they were outside a maximum range allowed. One would think that alarm bells would be going off upon detecting any such errors, and they would be logged and appropriately handled. Instead, all that was done was to simply cap the value to the maximum (or minimum) and forward it on as if nothing had happened!