Home     Content    

Reverse Engineering

Reverse engineering is the process of extracting the design of a system from the code. It can never produce the design reached by software engineering, but it can produce an "as is" representation of the system.

Introduction

Reverse engineering is not part of the system software development process. In fact, if a process were in place, there would be no need to reverse engineer a system.

I do not enjoy reverse engineering systems, and I don't know any software engineer who does. Unfortunately, there are systems around that need to be kept alive.

Reverse Engineering Tools

For some languages, tools exist for reverse engineering. For other languages, it is sometimes necessary to develop tools to aid in reverse engineering.

Tools for C code

Tools can be found to help reverse engineer C code. The tools I have used are ctags, cflow and ccalls. cflow or ccalls will document the flow of control through the system and have limited use. ctags can be used to identify the data and functions in the system.

Tools for C++

I used Rhapsody to attempt to reverse engineer an embedded system written in C++. Unfortunately, the system I tried to reverse engineer was written, but never designed. Instead of a design, this system relied on heros to work around the clock before a demonstration or delivery to fix bugs that seem to come out of nowhere. The failure to extract the design was not the fault of Rhapsody. It actually created all the diagrams as it was supposed to. The black mass that was displayed reflected the lack of design and tight coupling of the system.

Tools for MUMPS

I led the development of a set of tools to reverse engineer MUMPS. The requirements for the tools were tailored to a single system, so the tools were also tailored to a single system. If these tools were only used for the system they were developed for, everything would have been fine, but news of our tools spread to other companies who enlisted our help to understand their systems. Unfortunately, the tools failed to extract the required information without my going into the code to be analysed and modifying it to be compatible with the tools, defeating the purpose of automating the process.

After leaving the company, I designed the tools that I felt were the most useful again, but this time, they weren't for a single system. A full description of the tools are described elsewhere.

Language Conversions

I have done language conversions and recommend against it. In the early days of computer programming, we had an acronym, GIGO that means Garbage In, Garbage Out. If languages are similar, conversions can be simple, but one must be aware that converting the language does not guarantee that the intent of the code is converted. I will briefly discuss three conversions including the pros and cons.

Fortran IV to Fortran 77

I had to write a translator to perform this conversion. The translator was straight forward, and was able to convert 95% of the code automatically, and marked the parts that needed manual intervention. The 5% that could not be converted automatically consisted of vendor specific enhancement that should be avoided.

Pascal to C

This was relatively easy since I was able to find a Pascal to C translator in open source that handled between 90% to 95% of the code.

MUMPS to C++

This should never happen. These languages are as far apart as two languages can be. I call my tool for converting from MUMPS to C++ a transformer rather than a translator because that is what it does. I first tried to write a translator, but a translator could not properly handle the intent of the code. The transformer properly handles the intent of the code. I have provided samples of MUMPS and C++ code. The transformer took four years to get to a working prototype, and two years to get from the prototype to a complete application.

(M)UMPS to Caché

If you want to convert from MUMPS to object-oriented technology, convert to Caché. The conversion is painles, and any code that does not translate directly is flagged so it can be modified. Not only the code, but Caché can read MUMPS databases.