Home     Content    

Introduction

I have been building software systems for many years, and find reverse engineering tools very useful when I have to add functionality or move a system to a different platform. I will describe how I use the tools I developed to reverse engineer MUMPS code here, but there are equivalent tools for reverse engineering other languages.

Routine Unpack Report

The routine unpack report contains information as to the size of each routine(number of lines, number of characters), the number of routines (files) extracted, and whether the file was properly identified (i.e. if it could not be properly identified, it is assigned a pseudo name).

Duplicate Routine Report

The duplicate routine report is a list of the files that are not properly identified, and should be checked against the identified files to determine if they are duplicate files that are given temporary identity to preserve the code while the properly identified routine is being modified. Sometimes the temporary files are kept under the temporary identity.

Excluded File Report

An attempt is made to find the properly identified file that was modified from the temporary file, and if found, the temporary file name is moved to the excluded file report. If a matching file cannot be found, the pseudo name is retained. This can occur if the routine name begins with a percent (%) character.

Syntax Error Report

After the duplicate files are removed, the files are parsed using the CheckError script that will generate error lines for any exceptions. Not all exceptions are serious errors, in fact, most interpreters will ignore many of the errors. The errors my tools ignore are those exceptions that are generated because the syntax does not conform exactly to the standard. Serious errors are passed to the tools, but are not processed.

Corrected Errors Report

If the number of serious errors are not excessive, I can correct them. This is not easy since I only have the code to go by, and do not know the intent of the programmer. This report records the changes I made to the code so that I may use as much code as possible in the analysis. When an error is encountered, the error through the end of the line are included as part of the error since it is difficult to skip past the error to continue processing on the same line.

LSS Report (A.K.A. Metric Report)

This report provides some information as to the actual size of each routine, and insight as to its complexity. Following is a sample metric for FXAKICK:
 FXAKICK  S  F  I  E  Q  D  G  J  H  X  N  K  L  U  O  R   W  C  o stl   x  c  cmd
         32  3 12  0  4  1  0  0  0  0  1  0  0  0  0  0   0  0 17  53  99 126  80 
          0  5  0  0 12 11  0  0  0  0  0  0  0  0  0  0   0  0  0  28
          2  0  0  0 10  0  0  0  0  0  0  0  0  0  0  0   0  0  0  12
FXAKICK has 156 lines, 13 labels,  126 comments, 81 LSS in 46 source lines, nesting = 5
File: mumps/FXAKICK.m contains 1 routine
In this routine, there were 11 argument-less DO statements, some of them were nested within other argument-less DO statements causing the nesting to reach a depth of 5. In the metric report, nesting is really the nesting level, where if nesting = 1, then there would be no argument-less DO statements. So for FXAKICK to have a nesting level of 5, there would have been 4 argument-less DO statements as follows:
  IF expression1 DO
  .IF expression2 DO
  ..FOR Expression3 DO
  ...IF Expression4 DO
  ....this is nesting level 5
The number of paths affect the complexity of a routine. This routine has 0 GOTO statements, and that is good. Out of 156 lines, 46 lines contained source code, and 126 lines had comments. Although I can read some Japanese, I don't have the Japanese font set installed on my system, so I don't have access to the comments to determine their value. Actually, the value of comments is subjective, so I could not report on them, other than that there were or were not comments, or how many.

Data Usage Report

This report shows how data is used by a routine (it is also broken down by label). Following is a sample of the data usage for routine FXAKICK:
Routine: FXAKICK

  Local:
          name UuSMNnKkdFprPR@
         %CSFG Uu---n------P--
         %NPTN Uu---n------P--
         AKISU U-S------------
        CASETP U-S------------
         CSCNT U-S------------
          GOKI U-S------------
       HANSOSU UuS------------
             I ---------F-----
        MBSYO3 U-S------------
       TCASEID U-S------------
        TCRAN2 U-S------------
           TMP U-S------------
           TXT U-S-----------@
          ZCNT U-S------------
       ZONEREC U-S------------
        ZONETP U-S------F-----
           ZTP U-S------------

  Global:
          name UuSMNnKkdFprPR@
        NNISMV Uu-------------
In this report, three local variables have been used before they are set. In fact, two variables are only used, and not set. They are input parameters, and they were excluded from an NEW statement. However, the third variable HANSOSU is used before it is set. HANSOSU may be an implicit parameter. Another variable TXT contained the target of an indirect reference. Variables I and ZONETP were the variable of FOR statements. I appears to have been used as a simple counter, while ZONETP appears to have been used as an index for referencing data as well.

Text Control Flow

The control flow report is useful to a software engineer because it presents the flow of the system. My reports have additional information such as how a label may have been reached. Also, the second column has the depth in the control flow. In the following sample, routine ^FXAKIK is the label of the routine and the [S] indicates that it is an in-line label. The depth 0 indicates that this routine has not been referenced, meaning that either, I was not provided with the complete code for the system, or this is an entry point that an operator or programmer might invoke from the command line. It is also possible that it may be invoked using indirection. If I were going to work on this system other than using my tools to generate the reports, I would have to write a script that would search out all the variables used with indirection and determine the possible values they might contain. Just as indicated by the data usage report above, the two variables show up as explicit parameters.
   107    0  ^FXAKICK [S] XP( %NPTN %CSFG )
   108    1   START^FXAKICK [R]
Upon examining the code, I found that ^FXAKICK used to be referenced by STRN050 and SUPN020 as an extrinsic function call, but both references have been commented out, so now it appears as though FXAKICK is dead code that is never invoked, but there are no guarantees that it is still not being invoked using indirection.