Home
Content
Introduction
I have been building software systems for many years, and find
reverse engineering tools very useful when I have to add functionality
or move a system to a different platform. I will describe how I use
the tools I developed to reverse engineer MUMPS code here, but there
are equivalent tools for reverse engineering other languages.
Routine Unpack Report
The routine unpack report contains information as to the size of
each routine(number of lines, number of characters), the number
of routines (files) extracted, and whether the file was properly
identified (i.e. if it could not be properly identified, it is
assigned a pseudo name).
Duplicate Routine Report
The duplicate routine report is a list of the files that are not
properly identified, and should be checked against the identified
files to determine if they are duplicate files that are given
temporary identity to preserve the code while the properly identified
routine is being modified. Sometimes the temporary files are kept under
the temporary identity.
Excluded File Report
An attempt is made to find the properly identified file that was
modified from the temporary file, and if found, the temporary file
name is moved to the excluded file report. If a matching file cannot
be found, the pseudo name is retained. This can occur if the routine
name begins with a percent (%) character.
Syntax Error Report
After the duplicate files are removed, the files are parsed using the
CheckError script that will generate error lines for any exceptions.
Not all exceptions are serious errors, in fact, most interpreters will
ignore many of the errors. The errors my tools ignore are those
exceptions that are generated because the syntax does not conform
exactly to the standard. Serious errors are passed to the tools, but
are not processed.
Corrected Errors Report
If the number of serious errors are not excessive, I can correct them.
This is not easy since I only have the code to go by, and do not know
the intent of the programmer. This report records the changes I made
to the code so that I may use as much code as possible in the analysis.
When an error is encountered, the error through the end of the line are
included as part of the error since it is difficult to skip past the
error to continue processing on the same line.
LSS Report (A.K.A. Metric Report)
This report provides some information as to the actual size of each routine,
and insight as to its complexity. Following is a sample metric for FXAKICK:
FXAKICK S F I E Q D G J H X N K L U O R W C o stl x c cmd
32 3 12 0 4 1 0 0 0 0 1 0 0 0 0 0 0 0 17 53 99 126 80
0 5 0 0 12 11 0 0 0 0 0 0 0 0 0 0 0 0 0 28
2 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12
FXAKICK has 156 lines, 13 labels, 126 comments, 81 LSS in 46 source lines, nesting = 5
File: mumps/FXAKICK.m contains 1 routine
In this routine, there were 11 argument-less DO statements, some of them
were nested within other argument-less DO statements causing the nesting to
reach a depth of 5. In the metric report, nesting is really the nesting
level, where if nesting = 1, then there would be no argument-less DO
statements. So for FXAKICK to have a nesting level of 5, there would have
been 4 argument-less DO statements as follows:
IF expression1 DO
.IF expression2 DO
..FOR Expression3 DO
...IF Expression4 DO
....this is nesting level 5
The number of paths affect the complexity of a routine. This routine has
0 GOTO statements, and that is good. Out of 156 lines, 46 lines contained
source code, and 126 lines had comments. Although I can read some Japanese,
I don't have the Japanese font set installed on my system, so I don't have
access to the comments to determine their value. Actually, the value of
comments is subjective, so I could not report on them, other than that
there were or were not comments, or how many.
Data Usage Report
This report shows how data is used by a routine (it is also broken down
by label). Following is a sample of the data usage for routine FXAKICK:
Routine: FXAKICK
Local:
name UuSMNnKkdFprPR@
%CSFG Uu---n------P--
%NPTN Uu---n------P--
AKISU U-S------------
CASETP U-S------------
CSCNT U-S------------
GOKI U-S------------
HANSOSU UuS------------
I ---------F-----
MBSYO3 U-S------------
TCASEID U-S------------
TCRAN2 U-S------------
TMP U-S------------
TXT U-S-----------@
ZCNT U-S------------
ZONEREC U-S------------
ZONETP U-S------F-----
ZTP U-S------------
Global:
name UuSMNnKkdFprPR@
NNISMV Uu-------------
In this report, three local variables have been used before they
are set. In fact, two variables are only used, and not set. They
are input parameters, and they were excluded from an NEW statement.
However, the third variable HANSOSU is used before it is set.
HANSOSU may be an implicit parameter.
Another variable TXT contained the target of an indirect reference.
Variables I and ZONETP were the variable of FOR statements. I appears
to have been used as a simple counter, while ZONETP appears to have been
used as an index for referencing data as well.
Text Control Flow
The control flow report is useful to a software engineer because it
presents the flow of the system. My reports have additional information
such as how a label may have been reached. Also, the second column has the
depth in the control flow. In the following sample, routine ^FXAKIK is
the label of the routine and the [S] indicates that it is an in-line label.
The depth 0 indicates that this routine has not been referenced, meaning
that either, I was not provided with the complete code for the system, or
this is an entry point that an operator or programmer might invoke from
the command line. It is also possible that it may be invoked using
indirection.
If I were going to work on this system other than using my tools to
generate the reports, I would have to write a script that would search
out all the variables used with indirection and determine the possible
values they might contain.
Just as indicated by the data usage report above, the two variables
show up as explicit parameters.
107 0 ^FXAKICK [S] XP( %NPTN %CSFG )
108 1 START^FXAKICK [R]
Upon examining the code, I found that ^FXAKICK used to be referenced by
STRN050 and SUPN020 as an extrinsic function call, but both references
have been commented out, so now it appears as though FXAKICK is dead code
that is never invoked, but there are no guarantees that it is still not
being invoked using indirection.