What about MUMPS   Home     Content    

Effort to Transform

Size and complexity are the determining factors

Even though tools are used to automate a transformation, the steps involved are the same for the smallest system as for the largest system. I wouldn't even consider transforming a system less than 50K LSS (50K SLOC for MUMPS). I feel that for a smaller system, a software engineer familiar with MUMPS, the target language, and database design can do it better and faster.

I will concede that the analysis tools could help by identifying the interface (implicit parameters, naked references, etc.). The flow analysis would also be helpful.

For systems larger than 50K LSS, the steps remain constant, but the effort per LSS is actually less. The effort per LSS is determined by the total effort divided by total LSS, and multiplied by the complexity.

The drivers for complexity are:

GOTO statements

Goto in any language leads to spaghetti code. Some languages (such as C) do not allow a goto to go out of the current function. In MUMPS, a goto has no bounds. If the command "GOTO label" is used, it remains in the current routine at the target label. However, the command "GOTO label^routine" is used, then execution continues at the target label in the target routine. A good disciplined programmer would never use a "GOTO" not to mention a "boundless GOTO". When I come across a GOTO in any code, my first thought is that the programmer failed to do a design, and needed a GOTO to get him/her self out of a corner.

GOTO statements with indirection

As bad as a GOTO statement is, it becomes worse when the target is hidden inside a variable that cannot be examined in a normal static analysis. Although it is true that I can traverse the tree backwards to determine the probable value for the variable, it is at best an estimate, and not an absolute.

XECUTE statements

Worse still is the use of the XECUTE command. By using this command a programmer can be extremely subversive and elusive.

I once had to work on some MUMPS code that made extreme use of the XECUTE command. The code was so horrible that I thought I was mis-interpreting it. The code was taking data from the global data and concatenating the pieces of data together into a string, then XECUTING the string. So, I examined the data, and found it to be code. When I concatenated the strings of data to build the command that was being executed, I found that it was repeating the same process and executing another concatenated string.

I followed the logic (or lack thereof) for three iterations before going to one of the local MUMPS experts. Although I was horrified, he informed me that I was seeing the power of MUMPS. I thought that self-modifying code was gone when we came out of the dark ages.

Post conditions

Although post conditional statements are not necessarily bad, they do tend to add to the complexity. I will normally assign a post condition (it is really a pre-condition since it is tested before the command is executed) the same weight as an if statement. It really is an if statement that affects a single command, or as in the DO and GOTO, it can affect a single operand of the command.

$TEXT statements

The $TEXT statement is used to mix data with code, or code with data, depending on your view. I wonder why, all the practices that I learned were bad, seem to be found in MUMPS code. When I ask a MUMPS programmer about it, the standard answer is that "it is the power of MUMPS".

$TEST statements

This system variable is the leading side effect in a MUMPS program. This is what enables the ELSE to become a maybe. It also allows the ELSE to be used without an IF. The ELSE is not really an else, it is actually "if $TEST is not true then".

If a programmer is not aware (and some aren't), it is possible to have code that works most of the time.

ELSE statements

This is the MUMPS maybe described in $TEST. An else statement has a higher complexity value than an IF, since it always requires manual intervention.

Nesting

Current versions of the MUMPS interpreter have added nesting capability to the MUMPS language in an attempt to catch up with the modern structured languages. However, in MUMPS, the nesting is implemented as a stack operation. Inside the nest, the QUIT that normally acts as a return statement now acts as a break statement. In a nested expression, the QUIT will break from an IF the same as it would from a FOR loop. I have tested the transformed code in this area, and have not found any reason to require manual intervention, but it still adds to the complexity.

Naked references

MUMPS has more ways for programmer to be creative and hide their intent than any language I have ever worked with. A naked reference refers to the last global referenced.

Indirect references

Although a useful concept, indirection is extremely difficult to process using static analysis. It not only adds to the complexity during the transformation, but always requires manual intervention, to examine the generated code as a minimum.

HALT

A HALT in MUMPS is really a halt. It is transformed into a C++ exit command. The code is marked for manual intervention.

BREAK

The BREAK command is used during debugging. I use the equivalent to transform the BREAK. It is also flagged for manual intervention. When I see a break command in a delivered system, it makes me wonder if it is there because that path has not been debugged.

Side effects

Many MUMPS programmers rely on side effects. This is a dangerous practice, since most maintenance programmers do not have extensive experience with the side effects of MUMPS. I will always subscribe to good clean structured code that is easily understood and maintained, and anything that deviates is wrong.