How Jazillian Translates C To Java Code

Jazillian takes the source files that you provide and applies a series of transformation rules to convert various C constructs and patterns into their Java equivalents, generating “natural” code. For more information, see How Jazillian Translates Legacy Code to Java Code.

Example 1: "Hello, World"

The classic C program:

main(char *argv[], int argc) {
	printf("Hello, world!\n");
}
Becomes the classic Java program:
public class Hello {
	public static void main(String[] args) {
		System.out.println("Hello, world!");
	}
}
In this case, Jazillian applied the following "rules":
  1. Assuming the file was called "hello.c", The Java file "Hello.java" was created.
  2. The signature of the main() function was changed.
  3. The printf() call was changed to a System.out.println() call.
  4. The function was enclosed inside a new "Hello" class.
The key to Jazillian is that it produces not just correct, but reasonable Java code. Each of the three "rules" applied here illustrates this.

Jazillian makes reasonable assumptions about what programmers do.

You can't tell from this example, but Jazillian translates each and every C main() function into the "real" Java main(). Though this is not technically always correct (for example, you may have a function that you happened to call "main" that takes just a single "int" argument). However, in the real world, a "main()" function is almost always meant to be the function that's invoked on running the application.

Jazillian takes steps to generate realistic code.

You may have noticed that the "printf rule" did not produce:
System.out.print("Hello, world!\n");
That would have been perfectly valid, working Java code. However, it is not "real" Java code. A real Java programmer would not have embedded the (platform-specific) newline in the text, and instead would write:
System.out.println("Hello, world!");
So the "printf rule" was smart enough to see that the last argument ended with a newline character, and so it should remove the newline character and use "println" instead of "print".

Jazillian makes reasonable guesses.

How did Jazillian know to call the class "Hello"? It took the name of the file that contained the text, in this case "hello.c", converted it to the Java class naming convention ("Hello"), and Jazillian knows that each Java file name must match the class that it contains. "Hello" may not be the perfect name for this class, but it's as good a guess as a machine can do.

So even with the simple classic 3-line "Hello world" program, each of the rules that were applied used various "real-world programming" heuristics into account to produce "real-world" Java code. It was not enough to produce correct, working code. In fact, it will not always produce correct, working code (for example, you may have multiple main() functions, or had two "hello.c" files). But the code looks like it was written by a good Java programmer, not by a machine. This philosophy of "use heuristics to produce realistic code" is inherent in the design of Jazillian and most of its rules.

Example 2: strcpy() vs. String Assignment

The "Hello, World" example is pretty trivial, so here are a couple of better examples. The first example has to do with Strings. In C, programmers tend to copy strings a lot with strcpy(), while Java programmers very rarely make a copy of a String. That's because Java String objects are immutable: they can't be changed. So most Java programs tend to have many different String variables all pointing to the same memory location, because none of them can change the characters that are stored there.

So one of the Jazillian rules translates a "strcpy() call":

char *hello = "hello";
char *greeting  = strcpy(hello);
...to just a Java assignment with causes two String variables to point to the same place:
String hello = "hello";
String greeting = hello;
Obviously, having two locations in memory, each with the characters "hello" in them, is different than having two pointers pointing to a single memory location containing the characters "hello". I won't go into the details, but you can surely see how the behavior of the Java code might differ from the C code. But the good news is that the behavior may not differ. In fact, Java programmers are often pleasantly surprised at how rarely they really need to make multiple copies of Strings. I dare say, it's almost never. And there's more good news: surely some other rule must do something when it encounters the C code like:
greeting[4] = '\0';
In fact, in this case, some other rule will set 'greeting' to some new String value, indeed without changing the value of the 'hello' variable. But your mileage may vary.

The point here is that Java programmers just don't go around copying Strings all the time. Producing Java code that does this would simple be producing "C code with Java syntax", and would nullify some of the benefits of moving to Java in the first place (such as not having to keep calling strcpy() and not wasting memory and CPU time making lots of copies of Strings).

Example 3: Error Handling vs. Exception Handling

Another example of cases where not-perfectly-working code could be generated has to do with error handling. Many (maybe even most) C projects follow a convention of having every function return a flag to indicate whether the function actually worked or not. Java programmers do not follow this convention, and instead use the Exception feature for handling errors. So this C code:

struct person p = malloc(sizeof struct person);
if (!p) {
	printf("malloc failed");
}
...becomes this Java code...
try {
	Person p = new Person();
}
catch (OutOfMemoryException e) {
	System.out.println("malloc failed");
}
Many C library functions return an error flag ("errno"), and your C code checks for those errors. One of the transformation rules knows all the standard C functions and the error values they return (for example, fopen returns 0 on error, while open returns -1). This rule will detect that you are checking for these error conditions and place your error-recovery code in a catch block. This is all well and good, no problems so far.

If you're not a Java programmer yet, brace yourself: Java programmers almost never bother to catch OutOfMemoryException. Horror of horrors! "We were always taught to always check the return value of every function call!" you're yelling. Well, I'm sorry to break it to you, old timer, but these lazy kids today just don't bother to clutter their code with all those checks. You can like it or not, but that's the reality.

So the Jazillian rule that replaces error-checking codes with exception handling in fact will just discard code that handles a malloc() failure. I'm sorry, but that's what real Java programmers do, and that's one of the many little reasons why Java code is a lot more readable and maintainable.

Copyright 2003-2007 Jazillian, Inc. Java is a registered trademark of Sun Microsystems, Inc. in the U.S. or other countries.