Jissa ===== Jissa is a tool for disassembling java classfiles. Ofcourse, there are thousands of java disassemblers out there. What makes jissa so different, or special if you want so, is it's ability to be fully scriptable. This means you have complete control over the disassembly output- while other disassemblers like javap have a static output format, which will only vary between class files, jissa can be teached to put out almost anything depending on the classfile. The primary use for this was to transform java classes into "Isabelle Micro Java" code. That's a special java format consisting out of Isabelle (for Isabelle, see http://isabelle.in.tum.de) definitions. (For proving right java code) However, there should be countless other applications. [ Quickstart ] [ ========== ] [ ] [ make;make install;Pick a .class file of your choice (foo.class). Do ] [ ] [ jissa -f readable foo.class or ] [ jissa -f jasmin foo.class ] Calling ======= Call jissa by issuing jissa [flags] [-f formatfile] [-o outputfile.txt] input.class Well, you guessed- input.class is the classfile you'd like to disassemble, outputfile.txt is where the disassembled output should go. Formatfiles are special though. Before we discuss these, let's take a short glance at the flags which might also be provided: -i includepath Will append includepath to the list of search paths. includepath is a ;-seperated list of directories -v will give you some additional information. Useful especially if your formatfiles can't be found, as it will then list all it's search paths -h print some help Formatfiles =========== Ok, let's describe to the formatfiles. A formatfile is basically some ascii-textfile describing how classes should be disassembled. (Or, to be more precise, how disassembled classes should look like) Consider you have a class called "foo" which has the source code ======cut======= foo.java class foo { public int bar(int a,int b) { return a+b+77; } } ======cut======= If we'd disassemble this using javap, the result would be ======cut======= javap-output.txt Compiled from foo.java class foo extends java.lang.Object { public int bar(int, int); foo(); } Method int bar(int, int) 0 iload_1 1 iload_2 2 iadd 3 bipush 77 5 iadd 6 ireturn Method foo() 0 aload_0 1 invokespecial #3 4 return =======cut======= If we slowly start disassemble this class, now in jissa, a minimal formatfile would be: (don't be scared now: all this sounds a little bit that whatever you do with jissa, you'll have to write a formatfile from scratch. Actually, that's seldom necessary. There are a number of predefined formatfiles, all of which can be adapted to specific situations) =======cut======= format.fmt *** method_head "Method $name\n" =======cut======= Now, by calling jissa -f format.fmt -o foo.j foo.class you get =======cut======= foo.j Method bar Method =======cut======= What happened here? It disassembled all method names, because you told him to by saying so in the formatfile. It also used the format specified in the config file. That's not much, but it's a start. Now let's try to disassemble the opcodes. =======cut======= format.fmt *** method_head "Method $name" op "\n\t" method_tail "\nEnd\n" aload_0 "aload 0" iload_1 "iload 1" iload_2 "iload 2" iadd "iadd" bipush "bipush $arg[0]" ireturn "ireturn" return "return" invokespecial "invokespecial <$arg[0]>" =======cut======= # jissa -f format.fmt -o foo.j foo.class now results in =======cut======= foo.j \begin{verbatim} Method bar iload 1 iload 2 iadd bipush <77> iadd ireturn End Method aload 0 invokespecial <3> return End =======cut======= So we did actually manage to disassemble the function names and opcodes. There's a problem with the formatfile however. As we only gave a name to a small subset of all opcodes, only these opcodes will be diassembled. This might turn up unexpected results if you happen to forget a special opcode which actually appears in your classfiles. Then, the disassembled output will be, strictly speaking, wrong. You might want to include a special file (Whoops- didn't I mention includefiles yet?) to make all undefined opcodes produce an error message. Our formatfile would now look as follows: =======cut======= format.fmt *** include error.inc method_head "Method $name" op "\n\t" method_tail "\nEnd\n" aload_0 "aload 0" iload_1 "iload 1" iload_2 "iload 2" iadd "iadd" bipush "bipush <$arg[0]>" ireturn "ireturn" return "return" invokespecial "invokespecial <$arg[0]>" =======cut======= The include directive should be fairly self-explanatory. error.inc is searched in all directories the main formatfile was. You could now look into error.inc to see exactly why and how this works. If you'd prefer going undefined-opcode-errors to stderr instead of stdout, you'd have to edit error.inc and change all lines like shown here on the nop opcode: (error.inc line 1) nop "Opcode [00] not supported! (nop)" ==> nop ;print stderr "Opcode [00] not supported! (nop)" . It might not strike you clear at all why this is so. I'll try to explain. (The following text will assume that you're familiar with the PERL script processing language) When disassembling, all the classfile "information" will be transformed into a corresponding PERL program. This will then be spiked with the entries of the active formatfile. The PERL program is then called to produce the final output. The statement we just encountered was a small perl program which will be "called" every time the corresponding opcode (nop) is encountered in the classfile. Actually, an expanded statement will called. The above would be blown up to $pos=2;$nr=2;$linenumber="5 "; $op=0x00;$name="nop"; print ;print stderr "Opcode [00] not supported! (nop)" and then executed. The definitions (e.g. $pos=2) define variables which can be used in the print statements for the final output. This is how we got the method names in the above disassemble-method-headers example. In this case, $pos is the byteposition of the opcode, and $nr is the number. $linenumber is a formatted text of, yes, the linenumber. $op and $name are the opcodenumber and name of the current opcode, respectively. In your docfile directory should be a file called vars.txt, which defines all the variables which get set for the opcodes and all other statements. There's also a ps file (flowchart.ps), which, in the style of a finite state automon, describes in which order the mentioned statements might be executed. (like that a method_head will always come before the corresponding method_tail) One more word about formatfiles. You noticed that in the above format.fmt example, there where some asterisks ("*"s) at the start. These asterisks seperate global definitions and local definitions. Simply said, everthing above the asterisks will get stuffed directly into the perl interpreter, while everthing below will only get executed if the corresponding opcode/statement is encountered in the classfile. (the possibility to execute some general perl code before the disassembly action is especially useful for function definions) includefiles may also have global statements, anyway here you'll have to place stars (sorry, asterisks) after and _before_ them. Look at readable.inc to see what I mean. References ========== [Mey97] John Meyer & Troy Downing -- JAVA Virtual Machine. O'Reilly, 1997. [Wal96] Larry Wall, Tom Christiansen & Randal L.Schwartz Programming Perl . O'Reilly 1996 [Nip99] Tobias Nipkow, David von Oheimb & Cornelia Pusch uJava: Embedding a Programming Language in a Theorem Prover. http://isabelle.in.tum.de/Bali/papers/MOD99.html LICENSE ======= Jissa is distributed under the GPL, see file COPYING for details Matthias Kramm