If you want to parse a java class, it is a clever idea to fetch a java grammar and let JavaCC do the work for you. However, I didn't find a grammar for Java 7, so I decided to write my own parser. This parser can also read Groovy source code and most of the Java 8 source code (apart from defender methods).
Another reason to write the parser was to show how simple file and text manipulations can be. Groovy is optimized for programming these tasks (at the cost of being a little slower). The parser consists of merely 263 lines, including a couple of comments and blank lines. I guess the Java version is a lot more verbose.
License
Feel free to use the source code if you need it. Please note that I provide it to you on a "as-is" basis: if you use the source code, you use it at your own risk. You can download it here.
The regular expressions
The most important file of the project contains the regular expressions I use to extract classes, variable, methods, assignments and annotations:
package de.beyondjava.VariableParser import java.util.regex.Pattern; class RegularExpressions { final static Pattern SPACE= ~ /(\s)*/ final static Pattern IDENTIFIER = ~ /(\w|_|\$)+/ final static Pattern MODIFIERS = ~ /((\s)*(final|volatile|transient|static|public|private|protected)\s)*/ final static Pattern PARAMETERS = ~ /\((\w|_|,|\$|\s|=)*\)/ final static Pattern ANNOTATION = ~ /@($IDENTIFIER)($SPACE($PARAMETERS))?$SPACE/ final static Pattern ANNOTATIONS = ~ /($ANNOTATION)+/ final static Pattern ASSIGNMENT = ~ /(?:(\s)*=(?:[^;])*;(\s)*)/ final static Pattern VARIABLE_OR_METHOD_REGEXP = ~ /($ANNOTATIONS)?($MODIFIERS)$SPACE($IDENTIFIER)$SPACE($IDENTIFIER)($SPACE)?($PARAMETERS|$ASSIGNMENT)?/ static final Pattern NON_EMPTY_SPACE= ~ /(\s)+/ static final Pattern PACKAGEIDENTIFIER = ~ /(\w|_|\$|\.)+/ static final Pattern packageRegExp = ~ /\b(package)$NON_EMPTY_SPACE($PACKAGEIDENTIFIER)(;|\b)/ static final Pattern CLASS_REGEXP = ~ /($ANNOTATIONS)?($MODIFIERS)(class)$NON_EMPTY_SPACE($IDENTIFIER)/ static final Pattern STRING_REGEXP = ~ /"(.)*"/ // see http://ostermiller.org/findcomment.html static final String COMMENT_REGEXP = "(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)" static final String JAVA_BLOCK_REGEXP = "(?:(\\s)*\\{(?:[^\\}])*\\}(\\s)*)" }The main class
package de.beyondjava.VariableParser import java.util.regex.Matcher import java.util.regex.Pattern import static RegularExpressions.* /** * This is a simple class parser that reads every java or groovy class in a folder * and returns a list of class descriptions. */ class SimpleClassParser { /** Parse a file or a folder recursively. */ public ListThe class definition
This class is the definition of a single class.
package de.beyondjava.VariableParser; import java.util.List; import java.util.regex.Matcher import java.util.regex.Pattern; import static RegularExpressions.* /** * Description of a class, including a list of its variables and methods (but ignoring the parameter lists of the methods) */ public class ClassDefinition { String packageName; String className; Listpackage de.beyondjava.VariableParser; import static RegularExpressions.* import java.util.List; import java.util.regex.Matcher /** * This is the definition of a method (without the implementation details and without the parameter list). */ public class MethodDefinition { List
The description of a variable
... is almost identical to the description of a method:
package de.beyondjava.VariableParser; import java.util.regex.Matcher import java.util.regex.Pattern; import static RegularExpressions.* /** * This a the definition of a variable. */ class VariableDefinition { ListThe JUnit test and example classes
package de.beyondjava.VariableParser; import java.util.List; import de.beyondjava.Beans.Address import groovy.util.GroovyTestCase; class VariableParserTest extends GroovyTestCase { public void testParser() { SimpleClassParser parser = new SimpleClassParser() ListAlternatives
Of course, there are more professional class parsers out there. For instance, you can use the class parser and AST generator of Eclipse. Two small frameworks making Eclipse JDT easily accessible are
- Spoon (under CeCILL-C license - the French equivalent to LGPL)
- Jexast (also see their GitHub repository).