(filtered by month 'May 2010')

Parsing of external DSL

Added: May 09, 2010

Tags: ANTLR DSL book java

I am slowly getting through Domain Specific Languages by Martin Fowler. I really enjoy it as I expected I will. Apart of the main theme - building DSLs, their parsing and processing - it is full of little gems, making it much more useful than I would imagine before. But I would not expect anything less from Martin Fowler.

I have done a few internal DSLs before, always thinking external DSLs could be more useful for developers "stuck" in Java world, where internal DSLs are rather limited in way they can look like and when they can be processed (compilation). The big plus of this book is in that it shows ANTLR can be useful and is not very complicated if you don't want to build Turing complete language. A point of this exercise is to check how simple is to use it.

I was not able to make up a nice example, but this book provided a help also here. I really liked the example showing internal DSL for access control list. It reminded me something I was working years ago, but Martin solved it in much better way. I don't mean a usage of DSL for configuring ACL, but his semantic model and a way it processes security checks - by usage of Specification pattern. He provides several implementations (Java, Ruby) and I've decided to base the format of my external DSL on his internal DSL in Ruby to serve as example in Java.

I don't want to repeat his work here, so I will show only text I want to parse, the grammar and how to load it. I will skip creation of semantic model, because it would be ripped of the book directly and I don't want to do that, not to mention it would not be useful for my experiment. I don't want to hone my copy&pasting skills here :-) Instead I will only log information parsed from the file.

What I want to parse

As I said he uses following configuration in example, I've just removed ':' forming symbols.

The point is to list rules allowing or disallowing access to some resource based on when a person does it and to which department they belong.

allow {
   department mf
   ends 2008, 10, 18
}
refuse department finance
refuse department audit
allow {
  gradeAtLeast director
  during 1100, 1500
  ends 2008, 5, 1
}
refuse {
  department k9
  gradeAtLeast director
}
allow department k9

Initial grammar that parses the configuration

grammar Acl;

acl	:	(allows | refuses)+;
allows	:	'allow' what;
refuses	:	'refuse' what;

what	:	condition | '{' condition+ '}';

condition :	department | ends | grade_at_least | during;
	

department : 'department' ID;
	
ends	:	'ends' INT ',' INT ',' INT;

grade_at_least : 'gradeAtLeast' ID;
	
during	:	'during' INT ',' INT;

	
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;

INT :	'0'..'9'+;

WS  :   (' ' | '\t' | '\r' | '\n' ) + {skip();};

ILLEGAL	: .;
Token rules ID and INT were generated by ANTLWorks. Token rules WS and ILLEGAL come from the book. ILLEGAL is about parser stopping when no other rule matches. Also the usage of EOF in the top most rule was described in the book. Rest of it is my own creativity. I think it is not very complicated or hard to read for now. It will get more complicated shortly when I will add some clutter.

Initial loader

I would like to write a piece of code that will invoke generated parser and reads the file. Again, this is taken directly from the book, there is not much point to do it differently.
public class AclLoader {
    private Reader input;
    private AclParser parser;

    public AclLoader(Reader input) {
        this.input = input;
    }

    public void load() {
        try {
            AclLexer lexer = new AclLexer(new ANTLRReaderStream(input));
            parser = new AclParser(new CommonTokenStream(lexer));
            parser.acl();
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
}
AclLexer and AclParser were generated by ANTLR from my grammar. It should parse the file, but it does nothing with it yet. Let's add some tests.
public class AclLoaderTest {
    @Test
    public void readsValidFile() throws IOException {
        Reader input = new FileReader("resources/acl.txt");
        AclLoader loader = new AclLoader(input);
        loader.load();
    }

    @Test(expected = RuntimeException.class)
    public void failsForInvalidFile() throws IOException {
        Reader input = new FileReader("resources/acl-bad.txt");
        AclLoader loader = new AclLoader(input);
        loader.load();
    }
}
I am bit confused with the 2nd one - it fails, because invalid file is not refused. Hmm, I've read something about errors, in the book. Let's review it...

Error handling

I forgot to add reporting of errors to my parser. After seeing advantages in the book I am going to add superclass for parser (so that I can add custom code to parser without being afraid ANTLR will delete it next time I will generate the parser. It will be useful later when processing output from parser too. This is taken directly from the book, but simplified for my needs.
public abstract class BaseAclParser extends Parser {
    private List errors = new ArrayList();

    public BaseAclParser(TokenStream input, RecognizerSharedState state) {
        super(input, state);
    }

    public void reportError(RecognitionException e) {
        errors.add(e);
    }

    public boolean hasErrors() {
        return !errors.isEmpty();
    }

    public String getErrorReport() {
        StringBuffer result = new StringBuffer();
        for (Object e : errors) result.append(e).append("\n");
        return result.toString();
    }
}
I have to tell ANTLR to use it. I am going to modify beginning of my grammar file:
grammar Acl;
options {superClass = BaseAclParser;}
...
Finally to check for errors at the end of parsing in AclLoader (again taken from the book).
if (parser.hasErrors()) throw new RuntimeException("Loading failed: "+parser.getErrorReport());
Now both tests are passing, but I do not like error message I am getting. I don't want to spend time with cryptic error messages now, so I will ignore it.

Experiments with processing content

My parser seems to be reading correct configuration file, but it ignores everything it learns there. It is time to process that information. Now it will get a bit messy, but I hope I will make it (really, I have not done it myself yet - let's go back to the book to reread particular chapter so I can pretend I am smart again).

ANTLR allows to invoke custom code and I am going to try simple thing to see if it works as expected. I modify the grammar like follows:

allows	:	'allow' {allowBlock();} what;
That should invoke method allowBlock in BaseAclParser when 'allow' is parsed, but before parsing of conditions starts. Let's add simple implementation of allowBlock:
    protected void allowBlock() {
        System.out.println("allowBlock");
    }
As I said in the beginning I am not going to build semantic model, just use logging instead. If I wanted to do something more serious I would set Context Variable here, or alternatively I could propagate complete information about parsed text from inner rules here, but that would get pretty messy. I do similar thing for 'refuse' block:
refuses	:	'refuse' {refuseBlock();} what;
BaseAclParser:
    protected void refuseBlock() {
        System.out.println("refuseBlock");
    }

Processing conditions

Analogicaly to previous code I modified the grammar to inform parser about parsed content. Any time a rule about condition is successfuly parsed it invokes custom method in BaseAclParser that processes it.
grammar Acl;
options {superClass = BaseAclParser;}

acl	:	(allows | refuses)+ EOF;
allows	:	'allow' {allowBlock();} what;
refuses	:	'refuse' {refuseBlock();} what;

what	:	condition | '{' condition+ '}';


condition : department | ends | grade_at_least | during;
	

department : 'department' depName=ID
                {departmentCondition($depName.text);};
	
ends : 'ends' year=INT ',' month=INT ',' day=INT
          {endCondition($year.text, $month.text, $day.text);};

grade_at_least : 'gradeAtLeast' grade=ID 
                    {gradeAtLeastCondition($grade.text);};
	
during	: 'during' after=INT ',' before=INT 
             {duringCondition($after.text, $before.text);};

	
ID  :	('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;

INT :	'0'..'9'+;

WS  :   (' ' | '\t' | '\r' | '\n' ) + {skip();};

ILLEGAL	: .;
Of course, processing in my case means just logging it:
    protected void departmentCondition(String name) {
        System.out.println("\tdepartmentCondition: " + name);
    }

    protected void endCondition(String year, String month, String day) {
        System.out.println("\tendCondition: "+year + "-" + month + "-" + day);
    }

    protected void gradeAtLeastCondition(String grade) {
        System.out.println("\tgradeAtLeastCondition: " + grade);
    }

    protected void duringCondition(String after, String before) {
        System.out.println("\tduringCondition: <" + after + ", " + before + ">");
    }

What I have got

And as the result I am getting following output:
allowBlock
	departmentCondition: mf
	endCondition: 2008-10-18
refuseBlock
	departmentCondition: finance
refuseBlock
	departmentCondition: audit
allowBlock
	gradeAtLeastCondition: director
	duringCondition: <1100, 1500>
	endCondition: 2008-5-1
refuseBlock
	departmentCondition: k9
	gradeAtLeastCondition: director
allowBlock
	departmentCondition: k9
allowBlock
refuseBlock
	departmentCondition: k9
It looks suspiciously similar to original file :-) but I am happy with outcome. I am going to play with it more, to add building of semantic model and to add more unit tests. It should be easy, because the book contains all code for semantic model, I need just adapt my hook methods. Unfortunately I don't think I can publish it here completely. It is coming from still unpublished book...