gpollice

ANTLR and Ruby—Perfect Together

In Ruby, Tools on January 30, 2012 at 9:33 am

I’ve been playing around with Ruby quite a bit lately. It’s a good language for building tools for my testing course. It’s also a great metaprogramming language. This weekend I found that it’s a pretty good language for building code analyzers and other language tools as well, especially if you have a great language processing tool like ANTLR (ANother Tool for Language Recognition).

I wanted to write a tool that would read in source code for a function (in a simple language that I created) and produce a list of nodes with information about definitions and uses of variables in a textual form that my students would be able to input into their programs for further analysis of control flow and data flow.

I use ANTLR as the tool for building lexers and parsers in my compiler construction course. But I’ve always used it by writing Java and having it generate Java. Certainly my tool could be written in any language since it just had to produce textual output, so writing the ANTLR-based processor in Java was my first thought. Then I thought that it would be fun to learn something new and have some fun and I thought about producing the tool in Ruby.

I knew that ANTLR has an option to produce Ruby and thought I’d take a crack at that. With a little bit of Web searching I found the ANTLR for Ruby project. All I had to do was load a Ruby gem and I was off and running. Well, at least I was off and crawling along.

I spent most of Saturday looking at documentation and understanding what an ANTLR grammar with Ruby actions would look like. I also wanted Ruby because I had the feeling that the plumbing around the resulting lexer and parser would be much simpler to write than it is in Java—and I was right!

I set up my project, initialized a Git repository and starting writing tests for the lexer and parser. This was slow going at first. I was learning and trying to write good code. Most of all though, it was fun.

The documentation on ANTLR for Ruby is pretty minimal. It seems that Kyle Yetter, the author, decided to do something else and there are a lot of missing details missing from the Web page. However, there is more detail and the complete API for antlr3 on the actual project pages. It’s not the best documentation, but it’s enough to get you going. Once I muddled through the basics, I picked up steam and finished the application comfortably on Sunday.

I found that the Ruby grammar file was simple and easy to read. The connections to the parser and lexer, and setting up my own internal structures for capturing nodes, definitions, and uses of variables was almost trivial.

By the end of the weekend I had written the following number of lines (just using wc, so blank lines and comments are include here):

  • The ANTLR grammar file: 239
  • The script to run the processor and produce the output: 36
  • Utility classes for the nodes, def-use entries, and exceptions: 60
  • Unit tests: 285
  • Rake files: 31

So, the total number of lines I actually wrote came to: 651 lines. ANTLR generated 3059 lines of code. I think this is a pretty good output for a weekends hackfest. I’ll post some of the code and the grammar for the language in future posts.

I encourage you to try your hand at something like this. It’s fun and you get to learn a lot. One thing I will say is that if you do this, make sure that you’ve got some really good documentation on the tools you’re going to use available. During the weekend I had both Programming Ruby 1.9 and The Definitive ANTLR Reference open in my PDF reader. I didn’t need the ANTLR book that much. Once you’ve written an ANTLR grammar, it’s pretty straight forward. But I was constantly searching the Ruby book for the right classes and methods. These two, combined with the on-line Ruby documentation made the programming adventure fun and productive.

Advertisements
  1. Hi,

    I tried this today. I would like to be able to add all action routines to a separate ruby module similar to treetop/citrus. Did you manage to achieve this?

    • I have not tried that yet. I’ll probably do this for my compiler class in the Fall. That means I might get to it this summer. I would think that it’s not that hard if you either define the module in a separate file or files and then make sure to require the files in the appropriate header sections of the lexer and parser.

      –Gary

  2. I am using an already existing grammar. But I do not manage to connect the generated parser with the intended processing of the detected nodes. Could you provide your example?

    • Since I can’t seem to be able to post an archive file on WordPress, here’s my grammar file:

      /**
      * Created on Jan. 28, 2012
      *
      * Copyright (C) 2012, Gary Pollice, Worcester Polytechnic Institute, gpollice@cs.wpi.edu.
      *
      * The program is provided under the terms and conditions of the Eclipse Public License Version 1.0
      * (“EPL”). A copy of the EPL is available at http://www.eclipse.org/org/documents/epl-v10.php.
      *
      * Grammar for a def-use analyzer.
      *
      * This is a basic grammar for a simple imperative language. It does not take into account
      * any precedence of operators, etc. Its sole purpose is to do a structural analysis of the
      * textual input and produce data that can be used to create a graph representation of the
      * program and perform some control flow and data flow analyses.
      */
      grammar DefUse;

      options {
      language = Ruby;
      backtrack=true;
      memoize=true;
      }

      @parser::header {
      require ‘du_exception’
      require ‘du_parser_utils’
      }

      @parser::members {
      attr_reader :du_nodes

      def report_error( e = $! )
      @state.error_recovery and return
      @state.syntax_errors += 1
      @state.error_recovery = true
      display_recognition_error( e )
      @parse_had_errors = true
      end

      # Create a new DefUseNode and push it on the du_nodes array
      def next_def_use_node(node_type = DefUseUtil::STATEMENT)
      n = DefUseNode.new(@du_node_number, node_type)
      @du_node_number += 1
      @du_nodes < 0
      }
      ;

      assignment
      : ID ‘=’ expression {
      @current_node.add_entry DefUseEntry.new($ID.line, $ID.text, DefUseUtil::DEF)
      @current_node = next_def_use_node
      }
      ;

      input
      : ‘input’ variable_def_list { @current_node = next_def_use_node }
      ;

      print
      : ‘print’ expr_list {
      # Only write a node if there were uses
      @current_node = next_def_use_node if @current_node.dus.size > 0
      }
      ;

      alternative
      : ‘if’ expression
      {
      @current_node.type = DefUseUtil::BRANCH
      @current_node = next_def_use_node(DefUseUtil::TRUE_TARGET)
      @current_node = next_def_use_node
      }
      block (‘else’
      {
      @current_node.type = DefUseUtil::FALSE_TARGET
      @current_node = next_def_use_node
      }
      block)?
      {
      @current_node.type = DefUseUtil::JUNCTION
      @current_node = next_def_use_node
      }
      ;

      iteration
      : ‘while’ expression
      {
      @current_node.type = DefUseUtil::BRANCH
      @current_node = next_def_use_node(DefUseUtil::TRUE_TARGET)
      @current_node = next_def_use_node
      }
      block
      {
      @current_node.type = DefUseUtil::JUNCTION
      @current_node = next_def_use_node
      }
      ;

      block
      : statement
      | ‘{‘ statement_list? ‘}’
      ;

      expr_list
      : expression (‘,’ expression)*
      ;

      variable_def_list
      : variable_def (‘,’ variable_def)*
      ;

      variable_def
      : ID { @current_node.add_entry DefUseEntry.new($ID.line, $ID.text, DefUseUtil::DEF) }
      ;

      expression
      : operand
      | operand operator expression
      | ‘(‘ expression ‘)’
      | ‘-‘ expression
      | ‘!’ expression
      | function_call
      ;

      function_call
      : ID ‘(‘ expr_list? ‘)’
      ;

      operand
      : variable_use
      | NUMBER
      | ‘true’
      | ‘false’
      ;

      variable_use
      : ID { @current_node.add_entry DefUseEntry.new($ID.line, $ID.text, DefUseUtil::USE) }
      ;

      operator
      : ‘<' | '’ | ‘>=’ | ‘==’ | ‘!=’ | ‘!’ | ‘&&’ | ‘||’ | ‘^’
      | ‘+’ | ‘-‘ | ‘*’ | ‘/’ | ‘%’ | ‘**’
      ;

      //******************************** Lexer Rules ********************************/
      /**
      * Comments begin with the ‘#’ character and continue to the end of the line.
      * The new line is not considered part of the comment.
      */
      COMMENT
      : ‘#’ (~’\n’)* { skip(); }
      ;

      // White space
      WS
      : (‘ ‘ | ‘\t’ | ‘\n’ | ‘\r’) { skip(); }
      ;

      // Reserved words
      DEF : ‘def’ ;
      RW_END : ‘end’ ; // Cannot use END here–Ruby issue

      // Separators
      COMMA : ‘,’ ;
      SEMICOLON : ‘;’ ;

      ID
      : ID_START (ID_REST)*
      ;

      fragment
      ID_START
      : LETTER
      ;

      NUMBER
      : DIGIT+
      ;

      fragment
      ID_REST
      : LETTER | DIGIT
      ;

      fragment
      LETTER
      : (‘a’..’z’|’A’..’Z’|’_’)
      ;

      fragment
      DIGIT
      : (‘0’..’9′)
      ;

  3. thanks Gary,

    I see that you build an abstract syntax tree by adding ruby code in action blocks thereby utilizing a helper framework (du_exception, du_parser_utils).

    This makes the .g – file highly dependent on the target language. I was hoping that there are hooks in the generated parser which allows to mixin the action blocks.

    Thanks again

    Bernhard

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: