Parsing with crate LALRPOP

09/02/2019 — In Parser, Rust

How to write a parser using rust and lalrpop crate?

In this article, we are going to walk through the implementation of parsing a string like this:

This is a {dog, cat}.

And we want to expand this string into the following two strings:

This is a dog.
This is a cat.

As we can see, these strings in the brackets separated by comma (,) are expected to be expanded into different pieces and concatenated separately with the rest of the template as the final string.

And what's more, we would expect the brackets could be nested, like this:

This is a {{blue , yellow , colorful }{apple, banana}, green {dog{, gy}, cat}}.

So this complicated string could be expanded into the following strings according to our rule:

This is a blue apple.
This is a blue banana.
This is a yellow apple.
This is a yellow banana.
This is a colorful apple.
This is a colorful banana.
This is a green dog.
This is a green doggy.
This is a green cat.

How to implement?

These rules work like a little computer programming language. So we can use the common techniques that are used to implement a computer programming language like c, rust, javascript, etc.

We need to parse the string into an Abstract Syntax Tree, then we generate the string results by walking on the Abstract Syntax Tree.

The crate lalrpop is created for this purpose. It can be used to generate a parser by using a rule definition file provided by us.

Here it is:

  • Abstract Syntax Tree definition file: ast.rs
///
/// Expr
/// = ( <Item::Word> | '{' (<Expr>,)* <Expr> '}' )*
///
pub struct Expr(pub Vec<Box<ExprItem>>);
/// ExprItem
pub enum ExprItem {
/// Word
Word(String),
/// Candidates = '{' (<Expr>,)* <Expr> '}'
Candidates(Vec<Box<Expr>>),
}
  • Rule definition file: parser.lalrpop
use std::str::FromStr;
use crate::ast::{Expr, ExprItem};
use lalrpop_util::ErrorRecovery;
grammar<'err>(
errors: &'err mut Vec<
ErrorRecovery<usize, Token<'input>, &'static str>
>
);
pub Expr: Box<Expr> = {
<v: (<ExprItem>)*> => Box::new(Expr(v)),
};
ExprItem: Box<ExprItem> = {
<s:r"[^{},]*"> => Box::new(ExprItem::Word(s.to_string())),
"{" <v:Comma<Expr>> "}" => Box::new(ExprItem::Candidates(v)),
};
Comma<T>: Vec<T> = {
<v:(<T> ",")*> <e:T> => {
let mut v = v;
v.push(e);
v
},
};

Next, we can write a function to expand the Expr structure.

use crate::ast;
use crate::parser;
/// expand a string expression
pub fn expand_string<'input, S: AsRef<str>>(
expression: &'input S,
) -> Result<
Vec<String>,
lalrpop_util::ParseError<usize, parser::Token<'input>, &'static str>
> {
let mut errors = Vec::new();
let expr_parser = parser::ExprParser::new();
let expr_str = expression.as_ref();
let expr = expr_parser.parse(&mut errors, expr_str)?;
let result = expand_expr(&expr);
Ok(result)
}
/// expand an ast::Expr expression
pub fn expand_expr(expr: &ast::Expr) -> Vec<String> {
let mut result_list = Vec::<String>::new();
let mut stack = Vec::new();
expand_expr_impl(expr, 0, &mut stack, &mut |s: &mut Vec<String>| {
result_list.push(s.join(""));
});
result_list
}
fn expand_expr_impl(
expr: &ast::Expr,
idx: usize,
stack: &mut Vec<String>,
callback: &mut dyn FnMut(&mut Vec<String>),
) {
// end of current `expr`
if idx == expr.0.len() {
callback(stack);
return;
}
let item: &ast::ExprItem = &expr.0[idx];
match *item {
ast::ExprItem::Word(ref val) => {
stack.push(val.to_owned());
expand_expr_impl(expr, idx + 1, stack, callback);
stack.pop();
}
ast::ExprItem::Candidates(ref candidates) => {
if candidates.is_empty() {
expand_expr_impl(expr, idx + 1, stack, callback);
} else {
for candidate in candidates {
expand_expr_impl(&candidate, 0, stack, &mut |res| {
// continue to expand next item in `expr`
// if current `candidate` ends
expand_expr_impl(expr, idx + 1, res, callback);
});
}
}
}
}
}

Finally, we test our expand_expr function like this:

fn main() {
let expressions = vec![
"this is a {dog, cat}.",
"this is a {{blue , yellow , colorful }{apple, banana}, green {dog{, gy}, cat}}."
];
for expr_str in expressions.iter() {
println!(">> expanding string: {}", expr_str);
match expand::expand_string(expr_str) {
Err(err) => {
println!("expand failure: {}", err);
}
Ok(ref result_list) => {
for candidate in result_list {
println!("{}", candidate);
}
}
}
}
}

And it will output:

>> expanding string: this is a {dog, cat}.
this is a dog.
this is a cat.
>> expanding string: this is a {{blue , yellow , colorful }{apple, banana}, green {dog{, gy}, cat}}.
this is a blue apple.
this is a blue banana.
this is a yellow apple.
this is a yellow banana.
this is a colorful apple.
this is a colorful banana.
this is a green dog.
this is a green doggy.
this is a green cat.

Done.