Metadata-Version: 2.4
Name: grammarinator
Version: 23.7.post201+ga34191ac3
Summary: Grammarinator: Grammar-based Random Test Generator
Home-page: https://github.com/renatahodovan/grammarinator
Author: Renata Hodovan, Akos Kiss
Author-email: hodovan@inf.u-szeged.hu, akiss@inf.u-szeged.hu
License-Expression: BSD-3-Clause
Platform: any
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Software Development :: Testing
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/x-rst
License-File: LICENSE.rst
Requires-Dist: antlerinator>=1!3.0.0
Requires-Dist: antlr4-python3-runtime==4.13.2
Requires-Dist: autopep8
Requires-Dist: enlighten
Requires-Dist: flatbuffers==24.12.23
Requires-Dist: inators
Requires-Dist: jinja2
Requires-Dist: regex
Requires-Dist: xxhash
Dynamic: license-file
Dynamic: license-expression

=============
Grammarinator
=============
*ANTLRv4 grammar-based test generator*

.. image:: https://img.shields.io/pypi/v/grammarinator?logo=python&logoColor=white
   :target: https://pypi.org/project/grammarinator/
.. image:: https://img.shields.io/pypi/l/grammarinator?logo=open-source-initiative&logoColor=white
   :target: https://pypi.org/project/grammarinator/
.. image:: https://img.shields.io/github/actions/workflow/status/renatahodovan/grammarinator/main.yml?branch=master&logo=github&logoColor=white
   :target: https://github.com/renatahodovan/grammarinator/actions
.. image:: https://img.shields.io/coveralls/github/renatahodovan/grammarinator/master?logo=coveralls&logoColor=white
   :target: https://coveralls.io/github/renatahodovan/grammarinator
.. image:: https://img.shields.io/readthedocs/grammarinator?logo=read-the-docs&logoColor=white
   :target: http://grammarinator.readthedocs.io/en/latest/

.. start included documentation

*Grammarinator* is a random test generator / fuzzer that creates test cases
according to an input ANTLR_ v4 grammar. The motivation behind this
grammar-based approach is to leverage the large variety of publicly
available `ANTLR v4 grammars`_. It includes both a Python-based and a
high-performance C++ backend for generation.

The `trophy page`_ of the found issues is available from the wiki.

.. _ANTLR: http://www.antlr.org
.. _`ANTLR v4 grammars`: https://github.com/antlr/grammars-v4
.. _`trophy page`: https://github.com/renatahodovan/grammarinator/wiki


Requirements
============

* Python_ >= 3.9
* Java_ SE >= 11 JRE or JDK (the latter is optional)

Additionally, for the C++ backend:

* C++20 compiler (e.g., GCC >= 11.0, Clang >= 13.0, MSVC >= 2019)
* CMake_ >= 3.10

.. _Python: https://www.python.org
.. _Java: https://www.oracle.com/java/
.. _CMake: https://cmake.org


Install
=======

To use *Grammarinator* in another project, it can be added to ``setup.cfg`` as
an install requirement (if using setuptools_ with declarative config):

.. code-block:: ini

    [options]
    install_requires =
        grammarinator

To install *Grammarinator* manually, e.g., into a virtual environment, use
pip_::

    pip install grammarinator

The above approaches install the latest release of *Grammarinator* from PyPI_.
Alternatively, for the development version, clone the project and perform a
local install::

    pip install .

.. _setuptools: https://github.com/pypa/setuptools
.. _pip: https://pip.pypa.io
.. _PyPI: https://pypi.org/


Usage
=====

As a first step, *Grammarinator* takes an `ANTLR v4 grammar`_ and creates a
test generator script in Python3 or in C++. Grammarinator supports a subset
of the features of the ANTLR grammar which is introduced in the Grammar
overview section of the documentation. The produced generator can be subclassed
later to customize it further if needed.

Basic command-line syntax of test generator creation (Python or C++)::

    grammarinator-process <grammar-file(s)> -o <output-directory> --no-actions [--language hpp]

..

    **Notes**

    *Grammarinator* uses the `ANTLR v4 grammar`_ format as its input, which
    makes existing grammars (lexer and parser rules) easily reusable. However,
    because of the inherently different goals of a fuzzer and a parser, inlined
    code (actions and conditions, header and members blocks) are most probably
    not reusable, or even preventing proper execution. For first experiments
    with existing grammar files, ``grammarinator-process`` supports the
    command-line option ``--no-actions``, which skips all such code blocks
    during fuzzer generation. Once inlined code is tuned for fuzzing, that
    option may be omitted.

.. _`ANTLR v4 grammar`: https://github.com/antlr/grammars-v4

Python-based Test Generation
----------------------------

After having generated and optionally customized a fuzzer, it can be executed
by the ``grammarinator-generate`` script (or by manually instantiating it in a
custom-written driver, of course).

Basic command-line syntax of ``grammarinator-generate``::

    grammarinator-generate <generator> \
      -r <start-rule> -d <max-depth> \
      -o <output-pattern> -n <number-of-tests> \
      -t <transformer1> -t <transformer2>

C++-based Test Generation
-------------------------

After generating the C++-based fuzzer using ``grammarinator-process`` with the
``--language hpp`` flag, it needs to be built::

    python3 grammarinator-cxx/dev/build.py --clean \
        --generator <generator> \
        --includedir <include-dir> \
        --tools

Once built, the standalone generator can be run as follows::

    grammarinator-cxx/build/Release/bin/grammarinator-generate-<name> \
        -r <start-rule> -d <max-depth> \
        -o <output-pattern> -n <number-of-tests>

Note: The C++ backend can also be used as a custom mutator with libFuzzer.
Details about this are provided in the *LibFuzzer Integration* section of
the documentation.


Evolutionary Generation
=======================

Beside generating test cases from scratch based on the ANTLR grammar,
Grammarinator is also able to recombine existing inputs or mutate only a small
portion of them. To use these additional generation approaches, a population of
selected test cases has to be prepared. The preparation happens with the
``grammarinator-parse`` tool, which processes the input files with an ANTLR
grammar (possibly with the same one as the generator grammar) and builds
grammarinator tree representations from them (with ``.grt*`` extension). These
files encode the full derivation tree of the input, and can be reused across
different fuzzing strategies.

Basic command line syntax of ``grammarinator-parse``::

  grammarinator-parse -g <grammar-file(s)> -r <start-rule> \
    -o <output-directory> <input_file(s)>

Having a population of such ``.grt*`` files, ``grammarinator-generate`` or
``grammarinator-generate-<name>`` can make use of them with the
``--population`` CLI option. If the ``--population`` option is set (for the
Python or C++ generator), then *Grammarinator* will choose a strategy
(generation, mutation, or recombination) randomly for each new test case.
If any of the strategies is unwanted, they can be disabled with the
``--no-generate``, ``--no-mutate``, or ``--no-recombine`` options.

..

    **Notes**

    Real-life grammars often use recursive rules to express certain patterns.
    However, when using such rule(s) for generation, we can easily end up in an
    unexpectedly deep call stack. With the ``--max-depth`` or ``-d`` options,
    this depth - and also the size of the generated test cases - can be
    controlled.

    Another specialty of the ANTLR grammars is that they support so-called
    hidden tokens. These rules typically describe such elements of the target
    language that can be placed basically anywhere without breaking the syntax.
    The most common examples are comments or whitespaces. However, when using
    these grammars - which don't define explicitly where whitespace may or may
    not appear in rules - to generate test cases, we have to insert the missing
    spaces manually. This can be done by applying a serializer (with the ``-s``
    option) to the tree representation of the output tests. A simple serializer
    - that inserts a space after every unparser rule - is provided by
    *Grammarinator* (``grammarinator.runtime.simple_space_serializer``).

    In some cases, we may want to postprocess the output tree itself (without
    serializing it). For example, to enforce some logic that cannot be
    expressed by a context-free grammar. For this purpose the transformer
    mechanism can be used (with the ``-t`` option). Similarly to the
    serializers, it will take a tree as input, but instead of creating a string
    representation, it is expected to return the modified (transformed) tree
    object.

    As a final thought, one must not forget that the original purpose of
    grammars is the syntax-wise validation of various inputs. As a consequence,
    these grammars encode syntactic expectations only and not semantic rules.
    If we still want to add semantic knowledge into the generated test, then we
    can inherit custom fuzzers from the generated ones and redefine methods
    corresponding to lexer or parser rules in ways that encode the required
    knowledge (e.g.: HTMLCustomGenerator_).

.. _HTMLCustomGenerator: examples/fuzzer/HTMLCustomGenerator.py


Working Example
===============

The repository contains a minimal example_ to generate HTML files. To give it
a try, run the processor first, then use the generator to produce test cases.

With the Python backend::

    grammarinator-process examples/grammars/HTMLLexer.g4 examples/grammars/HTMLParser.g4 \
      -o examples/fuzzer/

    grammarinator-generate HTMLCustomGenerator.HTMLCustomGenerator \
      -r htmlDocument -d 20 \
      -o examples/tests/test_%d.html -n 100 \
      -s HTMLGenerator.html_space_serializer \
      --sys-path examples/fuzzer/

With the C++ backend::

    grammarinator-process examples/grammars/HTMLLexer.g4 examples/grammars/HTMLParser.g4 \
      -o examples/fuzzer/ --no-actions --language hpp

    python3 grammarinator-cxx/dev/build.py --clean \
        --generator HTMLGenerator \
        --serializer HTMLSpaceSerializer \
        --include HTMLConfig.hpp \
        --includedir examples/fuzzer/ \
        --tools

    grammarinator-cxx/build/Release/bin/grammarinator-generate-html \
        -r htmlDocument -d 20 \
        -o examples/tests/test_%d.html -n 100

.. _example: examples/


Compatibility
=============

*Grammarinator* was tested on:

* Linux (Ubuntu 16.04 ... 24.04)
* OS X / macOS (10.12 ... 15.5)
* Windows (Server 2012 R2 / Server version 1809 / Windows 10 / Windows Server 2022)


Citations
=========

Background on *Grammarinator* is published in:

* Renata Hodovan, Akos Kiss, and Tibor Gyimothy. Grammarinator: A Grammar-Based
  Open Source Fuzzer.
  In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating
  Test Case Design, Selection, and Evaluation (A-TEST 2018), pages 45-48, Lake
  Buena Vista, Florida, USA, November 2018. ACM.
  https://doi.org/10.1145/3278186.3278193
* Renata Hodovan, Akos Kiss. Grammarinator Meets LibFuzzer: A Structure-Aware
  In-Process Approach.
  In Proceedings of the 20th International Conference on Software Technologies
  (ICSOFT 2025), pages 178-189, Bilbao, Spain, June 2025. SciTePress.
  Best paper award.
  https://doi.org/10.5220/0013571500003964

.. end included documentation

Copyright and Licensing
=======================

Licensed under the BSD 3-Clause License_.

.. _License: LICENSE.rst
