Enterprise JavaScript
Warren Bickley, Senior Product Development Consultant
12 December 20237 minute read
Code accumulates over time, enterprise products amass a considerable volume of it. With that, we frequently find ourselves relying on code which may have been written a long time ago or written in languages that do not align with our strategic goals. As these technologies fall out of favor, the pool of engineers proficient in them diminishes significantly. Bringing in new engineers to work on such code becomes progressively challenging. Consequently, product teams often find themselves constrained to making only minor fixes, fearing the unintended consequences of altering unseen dependencies.
Eventually, a big rebuild project gets undertaken which often discards a lot of value which could be found in this legacy code.
What if you could parse and walk over this code to extract that value and find answers you’re looking for, but in your modern choice of language? What if you could provide tools which enables anyone in your team to easily do this?
Parsing code in an AST format is one of the ways Griffiths Waite demystify legacy transformation projects, and its part of the reason we are adept at these kind of enterprise projects.
AST as represented by Antlr
Abstract Syntax Trees (ASTs) are hierarchical data structures used to represent the structure of a program. They abstract away specific syntax details and focus on the underlying code structure. ASTs are used in various areas of software development such as compilation, code analysis, optimisation, and transformation. They are a fundamental component in compilers, interpreters, and even syntax highlighters.
Producing ASTs can be trivial, for example TypeScript ASTs are easily produced with the TypeScript package available on NPM. Python has an ast module which does the same for Python. You will probably find that most languages have their own AST parser, great!
But what if you want to parse Python AST in TypeScript?…
Enter Antlr4, “ANother Tool for Language Recognition”. Antlr4 is special because it is not quite a parser, but instead a parser generator. You give it a grammar, which is essentially the rules for a language, and you can generate a parser in a language of your choosing (target).
For example you could parse Fortran with PHP code if you wanted. Java in Go. Swift in TypeScript. Whilst the choice of targets is limited, the number of grammars is endless.
Getting started with Antlr4 is not always straightforward, depending on the grammar and target there can be extra steps or fine-tuning required for the generated code to work correctly. The community on GitHub is helpful and welcoming however, so it won’t take too long to work through any difficulties!
Digging into old, unfamiliar code might not sound like the most exciting task. But it's worth it! Parsing your legacy codebase can reveal a wealth of insights and opportunity. It helps you understand the nitty-gritty details, spot potential risks, and even pave the way for modernisation. Here are five clear reasons why we think parsing your legacy codebase is a smart move:
Griffiths Waite believe in leveraging cutting-edge tools and techniques to drive value for our clients. Through tools such as parsers and AST we are able to redefine how code is understood, managed, and improved in a manner which is tailored to a particular problem, challenge, or codebase. Some examples of how we leverage these tools are:
Our teams obsession with automation extends to running automations on code, whether that code be written by us or not. This approach makes it easier for us to demystify and improve legacy codebases, which we believe is key to both remaining competitive and long term success. We continuously work to develop and share any tooling which works toward these goals.
An example of this would be our PL/SQL AST Viewer which parses PL/SQL code within the browser, and returns a visual representation of the AST alongside highlighted and linked code. It is built using an open source PL/SQL parser which is a generated and packaged Antlr4 grammar in TypeScript.
The purpose of the viewer is to make interrogation via the parser incredibly fast. You can see how each token within the code is evaluated and then start to produce listeners based on that much quicker than without the viewer. The viewer is also a great way to introduce new members of the team to ASTs structure and how the underlying parser works.
Once we have that understanding of the AST which makes up our code, we can write our own automations. Such as the TypeScript type generator below.
In the ever-evolving landscape of software development, parsing legacy code is going to become increasingly more common place. Having this capability is a super power for extracting value out of existing long-standing systems and logic whilst enabling technology modernisation by effectively reducing risk.
Enter your email below, and we'll notify you when we publish a new blog or Thought Leadership article.