Would be a lot better if it came with tests. Please do this justice and dont let it rot as a gist, make a real repo and add some docs and at least smoke tests or some kind. Thanks
This gist is a concatenation of several shell script modules which form a comprehensive parser library for the portable shell.
The main parser and emitter are BFN-generated (that's why they look so mechanical). The BNF parser generator is also written in portable shell (I posted another gist with a preview of it in another thread).
All modules have comprehensive tests, but it is still lacking documentation and not ready for prime time!
Pure shell. Love the minimalism here... especially when every tiny CLI tool these days seems to require a 50MB node_modules folder just to run. There’s a certain Zen in doing things with zero dependencies. Reminds me of why I got into Unix in the first place.
Why not POSIX or some common external tools where it makes sense? Most of those big switch statements could be easily replaced with some standard programs that already exist everywhere.
One main reason is performance. Forking for other tools is very expensive.
That said, using larger sed or awk programs instead of ad-hoc calls for small snippets would perhaps be net-positive for performance and readability.
I'm currently working on very strict bootstrap scenarios in which sed and awk might not be available, but a shell might be (if I'm able to write it). It is possible that in such scenarios, the fist send and awk versions will be shell-written polyfills anyway.
It's an incomplete idea from around a year ago. The approach taken here (aliases as macro-like evals, AST generation using shell variables) became the backbone for the BNF parser generator.
This one is much simpler to understand. Simpler grammars tend to produce parser code that looks more like this one.
I mean, today it's possible to generate it in Tcl, Elisp, Windows BAT, Powershell.
The effort is just 1 prompt.
The WHY question is much more important today -- "because I can" no longer makes sense, because we all can do much, much more with minimum effort today than before LLMs.
Yes, c89cc.sh was definitely AI-assisted. However, I do carry extensive knowledge of the portable shell that was essential for the AI to complete it.
You'll find tricks inside c89cc.sh that don't exist anywhere, except in other code from me (like the ksh93 fix for local dynamic scoping or the alias/macro read -n1 polyfill).
The WHY is pretty obvious: I want to show that the portable shell is not a toy.
The main parser and emitter are BFN-generated (that's why they look so mechanical). The BNF parser generator is also written in portable shell (I posted another gist with a preview of it in another thread).
All modules have comprehensive tests, but it is still lacking documentation and not ready for prime time!
https://github.com/udem-dlteam/pnut
Usage:
printf 'int main(){puts("hello");return 0;}' | sh c89cc.sh > hello
chmod +x hello
./hello
That said, using larger sed or awk programs instead of ad-hoc calls for small snippets would perhaps be net-positive for performance and readability.
I'm currently working on very strict bootstrap scenarios in which sed and awk might not be available, but a shell might be (if I'm able to write it). It is possible that in such scenarios, the fist send and awk versions will be shell-written polyfills anyway.
https://gist.github.com/alganet/4dfd501a3377a60f7825901114d6...
Roughly 70% of c89cc was generated from it (parser, emitter).
It can generate parsers for C, ES6 and XML for example (subsets but not missing a lot).
It's still a mess though and I have lots of work to do to a proper release.
But the rest seems easy enough to understand.
https://gist.github.com/alganet/23df53c567b8a0bf959ecbc7b689...
It's an incomplete idea from around a year ago. The approach taken here (aliases as macro-like evals, AST generation using shell variables) became the backbone for the BNF parser generator.
This one is much simpler to understand. Simpler grammars tend to produce parser code that looks more like this one.
Or much easier to backdoor...
I mean, today it's possible to generate it in Tcl, Elisp, Windows BAT, Powershell.
The effort is just 1 prompt.
The WHY question is much more important today -- "because I can" no longer makes sense, because we all can do much, much more with minimum effort today than before LLMs.
https://gist.github.com/alganet/23df53c567b8a0bf959ecbc7b689...
Here is me 10 years ago experimenting on parsing stuff with sed:
https://gist.github.com/alganet/542f46865420529c9bd2
---
Yes, c89cc.sh was definitely AI-assisted. However, I do carry extensive knowledge of the portable shell that was essential for the AI to complete it.
You'll find tricks inside c89cc.sh that don't exist anywhere, except in other code from me (like the ksh93 fix for local dynamic scoping or the alias/macro read -n1 polyfill).
The WHY is pretty obvious: I want to show that the portable shell is not a toy.