1. Don't use bash, use a scripting language that is more CI friendly. I strongly prefer pwsh.
2. Don't have logic in your workflows. Workflows should be dumb and simple (KISS) and they should call your scripts.
3. Having standalone scripts will allow you to develop/modify and test locally without having to get caught in a loop of hell.
4. Design your entire CI pipeline for easier debugging, put that print state in, echo out the version of whatever. You don't need it _now_, but your future self will thank you when you do it need it.
5. Consider using third party runners that have better debugging capabilities
I would disagree with 1. if you need anything more than shell that starts to become a smell to me. The build/testing process etc should be simple enough to not need anything more.
I agree with #2, I meant more if you are calling out to something that is not a task runner(Make, Taskfile, Just etc) or a shell script thats a bit of a smell to me. E.g. I have seen people call out to Python scripts etc and it concerns me.
Huh? Who cares if the script is .sh, .bash, Makefile, Justfile, .py, .js or even .php? If it works it works, as long as you can run it locally, it'll be good enough, and sometimes it's an even better idea to keep it in the same language the rest of the project is. It all depends and what language a script is made in shouldn't be considered a "smell".
> Huh? Who cares if the script is .sh, .bash, Makefile, Justfile, .py, .js or even .php?
Me, typically I have found it to be a sign of over-engineering and found no benefits over just using shell script/task runner, as all it should be is plumbing that should be simple enough that a task runner can handle it.
> If it works it works, as long as you can run it locally, it'll be good enough,
Maybe when it is your own personal project "If it works it works" is fine. But when you come to corporate environment there starts to be issues of readability, maintainability, proprietary tooling, additional dependencies etc I have found when people start to over-engineer and use programming languages(like Python).
E.g.
> never_inline 30 minutes ago | parent | prev | next [–]
> Build a CLI in python or whatever which does the same thing as CI, every CI stage should just call its subcommands.
However,
> and sometimes it's an even better idea to keep it in the same language the rest of the project is
I'll agree. Depending on the project's language etc other options might make sense. But personally so far everytime I have come across something not using a task runner it has just been the wrong decision.
> But personally so far everytime I have come across something not using a task runner it has just been the wrong decision.
Yeah, tends to happen a lot when you hold strong opinions with strong conviction :) Not that it's wrong or anything, but it's highly subjective in the end.
Typically I see larger issues being created from "under-engineering" and just rushing with the first idea people can think of when they implement things, rather than "over-engineering" causing similarly sized future issues. But then I also know everyone's history is vastly different, my views are surely shaped by the specific issues I've witnessed (and sometimes contributed to :| ), than anything else.
> If your CI can do things that you can't do locally: that is a problem.
Probably most of the times when this is an actual problem, is building across many platforms. I'm running Linux x86_64 locally, but some of my deliverables are for macOS and Windows and ARM, and while I could cross-compile for all of them on Linux (macOS was a bitch to get working though), it always felt better to compile on the hardware I'm targeting.
Sometimes there are Windows/macOS-specific failures, and if I couldn't just ssh in and correct/investigate, and instead had to "change > commit > push" in an endless loop, it's possible I'd quite literally would lose my mind.
> If your CI can do things that you can't do locally: that is a problem.
Completely agree.
> I'm a huge fan of "train as you fight", whatever build tools you have locally should be what's used in CI.
That is what I am doing, having my GitHub Actions just call the Make targets I am using locally.
> I mean, at some point you are bash calling some other language anyway.
Yes, shell scripts and or task runners(Make, Just, Task etc) are really just plumbing around calling other tools. Which is why it feels like a smell to me when you need something more.
I typically use make for this and feel like I’m constantly clawing back scripts written in workflows that are hard to debug if they’re even runnable locally.
This isn’t only a problem with GitHub Actions though. I’ve run into it with every CI runner I’ve come across.
How do you handle persistent state in your actions?
For my actions, the part that takes the longest to run is installing all the dependencies from scratch. I'd like to speed that up but I could never figure it out. All the options I could find for caching deps sounded so complicated.
> How do you handle persistent state in your actions?
You shouldn't. Besides caching that is.
> All the options I could find for caching deps sounded so complicated.
In reality, it's fairly simple, as long as you leverage content-hashing. First, take your lock file, compute the sha256sum. Then check if the cache has an artifact with that hash as the ID. If it's found, download and extract, those are your dependencies. If not, you run the installation of the dependencies, then archive the results, with the ID set to the hash.
It really isn't more to it. I'm sure there are helpers/sub-actions/whatever Microsoft calls it, for doing all of this with 1-3 lines or something.
The tricky bit for me was figuring out which cache to use, and how to use and test it locally. Do you use the proprietary github actions stuff? If the installation process inside the actions runner is different from what we use in the developer machines, now we have two sets of scripts and it's harder to test and debug...
> Do you use the proprietary github actions stuff?
If I can avoid it, no. Almost everything I can control is outside of the Microsoft ecosystem. But as a freelancer, I have to deal a bunch with GitHub and Microsoft anyways, so in many of those cases, yes.
Many times, I end up using https://github.com/actions/cache for the clients who already use Actions, and none of that runs in the local machines at all.
Typically I use a single Makefile/Justfile, that sometimes have most of the logic inside of it for running tests and what not, sometimes shell out to "proper" scripts.
But that's disconnected from the required "setup", so Make/Just doesn't actually download dependencies, that's outside of the responsibilities of whatever runs the test.
And also, with a lot of languages, it doesn't matter if you run an extra "npm install" over already existing node_modules/, it'll figure out what's missing/there already, so you could in theory still have "make test" do absolute everything locally, including installing dependencies (if you now wish this), and still do the whole "hash > find cache > extract > continue" thing before running "make test", and it'll skip the dependencies part if it's there already.
Depends on the build toolchain but usually you'd hash the dependency file and that hash is your cache key for a folder in which you keep your dependencies. You can also make a Docker image containing all your dependencies but usually downloading and spinning that up will take as long as installing the dependencies.
For things like installing deps, you can use GitHub Actions or several third party runners have their own caching capabilities that are more mature than what GHA offers.
Step 0. Stop using CI services that purposefully waste your time, and use CI services that have "Rebuild with SSH" or similar. From previous discussions (https://news.ycombinator.com/item?id=46592643), seems like Semaphore CI still offers that.
The way I deal with all these terrible CI platforms (there is no good one, merely lesser evils) is to do my entire CI process in a container and the CI tool just pulls and runs that. You can trivially run this locally when needed.
Of course, the platforms would rather have you not do that since it nullifies their vendor lock-in.
Thats what i always did for our gitlab CI pipeline, just deploy dedicated images for different purposes. We had general terraform images for terraform code, this made it easy to standardize versions etc. Then we made specific ones for projects with a lot of dependencies so we could run the deployment pipeline in seconds instead of minutes. But now you need to maintain the docker images too. All about trade-offs.
The best CI platforms let you "Rebuild with SSH" or something similar, and instead of having the cycle of "change > commit > push > wait > see results" (when you're testing CI specific stuff, not iterating on Makefiles or whatever, assuming most of it is scripts you can run both locally and in CI), you get a URL to connect to while the job is running, so you can effectively ensure manually it works, then just copy-paste whatever you did to your local sources.
Of all the valid complaints about Github Actions or CI in general, this seems to be an odd one. No details about what was tried or not tried, but hard to see a `-run: go install cuelang.org/go/cmd/cue@latest` step not working?
Would a tool like act help here? (https://github.com/nektos/act) I suppose orchestration that is hiding things from different processor architectures could also very well run differently online than offline, but still.
> For the love of all that is holy, don’t let GitHub Actions
> manage your logic. Keep your scripts under your own damn
> control and just make the Actions call them!
I mean your problem was not `build.rs` here and Makefiles did not solve it, was your logic not already in `build.rs` which was called by Cargo via GitHub Actions?
The problem was the environment setup? You couldn't get CUE on Linux ARM and I am assuming when you moved to Makefiles you removed the need for CUE or something? So really the solution was something like Nix or Mise to install the tooling, so you have the same tooling/version locally & on CI?
As soon as I need more than two tries to get some workflow working, I set up a tmate session and debug things using a proper remote shell. It doesn't solve all the pain points, but it makes things a lot better.
> For the love of all that is holy, don’t let GitHub Actions
> manage your logic. Keep your scripts under your own damn
> control and just make the Actions call them!
The pain is real. I think everyone that's ever used GitHub actions has come to this conclusion. An ideal action has 2 steps: (1) check out the code, (2) invoke a sane script that you can test locally.
Honestly, I wonder if a better workflow definition would just have a single input: a single command to run. Remove the temptation to actually put logic in the actions workflow.
I assume you're using the currently recommended docker-in-docker method. The legacy Gitlab way is horrible and it makes it basically impossible to run pipelines locally.
This is basically how most other CI systems work. GitLab CI, Jenkins, Buildbot, Cirrus CI, etc. are all other systems I've used and they work this way.
I find GitHub Actions abhorrent in a way that I never found a CI/CD system before...
> I think everyone that's ever used GitHub actions has come to this conclusion.
I agree that that should be reasonable but unfortunately I can tell you that not all developers (including seniors) naturally arrive at such conclusion no.
GHA’s componentized architecture is appealing, but it’s astonishing to me that there’s still seemingly no way to actually debug workflows, run them locally, or rapidly iterate on them in any way. Alas.
So the article is about the frustrating experience of fixing GitHub Actions when something goes wrong, especially when a workflow only fails on one platform, potentially due to how GitHub runner is set up (inconsistently across platforms).
Took me a while to figure that out. While I appreciate occasional banters in blog articles, this one seems to diverge into rant a bit too much, and could have made its point much clearer, with, for example, meaningful section headers.
Until I resd this blog I was under the impression that everyone wrote Python/ other files and used Github Actions to just call the scripts!
This way we can test it on local machine before deployment.
Also as other commenters have said - bash is not a good option - Use Python or some other language and write reusabe scripts. If not for this then for the off chance that it'll be migrated to some other cicd platform
I wouldn't say that, but I would say there's no "should" here; it's often much more hassle than people expect and everyone has to decide for themselves whether the number of users is worth it.
Prefacing this with the fact that act is great, however, it has many shortcomings. Too often I've run into roadblocks, and when looking up the issue for it, it seems they are hard to address. Simpler workflows work fine with it, but more complex workflows will be much harder.
Don't put your logic in proprietary tooling. I have started writing all logic into mise tasks since I already manage the tool dependencies with mise. I tend to write them in a way where it can easily take advantage of GHA features such as concurrency, matrixes, etc. But beyond that, it is all running within mise tasks.
is it your contention that once anybody becomes sufficiently skillful with a technology they will come to love it? And thus stating that one does not love the specific technology demonstrates the lack of skill?
2. Don't have logic in your workflows. Workflows should be dumb and simple (KISS) and they should call your scripts.
3. Having standalone scripts will allow you to develop/modify and test locally without having to get caught in a loop of hell.
4. Design your entire CI pipeline for easier debugging, put that print state in, echo out the version of whatever. You don't need it _now_, but your future self will thank you when you do it need it.
5. Consider using third party runners that have better debugging capabilities
Me, typically I have found it to be a sign of over-engineering and found no benefits over just using shell script/task runner, as all it should be is plumbing that should be simple enough that a task runner can handle it.
> If it works it works, as long as you can run it locally, it'll be good enough,
Maybe when it is your own personal project "If it works it works" is fine. But when you come to corporate environment there starts to be issues of readability, maintainability, proprietary tooling, additional dependencies etc I have found when people start to over-engineer and use programming languages(like Python).
E.g.
> never_inline 30 minutes ago | parent | prev | next [–]
> Build a CLI in python or whatever which does the same thing as CI, every CI stage should just call its subcommands.
However,
> and sometimes it's an even better idea to keep it in the same language the rest of the project is
I'll agree. Depending on the project's language etc other options might make sense. But personally so far everytime I have come across something not using a task runner it has just been the wrong decision.
Yeah, tends to happen a lot when you hold strong opinions with strong conviction :) Not that it's wrong or anything, but it's highly subjective in the end.
Typically I see larger issues being created from "under-engineering" and just rushing with the first idea people can think of when they implement things, rather than "over-engineering" causing similarly sized future issues. But then I also know everyone's history is vastly different, my views are surely shaped by the specific issues I've witnessed (and sometimes contributed to :| ), than anything else.
I'm a huge fan of "train as you fight", whatever build tools you have locally should be what's used in CI.
If your CI can do things that you can't do locally: that is a problem.
Probably most of the times when this is an actual problem, is building across many platforms. I'm running Linux x86_64 locally, but some of my deliverables are for macOS and Windows and ARM, and while I could cross-compile for all of them on Linux (macOS was a bitch to get working though), it always felt better to compile on the hardware I'm targeting.
Sometimes there are Windows/macOS-specific failures, and if I couldn't just ssh in and correct/investigate, and instead had to "change > commit > push" in an endless loop, it's possible I'd quite literally would lose my mind.
Completely agree.
> I'm a huge fan of "train as you fight", whatever build tools you have locally should be what's used in CI.
That is what I am doing, having my GitHub Actions just call the Make targets I am using locally.
> I mean, at some point you are bash calling some other language anyway.
Yes, shell scripts and or task runners(Make, Just, Task etc) are really just plumbing around calling other tools. Which is why it feels like a smell to me when you need something more.
This isn’t only a problem with GitHub Actions though. I’ve run into it with every CI runner I’ve come across.
For my actions, the part that takes the longest to run is installing all the dependencies from scratch. I'd like to speed that up but I could never figure it out. All the options I could find for caching deps sounded so complicated.
You shouldn't. Besides caching that is.
> All the options I could find for caching deps sounded so complicated.
In reality, it's fairly simple, as long as you leverage content-hashing. First, take your lock file, compute the sha256sum. Then check if the cache has an artifact with that hash as the ID. If it's found, download and extract, those are your dependencies. If not, you run the installation of the dependencies, then archive the results, with the ID set to the hash.
It really isn't more to it. I'm sure there are helpers/sub-actions/whatever Microsoft calls it, for doing all of this with 1-3 lines or something.
If I can avoid it, no. Almost everything I can control is outside of the Microsoft ecosystem. But as a freelancer, I have to deal a bunch with GitHub and Microsoft anyways, so in many of those cases, yes.
Many times, I end up using https://github.com/actions/cache for the clients who already use Actions, and none of that runs in the local machines at all.
Typically I use a single Makefile/Justfile, that sometimes have most of the logic inside of it for running tests and what not, sometimes shell out to "proper" scripts.
But that's disconnected from the required "setup", so Make/Just doesn't actually download dependencies, that's outside of the responsibilities of whatever runs the test.
And also, with a lot of languages, it doesn't matter if you run an extra "npm install" over already existing node_modules/, it'll figure out what's missing/there already, so you could in theory still have "make test" do absolute everything locally, including installing dependencies (if you now wish this), and still do the whole "hash > find cache > extract > continue" thing before running "make test", and it'll skip the dependencies part if it's there already.
For caching you use GitHubs own cache action.
For things like installing deps, you can use GitHub Actions or several third party runners have their own caching capabilities that are more mature than what GHA offers.
https://docs.github.com/en/actions/how-tos/manage-runners/la...
Of course, the platforms would rather have you not do that since it nullifies their vendor lock-in.
GitHub action is a totally broken piece of s !! I know about that broken loops cause I had to deal with it an incredible number of times.
I very often mention OneDev in my comments, and you know what ? Robin solved this issue 3 years ago : https://docs.onedev.io/tutorials/cicd/diagnose-with-web-term...
You can pause your action, connect through a web terminal, and debug/fix things live until it works. Then, you just patch your action easily.
And that’s just one of the many features that make OneDev superior to pretty much every other CI/CD product out there.
gg watch action
Finds the most recent or currently running action for the branch you have checked out. Among other things.
https://github.com/frankwiles/gg
edit: Just a quick note, the `gg` and `gg tui` commands for me don't show any repos at all, the current context stuff all works perfectly though.
Maybe that has changed.
The problem was the environment setup? You couldn't get CUE on Linux ARM and I am assuming when you moved to Makefiles you removed the need for CUE or something? So really the solution was something like Nix or Mise to install the tooling, so you have the same tooling/version locally & on CI?
As soon as I need more than two tries to get some workflow working, I set up a tmate session and debug things using a proper remote shell. It doesn't solve all the pain points, but it makes things a lot better.
Honestly, this should be built into GitHub Actions.
[0] https://github.com/tmate-io/tmate/issues/322
[1] https://upterm.dev/
[2] https://github.com/marketplace/actions/debug-with-ssh
Honestly, I wonder if a better workflow definition would just have a single input: a single command to run. Remove the temptation to actually put logic in the actions workflow.
If you can't run the same scripts locally (minus external hosted service/API) then how do you debug them w/o running the whole pipeline?
I find GitHub Actions abhorrent in a way that I never found a CI/CD system before...
everything is including some crappy proprietary yaml rather than using standard tooling
so instead of being a collection of easily composable and testable bits it's a mess that only works on their platform
That's just the good old Microsoft effect, they have a reverse-midas-touch when it comes to actually delivering good UX experiences.
I agree that that should be reasonable but unfortunately I can tell you that not all developers (including seniors) naturally arrive at such conclusion no.
Took me a while to figure that out. While I appreciate occasional banters in blog articles, this one seems to diverge into rant a bit too much, and could have made its point much clearer, with, for example, meaningful section headers.
This way we can test it on local machine before deployment.
Also as other commenters have said - bash is not a good option - Use Python or some other language and write reusabe scripts. If not for this then for the off chance that it'll be migrated to some other cicd platform
No. It's cargo cult science.
Not by GitHub, but isn't act supposed to be that?
https://github.com/nektos/act
Don't put your logic in proprietary tooling. I have started writing all logic into mise tasks since I already manage the tool dependencies with mise. I tend to write them in a way where it can easily take advantage of GHA features such as concurrency, matrixes, etc. But beyond that, it is all running within mise tasks.