Show HN: Deterministic security solution for AI agents – OpenClaw and 2 more

I wanted to share a solution that I made initially for myself for OpenClaw, that helps control what your ai agents can reach when you let it do stuff without impacting its power, I hope it's useful to you.

Basically the solution lets you experiment freely with your agent within safe boundaries.

It's deterministic on purpose (doesn't include any Al layer) which means the solution follows clear and already defined rules, to maximize safety/security and predictability.

Rules are heavily tested on detecting prompt injection attempts and other security cases (explained in detail in the docs).

Everything is local and lives on your computer including the docs site.

It gives you a control panel to monitor and control boundaries. When boundaries are about to get crossed you receive an approval request which lets you see what your openclaw was trying to do.

It also (currently) supports Tailscale, so you can connect your Tailscale IP address and receive everything on your phone and you can also chat normally, approve or deny requests. It lets access the control panel via your tailscale IP address (a private one is recommended) from anywhere. Currently only Telegram Channel is supported.

Only supports linux os for now and Opencode Claude Code & OpenClaw runners.

The things you need to get started are explained in the readme, also include quick demo/showcase images so you can see how it looks.

I'll be happy to hear feedback from you guys, especially having it tested against prompt injections to see how it handles it, don't hesitate to open a ticket on the GitHub for any issue that you found, I'll do my best to fix them.

Link here: https://github.com/steadeepanda/agent-ruler/

Thank you for reading. I'll be happy to discuss about it.

4 points | by steadeepanda 7 hours ago

4 comments

  • derrak 4 hours ago
    > It's deterministic on purpose (doesn't include any Al layer)

    I wouldn’t use the word deterministic here. I would use the word symbolic. Determinism, meaning that you always get the same output on the same input, isn’t what you want here. For instance, you can use an LLM without temperature, etc. and its output will be deterministic. More over, if you had a symbolic, non-deterministic algorithm you would probably also be happy to use that.

    • steadeepanda 4 hours ago
      LLMs are probabilistic by nature so even if you're using without temperature it doesn't remove completely this fact, it would just narrow the output. However here we're aiming for an already defined set of rules on purpose, with no LLM including in the decision workflow on purpose. You can't safely rely on LLM for security, it's contradictory because of the current nature of LLMs, which is one of the issues that we have today, and that we're trying to propose a solution for. But yeah it's possible to include an LLM in the decision workflow it's just that in comes with cons that I was trying to mitigate with this solution
      • derrak 4 hours ago
        I think your solution is a good idea. I was just pushing back on why it’s a good idea. Determinism isn’t the crux. The crux is that you’re using a symbolic algorithm with well-defined formal semantics.

        I was trying to show that determinism is not the crux by pointing out that there are ways to get a deterministic output from an LLM. And that thought experiment shows that determinism isn’t what’s essential.

        And I will disagree about merely narrowing the outputs. If I download a local model and set the temperature to zero and give it the same prompt twice, I will get the same output. Not one of several outputs in a narrow set. LLMs are functions.

        • steadeepanda 4 hours ago
          Ah okayy, yeah sure you're right. I didn't mean it that way. I mean I know we can get deterministic output from LLM but the issue is that even with that LLMs are trained on large set of data that open a surface for prompt injections and other attacks, and no matter how strong your guardrails are there's still a way to inject a prompt that even if you configure for deterministic output. So where I was going for the "determinism" was that the solution I made sits outside the LLMs it has nothing to do with the internal reasoning, and since "determinism" it ensure and safe and secure action check against the defined rules.

          Maybe here I should emphasize on the fact that it's external to any LLM? I don't know.

  • jaylew1997 6 hours ago
    nice
  • sbw70 2 hours ago
    [dead]
  • Remi_Etien 4 hours ago
    [dead]