1 comments

  • just_fairly 4 hours ago
    What follows is a framework I developed through the experience of collaborating with AI, getting frustrated with specific failure modes, and working backward to understand why they were happening. I had feeling in my gut that there was a fundamental misunderstanding that allowed for the sudden, casual decoupling of the expected from the observed and by developing and testing behavioral harness .md files I arrived at a simple idea: that there are several lossy translation layers involved in reducing a human goal into a prompt, amd more when transforming a prompt into an ai goal set. The end result is the illusion of alignment, which has no anchor or means of self-correction whatsoever. The conceptual architecture is mine. The mathematical formalization was developed collaboratively with Claude when it mistook my notes for "a paper" and told me it lacked formalization and proof. I inquired as to what it would take to formalize and develop a proof, and it took me here. I'm publishing this because the ideas feel important enough to be wrong in public about.

    Calling this publishing is a leap, but as a layperson just recently augmented with claude I am plainly in the set of people in need or a differential diagnosis, perhaps I will be lucky enough to get one.

    All work is original if you can call me in powered exoskeleton all natural. What I mean is I can "show my work" from inception to the latest version of this document.