Show HN: Octosphere, a tool to decentralise scientific publishing

(octosphere.social)

56 points | by crimsoneer 21 hours ago

6 comments

verdverm 20 hours ago
Are you aware of the current efforts by researchers on Bluesky to build a new researchers platform on ATProto? (Forget the project name at the moment)
If not, same handle over there, I can get you in touch with them. Or hit up Boris, he knows everyone and is happy to make connections
There's also a full day at the upcoming conference on ATProto & scientific related things. I think they com on discourse more (?)
[-]
- nosuchthing 9 hours ago
  Isn't ATProto just a compromised version of Activity Pub, basically designed around an excuse to force all users into a data mining firehose structure like twitter used to have only there's no privacy features or federation for moderation controls?
  [-]
  - greentea23 5 hours ago
    Yes. Nostr and ActivityPub are so easy too, I don't see much advantage to ATProto and so many disadvantages. It's as decentralized as a meme coin, just waiting for the rug pull.
    To me something git-like with a peer review UI (a la pull requests) seems far more natural for distributed academic publications than a social media protocol though.
    [-]
    - crimsoneer 5 hours ago
      So I briefly touched on this in the blog post, but to expand a little... ATProto provides significantly more "batteries included" than ActivityPub in my view - if you use ATProto, it can handle both authentication and identity management, and effectively act as your back-end and CRUD operations (eg, oauth with your PDS, and then write/read from the network for your object creation based on your Lexicon).
      ActivityPub, based on my understanding, really doesn't work like that - while you an oauth with your mastodon account, the expectation is you'll be handling identity and back-end bits, and then sharing events across the network (happy to be corrected).
      Part of what kicked this off is seeing ATProto's new devrel person at a meetup and finding their vision pretty compelling.
      But yes, ActivityPub is more "robust" and decentralised (hence also jankier)
      https://andreasthinks.me/posts/octosphere/octosphere.html
  - southerntofu 6 hours ago
    Disclaimer: i'm rather hostile to ATProto for reinventing the wheel without bringing much value over AP/XMPP/Matrix.
    I don't think that's a fair characterization. Most AP implementations famously don't have privacy features: it was by design (and therefore no surprise to us tech folks), but i remember it was quite the scandal when users found out Mastodon instance admins could read users' private messages. A later "scandal" involved participation in the EUNOMIA research project about "provenance tracking" in federated networks [1], which to be fair to conspiracy theorists does sound like an academic front for NSA-style firehose R&D.
    That being said, Bluesky is much harder to selfhost and is therefore not decentralized in practice. [2] See also Blacksky development notes. However, Bluesky does bring a very interesting piece to the puzzle which AP carefully ignored despite years of research in AP-adjacent protocols (such as Hubzilla): account portability.
    All in all, i'm still siding on the ActivityPub ecosystem because i think it's much more ethical and friendly in all regards, and i'm really sad so many so-called journalists, researchers and leftists jumped ship to Bluesky just because the attraction of "Twitter reborn" (with the same startup nation vibes) was too strong. At least in my circles, i did not meet a single person who mentioned the choice of Bluesky was about UX or features.
    But now, i'm slowly warming up to the ATmosphere having a vibrant development community. Much more so than AP. And to be fair to ATProto, it is worse than AP from a centralization standpoint, but at least it's not as bad and complex as the matrix protocol which brought 0 value over AP/XMPP but made implementations 100x more complex and resource-intensive.
    [1] https://cordis.europa.eu/project/id/825171/reporting
    [2] https://arewedecentralizedyet.online/
- crimsoneer 20 hours ago
  Ooh no, please do, but would love to hear more!
  [-]
  - verdverm 20 hours ago
    Go chime in and share your work here: https://discourse.atprotocol.community/t/about-the-atproto-s...
    That'll get us connected off HN
    I think Cosmik is the group I was thinking of that has also put out some initial poc like yourself
- Johnny_Bonk 20 hours ago
  id also be curious to follow this if you have any links or resources
  [-]
  - verdverm 20 hours ago
    This is probably a good jumping off point
    https://discourse.atprotocol.community/t/about-the-atproto-s...
Murskautuminen 12 hours ago
I am afraid that gatekeeping is partially essential and somewhat desired, as an academic you don't have time to read everything and some sort of quick signals, albeit very flawed, can be useful to stop wasting time reading crappy science. If you don't gatekeep you will get a lot of crappy papers or papers that mention the same thing and it will waste more time from people that wish to get a quick sense of the state of a topic/field from quality work. An open source voting system would be easily abused, so it will end up to be trusting a select service of peer reviewers or agencies. Especially if a paper includes a lot of experiments and figures that can be somewhat complicated or overwhelming. What do think?
[-]
- fc417fc802 10 hours ago
  I'm inclined to agree, and yet the past decade of ML on arxiv moving at a breakneck pace seems to be a counterexample. In that case I observe citation "bubbles" where I can follow one good paper up and down the citation graph to find others.
  [-]
  - Murskautuminen 9 hours ago
    I think for smaller software papers or ML learning on arxiv this might work. For larger papers on biomedical or hard-tech, I think it is much less likely. I struggle to keep up with BioRxiv.org as a medical professional, many articles would require 2 hours+ to confidently review as a professional and I would never trust a "public" review algorithm or network to necessarily do a great job. If you allow weekly updates on your topic area, you might get a 100 papers a week of which 90 are likely poor-quality, but who is going to review these? Definitely not me, I cannot judge a 100 papers a week. Granted, probably only 1 or 2 are directly relevant to your work, but even then the time sink is annoying. It is nice if a publisher has done some initial quality check, made sure the written text is succinct, direct and validated and backed up by well-presented data/figures/methods. Even if a totally open social network exists for upvoting/describing papers, I am afraid the need for these publishers will still be there and they will just exist regardless, and it will still be preferred by academics.
    Three~five experts specifically asked to review a paper in a controlled environment versus a thousands random scientists or public people (which might be motivated by financial, malicious or other reasons) is probably still the better option. Larger, technically impressive multi-disciplinary papers with 20+ authors are basically impossible to review as individuals, you would like a few experts on the main methods to review it together in harmony with oversight from an reputable vendor/publisher. Such papers are also increasingly common in any biotech/hard-tech field.
    [-]
    - fc417fc802 1 hour ago
      > many articles would require 2 hours+ to confidently review as a professional
      I think ML (and really all other fields) are the same. Skimming a paper never really leaves you certain of how rigorous it is.
      I agree that a naive "just add voting" "review" mechanism would not suffice to replace journals. However there's no requirement that the review algorithm be so naive. Looked at differently, what is a journal except for a complicated algorithm for performing reviews?
      > I am afraid the need for these publishers will still be there and they will just exist regardless, and it will still be preferred by academics.
      Agreed. I doubt publishers are going away any time soon (if ever) regardless of how technically excellent any proposed replacement might be. I still think it's worthwhile to pursue alternatives though.
- perfmode 11 hours ago
  This is solved by social trust graph algorithms. These allow intersubjective ranking without a central authority.
rsolva 15 hours ago
@criomsoneer: Check out Open Science Network (Bonfire), they are also doing interesting work in this space! https://openscience.network/
j-pb 11 hours ago
Nothing based on DOIs and OCRIDs will ever be properly decentralised.
You need content addressing and cryptographic signatures for that.
[-]
- tbrownaw 10 hours ago
  Email is pretty decentralized without those things.
  [-]
  - j-pb 10 hours ago
    And it is infamously insecure, full of spam, and struggles with attachments beyond 10mB.
    So thank you for bringing it up, it showcases well that a distributed system is not automatically a good distributed system, and why you want encryption, cryptographic fingerprints and cryptographic provenance tracking.
    [-]
    - lelandbatey 10 hours ago
      And yet, it is a constantly used decentralized system which does not require content addressing, as you mentioned. You should elaborate why we need content addressing for a decentralized system instead of saying "10MiB limit + spam lol email fell off". Contemporary usage of technologies you've mentioned don't seem to do much to reduce spam (see IPFS which has hard content addressing). Please, share more.
      [-]
      - j-pb 3 hours ago
        If you think email is still in widespread use because it’s doing a good job, rather than because of massive network effects and sheer system inertia, then we’re probably talking past each other - but let me spell it out anyway.
        Email “works” in the same sense that fax machines worked for decades: it’s everywhere, it’s hard to dislodge, and everyone has already built workflows around it.
        There is no intrinsic content identity, no native provenance, no cryptographic binding between “this message” and “this author”. All of that has to be bolted on - inconsistently, optionally, and usually not at all.
        And even ignoring the cryptography angle: email predates “content as a first-class addressable object”. Attachments are in-band, so the sender pushes bytes and the receiver (plus intermediaries) must accept/store/scan/forward them up front. That’s why providers enforce tight size limits and aggressive filtering: the receiver is defending itself against other people’s pushes.
        For any kind of information dissemination like email or scientific publishing you want the opposite shape: push lightweight metadata (who/what/when/signature + content hashes), and let clients pull heavy blobs (datasets, binaries, notebooks) from storage the publishing author is willing to pay for and serve. Content addressing gives integrity + dedup for free. Paying ~1$ per DOI for what is essentially a UUID, is ridiculous by comparison.
        That decoupling (metadata vs blobs) is the missing primitive in email-era designs.
        All of that makes email a bad template for a substrate of verifiable, long-lived, referenceable knowledge. Let's not forget that the context of this thread isn’t “is decentralized routing possible?”, it’s “decentralized scientific publishing” - which is not about decentralized routing, but decentralized truth.
        Email absolutely is decentralized, but decentralization by itself isn’t enough. Scientific publishing needs decentralized verification.
        What makes systems like content-addressed storage (e.g., IPFS/IPLD) powerful isn’t just that they don’t rely on a central server - it’s that you can uniquely and unambiguously reference the exact content you care about with cryptographic guarantees. That means:
        - You can validate that what you fetched is exactly what was published or referenced, with no ambiguity or need to trust a third party.
        - You can build layered protocols on top (e.g., versioning, merkle trees, audit logs) where history and provenance are verifiable.
        - You don’t have to rely on opaque identifiers that can be reissued, duplicated, or reinterpreted by intermediaries.
        For systems that don’t rely on cryptographic primitives, like email or the current infrastructure using DOIs and ORCIDs as identifiers:
        - There is no strong content identity - messages can be altered in transit.
        - There is no native provenance - you can’t universally prove who authored something without added layers.
        - There’s no simple way to compose these into a tamper-evident graph of scientific artifacts with rigorous references.
        A truly decentralized scholarly publishing stack needs content identity and provenance. DOIs and ORCIDs help with discovery and indexing, but they are institutional namespaces, not cryptographically bound representations of content. Without content addressing and signatures, you’re mostly just trading one central authority for another.
        It’s also worth being explicit about what “institutional namespace” means in practice here.
        A DOI does not identify content. It identifies a record in a registry (ultimately operated under the DOI Foundation via registration agencies). The mapping from a DOI to a URL and ultimately to the actual bytes is mutable, policy-driven, and revocable. If the publisher disappears, changes access rules, or updates what they consider the “version of record”, the DOI doesn’t tell you what an author originally published or referenced - it tells you what the institution currently points to.
        ORCID works similarly: a centrally governed identifier system with a single root of authority. Accounts can be merged, corrected, suspended, or modified according to organisational policy. There is no cryptographic binding between an ORCID, a specific work, and the exact bytes of that work that an independent third party can verify without trusting the ORCID registry.
        None of this is malicious - these systems were designed for coordination and attribution, not for cryptographic verifiability. But it does mean they are gatekeepers in the precise sense that matters for decentralization:
        Even if lookup/resolution is distributed, the authority to decide what an identifier refers to, whether it remains valid, and how conflicts are resolved is concentrated in a small number of organizations. If those organizations change policy, disappear, or disagree with you, the identifier loses its meaning - regardless of how many mirrors or resolvers exist.
        If the system you build can’t answer “Is this byte-for-byte the thing the author actually referenced or published?” without trusting a gatekeeper, then it’s centralized in every meaningful sense that matters to reproducibility and verifiability.
        Decentralised lookup without decentralised authority is just centralisation with better caching.
gnarlouse 19 hours ago
Integrate them peer review process and you’ve got a disrupter
[-]
- mlpoknbji 19 hours ago
  Peer review should be disrupted, but doing peer review via social media is not the way to go.
  [-]
  - perching_aix 18 hours ago
    Has a bit of a leg up in that if it's only academics commenting, it would probably be way more usable than typical social media, maybe even outright good.
- crimsoneer 19 hours ago
  Right? This is kind of the dream.
- naasking 19 hours ago
  Calling it peer review suggests gatekeeping. I suggest no gatekeepind just let any academic post a review, and maybe upvote/downvote and let crowdsourcing handle the rest.
  [-]
  - staplers 18 hours ago
    While I appreciate no gatekeeping, the other side of the coin is gatekeeping via bots (vote manipulation).
    Something like rotten tomatoes could be useful. Have a list of "verified" users (critic score) in a separate voting column as anon users (audience score).
    This will often serve useful in highly controversial situations to parse common narratives.
11101010010001 17 hours ago
Yes publishing is broken, but academics are the last people to jump onto platforms...they never left email. If you want to change the publishing game, turn publishing into email.
[-]
- fghorow 11 hours ago
  In whose interests would it be for academics to "leave email"?
  Theirs? (Personally, I think not.)