You know I was actually really curious about this so I went back to the HTML and URL W3C standards and surprisingly they don't actually have any definitions of format other than being percent encoded. One might conflate query strings with "form-urlencoded"[0] query strings, which is one potential interoperability format, but in general a queries string is just any percent encoded string following a "?" in a url[1], and just another property in the "URL" HTML object that can be used in the generation of a response. While additionally there is a URLSearchParams object that is the result of parsing the query string with the form-urlencoded parser, this is simply an interoperability layer for JavaScript.
I'm going to be honest, I was pretty geared up to have a contrarian opinion until I looked at the standards but they're actually pretty clear, a 404 could be a proper response to unexpected query string; query string is as much part of the URL API as the path is and I think pretty much everyone can acknowledge that just tacking random stuff onto the path would be ill advised and undefined behavior.
Back in the day it was reasonably common for CMSs and forums to only have an index.php, and routing entirely by query string (in form-urlencoded form, people were not savages). So you would have index.php?p=home and index.php?p=shop. Or index.php?action=showthread&forum=42&thread=17976. It should be immediately obvious that in that scheme 404 is indeed the correct answer to unknown query parameters
In fact lots of sites still work like that, they just hide it behind a couple rewrite rules in apache/nginx for SEO reasons
On the other hand, if it's a CRUD app and you're filtering a list of entities by various field values? Returning that no items matched your selection (or an empty list, if an API) makes more sense than a 404, which would more appropriate for an attempt to pull up a nonexistent entity URI.
You can return whatever HTTP response code you want, but if you care about knowing whether your site is working being about to look at the logs and see "That user requested a page that doesn't exist" being different to "That user requested a page that exists but had no results" is quite useful. In coding terms it's the difference between a null and an empty array.
In this case I don't think the status should depend on the number of results. Here are you results, [] is a valid response body when there are no result. Returning 404 if there are no result (GET /books?title=a for instance) is misleading, the caller may think that /books is a non existent route and may conclude that books are reachable via another URI. To me, the querystring has no influence on the response status.
/books/1 could return 200 or 404 depending on the existence of the book#1, here it make sense because if /books/1 does not exist the API must tell it explicitly. However 404 belongs to the 4XX family which means "client error", is it an error to ask for a non existing book ? If you enter in a bookshop and ask for a book they don't have you did not "make a mistake". It's not like if you asked for a chainsaw. But in an API, especially with hypermedia, you are not supposed to request a resource that does not exist (unless the API provides a link to an existing resource that is was deleted before the caller try to reach it).
If you enter a bookshop and you ask for a book that does not exist then it's definitely your mistake.
If you ask for a book they don't have it's a different matter.
In any case, when you ask for a book in a library you are using their "search" endpoint. The equivalent to opening a books/1 url would be asking for a specific instance of a book by serial number or so. Then it's clear that you made a mistake uf you do that for an unexistent serial number...
A response code of 204 seems more appropriate but the problem is you're not allowed to send further information, which would make that descriptive response... not descriptive enough.
Yes, and this is obvious if /users/ exists and returns a 400 if the ID is required. That way you can tell the difference between /users/ being there and expecting and ID, and it not being there.
The point was that returning a 404 for unexpected query strings doesn’t just happen to okay per the specs, but that there is significant historical precedent for doing so based on application design that was common in the past.
Libraries that automatically throw errors for status codes in the 400 and 500 ranges are pretty obnoxious (looking at you, axios). It adds unnecessary overhead, complexity, and bad ergonomics by hijacking control flow from the app.
Responses with status codes in the 400 range are client errors, so the client shouldn't retry the same request. So a 404 is appropriate despite how annoying a library might be at handling it. Depending on which language/ecosystem you are using, there are likely more sane alternatives.
Completely agree on the axios part - one implication of that is you can't statically type the error response shapes (since exceptions can't be typed). Where as with fetch you can have a discriminated union based on the status code (eg: https://github.com/mnahkies/openapi-code-generator/blob/main...)
Although I do feel like I've seen too many instances of a 404 being used for an empty collection where it would make more sense to return `[]` and treat it as an expected (successful) state.
Generally true although 429 is often used for rate limiting so a back off and retry is appropriate. 409, 412, 428 may also be retriable depending on the specific semantics of the given situation. 421 apparently shows up commonly in HTTP/2 connection reuse and is retriable. 423 and 425 too potentially.
It would have been nice if there was an actually grouping of retriable and not retriable but in reality it’s a complete mess.
But at a minimum beware of 429. That’s not a permanent outage and is a frequent one you might get that needs a careful retry.
At the risk of naming an Eldritch horror, IIRC it was Cold Fusion that first adopted something like an MVC-in-querystring routing system in the late 90s or early 00s, and that eventually spread when FCGI caught on and users of other languages got used to long-running middleware processes. It seemed hella elegant at the time.
> It should be immediately obvious that in that scheme 404 is indeed the correct answer to unknown query parameters
That's not obvious at all. If I receive json data that contains a property I'm not aware of, i don't reject the entire document for that reason. In the case of query strings, extra query parameters might be used by other parts of the stack besides yours, so rejecting the entire document because someone somewhere else is trying to pass information to itself is the wrong approach.
As a web developer, you’re the like the guy standing with a clipboard outside a fancy club checking if people requesting entry are allowed or not. Basically, level 1 security.
If someone is not on the list, your job is to default to declining them access, not granting them access assuming level 2 security will handle them at a deeper layer.
It’s possible that the teams you work with expect fuzzy behaviour from the website but that’s a choice, not a practice.
The first layer of any web security should never be checking someone against a list, unless this can be done in less than a few milliseconds. It should only be sanity checking for basic compliance. In the analogy, this first layer should be denying entry to obviously drunk people, zebras, and a stampede of protesters.
>It’s possible that the teams you work with expect fuzzy behaviour from the website but that’s a choice, not a practice.
This is how the vast majority of websites work. The practical reason is obvious: when we model the behaviour our code depends on, we want to create the simplest possible model that allows our code to work as expected. Placing requirements on it that our code doesn't actually depend on is useless, unneeded, complexity.
> As a web developer, you’re the like the guy standing with a clipboard outside a fancy club checking if people requesting entry are allowed or not. Basically, level 1 security.
there is no security benefit to filtering out unneeded url parameters.
> there is no security benefit to filtering out unneeded url parameters.
there is - security in depth.
If a url parameter would've been a vulnerability because something lower down the stack misinterprets it (and the param wasn't necessary for your app in the first place), then you've just left a window open for the exploit.
If the set of url params are known ahead of time (which i claim should be true), then you could make adding unknown params an error.
> there is no security benefit to filtering out unneeded url parameters.
What about passing extra data to fill the server memory with either extra known junk or a script / executable to use with a zero day in an internal component or something.
To misuse the nightclub analogy: it’s like checking for bags not being larger than A4 and disallow knives and other weapons.
> in form-urlencoded form, people were not savages
Oh yeah? I remember a lot of semicolons from Perl and other CGI stuff where we would now use ampersands, back in the day, both in the path and in the query. (Sometimes the ? itself would be written ;.)
Correct. In fact, the semicolon is part of the URI scheme standard, and the ampersand is just some ad-hoc thing that got adopted naturally without any standardization effort.
Yeah, URLs really don’t have much in the way of semantics. Path is clearly intended for hierarchical data and query for non-hierarchical data, and there are strong customs, some commonly supported or even enforced by libraries, but no actual rules. Ultimately, it’s just a string that the server can decide what to do with.
The really funny thing about this is that, when I was worrying about possible side effects if I responded 404, I somehow completely forgot how much of the web’s history the path has been useless for. Paths have won. No one really starts new things with URLs like /item?id=… any more. Yay!
Wouldn't a generic 400 be better. It's not that the page wasn't found, but you've sent something that was not an accepted request. Fix your request and try again is how I've read it, and that's how I use it in the APIs I provide. I prefer it over 406 since it's not my end that can't process it. If your query string is tacking extra stuff trying to break things or just because your request wasn't crafted per the docs, then it's on you.
406 would be wrong for me. As it is to be used when client sends Accept: header and server cannot fulfil that. HTTP return codes get quite specific when you read the actual description and not just name.
Interestingly, quite a few places that should treat query strings transparently make a lot of assumptions about their structure. We ran into that when picking a new CDN, some providers didn't handle repeat parameters (?a=1&a=2) correctly.
Incorrectly would be processing the query string and deduping keys. Correctly would be passing it through as-is, or at least only lightly processing it, like normalizing escaping or such.
Standards are just commonly accepted behaviour that somebody chose to write down somewhere. There are a great number of commonly accepted behaviours that nobody's ever bothered to encode into a formal standard, but where failure to follow the accepted practice will result in widespread breakage. There are also a great many "standards" that you would be a fool to follow to the letter. In the OP case, the only thing that will break is people trying to visit their site, who will presumably simply press the back button on their browser and go about their day. They can decide for themselves if that is an acceptable casualty. But it isn't definitionally acceptable because no standard says it isn't (nor would is suddenly become unacceptable because a standard said it was...)
> I was pretty geared up to have a contrarian opinion until I looked at the standards but they're actually pretty clear, a 404 could be a proper response to unexpected query string; query string is as much part of the URL API as the path is and I think pretty much everyone can acknowledge that just tacking random stuff onto the path would be ill advised and undefined behavior.
This feels like a technically correct is the best kind of correct situation. Like technically, yeah web servers may respond 404 if they dont understand a query parameter, but in practise that is not how urls are conceptualized normally.
Wait until you realize that the difference between path and query string is entirely arbitrary and decided by the server. Query strings should never have existed. They are an implementation detail of CGI webservers that leaked all over everything and now smells really bad.
I dunno, it seems like the fact that we arrived at a fairly standard structure for URL paths that works pretty well is not a bad outcome.
Seems a lot better than the other potential world we could lived in, where paths were a black box and every web server/framework invented their own structure for them.
In my current project I use URIs to refer to absolutely any entity in a git(-ish) repo. Files, branches, revisions, diffs, anything. URI turns out to be a really good addressing scheme for everything. Surprise. But the most used and abused element is always the path. Query takes a lot of that mess away. Might have been unmanageable otherwise.
Grouping data by user is common and normal in computing: /home laid precedent decades ago.
Project directories are an extremely common grouping within a user’s work sets. Yeah, some of us just dump random files in $HOME, but this is still a sensible tier two path component.
The choice to make ‘view metadata-wrapped content in browser HTML output’ the default rather than ‘view raw file contents’ the default is legitimate for their usage. One could argue that using custom http headers would be preferable to a path element (to the exclusion of JavaScript being able to access them, iirc?) or that the path element blob should be moved into the domain component or should prefix rather than suffix the operands; all valid choices, but none implicitly better or worse here.
Object hash is obviously mandatory for git permalinks, and is perhaps the only mandatory component here. (But notably, that’s not the same as a commit hash.) However, such paths could arguably be interpreted as maximally user-hostile.
File path, interestingly enough, is completely disposable if one refers to a specific result object hash within a commit, but if the prior object hash was required to be a commit, then this is a valid unique identifier for the filesystem-tree contents of that commit. You could use the object hash instead of the full path within the commit hash, but that’s a pretty user-hostile way to go about this.
So, then, which part of the ordering and path selections do you consider indiscriminate, and why?
actually, instead of the object hash, you could also use the commit-hash. then the filename would be mandatory, but the url would be more readable and usable: give me the file VERBS.md as it is at commit <hash>
Which target audience of github needs extra verbosity in the commit hash, though? Once you know it you know it; if you don’t know git you aren’t the target audience; etc. Saying /user=foo is no better than ?user=foo if your audience can work it out without confusion from your unadorned paths. We have a great deal of history with filesystems showing that people are capable of keeping up with paths that lack key names if exposed to and familiar with them, and if the filesystem isn’t being constantly randomized.
what would be a better way of doing that? i am not disagreeing, but i just can't think of any way to improve on this. put everything into the query part? i prefer to use the query only for optional arguments. in this example the blob argument is the only thing that doesn't fit in my opinion.
Every object in git (commit, tree, revision of a single file) has a hash that is guaranteed unique within a repository (otherwise many more things than a web UI would break) and likely also globally. I can understand wanting to isolate repositories to prevent hash collisions from causing problems, but within a repo everything has a universally unique ID.
edit: for instance, that specific VERBS.md is represented by the blob 3b9a46854589abb305ea33360f6f6d8634649108.
> this should be sufficient to represent the file.
Except it's not, because the oid can be a short hash (https://github.com/gritzko/beagle/blob/a7e172/VERBS.md) and that means you're at risk of colliding with every other top-level entry in the repository, so you're restricting the naming of those toplevel entries, for no reason.
So namespacing git object lookups is perfectly sensible, and doing so with the type you're looking for (rather than e.g. `git` to indicate traversal of the git db) probably simplifies routing, and to the extent that it is any use makes the destination clearer for people reading the link.
They are following the /key/value/key/value pattern, but the first two pairs in a GitHub URL are fixed to user and project, which lets them omit the key names. I could see them not being willing to hardcode the third pair to blob.
Back when GitHub URLs were kind of cool, github.com/user/gritzko/project/beagle would have been much less cool than just github.com/gritzko/beagle.
Of course there's nothing to stop you using URIs like this (I think Angular does, or did at one point?) but I don't think the rules for relative matrix URIs were ever figured out and standardised, so browsers don't do anything useful with them.
Not entirely arbitrary - forms that use the GET method instead of POST will append form values as query params.
For sites without Javascript, it's great for things like search boxes, tables with sorting/filtering, etc. instead of POST, since it preserves your query in the URL.
It has always amazed me how much trouble the SPA folks are willing to go to in order to slowly rebuild just normal boring URLs with querystrings because users demand deep linking and back buttons and the like.
Or you could accept that you're probably going to need a round trip to the server and use a normal URL and it's fine.
For all but the absolute biggest websites in the world, anyhow. At Facebook or Google scale yeah it's needed.
Query strings existed before CGI did, and the way they're defined to be filled in from web forms is quite useful; I wouldn't want to need Javascript to fit that into path format. There's nothing wrong about having things decided by the server; I don't get that part of your argument at all.
Maybe dumb question: how does the server “decide” anything other than what file to serve? Today we have many choices but back in the day CGI was the first standard way to do it.
So yes query parameters existed before CGI but to use them you had to hack your server to do something with them (iirc NCSA web servers had some magic hacks for queries). CGI drove standardization.
It's arbitrary to a degree like the difference between using an attribute or child element in XML, but it's not entirely arbitrary. If you want to include data in the URL that's not part of the hierarchy of the path, query strings are good for that.
So my understanding is, he is annoyed that other website adds a query string such as "?ref=origin.com" to links pointing to authors website.
How does this benefit the other website? How does this hurt the authors website?
I am completely confused about the behavior of both side here.
I get that when I run an ad-campaing I want google to add a utm-query string, so I can track which campaign users arrived from - but then the origin and the destination are working together. Here the origin just adds stuff for no reason. Why?
It's marketing for the origin site. The line of thought is that the author sees significant traffic from xyz.com in the ref query string, and considers advertising or partnering with the origin site.
Honestly, it is quite useful for niche/startup sites. I have been on both ends of conversations that began from seeing these in web analytics (as someone that saw incoming traffic from a site and reached out, and as someone that received contact from a site I linked to) - and both times it ended in a mutually beneficial partnership.
I can understand the privacy argument to some degree, but it provides no more information than the standard Referer header (and if you use analytics like Simple Analytics/Plausible, it is a lot more visible).
I’m broadly anti-tracking: it’s generally against the interests of the individual.
Query string additions are commonly used to track things. You can see that lots of people don’t want that by the existence of Firefox features like “copy clean link” and Extended Tracking Protection which proactively strips some like UTM parameters.
Some sites happily participate in what I will glibly call the tracking economy. They may benefit because the recipient will see in their logs that lots of people are coming from their site, and do something that helps their site because of that.
My rejecting query strings is a simple form of protest against that system.
I understand being for privacy, but on the flip side, information about you can result in a better experience. E.g. in the case of tracking where a person comes from, that can help those two websites improve by coordinating with each other in some way. Or your ads might actually show you something you didn't know you existed that you end up buying. That's probably better than seeing ads you likely have zero interest in. I'll admit it's creepy when an ad is incredibly tuned to your recent internet activity, though.
If you have a popular website and you add that parameter the target easily sees who sends them traffic, that could be the base of sponsorships / affiliate arrangements for example.
But you see that anyway from access.log or whatever your server supports and dashboard/analytics shows it anyway? What's the benefit of adding origin to query string?
Some web pages don't send referrers by making all links rel="noreferrer". Mastodon used to do this by default, though now they've changed their stance.
Links opened from non-browser apps don't have any referrer information either. E.g. if somebody shares your link on iMessage, WhatsApp, or Telegram.
Email clients may also strip out referrers, but I'm not entirely sure about this one.
If people read your work via RSS readers, you'll almost certainly not get any referrers. Unless it's a web-based reader like Feedly.
My website gets a lot of traffic marked as "Direct / None" by Plausible. I suspect this is traffic from RSS readers or Mastodon, but I can't be sure. A few times I've considered adding a "?ref=RSS" to all URLs served to RSS readers and "?ref=Mastodon" to everything I post on Mastodon. But like the author of this post, I feel uncomfortable tracking my readers like this.
Adblockers like umatrix have options like "Spoof Referer header". I have this setting enabled, so adding tracking query strings to URLs would go against my user preference.
> It is a small, decentralised, self-hosted web console that lets visitors to your website explore interesting websites and pages recommended by a community of independent personal website owners.
Back in the Stone Age, we called these “Webrings,” but they weren’t as fancy.
One of the issues that I faced, while developing an open-source application framework, was that hosting that used FastCGI, would not honor Auth headers, so I was forced to pass the tokens in the query. It sucked, because that makes copy/paste of the Web address a real problem. It would often contain tokens. I guess maybe this has been fixed?
In the backends that I control, and aren’t required to make available to any and all, I use headers.
> an open-source application framework, was that hosting that used FastCGI, would not honor Auth headers
So you were writing your application as a fcgi-app, and (e.g.) Apache was bungling Auth headers? Can you expand on this? Curious about the technical detail of (I guess) PARAM records not actually giving you what you expect?
I don’t remember, exactly. Long time ago (I stepped away from that project many years ago).
I just remember the auth headers never showing up in the $_SERVER global (it was a PHP app). This was what I was told was the issue. They made it sound like it was well-known.
This is because of a deeply annoying default in Apache, where for "security reasons" the underlying script doesn't get to see auth details that might already be handled by Apache. At some point they added the CGIPassAuth directive[1] but all kinds of other workarounds are floating around on the internet.
>So I’ve decided to try a blanket ban for this site: no unauthorised query strings.
His site returns (I think incorrectly) a 414 if a request includes a query string. If this protest is meant to advocate for the user, who presumably wasn't able to manage that string in the first place, why would you penalize them for it being there?
Why not just use it as a cue to tell users how they can make this decision themselves (e.g. through browser tools)?
"You could argue that I’m abusing 414 URI Too Long. I respond that it’s funnier this way. Other options I considered were:
400 Bad Request, the generic client error code, which is correct but boring;
402 Payment Required, and honestly if you want to pay me to make a particular URL with query string work, I’m open to it;
404 Not Found, but it’s too likely to have side effects, and it doesn’t convey the idea that the request was malformed, which is what I’m going for; and
303 See Other with no Location header, which is extremely uncommon these days but legitimate. Or at least it was in RFC 2616 (“The different URI SHOULD be given by the Location field in the response”), but it was reworded in 7231 and 9110 in a way that assumes the presence of a Location header (“… as indicated by a URI in the Location header field”), while 301, 302, 307 and 308 say “the server SHOULD generate a Location header field”. Well, I reckon See Other with no Location header is fair enough. But URI Too Long was funnier."
I don't think it's an abuse, RFC9110 defines 414 as a response for "refusing to service the request because the target URI is longer than the server is willing to interpret". Since adding a query string involves only adding characters, this seems fine; there's no stipulation as far as I can tell that all pages a server hosts must adhere to the same length. I'd be curious if any well-known clients interpret it that way though, and make caching decisions based on it. As far as I know, they shouldn't.
Obviously it's against the spirit of the thing, but I don't think it's wrong per-se.
I reckon it is still an abuse. I am willing to interpret longer target URIs… so long as they don’t contain a question mark. /no-query-strings is longer than /?.
Opinions vary on how good an idea the robustness principle is. That is why, for example, the XML standard requires a conforming validator to throw an error on invalid XML.
In our modern world, the robustness principle has become an invitation to security bugs, and vendor lock-in. Edge cases snuck through one system on robustness, then trigger unfortunate behavior when they hit a different system. Two systems tried to do something reasonable on an ambiguous case, but did it differently, leading to software that works on one, failing to work on the other.
I generally agree, but I don't think XML is the best example. Getting HTML out of XML is considered to have been the right move isn't it? I was pro-XHTML2 at the time but in retrospect, have we suffered much for not sending webpage validation errors to end users?
Once people have gotten used to not having to conform, forcing them to conform is an uphill battle. Doubly so when, as happened with Microsoft and IE, the vendors would like to encourage vendor lock-in. The only time to reasonably do it is at the start.
That said, we are paying a huge complexity cost due to our efforts to allow nonconforming pages. This complexity is widely abused by malicious actors. See, for instance, https://cheatsheetseries.owasp.org/cheatsheets/XSS_Filter_Ev... for ways in which attackers try to bypass security filters. A lot of it is only possible because of this unnecessary complexity.
But, this is robust? I mean it's pretty clearly stating that you are visiting an unsupported URL. It provides direction on what to do about it to the user. It does not crash the browser or the server. In pretty much every dimension this is highly robust.
The robustness principle is itself bad manners, in plenty of contexts. If I deliver packages by throwing them at the customer, I really want a customer to tell me "hey, don't throw packages at me!" before I attempt to lob something fragile and breakable, or something heavy at someone fragile and breakable. Otherwise, how am I supposed to learn that I'm doing anything wrong?
It's been years but I seem to remember there was a version of PLSQL server pages that would return 500 if you tried to pass in an unknown query string.
Just straight "400" ("Bad Request") or "403" ("Forbidden") would also probably be defensible. Odd that there aren't any error response codes specific to URI parameters.
Several options which seem like they might be appropriate aren't on close examination:
- "406" ("Not Acceptable") which is based on content-negotiation headers.
- "409" ("Conflict") which is largely for WebDAV requests.
- Others such as 411, 422, and 431 are also for specific conditions which aren't relevant here.
- 300 or 500 errors are inappropriate as this isn't a relocation or server-side failure, it's a client-side request problem.
I think either 400 or 404 would be fine. 400 because the request isn't in the expected format, 404 because a resource with that query string doesn't exist.
Of course they do. For example you can lower a string from the top to query the fill level. Or you can wrap a string around the pot to query the circumference.
The tone of this and Chris's post gives me the impression that it's harmful to include these query parameters, but I don't understand how. Could someone elucidate me? I understand it can mangle some URLs and that's good enough reason not do it, but even then it seems like a minor incovenience.
The technical purist: you’re modifying a URL in a way that, while in line with accepted custom, is technically incorrect. URLs should (the least effective type of should) basically be treated as opaque.
Social: it’s tracking stuff, sibling comment trees are good, I won’t reiterate.
Clutter: it’s getting in the way of the bit you should care about, and contributing to normal people no longer caring about URLs because they’re too hard, too complex.
There are a lot of reasons I might not want a site to know where I came from to get to their site. It is basically sharing your browsing history with the site you are visiting.
Because of this, there have been a lot of updates to the http referer header, with restrictions on when it is sent, and an ability to opt out of the feature entirely.
Adding a url parameter with the same information bypasses any of these existing rules and ability to opt out. They should just use the standard.
If I release a video and send an email newsletter at the same time, which one caused the traffic increase? Should I invest in making more videos of sending more emails?
If you insist on knowing, include a different url in both that goes to the same place and use your damn server logs. You don’t need google analytics and whatever.
presumably you control the urls you are sending in the email. As a result if you want to use query strings that's fine. The issue only arises when you use query strings to implement tracking on someone else's site instead.
What's interesting is that none of these sites have a "search" feature. Which is an important accessibility feature and a clear and legitimate use case for a query string.
Oh, I have a couple - the users did not agree on being tracked (these query params are tracking information), and the site administrator does not want incoming traffic to be tracked. I know the latter can be hard to understand, but I for example sure as hell do not want to have any info in my logs that can be used to harm my users.
On a more personal note, I hate it when I go to copy a link to send via a message, and the tracking code glued onto it is twice as long as original URL... I either have to fiddle around with it to clean it up or leave the person I sent it to to wonder wtf am I on about with a screenful of random characters...
So it's violating users' privacy, it's shit UX, and on top of that, nobody asked for it...
Query strings are useful for way more than just tracking. Saving and servicing search queries is a way more common use case. So assuming it's only useful for tracking is very misleading.
Query strings are probably the least invasive tracking. They are transparent, obvious, and anonymous. Users are free to strip out and edit query strings if they don't want them.
More to the point, I can essentially do the same thing with HTTP routing - create an infinite number of unique URLs for tracking purposes. In that regard calling out query strings specifically for essentially the same thing but more transparently seems like splitting hairs.
Yeah I do not see an alternative way to easily copy paste links with things like filter settings saved.
Filters especially make sense as query params as they are non sequential but still visually readable as to what they do.
URL slugs make sense for sequential pages that are hierarchical but make no sense for non hierarchical data/routes.
Services can force tracking into links by encoding the whole url into a shortlink that makes it impossible to just remove the tracking alone as everything is encoded into a shorter non editable string.
Thank you for explaining to me that query parameters can be used for other purposes apart from tracking. The articles in question though, are railing against query parameters being abused for tracking purposes - passing referers (sic) and UTM by adding them to URLs of sites that neither process them, nor want them.
Referral query strings are not for tracking though. The person putting them on the links gets nothing out of them. There is no PII being shared. They are purely added out of courtesy.
If I am handing out maps to your address, letting people know who is publishing the map is generally a good thing.
This is like saying having a return to sender address on mail is an invasion of privacy.
Maybe an alternative would be to inconvenience people following such links still, but somewhat less.
Instead of responding with an error, give a page that states “The link you followed to get here appears to have had some tracking gubbins added, in case you are a bot following arbitrary links, and/or using random URL additions to look like a more organic visit, please wait while we run a little PoW automaton deterrent before passing you on to the page you are looking for.” then do a little busy work (perhaps a real PoW thingy) before redirecting. Or maybe don't redirect directly, just output the unadorned URL for the user to click (and pass on to others). This won't stop the extra gubbins being added of course, but neither will the error and this inconveniences potential readers less.
This basically boils down to "reject any incoming links from facebook, pinterest, chatgpt, linkedin, twitter, reddit, youtube, etc". I guess sure? There's a once-famous guy who shows goatse to all referring links from HN. I guess if you get enough traffic that you can pick which sources you want to allow, that's a good problem to have.
How much do platforms mangle people’s links? Figured I’d check the ones you mention. (I was actually mildly surprised to be able to find examples in all of them without needing to log in once. I thought LinkedIn and ChatGPT wouldn’t.)
ChatGPT: ?utm_source=chatgpt.com. (Aside: wow it’s confidently and atrociously wrong if you ask it about me. Ask it just vaguely enough, and it hallucinates someone clearly inspired by me, but who has done a whole lot of stuff that I haven’t. Ask it more precisely about me, and it gets all kinds of details wrong still. I feel further vindicated in hating this stuff. You made me use ChatGPT for the very first time.)
LinkedIn: no.
Twitter: no.
Reddit: no.
YouTube: no.
> if you get enough traffic that you can pick which sources you want to allow, that's a good problem to have.
Nah, I just don’t care about them. It’s my place, I’m doing things on my own terms. Should I discover it to be causing me problems, I’ll burn that bridge when I come to it.
But instead of the banned query strings message, it just returns a very sassy not-a-404 page. Once again, this is violating a common convention, but there's nothing in the HTTP spec that requires treating these URLs the same. Similarly the site also 404s when you add extra slashes like https://chrismorgan.info///no-query-strings
digression: I love trying "domain.com//" on various sites. Occasionally it'll trigger weird errors like a 502 or 500.
You’re misreading a couple of things. There’s no violation of common convention where you’re pointing.
Where dealing with static file servers:
For URLs that are supposed to include a trailing slash and the server will find that directory and serve its index.html: it’s customary, though not ubiquitous, to redirect from no-slash to slash. (Some, including popular commercial services, serve the index.html file instead of redirecting to add the slash. This is extremely wrong because it changes the meaning of relative URLs.)
But the other way round is not common.
My URLs don’t include a file extension, and I think that’s influencing your perception into thinking no-query-strings is logically a directory name. But it’s not, it’s logically a file name, just with the .html removed as unnecessary.
Take https://susam.net/no-query-strings.html as an example; Susam is more clearly just serving from a file system than I am, and leaves the “.html” file extension in the URL. Do you expect https://susam.net/no-query-strings.html/ to work? I hope not. It’s a 404, just as I’d expect, because there is no directory with that name.
> not-a-404 page
No, that’s a 404, just a plain old boring 404, same as any other. In fact, it’s the same 404 page I’ve been using since 2019, just with dark mode support added.
> digression: I love trying "domain.com//" on various sites.
Closely related is adding the trailing dot of a fully-qualified domain name: https://example.com./. I didn’t remember to try this on my new site, but it turns out Caddy won’t talk at https://chrismorgan.info./, so that’s probably good.
I also contemplated making it serve the content of the original page, but with every . and ! turned to ? (or maybe ! changed to ‽). But I decided that was too hard at present. I might revisit it later, there’s a stage in my plans where that will be easier to implement.
I was just thinking about something like this. Instagram and TikTok are major offenders, not everyone wants their personal info blasted everywhere because they copied a link. It would be great to have an iOS shortcut that automatically removes them, it's something I'm going to look into.
This is great! I love one-liners that are readable like this. This made me wonder if there are any extensions that run a script on every page load for a web browser. I'm currently experimenting with userscript managers and plan on including your code as an additional security measure against tracking.
Query strings do have uses (such as for searching files and some other kind of dynamic files), but you shouldn't add them to URLs that should not expect them. So, I agree that they would be right to refuse requests with UTM and other stuff added like that.
I think 404 probably makes the most sense as the response if a query string is not expected but is present anyways, although 400 might also be suitable.
"wander console" sounds like they're just web rings re-invented. In the era of forced feeds by giant corporations which consist of the things they want you to see, I've wondered if this old idea would make a comeback. Human curated content from trusted people seems like the only way forward.
I'll start with the clarification that the moderators have changed the URL of the original post from <https://susam.net/no-query-strings.html> to <https://chrismorgan.info/no-query-strings>. Hopefully, this will prevent any confusion about why we are discussing random walks in a post about query strings. Now let me answer your question.
> Is it not a random walk? Might sound pedantic but if there is graph structure I am interested.
The network is a directed graph. Every Wander Console declares a few other consoles as its neighbours. The person setting up the console decides who they want to list as their neighbours. So if we call the network graph X, then the set of vertices is:
V(X) = the set of all URLs that point to Wander Consoles
and the set of directed edges is:
E(X) = {(u, v) in V(X) : u declares v as its neighbour}
The traversal between consoles is not strictly a random walk. If I could call it something, I would call it randomised graph exploration with frontier expansion. On each click of the 'Wander' button, the tool picks one console at random from the set of discovered consoles and visits that console. It then fetches the neighbours declared by that console and adds any newly discovered consoles to the set.
The difference from a random walk is that the next console is not chosen from the neighbours of the last visited console. It is chosen from the whole set of consoles discovered so far. In other words, each click expands the known part of the graph, but the console used for that expansion is selected randomly from all discovered consoles, not just from the last console visited.
"I don’t like people adding tracking stuff to URLs" and "You abuse your users by adding that to the link" and "no unauthorised query strings" and "At present I don’t use any query strings" but for some reason ?igsh, which i'm pretty sure is an instagram tracking parameter, is allowed. weird
There’s nothing ruder in hypertext etiquette than giving someone a link to navigate to someone else’s HTTP server, where you have manipulated that URL in some way unsanctioned by the server you are sending them to.
You can’t just send arbitrary query string parameters to a server and assume they will just ignore them. Just like you can’t just remove query string parameters and assume the URL will work.
In fact, you usually can just send arbitrary query string parameters to a server - that's why the behavior is so common, and often useful.
Most sites don't mind or break, some sites get value from the behavior in ways hard to replicate in other ways – and those sites that don't like such additions can easily ignore them. And a few lines of code will work better than ineffectually appealing to manners, when the freedom of the web's form of hypertext, and protocols, gives the outlink authors full freedom to craft URLs (and thus requests) however they like.
Crafting outbound links with your own additions and handing them out to visitors to your site is similar to the practice of writing someone’s phone number on the door of a bathroom cubicle with ‘for a good time call:’ written above it.
You’re handing out someone elses’s contact details, but giving the person you hand them to a completely fabricated expectation for how the interaction will go.
Adding query strings is one of those things that I think a lot of sites could get away with more easily if they were reasonable about it.
A link that is "https:// web.site" is fine.
A link that is "https:// web.site?via=another.site" is fine.
A link that is "https:// web.site?fbm=avddjur5rdcbbdehy63edjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63edaaaddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednzzddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63edn"
is annoying as shit and I need to literally apologize to people after sending it if I forget to manually redact the query string. Don't abuse this.
This is not the first site to do so. A few years back, scarygoround.com started blocking query strings, although it seems to have stopped doing so now. Back then, Facebook had started to add ?fbclid=... to every outgoing link.
>Want to share an amazon product on a chat to discuss about it. I would have liked a nice short url that I can copy, instead I get a monstrosity, it forces me to manually select only the id portion of it if I want to share it.
Most of the sites that still use GET queries around here are the tax collection sites run by local governments, which pass those variables around after you login the way your mom... uh, it's HN, skip the mom joke.
I actually get a lot more annoyed by routing parsers that do the same thing Get requests do only by pretending to be a real URL.
Just referrer policy of strict origin when cross origin gives host level referer (sic) header in most mainstream browsers unless user has configured otherwise right? That’s usually enough for web authors to know what audience they’re appealing to and privacy-maximizers can turn off that header sending.
Indeed, that's not a query string! The #, and following text, is a fragment, is client-side only, and isn't the subject of the blogpost. Neither is percent encoding, which is just another way to send the exact same path from your browser to the server.
Note that it has nothing to do with the length of the URL. That's just the error message he's chosen to use, because "4xx stop pissing about with my URLs" doesn't exist in the spec.
YouTube is also quite famous with their source identifiers, especially with the short urls, the tracking part is longer then the url I’m trying to share.
I was just wondering if I should do something like this. I use a couple query string values and I validate them and issue a 40x if the value is invalid. So, I was wondering if I should issue a 40x for an unused query string val.
Query strings break unpredictably, and that alone is enough to ban them by third parties, especially for something as minor as referral tracking.
Example: The Browser is a well known link aggregation paid periodical. I subscribe, and every 1 in 10 or 20 links I clicked, it'd just break outright and I'd have to tediously edit the URL to fix it (assuming the website didn't do a silent ninja URL edit and make it impossible for me to remember what URL I opened possibly days or weeks ago in a tab and potentially fix it). This was annoying enough to bother me regularly, but not enough to figure out a workaround.
Why? ...Because TB was injecting a '?referrer=The_Browser' or something, and the receiving website server got confused by an invalid query and errored out. 'Wow, how careless of The Browser! Are they really so incompetent as to not even check their URLs before mailing an issue out to paying subscribers?'
I wondered the same thing, and I eventually complained to them. It turns out, they did check all their URLs carefully before emailing them out... emphasis on 'before', which meant that they were checking the query-string-free versions, which of course worked fine. (This is a good example of a testing failure due to not testing end-to-end or integration testing: they should have been testing draft emails sent to a testing account, to check for all possible issues like MIME mangling, not just query string shenanigans.)
After that they fixed it by making sure they injected the query string before they checked the URLs. (I suggested not injecting it at all, but they said that for business reasons, it was too valuable to show receiving websites exactly how much traffic TB was driving to them on net, because referrers are typically stripped from emails and reshares and just in general - this, BTW, is why the OP suggestion of 'just set a HTTP referrer header!' is naive and limited to very narrow niches where you can be sure that you can, in fact, just set the referrer header.)
But this error was affecting them for god knows how long and how many readers and how many clicks, and they didn't know. Because why would they? The most important thing any programmer or web dev should know about users is that "they may never tell you": https://pointersgonewild.com/2019/11/02/they-might-never-tel... (excerpts & more examples: https://gwern.net/ref/chevalier-boisvert-2019 ). No matter how badly broken a feature or service or URL may be, the odds are good that no user will ever tell you that. Laziness, public goods, learned helplessness / low standards, I don't know what it is, but never assume that you are aware of severe breakage (or vice-versa, as a user, never assume the creator is aware of even the most extreme problem or error).
Even the biggest businesses.... I was watching a friend the other day try to set up a bank account in Central America, and clicking on one of the few banks' websites to download the forms on their main web page. None of the form PDF download links worked. "That's not a good sign", they said. No, but also not as surprising as you might think - the bank might have no idea that some server config tweak broke their form links. After all, at least while I was watching, my friend didn't tell them about their problem either!
I don't see how your example, The Browser (thebrowser.com), supports your argument that ad-hoc query-string additions are so prone-to-breaking that 3rd parties should ban them.
In fact, the example seems to suggest the opposite: a 17+ year successful paid subscription business – to which you appear to be a generally-satisfied customer! – receives enough "business value" from the practice, despite its failure modes, they don't want to stop. Improving their probe of the risk-of-failure was enough.
Seemingly, the practice works often enough, pleasing more destination sites than it angers, that "referral tracking" is not something "so minor".
> Improving their probe of the risk-of-failure was enough.
The point was it was dangerous in a way they didn't even realize was an issue, for a thin business rationale. Unless you are going to do thorough tests and understand the risk you are taking (which they did not, as evidenced by screwing it up systematically at scale for years), you should not be doing it.
And it's not obvious that they are correct in their tightened-up testing, because even if a link is correct at the time they test it, it could break at any time thereafter.
> to which you appear to be a generally-satisfied customer!
No matter what _X_ is, _X_ would have to be a pretty epic screwup to make a customer unsubscribe solely over that! I never claimed it was such a major epic screwup that it could do that. So that is an unreasonable criterion: "well, you didn't outright quit, so I guess it can't be that bad." Indeed, but I never said it was, and somewhat bad is still bad; I was in fact fairly annoyed by the random breakage, and at the margin, everything matters. If TB did a few other things, in sum, they could potentially convince me to let my subscription lapse. An annoyance here, a papercut here, and pretty soon a generally-satisfied customer is no longer so satisfied...
Referrer is sometimes nice to know. If your site gets a traffic spike from an email newsletter that traffic won't correctly identify the source in the http headers.
> curl, for example, seems to illegitimately strip a trailing question mark
(could be only for the command line, didn’t test library usage).
umm what? I don't know what they're actually sending where they think this, but if you think curl is broken you should re-think that maybe you're the one doing something wrong.
Here are some examples showing curl not stripping question marks (obviously), I am very curious what this person was actually seeing
Query strings are awesome. Especially for one-page applications.
I build a lot of internal applications, and one of my golden UI rules is that a user should be able to share their URL and other users should be able to see exactly what the sender did.
So if you have a dashboard or visualization where the user can add filters or configurations, I have all of their settings saved automatically in the URL. It's visible, it's obvious, it's easy, it's convenient.
>There is also a moral question here about whether it is okay to modify a given URL on behalf of the user in order to insert a referral query string into it. I think it isn't.
These dogmatic technical screeds are all so weird to me. They usually reveal more about the authors lack of experience or imagination than provide a useful truism.
> If I wanted to know I’d look at the Referer header; and if it isn’t there, it’s probably for a good reason. You abuse your users by adding that to the link.
The reason is that the referrer headers are a usability and privacy nightmare. It's weird for the author to jump to such a conclusion.
This referral information is being done purely as a courtesy to the webhost. If we imagined a world in which ChatGPT or Wikipedia launched massive hugs of death on referral links without attributing themselves, that is a much, much worse outcome.
There's a referrer header, if the client wishes to send it. If they don't, the "courtesy to the web host" is done at the expense of the client. This particular web host takes umbrage at other sites taking advantage of their clients that way, which seems reasonable to me.
A relatively minor impact concern is that query strings create a new cache entry both in the browser and typically on server-side caches unless configured otherwise, so you might want to use URL fragment parameters if the parameters are only used by clientside JavaScript but the server response is the same.
It also makes me wonder what other noxious online behaviours might be addressed through ... creative ... client-side responses similar to this.
We've already seen, for years, sites attempting to socially-condition people over the use of ad-blockers and Javascript disablers. No reason why the Other Side can't fight back as well.
> After I implemented that feature, a page from one of my favourite websites refused to load in the console... the third URL returns an HTTP 404 error page. The website uses the query string to determine which one of its several font collections to show.
Yes, let's unilaterally decide that query strings are bad because one website (ab)uses query strings to load different fonts.
It's the query strings that are the problem, not the website!
jfc.
Look, I'm against utm fragments as much as the next guy, but let's not throw away a perfectly good thing because tracking is evil.
Adding your own garbage to someone else's URLs is in fact the problem. Could they handle your garbage better? Sure. Is your garbage still a problem? Yes.
Postel's law worked OK when people operated in good faith. But today the internet is full of abusers. Rejecting requests that aren't exactly what they should be is probably the best policy now.
Postel's law is typically stated as "be conservative in what you do, be liberal in what you accept from others". It's unfortunately common for people to ignore the first half and hallucinate a third clause demanding that the recipient stay silent about the errors they receive.
That website is not abusing query strings, though, its usage of query strings is perfectly cromulent. And tfa is not saying not to use query strings, but not to append random garbage to other people's URLs.
The website uses the feature for its intended purpose. Adding random trash to the query string of another website assuming it'll ignore it is in fact a bad idea, always, even if you can usually get away with it.
Trying to boostrap some taboo against novel unpermissioned URL munging is silly prudishness.
Ensuring both sides of a hyperlink agree/consent was a design flaw that limited the uptake of pre-web hypertext systems. The web's laissez-faire approach demonstrated a looser coupling was far better for users, despite all the new failure modes.
Of course any site/server has the practical power free to treat inbound requests as rigorously (or harshly) as they want. But by the web's essential nature, it is equally part of the inherent range-of-freedom of outlink authors to craft their URLs (and thus the resulting requests) however they want. URLs are permissionless hyperlanguage, not the intellectual property of entities named therein.
Plenty of sites welcome such extra info, and those that don't want it can ignore it easily enough – including by just not caring enough about the undefined behavior/failures to do nothing.
Though, when a web publisher has naively deployed a system that's fragile with respect to unexpected query-string values, they should want to upgrade their thinking for robustness, via either conscious strictness or conscious permissiveness. Thereafter, their work will be ready for the real web, not a just some idealized sandbox where scolding unwanted behavior makes sense.
Running your own small website is a constant battle against grifters and bad online etiquette. When people hotlink images, I usually make a point of having some personal fun with mod_rewrite.
It's just a string though. A project that I'll never get to is a custom webserver so that QR codes can use the smaller characterset, so it can link to a URL with parameters without forcing the larger character set.
That does not make a lot of sense. Yes, you can do what you want
with your website, but query-string is a way for users to query
for additional information or wants or needs. I use them on my own
websites to have more flexibility. For instance:
foobar.com/ducks?pdf
That will download the website content as a formatted .pdf file.
I can give many more examples here. The "query strings are horrible"
I can not agree with at all. His websites don't allow for query
strings? That's fine. But in no way does this mean query strings
are useless. Besides, what does it mean to "ban" it? You simply
don't respond to query strings you don't want to handle. We do
so via general routing in web-applications these days.
This isn't relevant when talking about links to his site. This is relevant when talking about links to your site.
> Besides, what does it mean to "ban" it? You simply don't respond to query strings you don't want to handle.
It means that you're going to get some sort of 400 error when you follow a link to his site with a query string attached to it. He simply will not respond to query strings that he doesn't want to handle, which is all of them.
I mean…the site that broke should know what to do with arbitrary query strings. If your site breaks when someone puts in an invalid query string, that’s on you?
I'm going to be honest, I was pretty geared up to have a contrarian opinion until I looked at the standards but they're actually pretty clear, a 404 could be a proper response to unexpected query string; query string is as much part of the URL API as the path is and I think pretty much everyone can acknowledge that just tacking random stuff onto the path would be ill advised and undefined behavior.
[0]: https://url.spec.whatwg.org/#application/x-www-form-urlencod...
[1]: https://url.spec.whatwg.org/#url-class
In fact lots of sites still work like that, they just hide it behind a couple rewrite rules in apache/nginx for SEO reasons
On the other hand, if it's a CRUD app and you're filtering a list of entities by various field values? Returning that no items matched your selection (or an empty list, if an API) makes more sense than a 404, which would more appropriate for an attempt to pull up a nonexistent entity URI.
If it's an API, a 200 with an empty JSON object or array in the body is legitimate as well, but a 204 is explicit.
/books/1 could return 200 or 404 depending on the existence of the book#1, here it make sense because if /books/1 does not exist the API must tell it explicitly. However 404 belongs to the 4XX family which means "client error", is it an error to ask for a non existing book ? If you enter in a bookshop and ask for a book they don't have you did not "make a mistake". It's not like if you asked for a chainsaw. But in an API, especially with hypermedia, you are not supposed to request a resource that does not exist (unless the API provides a link to an existing resource that is was deleted before the caller try to reach it).
If you ask for a book they don't have it's a different matter.
In any case, when you ask for a book in a library you are using their "search" endpoint. The equivalent to opening a books/1 url would be asking for a specific instance of a book by serial number or so. Then it's clear that you made a mistake uf you do that for an unexistent serial number...
/users/ returns a 404 in an API means that this resource does not exist. As in, this is not a part of the API.
/users/123 returns a 404 means this user record does not exist.
Yes this means that a 404 is context dependent but in a way that makes it easier for a human to think of and reason about.
Lots of REST libraries that I’ve used treat any 400 response as an error so generating a 404 when for an empty list would just create more headaches.
Responses with status codes in the 400 range are client errors, so the client shouldn't retry the same request. So a 404 is appropriate despite how annoying a library might be at handling it. Depending on which language/ecosystem you are using, there are likely more sane alternatives.
Although I do feel like I've seen too many instances of a 404 being used for an empty collection where it would make more sense to return `[]` and treat it as an expected (successful) state.
It would have been nice if there was an actually grouping of retriable and not retriable but in reality it’s a complete mess.
But at a minimum beware of 429. That’s not a permanent outage and is a frequent one you might get that needs a careful retry.
400 is the general “bad request” client area, indicating something is wrong with the request but not being specific about what.
404 is simply a more specific client error: it means the client asked for a resource that couldn’t be found.
That's not obvious at all. If I receive json data that contains a property I'm not aware of, i don't reject the entire document for that reason. In the case of query strings, extra query parameters might be used by other parts of the stack besides yours, so rejecting the entire document because someone somewhere else is trying to pass information to itself is the wrong approach.
As a web developer, you’re the like the guy standing with a clipboard outside a fancy club checking if people requesting entry are allowed or not. Basically, level 1 security.
If someone is not on the list, your job is to default to declining them access, not granting them access assuming level 2 security will handle them at a deeper layer.
It’s possible that the teams you work with expect fuzzy behaviour from the website but that’s a choice, not a practice.
This is how the vast majority of websites work. The practical reason is obvious: when we model the behaviour our code depends on, we want to create the simplest possible model that allows our code to work as expected. Placing requirements on it that our code doesn't actually depend on is useless, unneeded, complexity.
> As a web developer, you’re the like the guy standing with a clipboard outside a fancy club checking if people requesting entry are allowed or not. Basically, level 1 security.
there is no security benefit to filtering out unneeded url parameters.
there is - security in depth.
If a url parameter would've been a vulnerability because something lower down the stack misinterprets it (and the param wasn't necessary for your app in the first place), then you've just left a window open for the exploit.
If the set of url params are known ahead of time (which i claim should be true), then you could make adding unknown params an error.
What about passing extra data to fill the server memory with either extra known junk or a script / executable to use with a zero day in an internal component or something.
To misuse the nightclub analogy: it’s like checking for bags not being larger than A4 and disallow knives and other weapons.
Oh yeah? I remember a lot of semicolons from Perl and other CGI stuff where we would now use ampersands, back in the day, both in the path and in the query. (Sometimes the ? itself would be written ;.)
The really funny thing about this is that, when I was worrying about possible side effects if I responded 404, I somehow completely forgot how much of the web’s history the path has been useless for. Paths have won. No one really starts new things with URLs like /item?id=… any more. Yay!
So en.wikipedia.org/wiki/// is the article about C++ style comments
Though there are “smart” CDNs that will resize images etc. all beats are off for those.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...
effectively lets you specify what parts of a query are relevant. So for example
url?a=b&c=d matches url?c=d&a=b in terms of caching
This feels like a technically correct is the best kind of correct situation. Like technically, yeah web servers may respond 404 if they dont understand a query parameter, but in practise that is not how urls are conceptualized normally.
Seems a lot better than the other potential world we could lived in, where paths were a black box and every web server/framework invented their own structure for them.
It’s your website. Have fun with it! Do dumb things! :-)
MII//epi
Is converted to MII/epi
https://github.com/gritzko/beagle
Grouping data by user is common and normal in computing: /home laid precedent decades ago.
Project directories are an extremely common grouping within a user’s work sets. Yeah, some of us just dump random files in $HOME, but this is still a sensible tier two path component.
The choice to make ‘view metadata-wrapped content in browser HTML output’ the default rather than ‘view raw file contents’ the default is legitimate for their usage. One could argue that using custom http headers would be preferable to a path element (to the exclusion of JavaScript being able to access them, iirc?) or that the path element blob should be moved into the domain component or should prefix rather than suffix the operands; all valid choices, but none implicitly better or worse here.
Object hash is obviously mandatory for git permalinks, and is perhaps the only mandatory component here. (But notably, that’s not the same as a commit hash.) However, such paths could arguably be interpreted as maximally user-hostile.
File path, interestingly enough, is completely disposable if one refers to a specific result object hash within a commit, but if the prior object hash was required to be a commit, then this is a valid unique identifier for the filesystem-tree contents of that commit. You could use the object hash instead of the full path within the commit hash, but that’s a pretty user-hostile way to go about this.
So, then, which part of the ordering and path selections do you consider indiscriminate, and why?
Query strings are more verbose as force to give each param a name.
edit: for instance, that specific VERBS.md is represented by the blob 3b9a46854589abb305ea33360f6f6d8634649108.
"blob" is like a descriptor of the value that follows. it would be like doing this:
this actually irks me every time i see it in a github urlExcept it's not, because the oid can be a short hash (https://github.com/gritzko/beagle/blob/a7e172/VERBS.md) and that means you're at risk of colliding with every other top-level entry in the repository, so you're restricting the naming of those toplevel entries, for no reason.
So namespacing git object lookups is perfectly sensible, and doing so with the type you're looking for (rather than e.g. `git` to indicate traversal of the git db) probably simplifies routing, and to the extent that it is any use makes the destination clearer for people reading the link.
Back when GitHub URLs were kind of cool, github.com/user/gritzko/project/beagle would have been much less cool than just github.com/gritzko/beagle.
They are not. There's just a routing layer below the repository.
Of course there's nothing to stop you using URIs like this (I think Angular does, or did at one point?) but I don't think the rules for relative matrix URIs were ever figured out and standardised, so browsers don't do anything useful with them.
For sites without Javascript, it's great for things like search boxes, tables with sorting/filtering, etc. instead of POST, since it preserves your query in the URL.
https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...
Or you could accept that you're probably going to need a round trip to the server and use a normal URL and it's fine.
For all but the absolute biggest websites in the world, anyhow. At Facebook or Google scale yeah it's needed.
So yes query parameters existed before CGI but to use them you had to hack your server to do something with them (iirc NCSA web servers had some magic hacks for queries). CGI drove standardization.
Forgive the phone-posting.
Paths are hierarchical; query strings are name/value.
(Note I speak of common usage.)
You can create a different convention, but that one is pretty dang useful.
How does this benefit the other website? How does this hurt the authors website?
I am completely confused about the behavior of both side here.
I get that when I run an ad-campaing I want google to add a utm-query string, so I can track which campaign users arrived from - but then the origin and the destination are working together. Here the origin just adds stuff for no reason. Why?
Honestly, it is quite useful for niche/startup sites. I have been on both ends of conversations that began from seeing these in web analytics (as someone that saw incoming traffic from a site and reached out, and as someone that received contact from a site I linked to) - and both times it ended in a mutually beneficial partnership.
I can understand the privacy argument to some degree, but it provides no more information than the standard Referer header (and if you use analytics like Simple Analytics/Plausible, it is a lot more visible).
Why? Already getting traffic for free.
Query string additions are commonly used to track things. You can see that lots of people don’t want that by the existence of Firefox features like “copy clean link” and Extended Tracking Protection which proactively strips some like UTM parameters.
Some sites happily participate in what I will glibly call the tracking economy. They may benefit because the recipient will see in their logs that lots of people are coming from their site, and do something that helps their site because of that.
My rejecting query strings is a simple form of protest against that system.
Some web pages don't send referrers by making all links rel="noreferrer". Mastodon used to do this by default, though now they've changed their stance.
Links opened from non-browser apps don't have any referrer information either. E.g. if somebody shares your link on iMessage, WhatsApp, or Telegram.
Email clients may also strip out referrers, but I'm not entirely sure about this one.
If people read your work via RSS readers, you'll almost certainly not get any referrers. Unless it's a web-based reader like Feedly.
My website gets a lot of traffic marked as "Direct / None" by Plausible. I suspect this is traffic from RSS readers or Mastodon, but I can't be sure. A few times I've considered adding a "?ref=RSS" to all URLs served to RSS readers and "?ref=Mastodon" to everything I post on Mastodon. But like the author of this post, I feel uncomfortable tracking my readers like this.
Back in the Stone Age, we called these “Webrings,” but they weren’t as fancy.
One of the issues that I faced, while developing an open-source application framework, was that hosting that used FastCGI, would not honor Auth headers, so I was forced to pass the tokens in the query. It sucked, because that makes copy/paste of the Web address a real problem. It would often contain tokens. I guess maybe this has been fixed?
In the backends that I control, and aren’t required to make available to any and all, I use headers.
So you were writing your application as a fcgi-app, and (e.g.) Apache was bungling Auth headers? Can you expand on this? Curious about the technical detail of (I guess) PARAM records not actually giving you what you expect?
I just remember the auth headers never showing up in the $_SERVER global (it was a PHP app). This was what I was told was the issue. They made it sound like it was well-known.
[1]: https://httpd.apache.org/docs/2.4/en/mod/core.html#cgipassau...
His site returns (I think incorrectly) a 414 if a request includes a query string. If this protest is meant to advocate for the user, who presumably wasn't able to manage that string in the first place, why would you penalize them for it being there?
Why not just use it as a cue to tell users how they can make this decision themselves (e.g. through browser tools)?
Obviously it's against the spirit of the thing, but I don't think it's wrong per-se.
>Complain to whoever gave you the bad link, and ask them to stop modifying URLs, because it’s bad manners.
It's ironic that an error response so blatantly violating the robustness principle is throwing shade about bad manners.
In our modern world, the robustness principle has become an invitation to security bugs, and vendor lock-in. Edge cases snuck through one system on robustness, then trigger unfortunate behavior when they hit a different system. Two systems tried to do something reasonable on an ambiguous case, but did it differently, leading to software that works on one, failing to work on the other.
That said, we are paying a huge complexity cost due to our efforts to allow nonconforming pages. This complexity is widely abused by malicious actors. See, for instance, https://cheatsheetseries.owasp.org/cheatsheets/XSS_Filter_Ev... for ways in which attackers try to bypass security filters. A lot of it is only possible because of this unnecessary complexity.
Another option to consider is "418 I'm a teapot": teapots usually also don't support query strings
Several options which seem like they might be appropriate aren't on close examination:
- "406" ("Not Acceptable") which is based on content-negotiation headers.
- "409" ("Conflict") which is largely for WebDAV requests.
- Others such as 411, 422, and 431 are also for specific conditions which aren't relevant here.
- 300 or 500 errors are inappropriate as this isn't a relocation or server-side failure, it's a client-side request problem.
Teapot or too long seem best bets.
I’ve always used them in API servers when a client was POSTing to create a duplicate of a unique item.
Im not making this up btw. A old NOC I woeked at emitted every error as 200 OK with the body message with the real error. They were a real shitshow.
The technical purist: you’re modifying a URL in a way that, while in line with accepted custom, is technically incorrect. URLs should (the least effective type of should) basically be treated as opaque.
Social: it’s tracking stuff, sibling comment trees are good, I won’t reiterate.
Clutter: it’s getting in the way of the bit you should care about, and contributing to normal people no longer caring about URLs because they’re too hard, too complex.
There are a lot of reasons I might not want a site to know where I came from to get to their site. It is basically sharing your browsing history with the site you are visiting.
Because of this, there have been a lot of updates to the http referer header, with restrictions on when it is sent, and an ability to opt out of the feature entirely.
Adding a url parameter with the same information bypasses any of these existing rules and ability to opt out. They should just use the standard.
This is talking about links to third party sites, not your own.
Isn't this functionally the exact same?
You could simply throw the information away.
It's a ridiculously extreme stance and lacks proper explanation how this will lead to a better web.
They aren't saying the concept of query strings are bad, They're saying unsolicited query strings during referal are the issue.
On a more personal note, I hate it when I go to copy a link to send via a message, and the tracking code glued onto it is twice as long as original URL... I either have to fiddle around with it to clean it up or leave the person I sent it to to wonder wtf am I on about with a screenful of random characters...
So it's violating users' privacy, it's shit UX, and on top of that, nobody asked for it...
Query strings are useful for way more than just tracking. Saving and servicing search queries is a way more common use case. So assuming it's only useful for tracking is very misleading.
Query strings are probably the least invasive tracking. They are transparent, obvious, and anonymous. Users are free to strip out and edit query strings if they don't want them.
More to the point, I can essentially do the same thing with HTTP routing - create an infinite number of unique URLs for tracking purposes. In that regard calling out query strings specifically for essentially the same thing but more transparently seems like splitting hairs.
Filters especially make sense as query params as they are non sequential but still visually readable as to what they do.
URL slugs make sense for sequential pages that are hierarchical but make no sense for non hierarchical data/routes.
Services can force tracking into links by encoding the whole url into a shortlink that makes it impossible to just remove the tracking alone as everything is encoded into a shorter non editable string.
If I am handing out maps to your address, letting people know who is publishing the map is generally a good thing.
This is like saying having a return to sender address on mail is an invasion of privacy.
Instead of responding with an error, give a page that states “The link you followed to get here appears to have had some tracking gubbins added, in case you are a bot following arbitrary links, and/or using random URL additions to look like a more organic visit, please wait while we run a little PoW automaton deterrent before passing you on to the page you are looking for.” then do a little busy work (perhaps a real PoW thingy) before redirecting. Or maybe don't redirect directly, just output the unadorned URL for the user to click (and pass on to others). This won't stop the extra gubbins being added of course, but neither will the error and this inconveniences potential readers less.
Both are good but it seems fair to give priority to the original.
Facebook: no.
Pinterest: ?utm_source=Pinterest&utm_medium=organic.
ChatGPT: ?utm_source=chatgpt.com. (Aside: wow it’s confidently and atrociously wrong if you ask it about me. Ask it just vaguely enough, and it hallucinates someone clearly inspired by me, but who has done a whole lot of stuff that I haven’t. Ask it more precisely about me, and it gets all kinds of details wrong still. I feel further vindicated in hating this stuff. You made me use ChatGPT for the very first time.)
LinkedIn: no.
Twitter: no.
Reddit: no.
YouTube: no.
> if you get enough traffic that you can pick which sources you want to allow, that's a good problem to have.
Nah, I just don’t care about them. It’s my place, I’m doing things on my own terms. Should I discover it to be causing me problems, I’ll burn that bridge when I come to it.
Edit: Perhaps it only mangles links for logged-in users? That raises the possibility that some of the others may also only affect logged-in users.
(Trying with other ones I'm logged in on: Reddit doesn't mangle (obviously), Twitter doesn't mangle.)
https://chrismorgan.info/no-query-strings?
Never have I seen such a sassy web server
I noticed that his server also doesn't accept URLs ending is a single `/`: https://chrismorgan.info/no-query-strings/
But instead of the banned query strings message, it just returns a very sassy not-a-404 page. Once again, this is violating a common convention, but there's nothing in the HTTP spec that requires treating these URLs the same. Similarly the site also 404s when you add extra slashes like https://chrismorgan.info///no-query-strings
digression: I love trying "domain.com//" on various sites. Occasionally it'll trigger weird errors like a 502 or 500.
Where dealing with static file servers:
For URLs that are supposed to include a trailing slash and the server will find that directory and serve its index.html: it’s customary, though not ubiquitous, to redirect from no-slash to slash. (Some, including popular commercial services, serve the index.html file instead of redirecting to add the slash. This is extremely wrong because it changes the meaning of relative URLs.)
But the other way round is not common.
My URLs don’t include a file extension, and I think that’s influencing your perception into thinking no-query-strings is logically a directory name. But it’s not, it’s logically a file name, just with the .html removed as unnecessary.
Take https://susam.net/no-query-strings.html as an example; Susam is more clearly just serving from a file system than I am, and leaves the “.html” file extension in the URL. Do you expect https://susam.net/no-query-strings.html/ to work? I hope not. It’s a 404, just as I’d expect, because there is no directory with that name.
> not-a-404 page
No, that’s a 404, just a plain old boring 404, same as any other. In fact, it’s the same 404 page I’ve been using since 2019, just with dark mode support added.
> extra slashes
Ah, now for that I had to go out of my way, because Caddy misbehaves out of the box: https://chrismorgan.info/Caddyfile#:~:text=%40has%5Fmultiple...
> digression: I love trying "domain.com//" on various sites.
Closely related is adding the trailing dot of a fully-qualified domain name: https://example.com./. I didn’t remember to try this on my new site, but it turns out Caddy won’t talk at https://chrismorgan.info./, so that’s probably good.
I use this bookmarklet to strip query params before sharing a link:
I think 404 probably makes the most sense as the response if a query string is not expected but is present anyways, although 400 might also be suitable.
> Is it not a random walk? Might sound pedantic but if there is graph structure I am interested.
The network is a directed graph. Every Wander Console declares a few other consoles as its neighbours. The person setting up the console decides who they want to list as their neighbours. So if we call the network graph X, then the set of vertices is:
and the set of directed edges is: The traversal between consoles is not strictly a random walk. If I could call it something, I would call it randomised graph exploration with frontier expansion. On each click of the 'Wander' button, the tool picks one console at random from the set of discovered consoles and visits that console. It then fetches the neighbours declared by that console and adds any newly discovered consoles to the set.The difference from a random walk is that the next console is not chosen from the neighbours of the last visited console. It is chosen from the whole set of consoles discovered so far. In other words, each click expands the known part of the graph, but the console used for that expansion is selected randomly from all discovered consoles, not just from the last console visited.
"I don’t like people adding tracking stuff to URLs" and "You abuse your users by adding that to the link" and "no unauthorised query strings" and "At present I don’t use any query strings" but for some reason ?igsh, which i'm pretty sure is an instagram tracking parameter, is allowed. weird
You can’t just send arbitrary query string parameters to a server and assume they will just ignore them. Just like you can’t just remove query string parameters and assume the URL will work.
Most sites don't mind or break, some sites get value from the behavior in ways hard to replicate in other ways – and those sites that don't like such additions can easily ignore them. And a few lines of code will work better than ineffectually appealing to manners, when the freedom of the web's form of hypertext, and protocols, gives the outlink authors full freedom to craft URLs (and thus requests) however they like.
You’re handing out someone elses’s contact details, but giving the person you hand them to a completely fabricated expectation for how the interaction will go.
Example.com/interesting -> bookmark folder one
Example.com/interesting?dummy=t -> bookmark folder two
A link that is "https:// web.site" is fine.
A link that is "https:// web.site?via=another.site" is fine.
A link that is "https:// web.site?fbm=avddjur5rdcbbdehy63edjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63edaaaddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednzzddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63ednddjur5rdcbbdehy63edn"
is annoying as shit and I need to literally apologize to people after sending it if I forget to manually redact the query string. Don't abuse this.
https://www.google.com/search?q=clearurls+addon
https://www.google.com/search?newwindow=1&sca_esv=8061bd9cb1...
Edit: which luckily and sensibly Hacker News cuts short since it's 463 characters
Since the purpose is to show the full URL with trackers and other cruft, that's sensible here:
And yeah, that's pretty awful.In conclusion, Google must be destroyed.
https://support.mozilla.org/en-US/kb/enhanced-tracking-prote...
Right on! It's so liberating having your own wee corner of the internet.
>Want to share an amazon product on a chat to discuss about it. I would have liked a nice short url that I can copy, instead I get a monstrosity, it forces me to manually select only the id portion of it if I want to share it.
I actually get a lot more annoyed by routing parsers that do the same thing Get requests do only by pretending to be a real URL.
https://chrismorgan.info/no-query-strings#:~:text=So%20I%E2%...
but this one was too long:
https://chrismorgan.info/no-query-strings?a=1
https://chrismorgan.info/%6e%6f-%71%75%65%72%79-%73%74%72%69...
Note that it has nothing to do with the length of the URL. That's just the error message he's chosen to use, because "4xx stop pissing about with my URLs" doesn't exist in the spec.
It uses 4xx, but not just 400 :)
https://chrismorgan.info/no-query-strings?why=unknown
Example: The Browser is a well known link aggregation paid periodical. I subscribe, and every 1 in 10 or 20 links I clicked, it'd just break outright and I'd have to tediously edit the URL to fix it (assuming the website didn't do a silent ninja URL edit and make it impossible for me to remember what URL I opened possibly days or weeks ago in a tab and potentially fix it). This was annoying enough to bother me regularly, but not enough to figure out a workaround.
Why? ...Because TB was injecting a '?referrer=The_Browser' or something, and the receiving website server got confused by an invalid query and errored out. 'Wow, how careless of The Browser! Are they really so incompetent as to not even check their URLs before mailing an issue out to paying subscribers?'
I wondered the same thing, and I eventually complained to them. It turns out, they did check all their URLs carefully before emailing them out... emphasis on 'before', which meant that they were checking the query-string-free versions, which of course worked fine. (This is a good example of a testing failure due to not testing end-to-end or integration testing: they should have been testing draft emails sent to a testing account, to check for all possible issues like MIME mangling, not just query string shenanigans.)
After that they fixed it by making sure they injected the query string before they checked the URLs. (I suggested not injecting it at all, but they said that for business reasons, it was too valuable to show receiving websites exactly how much traffic TB was driving to them on net, because referrers are typically stripped from emails and reshares and just in general - this, BTW, is why the OP suggestion of 'just set a HTTP referrer header!' is naive and limited to very narrow niches where you can be sure that you can, in fact, just set the referrer header.)
But this error was affecting them for god knows how long and how many readers and how many clicks, and they didn't know. Because why would they? The most important thing any programmer or web dev should know about users is that "they may never tell you": https://pointersgonewild.com/2019/11/02/they-might-never-tel... (excerpts & more examples: https://gwern.net/ref/chevalier-boisvert-2019 ). No matter how badly broken a feature or service or URL may be, the odds are good that no user will ever tell you that. Laziness, public goods, learned helplessness / low standards, I don't know what it is, but never assume that you are aware of severe breakage (or vice-versa, as a user, never assume the creator is aware of even the most extreme problem or error).
Even the biggest businesses.... I was watching a friend the other day try to set up a bank account in Central America, and clicking on one of the few banks' websites to download the forms on their main web page. None of the form PDF download links worked. "That's not a good sign", they said. No, but also not as surprising as you might think - the bank might have no idea that some server config tweak broke their form links. After all, at least while I was watching, my friend didn't tell them about their problem either!
In fact, the example seems to suggest the opposite: a 17+ year successful paid subscription business – to which you appear to be a generally-satisfied customer! – receives enough "business value" from the practice, despite its failure modes, they don't want to stop. Improving their probe of the risk-of-failure was enough.
Seemingly, the practice works often enough, pleasing more destination sites than it angers, that "referral tracking" is not something "so minor".
The point was it was dangerous in a way they didn't even realize was an issue, for a thin business rationale. Unless you are going to do thorough tests and understand the risk you are taking (which they did not, as evidenced by screwing it up systematically at scale for years), you should not be doing it.
And it's not obvious that they are correct in their tightened-up testing, because even if a link is correct at the time they test it, it could break at any time thereafter.
> to which you appear to be a generally-satisfied customer!
No matter what _X_ is, _X_ would have to be a pretty epic screwup to make a customer unsubscribe solely over that! I never claimed it was such a major epic screwup that it could do that. So that is an unreasonable criterion: "well, you didn't outright quit, so I guess it can't be that bad." Indeed, but I never said it was, and somewhat bad is still bad; I was in fact fairly annoyed by the random breakage, and at the margin, everything matters. If TB did a few other things, in sum, they could potentially convince me to let my subscription lapse. An annoyance here, a papercut here, and pretty soon a generally-satisfied customer is no longer so satisfied...
No qualms with OP, your site your rules.
umm what? I don't know what they're actually sending where they think this, but if you think curl is broken you should re-think that maybe you're the one doing something wrong.
Here are some examples showing curl not stripping question marks (obviously), I am very curious what this person was actually seeing
Though I forget if any shell does stuff like that in quotes. Or printing oddities.
I build a lot of internal applications, and one of my golden UI rules is that a user should be able to share their URL and other users should be able to see exactly what the sender did.
So if you have a dashboard or visualization where the user can add filters or configurations, I have all of their settings saved automatically in the URL. It's visible, it's obvious, it's easy, it's convenient.
>There is also a moral question here about whether it is okay to modify a given URL on behalf of the user in order to insert a referral query string into it. I think it isn't.
These dogmatic technical screeds are all so weird to me. They usually reveal more about the authors lack of experience or imagination than provide a useful truism.
> If I wanted to know I’d look at the Referer header; and if it isn’t there, it’s probably for a good reason. You abuse your users by adding that to the link.
The reason is that the referrer headers are a usability and privacy nightmare. It's weird for the author to jump to such a conclusion.
This referral information is being done purely as a courtesy to the webhost. If we imagined a world in which ChatGPT or Wikipedia launched massive hugs of death on referral links without attributing themselves, that is a much, much worse outcome.
It also makes me wonder what other noxious online behaviours might be addressed through ... creative ... client-side responses similar to this.
We've already seen, for years, sites attempting to socially-condition people over the use of ad-blockers and Javascript disablers. No reason why the Other Side can't fight back as well.
Yes, let's unilaterally decide that query strings are bad because one website (ab)uses query strings to load different fonts.
It's the query strings that are the problem, not the website!
jfc.
Look, I'm against utm fragments as much as the next guy, but let's not throw away a perfectly good thing because tracking is evil.
Really not abusing abusing query strings from a standards perspective, a 404 is not an improper response to an unexpected query string
they added these ugly qses into every click on their site, bonkers: ?ref_=nm_ov_bio_lk
Ensuring both sides of a hyperlink agree/consent was a design flaw that limited the uptake of pre-web hypertext systems. The web's laissez-faire approach demonstrated a looser coupling was far better for users, despite all the new failure modes.
Of course any site/server has the practical power free to treat inbound requests as rigorously (or harshly) as they want. But by the web's essential nature, it is equally part of the inherent range-of-freedom of outlink authors to craft their URLs (and thus the resulting requests) however they want. URLs are permissionless hyperlanguage, not the intellectual property of entities named therein.
Plenty of sites welcome such extra info, and those that don't want it can ignore it easily enough – including by just not caring enough about the undefined behavior/failures to do nothing.
Though, when a web publisher has naively deployed a system that's fragile with respect to unexpected query-string values, they should want to upgrade their thinking for robustness, via either conscious strictness or conscious permissiveness. Thereafter, their work will be ready for the real web, not a just some idealized sandbox where scolding unwanted behavior makes sense.
> And you can do what you want with yours!
That does not make a lot of sense. Yes, you can do what you want with your website, but query-string is a way for users to query for additional information or wants or needs. I use them on my own websites to have more flexibility. For instance:
That will download the website content as a formatted .pdf file.I can give many more examples here. The "query strings are horrible" I can not agree with at all. His websites don't allow for query strings? That's fine. But in no way does this mean query strings are useless. Besides, what does it mean to "ban" it? You simply don't respond to query strings you don't want to handle. We do so via general routing in web-applications these days.
That's not at all what the article says. You're responding to a weird strawman that doesn't resemble the article's actual point.
This isn't relevant when talking about links to his site. This is relevant when talking about links to your site.
> Besides, what does it mean to "ban" it? You simply don't respond to query strings you don't want to handle.
It means that you're going to get some sort of 400 error when you follow a link to his site with a query string attached to it. He simply will not respond to query strings that he doesn't want to handle, which is all of them.