From: Martin Mares Date: Sat, 28 Feb 2004 10:49:48 +0000 (+0000) Subject: Hopefully finally sorted out the "http://www.xyz.cz?param" mess. The true X-Git-Tag: holmes-import~1106 X-Git-Url: http://mj.ucw.cz/gitweb/?a=commitdiff_plain;h=d63e6f9f94894e6dca41a6a6a510fcc14eae58ac;p=libucw.git Hopefully finally sorted out the "http://www.xyz.cz?param" mess. The true semantics turned out to be "http://www.xyz.cz/?param" and most web servers really require "GET /?param". I've changed the normalization rules to add the leading slash if needed which also solves the relative URL problem I mentioned in the comments. However, this means that the SEMANTICS OF NORMALIZED URL'S HAS CHANGED and gatherer databases with URL's in the "http://www.xyz.cz?param" form are now INVALID. I'm going to delete all such URL's from our gatherer now. --- diff --git a/lib/url.c b/lib/url.c index 105c7dd0..cb577cb1 100644 --- a/lib/url.c +++ b/lib/url.c @@ -11,9 +11,8 @@ * * o Escaping of special characters still follows RFC 1738. * o Interpretation of path parameters follows RFC 1808. - * o Parsing a relative URL "x" wrt. base "http://hell.org?y" - * gives an error, which might be wrong. However, I failed - * to find any rule applying to this case in the RFC. + * + * XXX: The buffer handling in this module is really horrible, but it works. */ #include "lib/lib.h" @@ -396,6 +395,18 @@ url_normalize(struct url *u, struct url *b) } } + /* Change path "?" to "/?" because it's the true meaning */ + if (u->rest[0] == '?') + { + int l = strlen(u->rest); + if (u->bufend - u->buf < l+1) + return URL_ERR_TOO_LONG; + u->buf[0] = '/'; + memcpy(u->buf+1, u->rest, l+1); + u->rest = u->buf; + u->buf += l+2; + } + /* Fill in missing info */ if (u->port == ~0U) u->port = std_ports[u->protoid]; diff --git a/lib/url.h b/lib/url.h index c01c1693..a9c44498 100644 --- a/lib/url.h +++ b/lib/url.h @@ -1,7 +1,7 @@ /* * Sherlock Library -- URL Functions * - * (c) 1997--2002 Martin Mares + * (c) 1997--2004 Martin Mares * (c) 2001 Robert Spalek * * This software may be freely distributed and used according to the terms