Add NTLM support for HTTP(s) servers and proxies#5052
Conversation
d12600e to
aaa7459
Compare
|
Rebased on master. |
|
What would help review this? Should I break out the necessary http client refactoring so that one can review the necessary bits and then follow up with an addition of the NTLM library? |
tiennou
left a comment
There was a problem hiding this comment.
Sorry, I had wanted to review sooner that but got caught up onto things. So, here's a "sparse" review, in that I've read everything, but didn't look hard for bugs, as most of it seems NFC and/or (nice) cleanups.
|
|
||
| static int ntlm_init_context( | ||
| http_auth_ntlm_context *ctx, | ||
| const git_net_url *url) |
There was a problem hiding this comment.
This is out of order, as the git_net_url rename commit is later in the patch series. Just a heads up.
There was a problem hiding this comment.
I think the GitHub UI may be showing commits out of order, 8497027 (the rename to git_net_url) is the first commit in this series.
| git_buf_oom(&host) || | ||
| git_buf_oom(&port) || | ||
| git_buf_oom(&path) || | ||
| git_buf_oom(&query) || |
|
|
||
| void test_online_clone__url_with_no_path_returns_EINVALIDSPEC(void) | ||
| { | ||
| cl_git_fail_with(git_clone(&g_repo, "http://github.com", "./foo", &g_options), |
| return -1; | ||
| } | ||
|
|
||
| git_buf_dispose(&request); |
There was a problem hiding this comment.
Nitpick: I went looking for a realloc call and found none. I'm not sure I got how this makes it "not realloc", though I understood the intent is to remove that dispose call ⬆️ so the request isn't lost. Maybe ?
There was a problem hiding this comment.
Sorry, I could have been more clear. We're not explicitly calling realloc, but we are doing that implicitly, since we were defining the request buffer inside the replay block. So we would create a new buffer for every replay, which is not necessary. Instead we can clear the existing buffer and just put the new URI inside of it.
| @@ -0,0 +1,33 @@ | |||
| /* | |||
There was a problem hiding this comment.
Skipping external dependency code because of sheer laziness. Would you like it reviewed anyway ? (I remember taking a look a few months ago and didn't see anything that stood out).
There was a problem hiding this comment.
I'm pretty comfortable with it. The functional portions are a reasonably direct port of Microsoft's code and from a technical perspective, I've run this through fuzzers and memory checkers galore. So although there may be (and I'm sure there still will be) bugs, I think we'll need to ship this to find them.
| if (parser->status_code == 407 && get_verb == s->verb) | ||
| return on_auth_required(&t->proxy.cred, | ||
| parser, | ||
| &t->proxy, |
There was a problem hiding this comment.
Nitpick: strange indent? Not sure who's wrong though 😅.
| if (parser->status_code == 401 && get_verb == s->verb) | ||
| return on_auth_required(&t->server.cred, | ||
| parser, | ||
| &t->server, |
| } | ||
|
|
||
| return error; | ||
| return git_net_url_parse(&t->proxy.url, t->proxy_opts.url); |
| } | ||
| } | ||
|
|
||
| /* Enforce a reasonable cap on the number of replays */ |
aaa7459 to
4154c2e
Compare
|
/rebuild |
|
Okay, @ethomson, I started to rebuild this pull request as build #2017. |
"Connection data" is an imprecise and largely incorrect name; these structures are actually parsed URLs. Provide a parser that takes a URL string and produces a URL structure (if it is valid). Separate the HTTP redirect handling logic from URL parsing, keeping a `gitno_connection_data_handle_redirect` whose only job is redirect handling logic and does not parse URLs itself.
There's no reason a git repository couldn't be at the root of a server, and URLs should have an implicit path of '/' when one is not specified.
We did not properly support default credentials for proxies, only for destination servers. Refactor the credential handling to support sending either username/password _or_ default credentials to either the proxy or the destination server. This actually shares the authentication logic between proxy servers and destination servers. Due to copy/pasta drift over time, they had diverged. Now they share a common logic which is: first, use credentials specified in the URL (if there were any), treating empty username and password (ie, "http://:@foo.com/") as default credentials, for compatibility with git. Next, call the credential callbacks. Finally, fallback to WinHTTP compatibility layers using built-in authentication like we always have. Allowing default credentials for proxies requires moving the security level downgrade into the credential setting routines themselves. We will update our security level to "high" by default which means that we will never send default credentials without prompting. (A lower setting, like the WinHTTP default of "medium" would allow WinHTTP to handle credentials for us, despite what a user may have requested with their structures.) Now we start with "high" and downgrade to "low" only after a user has explicitly requested default credentials.
Update our CI tests to start a proxy that requires NTLM authentication; ensure that our WIndows HTTP client can speak NTLM.
Increase the permissible replay count; with multiple-step authentication schemes (NTLM, Negotiate), proxy authentication and redirects, we need to be mindful of the number of steps it takes to get connected. 7 seems high but can be exhausted quickly with just a single authentication failure over a redirected multi-state authentication pipeline.
We cannot examine the keep-alive status of the http parser in `http_connect`; it's too late and the critical information about whether keep-alive is supported has been destroyed. Per the documentation for `http_should_keep_alive`: > If http_should_keep_alive() in the on_headers_complete or > on_message_complete callback returns 0, then this should be > the last message on the connection. Query then and set the state.
Some authentication mechanisms (like HTTP Basic and Digest) have a one-step mechanism to create credentials, but there are more complex mechanisms like NTLM and Negotiate that require challenge/response after negotiation, requiring several round-trips. Add an `is_complete` function to know when they have round-tripped enough to be a single authentication and should now either have succeeded or failed to authenticate.
When we get an authentication failure, we must consume the entire body of the response. If we only read half of the body (on the assumption that we can ignore the rest) then we will never complete the parsing of the message. This means that we will never set the complete flag, and our replay must actually tear down the connection and try again. This is particularly problematic for stateful authentication mechanisms (SPNEGO, NTLM) that require that we keep the connection alive. Note that the prior code is only a problem when the 401 that we are parsing is too large to be read in a single chunked read from the http parser. But now we will continue to invoke the http parser until we've got a complete message in the authentication failed scenario. Note that we need not do anything with the message, so when we get an authentication failed, we'll stop adding data to our buffer, we'll simply loop in the parser and let it advance its internal state.
We must always consume the full parser body if we're going to keep-alive. So in the authentication failure case, continue advancing the http message parser until it's complete, then we can retry the connection. Not doing so would mean that we have to tear the connection down and start over. Advancing through fully (even though we don't use the data) will ensure that we can retry a connection with keep-alive.
Ensure that the server supports the particular credential type that we're specifying. Previously we considered credential types as an input to an auth mechanism - since the HTTP transport only supported default credentials (via negotiate) and username/password credentials (via basic), this worked. However, if we are to add another mechanism that uses username/password credentials, we'll need to be careful to identify the types that are accepted.
Include https://github.com/ethomson/ntlmclient as a dependency.
A "connection" to a server is transient, and we may reconnect to a server in the midst of authentication failures (if the remote indicates that we should, via `Connection: close`) or in a redirect.
Hold an individual authentication context instead of trying to maintain all the contexts; we can select the preferred context during the initial negotiation. Subsequent authentication steps will re-use the chosen authentication (until such time as it's rejected) instead of trying to manage multiple contexts when all but one will never be used (since we can only authenticate with a single mechanism at a time.) Also, when we're given a 401 or 407 in the middle of challenge/response handling, short-circuit immediately without incrementing the retry count. The multi-step authentication is expected, and not a "retry" and should not be penalized as such. This means that we don't need to keep the contexts around and ensures that we do not unnecessarily fail for too many retries when we have challenge/response auth on a proxy and a server and potentially redirects in play as well.
For request-based authentication mechanisms (Basic, Digest) we should keep the authentication context alive across socket connections, since the authentication headers must be transmitted with every request. However, we should continue to remove authentication contexts for mechanisms with connection affinity (NTLM, Negotiate) since we need to reauthenticate for every socket connection.
Instead of using `is_complete` to decide whether we have connection or request affinity for authentication mechanisms, set a boolean on the mechanism definition itself.
We stop the read loop when we have read all the data. We should also consider the server's feelings. If the server hangs up on us, we need to stop our read loop. Otherwise, we'll try to read from the server - and fail - ad infinitum.
When we have a keep-alive connection to the server, that server may legally drop the connection for any reason once a successful request and response has occurred. It's common for servers to drop the connection after some amount of time or number of requests have occurred.
When we're issuing a CONNECT to a proxy, we expect to keep-alive to the proxy. However, during authentication negotiations, the proxy may close the connection. Reconnect if the server closes the connection.
When we send HTTP credentials but the server rejects them, tear down the authentication context so that we can start fresh. To maintain this state, additionally move all of the authentication handling into `on_auth_required`.
4154c2e to
7ea8630
Compare
This appears to be the result of Atom's git-utils package vendoring libgit2 during 5.6.0 and 5.6.1 at an incorrect point in libgit2's history, where there was an ABI change where gitno_connection_data was renamed to get_net_url. This landed on June 11th, per [this pull request](libgit2/libgit2#5052) but doesn't account for subsequent commits to src/net.c, realistically they should sort out their submodule or not rely on it in the first place. This new git-utils package was pulled in specifically for 1.41.0 and thus reverting it 5.5.0 from 1.40.0 resolves the issue. Alongside this change I also went ahead and removed some long unnecessary node engine min / max version comparisons.
When we removed libcurl, we inadvertantly removed support for NTLM connections to proxies; it was supported automatically, which I don't think any of the contributors realized.
This adds NTLM support by including https://github.com/ethomson/ntlmclient, which is a pure C NTLM client implementation that was built to live nicely within the libgit2 codebase.
In addition, it does some refactoring to the HTTP codebase to support NTLM connections to both endpoint servers and proxies.