Skip to content

Conversation

@robertodauria
Copy link
Contributor

@robertodauria robertodauria commented Nov 18, 2025

This PR originates from the realization that running Locate locally and attaching heartbeat instances to it became significantly harder after introducing JWTs for authentication.

Here I implemented 3 selectable backends to verify JWTs:

  1. espv1: (default) this is the previous behavior. It trusts the X-Endpoint-API-UserInfo header added/overwritten by Google App Engine's ESPv1 proxy, which contains the claims if and only if the Authorization header contained a valid JWT with a valid signature.
  2. direct: allows Locate to perform its own JWT signature verification. Requires a -jwt-jwks-url flag pointing to a valid keyset to use for verification. Only for development usage.
  3. insecure: does not verify JWTs and trusts them blindly. Only for development usage.

Additionally, since heartbeat needs to obtain a signed JWT to call Locate and we don't want the token-exchange dependency for local testing, I added a -jwt-token flag that allows to tell heartbeat which JWT to use directly.

Putting everything together:

# Start Redis + Locate
docker compose up --build

# Generate an unsigned JWT
export DEV_JWT=$(./scripts/create-dev-jwt.sh)

# Pass the generated JWT to heartbeat
go run ./cmd/heartbeat/main.go \
    -heartbeat-url ws://localhost:8080/v2/platform/heartbeat \
    -hostname=mlab1-lga0t.mlab-sandbox.measurement-lab.org \
    -experiment=ndt \
    -registration-url=https://siteinfo.mlab-oti.measurementlab.net/v2/sites/registration.json \
    -services 'ndt/ndt7=ws:///ndt/v7/download,ws:///ndt/v7/upload' \
    -jwt-token="$DEV_JWT"
    
# Listen on ports 80/443 so heartbeat reports a healthy node
sudo nc -l -k -p 80 &
sudo nc -l -k -p 443 &

# Wait for ~10s, then test
curl 'http://localhost:8080/v2/nearest/ndt/ndt7?region=US-NJ'

This change is Reviewable

Implement three JWT verification modes:
- ESPv1 (production default, with defense-in-depth)
- Direct (JWKS validation for integration testing)
- Insecure (dev/test only, requires ALLOW_INSECURE_JWT=true)

Enables deployment without ESP and simplifies local testing.
mlab2-lga1t has 0.5 probability in siteinfo, annoying for local testing.
@coveralls
Copy link
Collaborator

coveralls commented Nov 18, 2025

Pull Request Test Coverage Report for Build 1638

Details

  • 204 of 248 (82.26%) changed or added relevant lines in 7 files are covered.
  • 6 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-1.1%) to 91.206%

Changes Missing Coverage Covered Lines Changed/Added Lines %
clientgeo/maxmind.go 19 21 90.48%
auth/jwtverifier/espv1.go 34 40 85.0%
auth/jwtverifier/insecure.go 38 44 86.36%
cmd/heartbeat/main.go 18 30 60.0%
auth/jwtverifier/direct.go 68 86 79.07%
Files with Coverage Reduction New Missed Lines %
cmd/heartbeat/main.go 2 68.21%
handler/handler.go 4 95.8%
Totals Coverage Status
Change from base Build 1620: -1.1%
Covered Lines: 2261
Relevant Lines: 2479

💛 - Coveralls

Copy link
Collaborator

@bassosimone bassosimone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall: I think it's very useful to make m-lab/locate work locally for testing. This feels like a very good quality of life improvements for developers.

This code review raises a big concern about the XFF header handling documentation or implementation, plus additional concerns and suggestions that you can read below.

Let's discuss!

Comment on lines +21 to +27
// Extract claims from the ESP header (trusted source after ESP validation)
espClaims, err := v.extractFromESPHeader(req)
if err != nil {
return nil, err
}

return espClaims, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Cannot this code be simplified as follows?

Suggested change
// Extract claims from the ESP header (trusted source after ESP validation)
espClaims, err := v.extractFromESPHeader(req)
if err != nil {
return nil, err
}
return espClaims, nil
return v.extractFromESPHeader(req)

To put it in another way? What is the reason for having an additional middle man function here? Are you planning on adding extra functionality later on? If not, maybe we can avoid calling a private function and just move the actual implementation inside this function?


// JWTVerifier defines the interface for extracting JWT claims from HTTP requests.
// Different implementations support different verification modes.
type JWTVerifier interface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I know the conventional wisdom is that of defining the interfaces where they are used. I am still a bit lost with respect to whether we are using this interface in this package. If it is not used here, then I suggest to move it closer to where it is used, instead, which is probably the ./handler package.

A possible reason not to do this is to explicitly verify that each of the three implementations implements the interface. In such a case, though, I would recommend adding code like:

var _ JWTVerified = &InsecureVerifier{}

to ensure you have a build time issue when not implementing the interface.

If you choose to keep the interface in this package, I'd name it Verifier to avoid repeating stuttering (jwtverifier.JWTVerifier). Also, you can potentially drop verifier from the name for each type in this package (jwtverifier.Direct kind of says "this is the direct JWT Verifier" anyway).

// getRemoteAddr extracts the remote address from the request. When running on
// Google App Engine, the X-Forwarded-For is guaranteed to be set. When running
// elsewhere (including on the local machine), the RemoteAddr from the request
// is used instead.
Copy link
Collaborator

@bassosimone bassosimone Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ I spent some time trying to figure out how GAE handles headers and I am not satisfied by either the documentation provided here or by the implementation.

Consider, for example, https://docs.cloud.google.com/load-balancing/docs/https?utm_source=chatgpt.com#x-forwarded-for_header, which reads:

If the incoming request already includes an X-Forwarded-For header, the load balancer appends its values to the existing header:

X-Forwarded-For: ,,

and:

It is possible to remove existing header values by using custom request headers on the backend service. The following example uses the --custom-request-header flag to recreate the X-Forwarded-For header by using the variables client_ip_address and server_ip_address. This configuration replaces the incoming X-Forwarded-For header with only the client and the load balancer IP address.

Based on this reading, I would conclude that:

  1. either we are using --custom-request-header to recreate the XFF header, and we're safe, but, then, this mechanism MUST be explicitly documented here

  2. or, we're not using this functionality and therefore the code MUST change to take into account the reality of clients being able to spoof initial values for XFF

Additionally, the matter is further complicated by the fact that this pull request introduces the possibility of running m-lab/locate in distinct configuration (e.g., the current code was written assuming it's running behind GAE, but we're introducing ways to run as standalone -- and there are explicit comments hinting that this would be a future direction). In light of this, here's what I suggest:

  1. We sort out how exactly we're running in GAE and we either document why our processing of the XFF header is safe, with cross pointers to where this happens, or we adjust the code to account for potentially injected XFF headers

  2. We mark the direct verifier as unsafe and experimental exact like the insecure verifier (i.e., requires environment variable and prints warnings) and we clearly explain in its implementation that we cannot bring the code to production without also thinking carefully about how to process XFF in a configuration in which we do not have the GAE in front of us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For general awareness: we have discussed this offline, tested the behavior and created a small proof-of-concept. This is an actual vulnerability that has been around for quite a while. See also the maxmind geolocator.

It is true that X-Forwarded-For is guaranteed to be set, but it's also true that any X-Forwarded-For header set by the client is retained, and the client IP + the GCP load balancer IP are appended.

e.g. if the client sent X-Forwarded-For: 8.8.8.8, 1.1.1.1, this is what Locate sees:

X-Forwarded-For header: "8.8.8.8, 1.1.1.1, <client-address>, 142.250.185.180

In this case, the IP used for rate limiting purposes would be 8.8.8.8 instead of the actual client address. This makes it trivial to bypass the rate limiting.

This vulnerability does not seem to impact the geolocation, as long as the GAE geolocator is used, since the GAE-provided headers containing lat/lon don't depend on the X-Forwarded-For header, so the data collected is unaffected. However, it impacts the maxmind geolocator since the same bug is present there.

From my understanding, taking the second-to-last IP address is the right thing to do. This only matches ip[0] by coincidence, when the client does not send any X-Forwarded-For.

Nice catch!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that XFF has been used in the past for debugging. We do need to preserve some capability to ask locate diagnostic questions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattmathis could you say more? This won't remove or change the way we use the XFF header, we just need to pick the second-to-last IP address in the comma-separated list instead of the first one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants