Skip to content

MIDRC-1285 MIDRC-1291 Task tokens + Access token denylist + Validate aud#1356

Open
paulineribeyre wants to merge 51 commits into
masterfrom
feat/scoped-token
Open

MIDRC-1285 MIDRC-1291 Task tokens + Access token denylist + Validate aud#1356
paulineribeyre wants to merge 51 commits into
masterfrom
feat/scoped-token

Conversation

@paulineribeyre

@paulineribeyre paulineribeyre commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Link to JIRA ticket if there is one: https://ctds-planx.atlassian.net/browse/MIDRC-1285 and https://ctds-planx.atlassian.net/browse/MIDRC-1291

Goes with uc-cdis/authutils#110 and uc-cdis/gen3-helm#616

New Features

Breaking Changes

  • Remove deprecated /credentials/cdis/access_token endpoint

Bug Fixes

Improvements

Dependency updates

Deployment changes

Comment thread fence/oidc/endpoints.py Outdated
Comment on lines +35 to +39
except BlacklistingInvalidTokenError as err:
# Attempting to revoke an invalid token fails and return a 200 (per RFC 7009).
# However, other revocation failures should not return 200. RFC 7009 only says to
# return 200 for tokens that are already invalid and as such cannot be used, effectively
# the same result as revoking them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RFC 7009 was misinterpreted in #814, this fixes it

@coveralls

coveralls commented Jun 9, 2026

Copy link
Copy Markdown

Coverage Report for CI Build 28607049608

Coverage increased (+0.3%) to 75.387%

Details

  • Coverage increased (+0.3%) from the base build.
  • Patch coverage: No coverable lines changed in this PR.
  • 196 coverage regressions across 12 files.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

196 previously-covered lines in 12 files lost coverage.

Top 10 Files by Coverage Loss Lines Losing Coverage Coverage
blueprints/admin.py 76 67.94%
auth.py 31 86.1%
blueprints/oauth2.py 23 78.03%
oidc/endpoints.py 18 45.45%
blueprints/link.py 14 75.13%
blueprints/storage_creds/api.py 11 76.54%
jwt/token.py 6 95.18%
jwt/blacklist.py 5 91.07%
jwt/validate.py 4 83.33%
resources/storage/cdis_jwt.py 4 74.29%

Coverage Stats

Coverage Status
Relevant Lines: 11421
Covered Lines: 8610
Line Coverage: 75.39%
Coverage Strength: 0.75 hits per line

💛 - Coveralls

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Integration Tests

filepath passed skipped SUBTOTAL
tests/test_oauth2.py 15 0 15
tests/test_centralized_auth.py 16 0 16
tests/test_data_upload.py 8 1 9
tests/test_audit_service.py 3 3 6
tests/test_dbgap.py 4 1 5
tests/test_user_token.py 5 0 5
tests/test_drs_endpoint.py 4 0 4
tests/test_user_login_activation.py 2 1 3
tests/test_register_user.py 2 0 2
tests/test_presigned_url.py 7 0 7
tests/test_fence_admin.py 2 0 2
tests/test_client_credentials.py 1 0 1
tests/test_oidc_client.py 2 0 2
tests/test_google_data_access.py 1 0 1
tests/test_ras_authn.py 0 3 3
TOTAL 72 9 81

Please find the detailed integration test report here

Please find the Github Action logs here

@github-actions

Copy link
Copy Markdown

Integration Tests

filepath passed skipped SUBTOTAL
tests/test_presigned_url.py 7 0 7
tests/test_drs_endpoint.py 6 0 6
tests/test_centralized_auth.py 16 0 16
tests/test_audit_service.py 3 3 6
tests/test_data_upload.py 8 1 9
tests/test_dbgap.py 4 1 5
tests/test_user_login_activation.py 2 1 3
tests/test_fence_admin.py 2 0 2
tests/test_client_credentials.py 1 0 1
tests/test_user_token.py 5 0 5
tests/test_google_data_access.py 1 0 1
TOTAL 55 6 61

Please find the detailed integration test report here

Please find the Github Action logs here

@nss10 nss10 requested a review from Avantol13 June 26, 2026 17:45
claims, is_blacklisted = is_token_blacklisted(encoded_token)
except (jwt.exceptions.InvalidTokenError, Unauthorized) as e:
logger.info(
f"No provided token, or provided token is invalid: `{e}`. Token not blacklisted."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"No provided token, or provided token is invalid: `{e}`. Token not blacklisted."
f"No provided token, or provided token is invalid: `{e}`. Token not denylisted."

try:
if not encoded_token:
encoded_token = get_jwt_header()
claims, is_blacklisted = is_token_blacklisted(encoded_token)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
claims, is_blacklisted = is_token_blacklisted(encoded_token)
claims, is_denylisted = is_token_blacklisted(encoded_token)

)
return "", 200

if is_blacklisted:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if is_blacklisted:
if is_denylisted:


if is_blacklisted:
logger.warning(
f'Blocking attempt to use a blacklisted token. jti={claims.get("jti")}; azp={claims.get("azp")}; sub={claims.get("sub")}; username={claims.get("context", {}).get("user", {}).get("name")}'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f'Blocking attempt to use a blacklisted token. jti={claims.get("jti")}; azp={claims.get("azp")}; sub={claims.get("sub")}; username={claims.get("context", {}).get("user", {}).get("name")}'
f'Blocking attempt to use a denylisted token. jti={claims.get("jti")}; azp={claims.get("azp")}; sub={claims.get("sub")}; username={claims.get("context", {}).get("user", {}).get("name")}'

logger.warning(
f'Blocking attempt to use a blacklisted token. jti={claims.get("jti")}; azp={claims.get("azp")}; sub={claims.get("sub")}; username={claims.get("context", {}).get("user", {}).get("name")}'
)
return "Token is blacklisted", 403

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "Token is blacklisted", 403
return "Token is denylisted", 403

Comment thread fence/jwt/blacklist.py Outdated
exists = bool(session.query(BlacklistedToken).filter_by(jti=jti).first())

if exists:
_blacklisted_cache.add(jti)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want this grow unbounded, we need a limit. Better yet, use a FIFO. Otherwise a user could purposely generate a bunch of task tokens, revoke them over and over, and exhaust the memory of all our pods

Comment thread fence/jwt/blacklist.py Outdated
encoded_token,
public_key,
algorithms=["RS256"],
options={"verify_signature": False},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing verify_signature false a lot in this PR, we absolutely should never do this unless it is 100% clear that the verification is happening elsewhere. This verification is what ensures that the token was not modified by a malicious actor. I suppose it could be argued that spoofing it to make it denylisted is somewhat silly, but the principle of the matter is the same: we cannot trust the contents of a token unless we verify it. If we want to avoid this entirely, we could just hash the token instead of relying on JTI. you don't even need to verify in that case and it'd be so much faster. Just use a SHA-256 hash, it's fixed length

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to retain the jti's for now, and since the original logic written earlier did not set verify_signature to False, and is changed in this commit. There isn't a lot of explanation on what made pauline switch the validation conditions. For now, I'm going to revert the params to the original ones and highlight this comment for @paulineribeyre for when she is back.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I use the validate_jwt function in oauth2.py and prevent skipping verify_signature: False, I was running into this audience mismatch error here which I was not able to reproduce when running locally.

I was not able to identify the cause of the error as of now, but wanted to check with you to see if I can skip verify_aud for this use case?

When running locally I see --
aud in the token -- ['FOO']
aud in the verify_jwt -- ['gen3', 'http://localhost/user']

But local execution did not throw "Audience mismatch error".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have to skip it - it should be able to validate successfully. Oh.. but I guess the problem is that it could be a task token which has a different aud than Fence normally accepts 🤦 Okay so in this case, verify_aud false is maybe okay 👍 but we still need to verify the signature

Comment thread fence/jwt/utils.py Outdated
if pur != "access":
return False
aud = [e for e in aud if e != "gen3"]
return aud and all(aud in config["ALLOWED_TASK_TOKEN_TYPES"] for aud in aud)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep in mind that this setup means that there's no way to determine if an access token is a task token without knowing fence configuration. It may be desirable for other services to know in a stateless and non-Fence coupled way if a token it's receiving is a task token. One example I can think of is if a service wants to deny task tokens even if they contain the valid aud, etc. Or, if we need to differentiate b/t an access token and a task token for additional security measures.

I'm wondering if we considered:

A) having the pur be task_token and what would need to change to support that
B) putting some other data somewhere in the token to indicate it is a task token

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't really considered either of these options before. I agree that the need for a stateless way to verify whether a token is a task token will likely arise at some point. If I had to choose, I'd lean toward exploring B, since it's less rigid and likely not a breaking change.

One thing I'm wary of with pur = 'task_token' is that right now, external services still see this token as just an access token, and we're okay with that. Theoretically, it's an extension of an access token, with extended expiration times based on the type of task token it is. Setting pur = task_token doesn't represent that relationship clearly.

For option B, though, we could update the claims to include a task_token_type field, defaulted to None. I can modify generate_signed_access_token 👇

fence/fence/jwt/token.py

Lines 358 to 369 in f8e81c5

def generate_signed_access_token(
kid,
private_key,
expires_in,
scopes,
user=None,
iss=None,
forced_exp_time=None,
client_id=None,
linked_google_email=None,
to accept an optional task_token_type parameter and include it in the claims. Since task tokens aren't widely used yet, we don't need to worry about backwards compatibility, and can simply check if claims.get("task_token_type"): to verify whether a given token is a task token.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. yeah we do already have some custom stuff in a context block in the claims. I agree about the pur=task_token, it does mean that it being an access token would have to be understood - although, pur is a custom field we added, it's not part of the spec iirc - so I think the only service that would need to understand that is Fence itself. To other services the token comes in an authorization header as a bearer token and is just a JWT.

The worry I have about task_token_type is that the meaning of different types will have to be synchronously understood across services. Fence needs to know so it can decide what audience / scope, and other services need to know what types Fence supports and what they mean, and that too would not really be available outside of configuration/code.

I suppose the general case is not to know what type of task token it is though, just to know if it is one at all. So maybe task_token_type in the context/claims is a good future-proofing for now and we can rethink later if need be. You can probably just avoid putting it in non-task tokens at all (e.g. don't even put it as None, just don't include the field at all).

Comment thread fence/resources/storage/cdis_jwt.py Outdated

if (
expires_in > config["MAX_ACCESS_TOKEN_TTL"]
and claims.get("exp", 0) < time.time() + expires_in

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic is dangerously flawed.

MAX_ACCESS_TOKEN_TTL = 1000

If I submit an expires_in of 2000 but my API key expires in 2 seconds, the first check fails so it moves on to generate me a token past the max TTL.

You need 2 separate checks to simplify this.

  1. is requested expires_in allowed?
    • No? Error immediately about limits
  2. if it is, does it extend beyond the API key expiration?
    • Yes? Error b/c of API key

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding your example, but in the scenario you described, wouldn't the code still raise an error?

With:

MAX_ACCESS_TOKEN_TTL = 1000
expires_in = 2000

the first condition (expires_in > MAX_ACCESS_TOKEN_TTL) evaluates to True. If the API key expires in 2 seconds, then claims.get("exp", 0) < time.time() + expires_in would also evaluate to True, causing the UserError to be raised and preventing token issuance.

That said, I do think the current logic could be clearer.

My assumption when reading this code was that the expires_in > MAX_ACCESS_TOKEN_TTL check was effectively being used to identify task tokens. Regular access tokens are already capped to MAX_ACCESS_TOKEN_TTL earlier in the flow (here) so this condition would never be true for them. Because of that, I interpreted the logic as enforcing the API key expiration check only for task tokens.

I think there are two possible directions:

  1. If no token should ever outlive the API key, regardless of type, then we can simplify the logic to:

    if claims.get("exp", 0) < time.time() + expires_in:
        raise UserError(
            "Cannot issue a token that would expire after the provided API key does. Please obtain a new API key and try again."
        )
  2. If this restriction is intended only for task tokens, then I'd suggest making that intent explicit—either with a comment explaining why the MAX_ACCESS_TOKEN_TTL check is present, or by introducing an explicit flag (e.g., is_task_token) rather than relying on the TTL comparison to distinguish token types.

Can you clarify which behavior is intended?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying, and I agree that relying on the expiration being capped to determine the token type is not ideal and confusing.

This function is create_user_access_token with an API key, and since technically a task token is an access token, I think we need to handle both normal and task access tokens here (even if we technically cap non-task before this). That limit earlier in the flow is good, but this function should also ensure that cap isn't exceeded for both normal and task tokens to be safe.

And yes, no token should should ever outlive the API key.

Back to the example, you are right - I misstated with the example values. But the flaw is still there in a few scenarios.

MAX_ACCESS_TOKEN_TTL = 1000
expires_in = 2000
API Key expires in 3000

the first condition (expires_in > MAX_ACCESS_TOKEN_TTL) evaluates to True. If the API key expires in 3000 seconds, then claims.get("exp", 0) < time.time() + expires_in would evaluate to False, skipping the UserError and allowing token issuance past the max allowed lifetime.

Another example:

MAX_ACCESS_TOKEN_TTL = 1000
expires_in = 500
API Key expires in 1

the first condition (expires_in > MAX_ACCESS_TOKEN_TTL) evaluates to False. The conditional exits early because of the and so the 2nd condition is never checked. If the API key expires in 1 seconds, we don't issue a UserError and allowing token issuance pas API Key lifetime.

So I still think the best plan is to have 2 separate checks, 1 to error on extension beyond max, 1 to error on extension beyond API key

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, the first stated example is still valid for task tokens. API key expires in 3000 seconds and I need a task token for 2000 seconds. (As long as my expires_in is less than what my task token's allowed max is, I must be granted a task token). As stated above, the expires_in > MAX_ACCESS_TOKEN_TTL is only used to separate access tokens from task tokens.

Since no token must outlive API key, the only check I believe we need is

if claims.get("exp", 0) < time.time() + expires_in:
    raise UserError(
        "Cannot issue a token that would expire after the provided API key does. Please obtain a new API key and try again."
    )

The entire validation of the expires_in field not to exceed the token_type's max_ttl is being handled by the upstream called function and is not the responsibility of this function.

Comment thread openapis/swagger.yaml Outdated
responses:
'200':
description: successful operation
/credentials/token/blacklisted:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

denylisted

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Integration Tests

filepath passed skipped SUBTOTAL
tests/test_presigned_url.py 7 0 7
tests/test_drs_endpoint.py 6 0 6
tests/test_centralized_auth.py 16 0 16
tests/test_data_upload.py 8 1 9
tests/test_audit_service.py 3 3 6
tests/test_dbgap.py 4 1 5
tests/test_user_login_activation.py 2 1 3
tests/test_fence_admin.py 2 0 2
tests/test_user_token.py 5 0 5
tests/test_google_data_access.py 1 0 1
tests/test_client_credentials.py 1 0 1
TOTAL 55 6 61

Please find the detailed integration test report here

Please find the Github Action logs here

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Integration Tests

filepath passed skipped SUBTOTAL
tests/test_oauth2.py 15 0 15
tests/test_audit_service.py 3 3 6
tests/test_centralized_auth.py 16 0 16
tests/test_data_upload.py 8 1 9
tests/test_drs_endpoint.py 6 0 6
tests/test_dbgap.py 4 1 5
tests/test_presigned_url.py 7 0 7
tests/test_user_login_activation.py 2 1 3
tests/test_user_token.py 5 0 5
tests/test_fence_admin.py 2 0 2
tests/test_oidc_client.py 2 0 2
tests/test_client_credentials.py 1 0 1
tests/test_register_user.py 2 0 2
tests/test_google_data_access.py 1 0 1
tests/test_ras_authn.py 0 3 3
tests/test_ras_passport.py 0 3 3
TOTAL 74 12 86

Please find the detailed integration test report here

Please find the Github Action logs here

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Integration Tests

filepath passed skipped SUBTOTAL
tests/test_oauth2.py 15 0 15
tests/test_centralized_auth.py 16 0 16
tests/test_audit_service.py 3 3 6
tests/test_data_upload.py 8 1 9
tests/test_drs_endpoint.py 6 0 6
tests/test_dbgap.py 4 1 5
tests/test_user_token.py 5 0 5
tests/test_presigned_url.py 7 0 7
tests/test_user_login_activation.py 2 1 3
tests/test_register_user.py 2 0 2
tests/test_fence_admin.py 2 0 2
tests/test_oidc_client.py 2 0 2
tests/test_google_data_access.py 1 0 1
tests/test_client_credentials.py 1 0 1
tests/test_ras_authn.py 0 3 3
tests/test_ras_passport.py 0 3 3
TOTAL 74 12 86

Please find the detailed integration test report here

Please find the Github Action logs here

Comment thread fence/authz/auth.py Outdated

if not isinstance(expires_in, int) or isinstance(expires_in, bool):
return False
if expires_in < 0:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe <= ?

@nss10 nss10 requested a review from Avantol13 July 2, 2026 16:52
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Integration Tests

filepath passed skipped SUBTOTAL
tests/test_oauth2.py 15 0 15
tests/test_centralized_auth.py 16 0 16
tests/test_audit_service.py 3 3 6
tests/test_data_upload.py 8 1 9
tests/test_drs_endpoint.py 6 0 6
tests/test_user_token.py 5 0 5
tests/test_dbgap.py 4 1 5
tests/test_presigned_url.py 7 0 7
tests/test_user_login_activation.py 2 1 3
tests/test_register_user.py 2 0 2
tests/test_oidc_client.py 2 0 2
tests/test_client_credentials.py 1 0 1
tests/test_fence_admin.py 2 0 2
tests/test_google_data_access.py 1 0 1
tests/test_ras_authn.py 0 3 3
tests/test_ras_passport.py 0 3 3
TOTAL 74 12 86

Please find the detailed integration test report here

Please find the Github Action logs here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants