Fix: Can't use curl to download a single manifest in one invocation (#5918) by dsotirho-ucsc · Pull Request #6099 · DataBiosphere/azul

dsotirho-ucsc · 2024-03-26T23:51:49Z

Linked issues: #5918

Checklist

Author

PR is assigned to the author
PR is a draft
Target branch is develop
Name of PR branch matches issues/<GitHub handle of author>/<issue#>-<slug>
PR is linked to all issues it (partially) resolves
PR description links to connected issues
PR title matches¹ that of a linked issue _{or comment in PR explains why they're different}
PR title references all linked issues
For each linked issue, there is at least one commit whose title references that issue

¹ when the issue title describes a problem, the corresponding PR
title is Fix: followed by the issue title

Author (partiality)

Added p tag to titles of partial commits
This PR is labeled partial _{or completely resolves all linked issues}
This PR partially resolves each of the linked issues _{or does not have the partial label}

Author (reindex)

Added r tag to commit title _{or the changes introduced by this PR will not require reindexing of any deployment}
This PR is labeled reindex:dev _{or the changes introduced by it will not require reindexing of dev}
This PR is labeled reindex:anvildev _{or the changes introduced by it will not require reindexing of anvildev}
This PR is labeled reindex:anvilprod _{or the changes introduced by it will not require reindexing of anvilprod}
This PR is labeled reindex:prod _{or the changes introduced by it will not require reindexing of prod}
This PR is labeled reindex:partial and its description documents the specific reindexing procedure for dev, anvildev, anvilprod and prod _{or requires a full reindex or carries none of the labels reindex:dev, reindex:anvildev, reindex:anvilprod and reindex:prod}

Author (API changes)

This PR and its linked issues are labeled API _{or this PR does not modify a REST API}
Added a (A) tag to commit title for backwards (in)compatible changes _{or this PR does not modify a REST API}
Updated REST API version number in app.py _{or this PR does not modify a REST API}

Author (upgrading deployments)

Ran make docker_images.json and committed the resulting changes _{or this PR does not modify azul_docker_images, or any other variables referenced in the definition of that variable}
Documented upgrading of deployments in UPGRADING.rst _{or this PR does not require upgrading deployments}
Added u tag to commit title _{or this PR does not require upgrading deployments}
This PR is labeled upgrade _{or does not require upgrading deployments}
This PR is labeled deploy:shared _{or does not modify docker_images.json, and does not require deploying the shared component for any other reason}
This PR is labeled deploy:gitlab _{or does not require deploying the gitlab component}
This PR is labeled deploy:runner _{or does not require deploying the runner image}

Author (hotfixes)

Added F tag to main commit title _{or this PR does not include permanent fix for a temporary hotfix}
Reverted the temporary hotfixes for any linked issues _{or the none of the stable branches (anvilprod and prod) have temporary hotfixes for any of the issues linked to this PR}

Author (before every review)

Rebased PR branch on develop, squashed fixups from prior reviews
Ran make requirements_update _{or this PR does not modify requirements*.txt, common.mk, Makefile, Dockerfile or environment.boot}
Added R tag to commit title _{or this PR does not modify requirements*.txt}
This PR is labeled reqs _{or does not modify requirements*.txt}
make integration_test passes in personal deployment _{or this PR does not modify functionality that could affect the IT outcome}
PR is awaiting requested review from a peer
Status of PR is Review requested
PR is assigned to only the peer

Peer reviewer (after approval)

Note that when requesting changes, the PR must be assigned back to the author.

Actually approved the PR
PR is not a draft
PR is awaiting requested review from system administrator
Status of PR is Review requested
PR is assigned to only the system administrator

System administrator (after approval)

Actually approved the PR
Labeled linked issues as demo or no demo
Commented on linked issues about demo expectations _{or all linked issues are labeled no demo}
Decided if PR can be labeled no sandbox
A comment to this PR details the completed security design review
PR title is appropriate as title of merge commit
N reviews label is accurate
Status of PR is Approved
PR is assigned to only the operator

Operator

Checked reindex:… labels and r commit title tag
Checked that demo expectations are clear _{or all linked issues are labeled no demo}
Squashed PR branch and rebased onto develop
Sanity-checked history
Pushed PR branch to GitHub

Operator (deploy `.shared` and `.gitlab` components)

Ran _select dev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused _{or this PR is not labeled deploy:shared}
Ran _select dev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply _{or this PR is not labeled deploy:gitlab}
Ran _select anvildev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused _{or this PR is not labeled deploy:shared}
Ran _select anvildev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply _{or this PR is not labeled deploy:gitlab}
Checked the items in the next section _{or this PR is labeled deploy:gitlab}
PR is assigned to only the system administrator _{or this PR is not labeled deploy:gitlab}

System administrator (post-deploy of `.gitlab` component)

Background migrations for dev.gitlab are complete _{or this PR is not labeled deploy:gitlab}
Background migrations for anvildev.gitlab are complete _{or this PR is not labeled deploy:gitlab}
PR is assigned to only the operator

Operator (deploy runner image)

Ran _select dev.gitlab && make -C terraform/gitlab/runner _{or this PR is not labeled deploy:runner}
Ran _select anvildev.gitlab && make -C terraform/gitlab/runner _{or this PR is not labeled deploy:runner}

Operator (sandbox build)

Operator (merge the branch)

All status checks passed and the PR is mergeable
The title of the merge commit starts with the title of this PR
Added PR # reference to merge commit title
Collected commit title tags in merge commit title _{but only included p if the PR is also labeled partial}
Pushed merge commit to GitHub
Status of PR is Merged lower
Status of blocked issues is Triage _{or no issues are blocked on the linked issues}

Operator (main build)

Operator (reindex)

Operator (mirroring)

Started mirroring in dev _{or this PR does not require mirroring dev}
Started mirroring in anvildev _{or this PR does not require mirroring anvildev}
Checked for, triaged and possibly requeued messages in mirror fail queue in dev _{or this PR does not require mirroring dev}
Checked for, triaged and possibly requeued messages in mirror fail queue in anvildev _{or this PR does not require mirroring anvildev}
Emptied mirror fail queue in dev _{or this PR does not require mirroring dev}
Emptied mirror fail queue in anvildev _{or this PR does not require mirroring anvildev}

Operator

Propagated the deploy:shared, deploy:gitlab, deploy:runner, API, reindex:partial, reindex:anvilprod and reindex:prod labels to the next promotion PRs _{or this PR carries none of these labels}
Propagated any specific instructions related to the deploy:shared, deploy:gitlab, deploy:runner, API, reindex:partial, reindex:anvilprod and reindex:prod labels, from the description of this PR to that of the next promotion PRs _{or this PR carries none of these labels}
PR is assigned to no one

Shorthand for review comments

L line is too long
W line wrapping is wrong
Q bad quotes
F other formatting problem

coveralls · 2024-03-27T01:31:24Z

coverage: 85.439% (-0.02%) from 85.46%
when pulling 6d3c890 on issues/dsotirho-ucsc/5918-manifest-post
into 57627b3 on develop.

codecov · 2024-03-27T01:31:25Z

Codecov Report

❌ Patch coverage is 70.00000% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.20%. Comparing base (57627b3) to head (6d3c890).

Files with missing lines	Patch %	Lines
test/integration_test.py	0.00%	13 Missing ⚠️
src/azul/service/manifest_controller.py	71.42%	4 Missing ⚠️
src/azul/service/app_controller.py	83.33%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #6099      +/-   ##
===========================================
- Coverage    85.22%   85.20%   -0.03%     
===========================================
  Files          156      156              
  Lines        22408    22451      +43     
===========================================
+ Hits         19098    19130      +32     
- Misses        3310     3321      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dsotirho-ucsc · 2024-03-29T00:25:53Z

Successful use of curl to download a manifest in one invocation:

(Unabridged version: 6099_manifest_invocation_unabridged.txt)

daniel@Crispin ~ $ curl --verbose --data '' --location 'https://service.daniel.dev.singlecell.gi.ucsc.edu/manifest/files?catalog=dcp3&filters=%7B%22cellCount%22%3A%7B%22within%22%3A%5B%5B1%2C2222222222%5D%5D%7D%7D&format=curl'

> POST /manifest/files?catalog=dcp3&filters=%7B%22cellCount%22%3A%7B%22within%22%3A%5B%5B1%2C2222222222%5D%5D%7D%7D&format=curl HTTP/2
> Host: service.daniel.dev.singlecell.gi.ucsc.edu
> User-Agent: curl/8.4.0
> Accept: */*
> Content-Length: 0
> Content-Type: application/x-www-form-urlencoded

< HTTP/2 301
< content-type: application/json
< content-length: 4
< location: https://service.daniel.dev.singlecell.gi.ucsc.edu/manifest/files/k8Qgp9pWcfTS0-CkTvX1MrLsApTFGithYhAvFgORT-E-D0kAAQ==

> GET /manifest/files/k8Qgp9pWcfTS0-CkTvX1MrLsApTFGithYhAvFgORT-E-D0kAAQ== HTTP/2
> Host: service.daniel.dev.singlecell.gi.ucsc.edu
> User-Agent: curl/8.4.0
> Accept: */*

< HTTP/2 301
< content-type: application/json
< content-length: 4
< location: https://service.daniel.dev.singlecell.gi.ucsc.edu/manifest/files/k8Qgp9pWcfTS0-CkTvX1MrLsApTFGithYhAvFgORT-E-D0kBAQ==

> GET /manifest/files/k8Qgp9pWcfTS0-CkTvX1MrLsApTFGithYhAvFgORT-E-D0kBAQ== HTTP/2
> Host: service.daniel.dev.singlecell.gi.ucsc.edu
> User-Agent: curl/8.4.0
> Accept: */*

…

< HTTP/2 301
< content-type: application/json
< content-length: 4
< location: https://service.daniel.dev.singlecell.gi.ucsc.edu/manifest/files/k8Qgp9pWcfTS0-CkTvX1MrLsApTFGithYhAvFgORT-E-D0kgCg==

> GET /manifest/files/k8Qgp9pWcfTS0-CkTvX1MrLsApTFGithYhAvFgORT-E-D0kgCg== HTTP/2
> Host: service.daniel.dev.singlecell.gi.ucsc.edu
> User-Agent: curl/8.4.0
> Accept: */*

< HTTP/2 302
< content-type: text/plain
< content-length: 3215
< location: https://s3.amazonaws.com/edu-ucsc-gi-platform-hca-dev-storage-daniel.us-east-1/manifests/39e58996-f70d-55ec-8250-98d0f6c92456.ad14b32b-dea0-555e-b745-8ebe5bb0958d.curlrc?response-content-disposition=attachment%3Bfilename%3D%22hca-manifest-39e58996-f70d-55ec-8250-98d0f6c92456.ad14b32b-dea0-555e-b745-8ebe5bb0958d.curlrc%22&AWSAccessKeyId=ASIARZFZ7W77QFHNZFUL&Signature=2QnTDdkuUII9wZKFrAL7rXZ3L0I%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEMn%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJGMEQCIDYEjKDSa5U5%2Fa6nN2P1%2BSWvzJPwv0J5xXrumvTOvzXdAiBMTBy0vvDg2R%2BsaCSHByX%2BFHsVd7xnNnRsP6fBNUEWciqWAwjh%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAQaDDEyMjc5NjYxOTc3NSIMSx7PsxpRaWQWQZsKKuoCpfE0yyl5h24d%2F%2FxFh8qjwBgJ%2FKIUNPGadCRETM1rZS7%2BvpcUd0a1FtxAiaWePMy6yme4ZquN7rBrup2vmFzUocOT%2FPmEc0wdKAJ45E97cNt4MUzH7aGY9qi49dnWWDtnYL65r%2FLSz%2BtSyV8Ly6Bb%2Btp9shZaEaKzAKuR5Ha7Wz0uEQkyi%2BK8qbqEW9IyHw%2BZZ19utq0fewqpv5mnrffRkvKbegzu%2B2G9iATr7ywaNuYFoC10itubY95rS0ILajMWQdGR9Lpe5bZfmIrpVgIgzinue9UR%2BhLHx%2Fve3ry%2BEF5iiDXsoJeWY%2Fg%2Bnr0TpDTW33RIyp%2BstZVnX40JDEV7F9yehXX3GwAC9Zwu%2BuSYEYPs4avFDIogATVLjVQvRhFmXW8EcknDZ3ngOz5bU32FKUkn1BITDSkEQEe5j46a%2FJTpWOieLSUy0e35DZdtwomu6TGVdfOyTwefiuGinAPyE4t8zdm67q6hcwQwgo2YsAY6ngEQCkNB3UFolbnF%2BqzZCgbImeJmbZWbEr3zMZbzQM7w5KYUSHANYD%2BzDQYDq7UsKu9fvX2uPX0%2BXmICHXCiNX30S8AymOSowXaLsxP7FlY7Sx9EA7we4YjU0%2BIIHm6sd2jnrTk5Jq7kv3yniCGfgxYKH1ucbOFns6GmhnKpVIQMgQ6q8KaSyjRpXI51JwHChMbakT9yTpPA2wwlmIQvTA%3D%3D&Expires=1711674521

> GET /edu-ucsc-gi-platform-hca-dev-storage-daniel.us-east-1/manifests/39e58996-f70d-55ec-8250-98d0f6c92456.ad14b32b-dea0-555e-b745-8ebe5bb0958d.curlrc?response-content-disposition=attachment%3Bfilename%3D%22hca-manifest-39e58996-f70d-55ec-8250-98d0f6c92456.ad14b32b-dea0-555e-b745-8ebe5bb0958d.curlrc%22&AWSAccessKeyId=ASIARZFZ7W77QFHNZFUL&Signature=2QnTDdkuUII9wZKFrAL7rXZ3L0I%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEMn%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJGMEQCIDYEjKDSa5U5%2Fa6nN2P1%2BSWvzJPwv0J5xXrumvTOvzXdAiBMTBy0vvDg2R%2BsaCSHByX%2BFHsVd7xnNnRsP6fBNUEWciqWAwjh%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAQaDDEyMjc5NjYxOTc3NSIMSx7PsxpRaWQWQZsKKuoCpfE0yyl5h24d%2F%2FxFh8qjwBgJ%2FKIUNPGadCRETM1rZS7%2BvpcUd0a1FtxAiaWePMy6yme4ZquN7rBrup2vmFzUocOT%2FPmEc0wdKAJ45E97cNt4MUzH7aGY9qi49dnWWDtnYL65r%2FLSz%2BtSyV8Ly6Bb%2Btp9shZaEaKzAKuR5Ha7Wz0uEQkyi%2BK8qbqEW9IyHw%2BZZ19utq0fewqpv5mnrffRkvKbegzu%2B2G9iATr7ywaNuYFoC10itubY95rS0ILajMWQdGR9Lpe5bZfmIrpVgIgzinue9UR%2BhLHx%2Fve3ry%2BEF5iiDXsoJeWY%2Fg%2Bnr0TpDTW33RIyp%2BstZVnX40JDEV7F9yehXX3GwAC9Zwu%2BuSYEYPs4avFDIogATVLjVQvRhFmXW8EcknDZ3ngOz5bU32FKUkn1BITDSkEQEe5j46a%2FJTpWOieLSUy0e35DZdtwomu6TGVdfOyTwefiuGinAPyE4t8zdm67q6hcwQwgo2YsAY6ngEQCkNB3UFolbnF%2BqzZCgbImeJmbZWbEr3zMZbzQM7w5KYUSHANYD%2BzDQYDq7UsKu9fvX2uPX0%2BXmICHXCiNX30S8AymOSowXaLsxP7FlY7Sx9EA7we4YjU0%2BIIHm6sd2jnrTk5Jq7kv3yniCGfgxYKH1ucbOFns6GmhnKpVIQMgQ6q8KaSyjRpXI51JwHChMbakT9yTpPA2wwlmIQvTA%3D%3D&Expires=1711674521 HTTP/1.1
> Host: s3.amazonaws.com
> User-Agent: curl/8.4.0
> Accept: */*

< HTTP/1.1 200 OK
< Content-Disposition: attachment;filename="hca-manifest-39e58996-f70d-55ec-8250-98d0f6c92456.ad14b32b-dea0-555e-b745-8ebe5bb0958d.curlrc"
< Accept-Ranges: bytes
< Content-Type: binary/octet-stream
< Content-Length: 494554
<
--create-dirs

--compressed

--location

--globoff

--fail

--write-out "Downloading to: %{filename_effective}\n\n"

url="https://service.daniel.dev.singlecell.gi.ucsc.edu/repository/files/7113cb1e-7289-4fe2-ba15-bbe0620bd808?catalog=dcp3&version=2021-06-28T14%3A21%3A17.533000Z"
output="39fdb0ae-9a00-4829-a581-e7cb59798f02/mouse_cortex_S2_L003_I1_001.fastq"

url="https://service.daniel.dev.singlecell.gi.ucsc.edu/repository/files/9a6e412e-a4bd-4b34-a8f7-38aef94432a2?catalog=dcp3&version=2021-03-15T23%3A40%3A56.660000Z"
output="0fd2d069-f144-443b-9ece-aec641a959e1/SRR9004342_2.fastq.gz"

url="https://service.daniel.dev.singlecell.gi.ucsc.edu/repository/files/50d634ef-77b0-4339-a9c3-9a83bd9e5ab6?catalog=dcp3&version=2022-02-04T12%3A37%3A12.966000Z"
output="da317018-1b0b-4e4f-b508-2c1c7d049507/experiment2_mouse_pbs_scp_barcodes.tsv"

…

dsotirho-ucsc · 2024-03-29T00:31:31Z

6090_IT_2024-03-28.txt

nadove-ucsc · 2024-04-01T00:51:47Z

lambdas/service/app.py


-def manifest_route(*, fetch: bool, initiate: bool):
+def manifest_route(*, fetch: bool, method: str):
+    initiate = method in ['PUT', 'POST']


Suggested change

initiate = method in ['PUT', 'POST']

initiate = method in {'PUT', 'POST'}

test/integration_test.py

nadove-ucsc · 2024-04-01T00:59:46Z

lambdas/service/app.py


 def _file_manifest(fetch: bool, token_or_key: Optional[str] = None):
    request = app.current_request
+    require(request.method != 'POST' or request.raw_body.decode() == '',


This is the same as

Suggested change

require(request.method != 'POST' or request.raw_body.decode() == '',

require(request.method != 'POST' or request.raw_body == b'',

Right?

dsotirho-ucsc · 2024-04-01T17:27:25Z

Added #6003 as a blocker due to overlapping changes to the IT _test_manifest method.

nadove-ucsc

Looks good but needs rebase

hannes-ucsc · 2024-05-24T18:26:16Z

src/azul/service/source_controller.py

+
+    def server_side_sleep(self, max_seconds: int | float) -> float:
+        """
+        Sleep in the Lambda.


See my comment regarding the "Lambda" vs "Lambda Function" terminology on your other PR.

hannes-ucsc · 2024-05-24T18:26:52Z

src/azul/service/source_controller.py

+        Sleep in the Lambda.
+
+        :param max_seconds: The requested number of seconds to sleep. The actual
+                            time slept will be less however if the requested


Suggested change

time slept will be less however if the requested

time slept will be less if the requested

hannes-ucsc · 2024-05-24T18:27:28Z

src/azul/service/source_controller.py

+                            number would cause the Lambda to exceed its
+                            execution timeout.
+
+        :return: The actual number of seconds slept


"number" suggests an integral amount which is not what is being returned.

hannes-ucsc · 2024-05-24T18:29:01Z

test/integration_test.py

+                        first_fetch = bool(self.random.getrandbits(1))
+                        fetch_modes = [first_fetch, not first_fetch]
+                    for fetch in fetch_modes:
+                        with self.subTest('manifest',
+                                          catalog=catalog,
+                                          format=format,
+                                          curl=curl,
+                                          fetch=fetch,
+                                          wait=wait):
+                            args = dict(catalog=catalog,
+                                        filters=json.dumps(filters),
+                                        **({} if wait is None else {'wait': wait}))
+                            if format is None:
+                                format = first(supported_formats)
+                            else:
+                                args['format'] = format.value
+
+                            # Wrap self._get_url to collect all HTTP responses
+                            _get_url = self._get_url
+                            responses = list()
+
+                            def get_url(*args, **kwargs):
+                                response = _get_url(*args, **kwargs)
+                                responses.append(response)
+                                return response
+
+                            with mock.patch.object(self, '_get_url', new=get_url):
+
+                                # Make multiple identical concurrent requests to
+                                # test the idempotence of manifest generation,
+                                # and its resilience against DOS attacks.
+
+                                def worker(_):
+                                    response = self._check_endpoint(POST if curl else PUT,
+                                                                    '/manifest/files',
+                                                                    args=args,
+                                                                    fetch=fetch)
+                                    self._manifest_validators[format](catalog, response)
+
+                                num_workers = 3
+                                with ThreadPoolExecutor(max_workers=num_workers) as tpe:
+                                    results = list(tpe.map(worker, range(num_workers)))
+
+                            self.assertEqual([None] * num_workers, results)
+                            execution_ids = self._manifest_execution_ids(responses, fetch=fetch)
+                            # The second iteration of the inner-most loop re-
+                            # requests the manifest with only `fetch` being
+                            # different. In that case, the manifest will already
+                            # be cached and no step function execution is
+                            # expected to have been started.
+                            expect_execution = fetch == first_fetch
+                            self.assertEqual(1 if expect_execution else 0, len(execution_ids))


Due to the added indention, this is getting too narrow. Please refactor in an early commit.

hannes-ucsc · 2024-05-24T18:29:32Z

test/service/test_manifest.py

+        def test(*,
+                 format: ManifestFormat,
+                 fetch: bool,
+                 curl: Optional[bool] = False,


What would curl=None signify?

As always, assert that no invalid combinations are passed.

What would curl=None signify?

Good point, this should not be optional.

dsotirho-ucsc · 2024-05-28T21:48:20Z

6099_IT_2024-05-28.txt

hannes-ucsc · 2024-06-06T16:09:31Z

test/integration_test.py

+                        with self.subTest('manifest',
+                                          catalog=catalog,
+                                          format=format,
+                                          fetch=fetch,
+                                          curl=curl,
+                                          wait=wait):
+                            execution_ids = self._test_manifest_execution(catalog,
+                                                                          format,
+                                                                          filters,
+                                                                          fetch,
+                                                                          curl,
+                                                                          wait)
+                            # The second iteration of the inner-most loop re-
+                            # requests the manifest with only `fetch` being
+                            # different. In that case, the manifest will already
+                            # be cached and no step function execution is
+                            # expected to have been started.
+                            expect_execution = fetch == first_fetch
+                            self.assertEqual(1 if expect_execution else 0, len(execution_ids))


This refactoring didn't have the desired effect of widening the horizontal space for this comment. The invocation of the introduced method also looks awkward. We typically switch to keyword arguments for a function/method with this many arguments. I don't think extracting a method is the solution. See if you can come up with some other solution to reduce the nesting.

dsotirho-ucsc · 2024-06-06T18:19:01Z

6099_IT_2024-06-06.txt

hannes-ucsc

As always, the refactoring should be in a separate commit.

dsotirho-ucsc · 2024-06-12T16:45:19Z

As always, the refactoring should be in a separate commit.

The refactoring only applied to my addition as doesn't work in this case as a separate prior commit. Instead of nesting the existing for loop body under multiple new for loops for the wait and format values, a list comprehension was used to supply the outter for loop.

hannes-ucsc · 2024-06-13T04:08:30Z

test/integration_test.py

+        for curl, wait, format in [(c, w, f)
+                                   for c in [False, True]
+                                   for w in ([None, 0, 1] if c else [None])
+                                   for f in [None, *supported_formats]]:


This can be done more succinctly with itertools.product

hannes-ucsc · 2024-06-13T04:13:52Z

test/integration_test.py

+                with self.subTest('manifest',
+                                  catalog=catalog,
+                                  format=format,
+                                  fetch=fetch,
+                                  curl=curl,
+                                  wait=wait):


Let's include subTest in the list of exceptions to https://github.com/DataBiosphere/azul/blob/develop/CONTRIBUTING.rst#line-wrapping-and-indentation so that this can be formatted as

Suggested change

with self.subTest('manifest',

catalog=catalog,

format=format,

fetch=fetch,

curl=curl,

wait=wait):

with self.subTest('manifest', catalog=catalog, format=format,

fetch=fetch, curl=curl, wait=wait):

The update to CONTRIBUTING.rst should of course be in a separate commit.

hannes-ucsc · 2024-06-13T04:14:34Z

test/service/test_manifest.py

+                 fetch: bool,
+                 curl: bool = False,
+                 url: furl | None = None):
+            assert not fetch or not curl


Suggested change

assert not fetch or not curl

assert not (fetch and curl)

is shorter and much more comprehensible

dsotirho-ucsc · 2024-06-21T23:39:30Z

6099_IT_2024-06-21.txt

hannes-ucsc

Please separate the wait changes from the POST changes, most likely as a split commit.

hannes-ucsc · 2024-06-24T16:29:55Z

CONTRIBUTING.rst

+  Only if the second and subsequent arguments won't fit on one line, do we
+  wrap all arguments, one line per argument.


Don't understand why you exclude the first argument.

Furthermore this rule doesn't just apply to calls. Note the bullet uses the term "element", not "argument".

hannes-ucsc · 2024-06-24T16:32:47Z

CONTRIBUTING.rst

+  The exception to this rule are logging method invocations and calls to
+  reject(), require(), or the integration test context manager subTest() ::


Confusing mixing of conjunctions "and" and "or".

Also, subTest is not just used in IT. It would be arbitrary and confusing to restrict the exception to only the IT usages of subTest.

hannes-ucsc · 2024-06-24T16:47:29Z

lambdas/service/app.py

        raise BRE(f'The {name!r} parameter is not valid JSON')


+def validate_wait(wait: str | None) -> Optional[int]:


Why is this returning a value? None of the other validators do.

hannes-ucsc · 2024-06-24T16:55:44Z

lambdas/service/app.py

+def wait_parameter_spec(default: int | None = None) -> JSON:
+    return params.query(
+        'wait',
+        schema.optional(int


This would be better described as an enum.

hannes-ucsc · 2024-06-24T17:03:02Z

lambdas/service/app.py

+                  '`application/x-www-form-urlencoded` to this endpoint')
    query_params = request.query_params or {}
    _hoist_parameters(query_params, request)
+    if is_post and 'wait' not in query_params:


Two lines down a different idiom is used to set the default. Be consistent with precedent.

Are you sure the default is needed at this level? You should avoid the case where a default is injected when the parameter is actually disallowed. I'm a bit confused by the fact that the wait parameter is conditional on is_post in one place and on fetch in another. I don't remember: Did I ask you to allow wait in use cases other than the one we actually need it for, i.e., with curl?

Are you sure the default is needed at this level? You should avoid the case where a default is injected when the parameter is actually disallowed.

Yes, the wait parameter only applies if the initial request was POST, in which case we want to set a default (1) if the param was not specified.

I'm a bit confused by the fact that the wait parameter is conditional on is_post in one place and on fetch in another. I don't remember: Did I ask you to allow wait in use cases other than the one we actually need it for, i.e., with curl?

If an initial POST request to the non-fetch endpoint returns a 301 with a token, the wait parameter is carried over from the initial request, so that each subsequent request to the non-fetch endpoint will have the same wait parameter. This is why the param is conditional on is_post in one section (when setting a default), and is not fetch in another section (when processing a non-initial request). I've added a brief comment to the latter to help clarify.

dsotirho-ucsc · 2024-08-14T16:46:27Z

6099_IT_2024-08-14.txt

hannes-ucsc · 2024-08-14T18:03:08Z

CONTRIBUTING.rst

-  The one exception to this rule are logging method invocations and calls to
-  reject() and require()::
+  The exception to this rule are logging method invocations, calls to
+  reject(), require(), or the context manager subTest() ::


Suggested change

reject(), require(), or the context manager subTest() ::

reject(), require(), or uses of TestCase.subTest() ::

hannes-ucsc · 2024-08-14T18:03:53Z

CONTRIBUTING.rst

-  Only if the second and subsequent arguments won't fit on one line, do we
-  wrap all arguments, one line per argument.


This should not be deleted.

hannes-ucsc · 2024-08-14T18:06:59Z

lambdas/service/app.py


-def manifest_route(*, fetch: bool, initiate: bool):
+def wait_parameter_spec(default: str | None = None) -> JSON:
+    possible_values = ('0', '1')


Suggested change

possible_values = ('0', '1')

valid_values = ['0', '1']

hannes-ucsc · 2024-08-14T18:13:59Z

lambdas/service/app.py

+                    Requests to this endpoint are idempotent, so PUT would be
+                    the more standards-compliant method to use. POST is offered
+                    as a convenience for `curl` users, exploiting the fact that
+                    `curl` drops to GET when following a redirect in response to
+                    a POST, but not a PUT request. This is the only reason for
+                    the deprecation of this endpoint and there are currently no
+                    plans to remove it.
+
+                    To use this endpoint with `curl`, pass the `--location` and
+                    `--data` options. This makes `curl` automatically follow the
+                    intermediate redirects to the GET /manifest/files endpoint,
+                    and ultimately to the URL that yields the manifest. Example:
+
+                    ```
+                    curl --data "" --location {post_manifest_example_url}
+                    ```
+
+                    In order to facilitate this, a POST request to this endpoint
+                    may have a `Content-Type` header of
+                    `application/x-www-form-urlencoded`, which is what the
+                    `--data` option sends. The body must be empty in that case
+                    and parameters cannot be hoisted as described above.


This looks like something that was already there but it shows up as an addition without a corresponding deletion. IOW, it appears that this was copied and pasted.

This originated from a patch you provided, and was extended and refined during review feedback.

hannes-ucsc · 2024-08-14T18:14:34Z

src/azul/service/source_controller.py

+
+    def server_side_sleep(self, max_seconds: int | float) -> float:
+        """
+        Run a sleep in the current Lambda function.


Sleeps aren't "run".

hannes-ucsc · 2024-08-14T18:15:31Z

src/azul/service/source_controller.py

        return Filters(explicit=self._parse_filters(filters),
                       source_ids=self._list_source_ids(catalog, authentication))
+
+    def server_side_sleep(self, max_seconds: int | float) -> float:


Why is this in SourceController? It has nothing to do with sources.

hannes-ucsc · 2024-08-14T18:16:36Z

src/azul/service/source_controller.py

+        # to the client.
+        actual_seconds = min(float(max_seconds),
+                             remaining_time - config.api_gateway_timeout_padding - 3)
+        time.sleep(actual_seconds)


We should emit a log entry here.

hannes-ucsc · 2024-08-14T18:16:58Z

test/integration_test.py

                if retry_after is not None:
                    retry_after = float(retry_after)
+                    if url.args.get('wait') == 1:
+                        # The waiting should have happened server-side and been


Suggested change

# The waiting should have happened server-side and been

# The wait should have happened server-side and been

dsotirho-ucsc · 2024-08-15T15:51:13Z

6099_IT_2024-08-15.txt

hannes-ucsc

Index: lambdas/service/app.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/lambdas/service/app.py b/lambdas/service/app.py
--- a/lambdas/service/app.py	(revision 41d9b4b892f0503d0e5e8d6b7129b8211f9590a6)
+++ b/lambdas/service/app.py	(date 1723741147188)
@@ -1320,15 +1320,15 @@
                                              authentication=request.authentication)
 
 
-def wait_parameter_spec(default: str | None = None) -> JSON:
+def wait_parameter_spec(*, default: str | None = None) -> JSON:
     valid_values = ['0', '1']
     assert default in (None, *valid_values), default
     return params.query(
         'wait',
         schema.optional(
-            schema.with_default(default, type_=schema.enum(*valid_values))
-            if default else
             schema.enum(*valid_values)
+            if default is None else
+            schema.with_default(default, type_=schema.enum(*valid_values))
         ),
         description=fd('''
             If 0, the client is responsible for honoring the waiting period
@@ -1368,8 +1368,8 @@
         + ('/manifest/files' if initiate else '/manifest/files/{token}'),
         # The initial PUT request is idempotent.
         methods=[method],
-        # To support curl requests made with the `--data` option we accept the
-        # `application/x-www-form-urlencoded` content-type.
+        # In order to support requests made with `curl` and its `--data` option,
+        # we accept the `application/x-www-form-urlencoded` content-type.
         content_types=['application/json', 'application/x-www-form-urlencoded'],
         interactive=fetch,
         cors=True,
@@ -1378,7 +1378,7 @@
                 params.path('token', str, description=fd('''
                     An opaque string representing the manifest preparation job
                 ''')),
-                *((wait_parameter_spec(),) if not fetch else ())
+                *([] if fetch else [wait_parameter_spec()])
             ]
         },
         method_spec={
@@ -1496,7 +1496,7 @@
             '''),
             'parameters': [
                 catalog_param_spec,
-                *((wait_parameter_spec('1'),) if curl else ()),
+                *([wait_parameter_spec(default='1')] if curl else []),
                 filters_param_spec,
                 params.query(
                     'format',
@@ -1562,6 +1562,7 @@
                         'Location': {
                             'description': fd('''
                                 The URL of the manifest preparation job at
+                            # REVIEW: Why was the link removed?
                             ''') + fd('''
                                 the `GET /manifest/files/{token}` endpoint.
                                 ''') if initiate else fd('''
@@ -1687,17 +1688,19 @@
 
 def _file_manifest(fetch: bool, token_or_key: Optional[str] = None):
     request = app.current_request
-    is_post = request.method == 'POST'
+    # REVIEW: We don't us `is_` for `fetch` or `curl` so we should be consistent here
+    post = request.method == 'POST'
     if (
-        is_post
+        post
         and request.headers.get('content-type') == 'application/x-www-form-urlencoded'
         and request.raw_body != b''
     ):
-        raise BRE('The body must be empty for a POST request of content-type '
-                  '`application/x-www-form-urlencoded` to this endpoint')
+        raise BRE('POST requests to this endpoint must have an empty body if '
+                  'they specify a `Content-Type` header of '
+                  '`application/x-www-form-urlencoded`')
     query_params = request.query_params or {}
     _hoist_parameters(query_params, request)
-    if is_post:
+    if post:
         query_params.setdefault('wait', '1')
     if token_or_key is None:
         query_params.setdefault('filters', '{}')
@@ -1707,7 +1710,7 @@
                         catalog=validate_catalog,
                         format=validate_manifest_format,
                         filters=validate_filters,
-                        **({'wait': validate_wait} if is_post else {}))
+                        **({'wait': validate_wait} if post else {}))
         # Now that the catalog is valid, we can provide the default format that
         # depends on it
         default_format = app.metadata_plugin.manifest_formats[0].value

hannes-ucsc · 2024-08-15T17:06:38Z

lambdas/service/app.py

        ),
-        params.query(
-            'wait',
-            schema.optional(int),


Why was this changed from int to string? Why was the default logic added? I don't think we have a use case for there not being a default.

Meta comment: I try not to ask rhetorical questions, but you often treat my questions as such and assume that I already have an answer particular answer or preference which you then implement. Sometimes you get it right but sometimes you don't. It would be better to just answer my questions. In general, feel free to provide the answer during a PL slot and for the questions in the paragraph above, a PL slot would most certainly be the best venue.

Why was this changed from int to string?

This was a mistake. I was confused by the string form of the param in the request. I've set it back to int in the schema.

Why was the default logic added? I don't think we have a use case for there not being a default.

We do not set a wait default value for the /repository/files/{file_uuid} and /fetch/repository/files/{file_uuid} endpoints.
Also, a default value is added for the initial manifest request (POST /manifest/files), but not for the subsequent requests to GET /manifest/files/{token}

REVIEW: Why was the link removed?

Links are not supported in this section of the Swagger
From: https://service.azul.data.humancellatlas.org/#/Manifests/put_manifest_files

A "default" is a value that takes effect when no explicit value is specified. When an endpoint accepts a wait parameter, that parameter must be optional so as to not complicate "regular" use cases in which the wait can be performed client-side. For every optional parameter a default needs to be specified. Anything else would lead to incorrect or incomplete documentation. If you are unable to implement this directive for any reason, or if you think I am wrong, please request a PL slot so we can clear this up.

Added a default value for the wait parameter in both the /repository/files and /manifest/files endpoints.

dsotirho-ucsc · 2024-08-19T18:00:41Z

6099_IT_2024-08-19.txt

hannes-ucsc

#6099 (comment)

dsotirho-ucsc · 2024-08-21T00:05:56Z

6099_IT_2024-08-20.txt

hannes-ucsc

Please push the commits individually.

hannes-ucsc · 2024-08-21T17:37:27Z

src/azul/chalice.py

    def current_request(self) -> AzulRequest:
        return self.app.current_request
+
+    def server_side_sleep(self, max_seconds: int | float) -> float:


Suggested change

def server_side_sleep(self, max_seconds: int | float) -> float:

def server_side_sleep(self, max_seconds: float) -> float:

Time in Python is always a float. Supporting int may seem like a convenience but just leads to sloppy use of this function.

hannes-ucsc · 2024-08-21T17:37:48Z

src/azul/chalice.py

+
+        :return: The actual amount of time slept in seconds
+        """
+        remaining_time = self.lambda_context.get_remaining_time_in_millis() / 1000


Add validation of type and range of the argument.

hannes-ucsc · 2024-08-21T17:40:43Z

src/azul/service/manifest_controller.py

+                retry_after = body.get('Retry-After')
+                if retry_after is not None:
+                    time_slept = self.server_side_sleep(retry_after)
+                    body['Retry-After'] = round(retry_after - time_slept)


Suggested change

body['Retry-After'] = round(retry_after - time_slept)

body['Retry-After'] = ceil(retry_after - time_slept)

hannes-ucsc · 2024-08-21T17:59:49Z

test/service/test_repository_files.py

+                                if wait is None:
+                                    azul_url.args['wait'] = '0'


This indicates that Azul now yields a URL with wait set to the default. I'd prefer not to do that.

dsotirho-ucsc · 2024-08-22T18:10:03Z

6099_IT_2024-08-22.txt

hannes-ucsc · 2024-09-04T00:52:34Z

src/azul/service/repository_controller.py

                elif wait == '1':
                    time_slept = self.server_side_sleep(float(retry_after))
-                    retry_after = round(retry_after - time_slept)
+                    retry_after = ceil(retry_after - time_slept)


The change from round to ceil is currently in a commit labeled "Add a default value for the /repository/files wait parameter". Please explain why going from round to ceil is related to adding a default or isolate that change in its own commit.

Moved change to commit Fix rounding of /repository/file retry-after value

hannes-ucsc · 2024-09-04T00:55:03Z

lambdas/service/app.py

-        as a property of a JSON object in the body of the request. This can be
-        useful in case the value of the `filters` query parameter causes the URL
-        to exceed the maximum length of 8192 characters, resulting in a 413
-        Request Entity Too Large response.
+        as a property of a JSON object in the body of the request. This is
+        referred to as *parameter hoisting* and can be useful in case the value
+        of the `filters` query parameter causes the URL to exceed the maximum
+        length of 8192 characters, resulting in a 413 Request Entity Too Large
+        response.

        The request `%s %s?filters={…}`, for example, is equivalent to  `%s %s`
-        with the body `{"filters": "{…}"}` in which any double quotes or
-        backslash characters inside `…` are escaped with another backslash. That
-        escaping is the requisite procedure for embedding one JSON structure
-        inside another.
+        with a `Content-Type` header of `application/json` and the body
+        `{"filters": "{…}"}` in which any double quotes or backslash characters
+        inside `…` are escaped with another backslash. That escaping is the
+        requisite procedure for embedding one JSON structure inside another.


Correct me if I am wrong but it seems like these two hunks are unrelated documentation improvement that I requested at some point, but that have nothing to do with "Add[ing] support for POST requests to the manifest endpoint", as is the title of the commit the hunks are part of.

Moved change to commit Refine parameter hoisting note

dsotirho-ucsc · 2024-09-04T21:08:19Z

6099_IT_2024-09-04.txt

test/integration_test.py

Fixes regression from fb58b01. The problem manifested as an error with the `app` property in an AppController subclass, and this wasn't noticed until now due to the `app` property not being accessed in a controller that had this problem, namely ManifestController.

…vocation (#5918) Add support for POST requests to the manifest endpoint

…vocation (#5918) Add a wait parameter option to the manifest endpoint

dsotirho-ucsc added the API API change affecting callers label Mar 26, 2024

github-actions bot added the orange label Mar 27, 2024

dsotirho-ucsc force-pushed the issues/dsotirho-ucsc/5918-manifest-post branch 2 times, most recently from 8b62dbc to 830a1e4 Compare March 27, 2024 00:33

dsotirho-ucsc force-pushed the issues/dsotirho-ucsc/5918-manifest-post branch 2 times, most recently from da860a5 to b81d052 Compare March 27, 2024 16:07

dsotirho-ucsc requested a review from nadove-ucsc March 27, 2024 17:23

dsotirho-ucsc assigned nadove-ucsc and unassigned nadove-ucsc Mar 27, 2024

dsotirho-ucsc force-pushed the issues/dsotirho-ucsc/5918-manifest-post branch 5 times, most recently from 9fb4dbb to 391cedb Compare March 28, 2024 23:50

dsotirho-ucsc assigned nadove-ucsc Mar 29, 2024

nadove-ucsc requested changes Apr 1, 2024

View reviewed changes

nadove-ucsc removed their assignment Apr 1, 2024

dsotirho-ucsc force-pushed the issues/dsotirho-ucsc/5918-manifest-post branch from 600eb8b to 81a4905 Compare April 1, 2024 19:22

dsotirho-ucsc requested a review from nadove-ucsc April 1, 2024 20:43

dsotirho-ucsc assigned nadove-ucsc Apr 1, 2024

nadove-ucsc reviewed Apr 4, 2024

View reviewed changes

nadove-ucsc removed their assignment Apr 4, 2024

dsotirho-ucsc force-pushed the issues/dsotirho-ucsc/5918-manifest-post branch from 81a4905 to ff39020 Compare April 4, 2024 00:57

dsotirho-ucsc requested a review from nadove-ucsc April 4, 2024 01:36

dsotirho-ucsc assigned nadove-ucsc Apr 4, 2024

hannes-ucsc requested changes May 24, 2024

View reviewed changes

hannes-ucsc requested changes Jun 6, 2024

View reviewed changes

hannes-ucsc requested changes Jun 7, 2024

View reviewed changes

hannes-ucsc requested changes Jun 13, 2024

View reviewed changes

hannes-ucsc requested changes Jun 24, 2024

View reviewed changes

hannes-ucsc requested changes Aug 14, 2024

View reviewed changes

hannes-ucsc requested changes Aug 15, 2024

View reviewed changes

hannes-ucsc reviewed Aug 20, 2024

View reviewed changes

hannes-ucsc requested changes Aug 21, 2024

View reviewed changes

hannes-ucsc requested changes Sep 4, 2024

View reviewed changes

github-advanced-security bot found potential problems May 15, 2025

View reviewed changes

test/integration_test.py Fixed Show fixed Hide fixed

dsotirho-ucsc added 9 commits October 9, 2025 14:05

Add subTest() as an exception to the rule for wrapping style

e2893b8

Fix broken manifest links on Swagger page

8eef441

Refactor server-side sleep into parent class

6627c5f

Add a default value for the /repository/files wait parameter

bcc580b

Fix rounding of /repository/file retry-after value

0d3096d

Refine parameter hoisting note

7ffbd1f

[1/2] [a] Fix: Can't use curl to download a single manifest in one in…

bf12d9e

…vocation (#5918) Add support for POST requests to the manifest endpoint

[2/2] [a] Fix: Can't use curl to download a single manifest in one in…

6d3c890

…vocation (#5918) Add a wait parameter option to the manifest endpoint

	initiate = method in ['PUT', 'POST']
	initiate = method in {'PUT', 'POST'}

	require(request.method != 'POST' or request.raw_body.decode() == '',
	require(request.method != 'POST' or request.raw_body == b'',

	time slept will be less however if the requested
	time slept will be less if the requested

		Only if the second and subsequent arguments won't fit on one line, do we
		wrap all arguments, one line per argument.

		The exception to this rule are logging method invocations and calls to
		reject(), require(), or the integration test context manager subTest() ::

		raise BRE(f'The {name!r} parameter is not valid JSON')


		def validate_wait(wait: str \| None) -> Optional[int]:

	reject(), require(), or the context manager subTest() ::
	reject(), require(), or uses of TestCase.subTest() ::

	# The waiting should have happened server-side and been
	# The wait should have happened server-side and been

	def server_side_sleep(self, max_seconds: int \| float) -> float:
	def server_side_sleep(self, max_seconds: float) -> float:

	body['Retry-After'] = round(retry_after - time_slept)
	body['Retry-After'] = ceil(retry_after - time_slept)

Conversation

dsotirho-ucsc commented Mar 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Author

Author (partiality)

Author (reindex)

Author (API changes)

Author (upgrading deployments)

Author (hotfixes)

Author (before every review)

Peer reviewer (after approval)

System administrator (after approval)

Operator

Operator (deploy .shared and .gitlab components)

System administrator (post-deploy of .gitlab component)

Operator (deploy runner image)

Operator (sandbox build)

Operator (merge the branch)

Operator (main build)

Operator (reindex)

Operator (mirroring)

Operator

Shorthand for review comments

Uh oh!

coveralls commented Mar 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dsotirho-ucsc commented Mar 29, 2024

Uh oh!

dsotirho-ucsc commented Mar 29, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsotirho-ucsc commented Apr 1, 2024

Uh oh!

nadove-ucsc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsotirho-ucsc commented May 28, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsotirho-ucsc commented Jun 6, 2024

Uh oh!

hannes-ucsc left a comment

Choose a reason for hiding this comment

Uh oh!

dsotirho-ucsc commented Jun 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsotirho-ucsc commented Jun 21, 2024

Uh oh!

hannes-ucsc left a comment

Choose a reason for hiding this comment

Uh oh!

dsotirho-ucsc commented Mar 26, 2024 •

edited

Loading

Operator (deploy `.shared` and `.gitlab` components)

System administrator (post-deploy of `.gitlab` component)

coveralls commented Mar 27, 2024 •

edited

Loading

codecov bot commented Mar 27, 2024 •

edited

Loading

dsotirho-ucsc Aug 13, 2024 •

edited

Loading