Skip to content

bug: GraphQL subscription schema mismatch breaks bot startups #307

@flexus-teams

Description

@flexus-teams

Original Logs

20260410 11:16:25.721 fclnt [INFO] FlexusClient service_name=bob_60008_r_ api_key=None http://backend-v1-service.flexus.svc/v1/jailed-bot
20260410 11:16:25.722 stexe [INFO] Connecting ws://backend-v1-service.flexus.svc/v1/jailed-bot
20260410 11:16:25.781 btexe [ERROR] 🛑 That looks bad, my key doesn't work: {'message': "403: Whoops your key didn't work (2).", 'locations': [{'line': 2, 'column': 3}], 'path': ['bot_confirm_exists']}
20260410 11:16:25.786 stexe [INFO] got TransportQueryError (attempt 1/3), sleep 60...
20260410 11:17:25.787 stexe [INFO] Connecting ws://backend-v1-service.flexus.svc/v1/jailed-bot
20260410 11:17:25.819 btexe [INFO] i_am_still_alive bob:60008 group_id=None
20260410 11:17:25.841 stexe [INFO] got TransportQueryError (attempt 2/3), sleep 60...
20260410 11:18:25.843 stexe [INFO] Connecting ws://backend-v1-service.flexus.svc/v1/jailed-bot
20260410 11:18:25.885 btexe [ERROR] 🛑 3 exceptions in 5 min, exiting
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/flexus_client_kit/ckit_service_exec.py", line 26, in run_typical_single_subscription_with_restart_on_network_errors
    await subscribe_and_do_something(fclient, ws_client, *func_args, **func_kwargs)
  File "/usr/local/lib/python3.11/site-packages/flexus_client_kit/ckit_bot_exec.py", line 412, in subscribe_and_produce_callbacks
    async for r in ws.subscribe(
  File "/usr/local/lib/python3.11/site-packages/gql/client.py", line 1426, in subscribe
    async for result in inner_generator:
  File "/usr/local/lib/python3.11/site-packages/gql/client.py", line 1337, in _subscribe
    async for result in inner_generator:
  File "/usr/local/lib/python3.11/site-packages/gql/transport/common/base.py", line 298, in subscribe
    answer_type, execution_result = await listener.get()
                                    ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gql/transport/common/listener_queue.py", line 35, in get
    raise item
  File "/usr/local/lib/python3.11/site-packages/gql/transport/common/base.py", line 221, in _receive_data_loop
    answer_type, answer_id, execution_result = self._parse_answer(
                                               ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gql/transport/websockets_protocol.py", line 423, in _parse_answer
    return self._parse_answer_apollo(json_answer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gql/transport/websockets_protocol.py", line 384, in _parse_answer_apollo
    raise TransportQueryError(

Error Summary

Multiple bot pods in isolated crashloop during startup. The common failure is a GraphQL subscription schema mismatch against FBotThreadsCallsTasks:
Cannot query field 'news_payload_task' ... Did you mean 'news_payload_task_new' / 'news_payload_task_old'?

Observed affected pods include bob, frog, karen, lawyerrat, vix, boss, strategist, productman, admonster, clerkwing, researcher, botticelli, executor.

Stacktrace

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/flexus_client_kit/ckit_service_exec.py", line 26, in run_typical_single_subscription_with_restart_on_network_errors
    await subscribe_and_do_something(fclient, ws_client, *func_args, **func_kwargs)
  File "/usr/local/lib/python3.11/site-packages/flexus_client_kit/ckit_bot_exec.py", line 412, in subscribe_and_produce_callbacks
    async for r in ws.subscribe(
  File "/usr/local/lib/python3.11/site-packages/gql/client.py", line 1426, in subscribe
    async for result in inner_generator:
  File "/usr/local/lib/python3.11/site-packages/gql/client.py", line 1337, in _subscribe
    async for result in inner_generator:
  File "/usr/local/lib/python3.11/site-packages/gql/transport/common/base.py", line 298, in subscribe
    answer_type, execution_result = await listener.get()
                                    ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gql/transport/common/listener_queue.py", line 35, in get
    raise item
  File "/usr/local/lib/python3.11/site-packages/gql/transport/common/base.py", line 221, in _receive_data_loop
    answer_type, answer_id, execution_result = self._parse_answer(
                                               ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gql/transport/websockets_protocol.py", line 423, in _parse_answer
    return self._parse_answer_apollo(json_answer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gql/transport/websockets_protocol.py", line 384, in _parse_answer_apollo
    raise TransportQueryError(

Root Cause

  • File: flexus_client_kit/ckit_bot_exec.py:436
  • Function: subscribe_and_produce_callbacks
  • Why: the generated subscription still requests news_payload_task from FBotThreadsCallsTasks, but the dataclass/schema now exposes news_payload_task_new and news_payload_task_old instead. This breaks startup subscriptions for bots using this client kit build.
  • Git blame: @oleg Klimov in 87119dc / caff82d4 (schema migration and subscription update)

Code Snippet

bot_threads_calls_tasks(...)
    {
        {gql_utils.gql_fields(ckit_bot_query.FBotThreadsCallsTasks)}
    }

Affected

  • Pods: bob, frog, karen, lawyerrat, vix, boss, strategist, productman, admonster, clerkwing, researcher, botticelli, executor
  • Namespace: isolated
  • Occurrences: repeated CrashLoopBackOff on startup

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions