Skip to content

Additional logging for HTTP /sql endpoint#11700

Merged
StpMax merged 37 commits into
25-8-2-revertfrom
feat/FQE-1654
Oct 21, 2025
Merged

Additional logging for HTTP /sql endpoint#11700
StpMax merged 37 commits into
25-8-2-revertfrom
feat/FQE-1654

Conversation

@StpMax

@StpMax StpMax commented Oct 9, 2025

Copy link
Copy Markdown
Contributor

Description

  • updated pydantic dependencies
  • fixed issue with unnecessary Preparer creation
  • added additional loging:
    • for each http sql query:
      • start/end time and duration
      • status (and error or rows count)
      • used handlers
    • with 60s interval:
      • System and mindsdb's processes RAM usage
      • CPU usage
      • count of active queries that received by HTTP /sql endpoint

Fixes #FQE-1654

Type of change

(Please delete options that are not relevant)

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ⚡ New feature (non-breaking change which adds functionality)
  • 📢 Breaking change (fix or feature that would cause existing functionality not to work as expected)
  • 📄 This change requires a documentation update

Verification Process

To ensure the changes are working as expected:

  • Test Location: Specify the URL or path for testing.
  • Verification Steps: Outline the steps or queries needed to validate the change. Include any data, configurations, or actions required to reproduce or see the new functionality.

Additional Media:

  • I have attached a brief loom video or screenshots showcasing the new functionality or change.

Checklist:

  • My code follows the style guidelines(PEP 8) of MindsDB.
  • I have appropriately commented on my code, especially in complex areas.
  • Necessary documentation updates are either made or tracked in issues.
  • Relevant unit and integration tests are updated or added.

@entelligence-ai-pr-reviews

Copy link
Copy Markdown
Contributor

🔒 Entelligence AI Vulnerability Scanner

No security vulnerabilities found!

Your code passed our comprehensive security analysis.

📊 Files Analyzed: 5 files


@entelligence-ai-pr-reviews

Copy link
Copy Markdown
Contributor

Review Summary

🏷️ Draft Comments (6)

Skipped posting 6 draft comments that were valid but scored below your review threshold (>=13/15). Feel free to update them here.

mindsdb/__main__.py (1)

610-610: log.log_resources_thread is started as a thread but is not guaranteed to terminate when the main process exits, potentially causing resource leaks or hanging processes.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 5/5
  • Urgency Impact: 2/5
  • Total Score: 10/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/__main__.py, lines 610-610, the log_resources_thread is started as a non-daemon thread, which may prevent the main process from exiting cleanly and cause resource leaks. Change the thread to be a daemon thread by setting t.daemon = True before t.start().

mindsdb/api/http/namespaces/sql.py (3)

66-93: error_traceback is always None and never set to the actual traceback, so the traceback field in hooks.after_api_query is always empty, breaking error reporting contract.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 5/5
  • Urgency Impact: 3/5
  • Total Score: 11/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/api/http/namespaces/sql.py, lines 66-93: The variable `error_traceback` is always None and never set to the actual traceback, so the `traceback` field in `hooks.after_api_query` is always empty, breaking error reporting contract. Update each exception handler to set `error_traceback = traceback.format_exc()` before logging and calling hooks, and use `error_traceback` in the logger.error call.

147-156: The /list_databases endpoint executes a separate SHOW TABLES query for each database in a loop, causing N+1 query inefficiency and high latency for many databases.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 2/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Optimize the `/list_databases` endpoint in mindsdb/api/http/namespaces/sql.py (lines 147-156) to avoid the N+1 query pattern. Instead of running a separate `SHOW TABLES` query for each database in a loop, batch the retrieval of tables for all databases in a single or minimal number of queries, or use a more efficient metadata API if available. This will significantly reduce latency and resource usage when there are many databases.

115-121: logger.info(log_msg) at line 121 logs the full SQL query and error messages, potentially exposing sensitive user data or internal error details to logs, risking data leakage if logs are accessed by unauthorized parties.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 5/5
  • Urgency Impact: 3/5
  • Total Score: 11/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/api/http/namespaces/sql.py, lines 115-121, the code logs the full SQL query and error messages, which can expose sensitive user data or internal error details in logs. Update the logging so that it does NOT include the raw SQL query or error message content. Instead, log only metadata such as execution time, result type, row count, and handler usage. Replace the error message in logs with a generic placeholder like '(error occurred)'.

mindsdb/interfaces/database/integrations.py (1)

505-594: The function get_data_handler (lines 505-594) is too complex (over 10 branches), making it hard to maintain and optimize for performance as the codebase grows.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 2/5
  • Urgency Impact: 2/5
  • Total Score: 6/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor the function `get_data_handler` in `mindsdb/interfaces/database/integrations.py` (lines 505-594). The function is too complex (over 10 branches), making it hard to maintain and optimize. Extract logical blocks (such as error handling, file setup, and handler instantiation) into private helper methods to reduce cyclomatic complexity and improve maintainability, while preserving all existing functionality and performance.

mindsdb/utilities/log.py (1)

471-485: try-except inside a loop in log_resources_thread (lines 447-484) incurs significant performance overhead when iterating over many child processes, especially if exceptions are frequent.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 5/5
  • Urgency Impact: 2/5
  • Total Score: 9/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In mindsdb/utilities/log.py, lines 471-485, the `log_resources_thread` function uses a `try`-`except` inside a loop over child processes, which can cause significant performance overhead if exceptions are frequent. Refactor the loop so that all attribute accesses for each child are grouped in a single try block, and only append/process the child if all attributes are successfully fetched. This reduces exception handling overhead and improves performance when iterating over many processes.

🔍 Comments beyond diff scope (1)
mindsdb/utilities/context.py (1)

48-48: __delattr__ deletes the key 'name' instead of the attribute specified by the name parameter, causing incorrect attribute deletion and potential runtime errors.
Category: correctness


@StpMax StpMax changed the base branch from 25-8-2-revert to develop October 9, 2025 14:23
@StpMax StpMax changed the base branch from develop to 25-8-2-revert October 9, 2025 14:23
@StpMax StpMax requested a review from ea-rus October 10, 2025 06:56
@ea-rus ea-rus changed the base branch from 25-8-2-revert to main October 10, 2025 07:21
@ea-rus ea-rus changed the base branch from main to develop October 10, 2025 07:21
@ea-rus ea-rus changed the base branch from develop to 25-8-2-revert October 10, 2025 07:23
ea-rus
ea-rus previously approved these changes Oct 10, 2025
@StpMax StpMax closed this Oct 10, 2025
@github-actions github-actions Bot locked and limited conversation to collaborators Oct 10, 2025
@StpMax StpMax reopened this Oct 10, 2025
lucas-koontz
lucas-koontz previously approved these changes Oct 15, 2025
lucas-koontz and others added 9 commits October 17, 2025 19:24
Updates CI workflows to use a consistent approach for checking out Github Actions.

This change ensures that the correct version of actions is used, improving the reliability and reproducibility of builds and deployments.

Removes concurrency from the build_deploy_dev workflow as it was causing issues and was not needed, replacing it with concurrency control on the individual jobs.

Moves unit tests into test_on_push and runs them on PRs as this is the expected behaviour.

Adds a check to see if changes are only in docs, and skips tests if so.

Updates deploy workflow to use uv pip install with prerelease enabled.

Switches to using github actions from relative paths.

Adds ENV_URL var to environments.

Fixes json comparison in mysql integration tests.

Removes matrix_includes.json file.

Updates python versions in CI.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants