Predictive skills by ifountalis · Pull Request #12 · RelationalAI/rai-agent-skills

ifountalis · 2026-04-06T22:58:20Z

No description provided.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cafzal · 2026-04-06T23:26:45Z

Predictive Skills Review (Pass 2)

Re-reviewed after commits 8c33be2 (address PR feedback), cc21d18 (fix link prediction issue), and 283257d (merge main). Many items from the first pass are resolved. Updated findings below.

Resolved from Pass 1: Discovery routing (#1), case bug in modeling example (#2), graph-analysis disambiguation (#5), placeholder consistency (#7), pitfall consequences (#9), non-pitfall row removed (#10), register_and_load rebuild uncommented (#13), description shortened (#14). Training examples now document required variables in docstrings (partial #8).

Must Fix

1. Discovery predictive.md reference still treats GNN as "future." rai-discovery/references/predictive.md:42 says rai_predictive mode is "Future — when the RAI predictive reasoner is platform-integrated." But the predictive skills now exist and provide a full GNN pipeline. The routing table (SKILL.md:394) correctly points to rai-predictive-modeling, but the reference file contradicts it. An agent reading the reference will think GNN training isn't available yet. Lines 131, 138 repeat this. The predictive routing example (predictive_routing.md) only shows pre_computed mode — needs a GNN/rai_predictive walkthrough.

2. Case mismatch migrated to training example. The modeling fix (link_prediction_snowflake.py) now correctly uses Customer.C_customer_id and Customer.AGE. But train_link_prediction.py:38 still uses Customer.c_customer_id (lowercase) and Customer.age (lowercase, lines 39/45/46). These paired examples reference the same concepts from the same dataset — the casing must match.

3. No guidance on extending an existing ontology for GNN. (Unchanged from Pass 1.) The modeling skill treats concept definition as greenfield. Users coming from rai-build-starter-ontology will have an existing model with Concept("Customer") (no identify_by). The skill needs to address: create a new Model or extend? Redefine concepts? What happens to existing Relationships/Properties?

4. No guidance on creating train/val/test split tables. (Unchanged from Pass 1.) All three skills assume split tables exist in Snowflake but no skill explains how to create them or what schema they require.

Should Fix

5. No cross-skill pattern for graph metrics → GNN features. (Unchanged.) rai-discovery documents the Graph → Predictive chain pattern (SKILL.md:139) but the predictive modeling skill doesn't show how to consume graph-analysis outputs (centrality scores, community labels) as PropertyTransformer features.

6. Link prediction training example includes domain-specific filtering. train_link_prediction.py:45-46 adds Customer.age < 50, Customer.age > 20. This obscures the pure link-prediction pattern and uses the lowercase .age that doesn't match the modeling example's .AGE. Strip it.

7. Unused GNN import in modeling skill and examples. The Quick Reference (SKILL.md:42) and all three modeling examples import GNN but never use it — GNN belongs to the training phase.

Nice to Have

8. No ontology-type → PropertyTransformer annotation mapping. A small table ("Float → usually continuous, String enum → usually category") would help an agent translate existing ontology types to PT annotations.

Works Well

Internal consistency across the three skills is strong — shared terminology, parameter naming, code patterns all align
Boundary separation (modeling / training / management) is clean and logical
Reference files are well-scoped — hyperparameters, task-types, prediction-attributes are exactly the right granularity
Common Pitfalls tables now include consequence descriptions (RuntimeError, TypeError, etc.)
Examples cover all three main task types with consistent DB/SCHEMA placeholders
Relationship arity rules table in modeling is excellent — clear, complete, copy-paste ready
The "What to include vs. omit when loading" table in management prevents a common class of errors
Training examples now document required variables in docstrings
register_and_load.py Session 2 now shows the graph/PT rebuild inline
Discovery skill correctly routes predictive → rai-predictive-modeling
Link prediction guidance added: "do not choose for the user" between link_prediction and repeated_link_prediction (modeling SKILL.md:195)

cafzal

see comment

- Add predictive routing in rai-problem-discovery post-discovery section - Fix case inconsistency (c_customer_id → C_customer_id) in link prediction example - Add Graph constructor disambiguation in rai-graph-analysis and rai-predictive-modeling - Standardize DB/SCHEMA placeholders across all examples and SKILL.md code blocks - Add file pointers to paired modeling examples in training example headers - Add failure consequences (error types) to pitfall tables across all three skills - Remove non-pitfall row (pt=None) from training pitfalls table - Uncomment required graph/PT rebuild in register_and_load.py Session 2 - Shorten rai-predictive-modeling frontmatter description Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ifountalis · 2026-04-07T13:06:51Z

@cafzal thanks for the thorough review. Here's what we addressed and what we deferred, point by point.

Addressed

Update prescriptive skills for PyRel PR #730 API changes #1 — No routing from rai-problem-discovery to predictive skills: Added explicit routing table: predictive →
rai-predictive-modeling alongside prescriptive/graph/rules in post-discovery section
Remove auto_suspend_mins / await_storage_vacuum from skill docs #2 — Case inconsistency c_customer_id vs C_customer_id: Fixed — all references now use C_customer_id matching the identify_by
declaration
Add multi-objective optimization (epsilon constraint) across prescriptive skills #5 — Graph constructor overlap — no disambiguation: Added cross-references in both skills' "When NOT to use" sections
Add decomposition, constraint, and advanced modeling patterns to ontology design skill #7 — Inconsistent DB/schema placeholders: Standardized all examples and SKILL.md code blocks to DB.SCHEMA.* / database="DB",
schema="SCHEMA"
Improve graph analysis skill based on domain expert feedback #8 — Training examples aren't self-contained: Added file pointers to paired modeling examples (e.g., See:
examples/node_classification_snowflake.py) and a Required variables: line
Add print()-based debugging patterns for readable DSL repr #9 — Pitfall "Cause" column lacks consequences: Added failure modes (e.g., TypeError, RuntimeError, data leakage) to all pitfall
rows across all three skills
small consistency fixes #10 — Non-pitfall in pitfalls table (pt=None): Removed the row
Fix pitfalls and limitations from workflow eval runs #13 — Commented-out graph rebuild in register_and_load.py: Uncommented and marked as # REQUIRED
Fix skill gaps from reasoner workflow eval run #14 — Long frontmatter description: Shortened to one line

Not Addressed

Improve workflow interactivity and question discovery #3 — No guidance on extending existing ontology for GNN: Out of scope for this PR — @cafzal can you confirm this is a separate
concern? The interaction between rai-build-starter-ontology concepts and GNN identify_by needs deeper design discussion.
Reduce multi-reasoner workflow friction across 6 skills #4 — No guidance on creating train/val/test split tables: Out of scope — already discussed separately. Split table creation is a
data-prep step outside the skill's scope.
Add pitfalls across pyrel-coding, querying, rules, graph, and prescriptive skills #6 — No cross-skill pattern for graph metrics → GNN features: Out of scope — no validated working pattern exists yet. Adding
speculative guidance could mislead agents.
Fill multi-reasoner gaps + reframe skill examples as functional patterns #11 — Unused GNN import in modeling skill/examples: Deferred — low impact, and having it present for convenience when users
combine phases is arguable.
Predictive skills #12 — Domain-specific filtering in link prediction training example: Deferred — the age filter is noise but doesn't block
correctness.
Update prescriptive skills for SDK 1.0.13 patterns #15 — No ontology-type → PropertyTransformer mapping: Deferred — nice to have but low priority.

Add guidance that agents must not choose between link_prediction and repeated_link_prediction — present both options and ask the user. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

merge to resolve conflict

export_csv now always defaults to True internally so users don't need to set it. CDC-related options (skip_cdc) are removed entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cafzal · 2026-04-21T22:35:36Z

Test Report — Hands-On GNN Pipeline Testing (Updated)

Ran the full predictive pipeline (all three task types) on the H&M dataset. Setup questions raised to the GNN team in #team-prod-neuralai-gnns.

What works end-to-end

PyRel gnn3 branch installs in isolated venv (requires make sync to vendor v0 layer)
GNN, PropertyTransformer import successfully
RAI native app connects, Predictive + Logic reasoners provision (~4s)
Data sync and table materialization for 31M+ row H&M dataset (~4 min)
GNN node/edge table creation succeeds
Dataset preparation (Step 1/3) completes (~8 min)
Trainer configuration (Step 2/3) completes (~2s)
Training job submission (Step 3/3) succeeds — training actually starts ("Preparation finished, starting training 1st epoch")

Issues found (in order of discovery)

1. SKILLS BUG: pt=pt is wrong, should be property_transformer=pt

All three skills and the retail_planning template use pt=pt in the GNN constructor. But the GNN class has property_transformer as the named parameter (not pt). Using pt=pt causes it to leak into **train_params → TrainerConfig(**params) → TypeError: TrainerConfig.__init__() got an unexpected keyword argument 'pt'.

Fix: Change all occurrences of pt=pt to property_transformer=pt in skills and examples. This is the single biggest blocker — without it, nothing trains.

2. has_time_column=True triggers validation error

When enabled, task_info.py hardcodes time_column = "timecolumn" (literal string). The relationalai_gnns dataset sanity check requires at least one data table to have a matching time column. But Transaction (which carries the time_col via PropertyTransformer) is an edge table, and preparation.py only sets time_col on TableType.NODE. Non-temporal mode (has_time_column=False) works fine.

Status: Raised with GNN team. Workaround: use has_time_column=False and non-temporal Relationships.

3. Experiment schema permissions needed for RAI native app

The GNN experiment tracking database/schema must be accessible to the RAI native app. Requires:

GRANT USAGE ON DATABASE <db> TO APPLICATION RELATIONALAI;
GRANT ALL ON SCHEMA <db>.<schema> TO APPLICATION RELATIONALAI;

Not documented in the skills.

4. SSL certificate error during log streaming (environment, not code)

After training starts successfully, log streaming hits an SSL error (unable to load trusted certificates). This is a pyOpenSSL/certifi version conflict in the venv, not a code issue. Training may still be running on the server despite the client-side error.

5. make sync prerequisite not documented

The gnn3 branch requires make sync (which clones and vendors the v0 compatibility layer from relationalai-python@v0.15.12) before the predictive module will import. Without it, from lqp import ir fails.

6. rai_app_name config for non-standard app names

If the RAI native app isn't called RELATIONALAI, the config needs rai_app_name: <app_name>. Not mentioned in skills.

Implications for the skills

Must fix in skills:

Change pt=pt → property_transformer=pt in all SKILL.md code blocks and all example files
This is a bug that makes every GNN code sample non-functional

Should add to skills:

Document the GRANT ... TO APPLICATION RELATIONALAI step for experiment schema
Note that has_time_column=True requires the time_col concept to be a node (not edge) — or clarify the workaround
Document rai_app_name config option

Test environment

PyRel: gnn3 branch @ 9ed0bdafd, with make sync (v0.15.12)
relationalai-gnns: 0.1.5
Snowflake: NDSOEBE.RAI_PROD_PM (jqb21724) with H&M data replicated from NDSOEBE.RAI_GNNS_TEST
Dataset: HM_PYREL (1.37M customers, 105K articles, 31.5M transactions, 3 task schemas)
Test script: non-temporal mode (has_time_column=False, standard link_prediction not repeated_link_prediction)

ifountalis and others added 2 commits March 30, 2026 15:51

predictive management skills

0f445fa

Add node regression examples to predictive modeling and training skills

0693155

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cafzal requested review from agarrard-rai and cafzal April 6, 2026 23:07

cafzal requested changes Apr 6, 2026

View reviewed changes

ifountalis and others added 3 commits April 7, 2026 09:14

Fix link prediction issue

cc21d18

Add guidance that agents must not choose between link_prediction and repeated_link_prediction — present both options and ask the user. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into predictive_skills

283257d

merge to resolve conflict

Remove export_csv and skip_cdc options from predictive training skill

957b938

export_csv now always defaults to True internally so users don't need to set it. CDC-related options (skip_cdc) are removed entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cafzal mentioned this pull request Apr 22, 2026

Predictive skills: two-skill workflow (modeling + training) #21

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictive skills#12

Predictive skills#12
ifountalis wants to merge 6 commits into
mainfrom
predictive_skills

ifountalis commented Apr 6, 2026

Uh oh!

cafzal commented Apr 6, 2026 •

edited

Loading

Uh oh!

cafzal left a comment

Uh oh!

ifountalis commented Apr 7, 2026

Uh oh!

cafzal commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ifountalis commented Apr 6, 2026

Uh oh!

cafzal commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Predictive Skills Review (Pass 2)

Must Fix

Should Fix

Nice to Have

Works Well

Uh oh!

cafzal left a comment

Choose a reason for hiding this comment

Uh oh!

ifountalis commented Apr 7, 2026

Uh oh!

cafzal commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Report — Hands-On GNN Pipeline Testing (Updated)

What works end-to-end

Issues found (in order of discovery)

Implications for the skills

Test environment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cafzal commented Apr 6, 2026 •

edited

Loading

cafzal commented Apr 21, 2026 •

edited

Loading