Skip to content

Conversation

@cyclux
Copy link
Contributor

@cyclux cyclux commented Dec 5, 2025

This pull request introduces the initial setup for the Jaffle Shop data integration, focusing on enabling CSV-to-Parquet conversion and preparing the environment for efficient data processing and integration with Snowflake and GCP. Key changes include a new conversion script, a comprehensive project configuration, and usage documentation.

Data conversion and workflow setup

  • Added convert_jaffle_csv_to_parquet.py script to automate conversion of Jaffle Shop CSV files into Parquet format, improving data storage and query efficiency for downstream use in Snowflake.
  • Added GENERATE_JAFFLE_SHOP_PARQUET.md documentation to guide users through generating CSV data, converting it to Parquet, and uploading Parquet files to GCP, including prerequisites and step-by-step instructions.

Project configuration and dependencies

  • Introduced pyproject.toml for project metadata, dependency management (including pandas, pyarrow, fastparquet, Snowflake connectors), development tools, and configuration for code quality and testing.

…rmat and instructions to upload to GCS bucket
@cyclux cyclux self-assigned this Dec 5, 2025
@cyclux cyclux linked an issue Dec 5, 2025 that may be closed by this pull request
@cyclux cyclux requested review from Copilot and srnnkls December 5, 2025 17:55
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request establishes the initial infrastructure for Jaffle Shop data integration, introducing a Python script to convert CSV files to Parquet format, comprehensive project configuration, and step-by-step documentation. The changes prepare the environment for efficient data processing and integration with Snowflake and GCP.

Key Changes:

  • Added CSV-to-Parquet conversion script with automated processing for seven Jaffle Shop data tables
  • Configured project dependencies and development tooling via pyproject.toml with Python 3.12+ support
  • Documented complete workflow from CSV generation through GCP upload

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
integration/pyproject.toml Establishes project metadata, dependencies (pandas, pyarrow, Snowflake connectors), and development tool configurations for the integration package
integration/jaffle-shop-data/convert_jaffle_csv_to_parquet.py Implements automated CSV-to-Parquet conversion for seven Jaffle Shop datasets with basic error handling
integration/jaffle-shop-data/GENERATE_JAFFLE_SHOP_PARQUET.md Provides user documentation covering prerequisites, CSV generation, conversion steps, and GCP upload instructions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)

JAFFLE_PARQUET_DATA_PATH = JAFFLE_CSV_DATA_PATH / "parquet"
Path.mkdir(JAFFLE_PARQUET_DATA_PATH, exist_ok=True)
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to Path.mkdir() is incorrect. The method should be called on the path instance, not on the Path class. Change Path.mkdir(JAFFLE_PARQUET_DATA_PATH, exist_ok=True) to JAFFLE_PARQUET_DATA_PATH.mkdir(exist_ok=True) or JAFFLE_PARQUET_DATA_PATH.mkdir(parents=True, exist_ok=True) to ensure parent directories are also created if needed.

Suggested change
Path.mkdir(JAFFLE_PARQUET_DATA_PATH, exist_ok=True)
JAFFLE_PARQUET_DATA_PATH.mkdir(parents=True, exist_ok=True)

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@srnnkls srnnkls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Host jaffle-shop parquet files on GCS

3 participants