Skip to content

Enable percent-script and .ipynb formats via pluggable AppFileManagers. #5318

Description

@asford

Description

It would be extremely useful for Marimo to add support for .ipynb and percent-formatted .py scripts to it's primary edit/run/export commands.
This would let data teams with existing notebook stacks to progressively adopt Marimo while
(a) allowing authors to use either both Marimo and their preferred editor (VSCode, Jupyter, et al) and
(b) allowing teams to export and deploy apps in Marimo or their current tools (Papermill, Quarto, NBConvert, et al).

Converting to Marimo from Jupyter/VSCode/et al would as simple as...

  • ... open your existing notebooks with marimo edit.
  • ... update the notebook content, if needed, to accomodate reactive execution.
  • ... enjoy!

Fortunately, adding support for additional formats is comparatively easy because...

  • ... Marimo already supports .md or .py formats and can be lightly refactored to expose additional file format handlers.
  • ... most well-structured notebooks can be easily rewritten to function as a script, a standard notebook or a reactive notebook.

Allowing precent-formatted .py scripts as a first-class Marimo format is especially useful for generative workflows,
because modern agent-assisted IDEs (Cursor, VSCode et al) can easily process percent-formatted scripts in topological execution order.
Unfortunately, this isn't true for Marimo's existing .py format, which is too specialized to be effectively generated by agentic tools.

Notebook Format Context

The Marimo .py notebook format is analogous to other plain text notebook which are...

  • ... author-and-reader oriented, human-readable plain-text formats.
  • ... diff-and-source control friendly.
  • ... compatible with existing editors and IDEs.
  • ... generally limited to storing notebook inputs & metadata, defering outputs to rendered formats.

For clarity here we'll refer to these formats as "text-noteboks", and refer to Marimo's .py format in particular as "marimo-script".

Text-notebook formats only contain notebook cell inputs and metadata (the "notebook definition") and do not contain notebook output.
A text-notebook is executed and rendered with rich outputs; we'll call these output formats "rich-notebooks".
For example, marimo export is used to execute a marimo-script and render a standalone rich .html output.

The .ipynb notebook format is a hybrid format, which contains cell inputs, metadata and (optionally) rich cell outputs.
This format is semi-opaque, the definition isn't stored in a readable or editable form and outputs must viewed with an external tool.
We'll refer to notebook documents as "input-ipynb" when they only contain the notebook definition,
and as "rich-ipynb" when they contain both the notebookd definition and rich outputs.

Jupytext is a standard tool for text-notebooks in the Jupyter ecosystem.
When supporting a text-notebook directly in jupyterlab,
Jupytext is used to maintain a paired rich-ipynb matching the text-notebook with outputs.
VSCode, PyCharm and many other IDEs directly support a text-notebook format,
and offer an IDE-like or REPL-like execution of text-notebook formats with rich outputs.
These IDES typically only render outputs and don't save a rich-notebook.

For text-notebooks IDEs and Jupytext have broadly converged on the percent format, which we'll refer to as "percent-script".
This is a representation of Jupyter notebooks as scripts,
in which all cells are explicitly delimited with a commented double percent sign # %%.
This format is isomorphic to the .ipynb notebook format with notebook metadata, cell metadata and cell sources.

There are many other text-notebook and rich-notebook formats, which we will not fully enumerate.
There are multiple script-text formats with different comment-based delimiter schemes.
There are multiple markdown-based text formats, which include input cells as code blocks.
NbConvert and other tools produce a variety of rich-notebook output formats,
which render notebook rich outputs as docs, html pages, etc...

The .ipynb notebook format is a defacto minimum standard format.
Any tool stacks working with notebook formats offers at least conversion to & from the .ipynb,
though tools which support a text-notebook format will often support a handfull of additional popular formats.
For instance, Marimo supports a literate markdown format, compatible with the Quarto ecosystem, which we'll refer to "marimo-md".

Transitions to Marimo

The Marimo tool set (edit/run/export) is analogous to existing jupyter-based tools (jupyterlab/papermill, et al);
but is enhanced with Marimo's reproducible reactive execution model, editor, UI components, et al.

However, right now, adopting Marimo if you already use a text-notebook formats is pseudo-irreversible.
You need to ...

  • ... convert existing text-notebook documents to marimo-script format;
    likely through an (percent-script.py -jupytext-> input.ipynb -marimo import-> marimo-script.py) flow.
  • ... manually updated the exported marimo-script to fixup potential conversion errors.
  • ... validate the executed script via marimo run or marimo export.
  • ... fully adopt Marimo tools with the new marimo-script.py sources, which are no long compatible with your tools.

This can be a blocker if your team has an existing notebook edit/run/export tool stack (e.g. jupyter/papermill/nbconvert).
Individual authors can shim in-and-out of the stack via marimo export and marimo import for local edits,
but data science teams may need to maintain and collaborate on notebooks with their existing stack.

While there are semantic differences between Marimo's execution model and supported features;
most code can be written to be executable as either a standard scripts, a standard notebooks or Marimo reactive notebook.
This is already captured in Marimo's import/export flows,
which handle cell reordering in topological execution order and a limited form of code conversion.

Ideally a team should be able to adopt Marimo progressively, in parallel to their existing notebook stack.
This would be trivial if Marimo supported additional notebook formats in the edit/run/export commands,
and could be used to read/write different text-notebook and input-notebook formats.

Suggested solution

Fortunately, Marimo is already architected to support different file formats,
and supports both marimo-script and quarto-markdown as primary script formats.

Extending the current implementation to support additional, pluggable, formats would only require ...

  • ... lightly refactoring to extract an abstract save/lod file handler interface from the existing file manager.
  • ... lightly refactoring the existing .md and .py file handling into two format-specific file handlers.
  • ... adapting and lightly improving the Marimo's .ipynb import/export flow into a .ipynb file handler.
  • ... exposing a registration hook for additional file format handlers.

Because the vast majority of notebook formats already have some form of bidirectional conversion to/from .ipynb format,
adding a new format handler to Marimo would just require layering a format-specific .ipynb converter on the provided .ipynb format handler.
For instance, the percent formamt can be trivially supported running jupyter-ipynb-marimo conversion on each save & load operation.

MMVP Hack

As an MMVP proof of concept, see the CustomAppFileManager sketched out in server.py.
This implements a minimum shim layer, allowing marimo edit and marimo run to target either .ipynb or jupytext .ipy input formats.
The server's existing file watch mechanism then allows concurrent edits to the file in both marimo and an external IDE.
Nice!

flowchart TD
    rest["... Marimo"] -->|_load_app| A -->|_save_file| rest

    A[CustomAppFileManager] --> B[ResolvedFormat]
    
    B --> C{Format Detection}
    
    subgraph SG1 ["jupytext"]
        percent-marimo-script[marimo-script]<-->|marimo convert
        | percent-input-ipynb[input-ipynb] <-->|jupytext
        | percent-script[("📄 percent-script")]

        style percent-marimo-script stroke-dasharray: 5 5
        style percent-input-ipynb stroke-dasharray: 5 5
    end
    
    subgraph SG2 ["ipynb"]
        ipynb-marimo-script[marimo-script] <-->|marimo convert
        | ipynb-input-ipynb[("📄 ipynb-script")]

        style ipynb-marimo-script stroke-dasharray: 5 5
    end
    
    subgraph SG3 ["marimo"]
        marimo-script[("📄 marimo-script")]
    end
    
    C -->|.py with '# %%'| SG1
    C -->|.ipynb| SG2
    C -->|.py with marimo.App| SG3
    
Loading

This MMVP has uncovered a few needful updates...

  • Marimo's .ipynb import/export flows need to be updated to be bidirectional and metadata preserving.
    For example, right now .ipynb conversion discards notebook-level metadata (i.e. column layout) for the marimo notebook.
  • The FileManager interface needs to be made pluggable, rather than enumerating all file formats.
    It's probably not feasible to burden the Marimo core with support multiple, increasingly esoteric notebook formats.
mmvp_jupytext_marimo_server.py
import shutil
import subprocess
import sys
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Literal, Optional

import jupytext
import marimo._server.start as marimo_server
import typer
from marimo._ast.app import InternalApp
from marimo._server.api.status import HTTPException, HTTPStatus
from marimo._server.file_router import AppFileManager, AppFileRouter
from marimo._server.model import SessionMode
from marimo._server.models.files import FileInfo
from marimo._server.models.home import MarimoFile
from marimo._server.tokens import AuthToken
from marimo._utils.marimo_path import MarimoPath
from typing_extensions import Self

import abcellera.logging as logging

logger = logging.get_logger(__name__)


def run_cmd(cmd, check=True):
    logger.info(f"run", cmd=cmd)
    result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
    if check and result.returncode != 0:
        logger.error(f"run failed", cmd=cmd, log=result.stdout + result.stderr)
        sys.stderr.write(result.stdout)
        sys.stderr.write(result.stderr)
        raise RuntimeError(
            f"Command failed: {cmd=}\n{result.stdout=}\n{result.stderr=}"
        )
    return result


_MISSING: Any = object()


@dataclass
class ResolvedFormat:
    format: Literal["marimo", "percent", "ipynb"]
    basename: str
    src_path: Path
    workdir: Path = _MISSING

    def __post_init__(self):
        if self.workdir == _MISSING:
            # Default workdir to the parent directory of the source path
            self.workdir = self.src_path.parent / "__marimo__" / "jupytext"
            self.workdir.mkdir(parents=True, exist_ok=True)

    def convert_src_to_marimo(self) -> Path:
        # TODO: update commands with without using shell=True
        # pass filepaths et al as strings to run_cmd
        if self.format == "percent":
            ipynb_path = self.workdir / (self.basename + ".in.ipynb")
            marimo_path = self.workdir / (self.basename + ".marimo.py")
            run_cmd(f"jupytext --to ipynb --output {ipynb_path} {self.src_path}")
            run_cmd(f"marimo convert {ipynb_path} -o {marimo_path}")
        elif self.format == "ipynb":
            # conceptually could support a .ipynb file in conversion,
            # but the top-level marimo app path validation doesn't allow it
            # so this path isn't used in practice
            marimo_path = self.workdir / (self.basename + ".marimo.py")
            jpytxt_path = self.workdir / (self.basename + ".jpytxt.py")
            run_cmd(f"marimo convert {self.src_path} -o {marimo_path}")
            run_cmd(f"jupytext --to py:percent --output {jpytxt_path} {self.src_path}")
        elif self.format == "marimo":
            jpytxt_path = self.workdir / (self.basename + ".jpytxt.py")
            ipynb_path = self.workdir / (self.basename + ".in.ipynb")
            # Use marimo export to create .in.ipynb, then jupytext to create .jpytxt.py
            run_cmd(
                f"marimo export ipynb --include-outputs {self.src_path} -o {ipynb_path}"
            )
            run_cmd(f"jupytext --to py:percent --output {jpytxt_path} {ipynb_path}")
            logger.info(f"Reading app from file: {self.src_path}")
            marimo_path = self.src_path
        else:
            raise NotImplementedError("Unsupported format: " + self.format)

        return marimo_path

    @property
    def marimo_path(self) -> Path:
        if self.format == "percent":
            return self.workdir / (self.basename + ".marimo.py")
        elif self.format == "ipynb":
            return self.workdir / (self.basename + ".marimo.py")
        elif self.format == "marimo":
            return self.src_path
        else:
            raise NotImplementedError(self.format)

    def convert_marimo_to_src(self) -> Path:
        if self.format == "percent":
            ipynb_path = self.workdir / (self.basename + ".out.ipynb")
            # Export to .out.ipynb
            run_cmd(f"marimo export ipynb {self.marimo_path} -o {ipynb_path}")
            # Convert to .jpytxt.py (overwrite source)
            run_cmd(f"jupytext --to py:percent --output {self.src_path} {ipynb_path}")
            return self.src_path
        elif self.format == "ipynb":
            base = self.src_path.name.removesuffix(".ipynb")
            ipynb_path = self.workdir / (self.basename + ".out.ipynb")
            jpytxt_path = self.workdir / (self.basename + ".jpytxt.py")
            # Export to .out.ipynb
            run_cmd(f"marimo export ipynb {self.marimo_path} -o {ipynb_path}")
            # Convert to .jpytxt.py
            run_cmd(f"jupytext --to py:percent --output {jpytxt_path} {ipynb_path}")
            # Copy .out.ipynb to requested .ipynb

            shutil.copy2(ipynb_path, self.src_path)
            return self.src_path
        elif self.format == "marimo":
            ipynb_path = self.workdir / (self.basename + ".out.ipynb")
            jpytxt_path = self.workdir / (self.basename + ".jpytxt.py")

            # Export to .out.ipynb
            run_cmd(f"marimo export ipynb {self.marimo_path} -o {ipynb_path}")
            # Convert to .jpytxt.py
            run_cmd(f"jupytext --to py:percent --output {jpytxt_path} {ipynb_path}")
            # Copy .out.ipynb to .ipynb next to source
            ipynb_out = self.src_path.parent / (self.basename + ".ipynb")

            shutil.copy2(ipynb_path, ipynb_out)
            return self.src_path
        else:
            raise NotImplementedError("Unsupported format: " + self.format)

    @classmethod
    def resolve_format(cls, src_path: Path) -> Self:
        """Resolve the format of the source file based on its extension."""
        # TODO expand to support markdown formats?

        # direct extension overrides for marimo/jupytxt/ipynb files
        if src_path.name.endswith("marimo.py"):
            return cls("marimo", src_path.name.removesuffix(".marimo.py"), src_path)
        elif src_path.name.endswith("jupytxt.py"):
            return cls("percent", src_path.name.removesuffix(".jpytxt.py"), src_path)
        elif src_path.suffix == ".ipynb":
            return cls("ipynb", src_path.stem, src_path)

        # otherwise, verify that suffix is .py
        # if jupytext metadata is present, verify that it's in percent format
        # if not, detect if it contains "app = marimo.App" to infer marimo format
        # otherwise, default to jupytext percent format
        if not src_path.suffix == ".py":
            logger.error(
                "Unsupported file extension for marimo app",
                src_path=src_path,
                suffix=src_path.suffix,
            )
            raise ValueError(f"Unsupported file extension: {src_path.suffix}")

        src_content = src_path.read_text() if src_path.exists() else ""
        if "text_representation" in (
            metadata := jupytext.formats.read_metadata(src_content, src_path.suffix)
        ).get("jupytext", {}):
            format = jupytext.formats.format_name_for_ext(metadata, src_path.suffix)
            if format == "percent":
                logger.info(
                    "Resolved percent jupytext metadata",
                    src_path=src_path,
                    format=format,
                )
                return cls("percent", src_path.stem, src_path)
            elif not format:
                pass
            else:
                logger.error(
                    "Resolved unsupported jupytext format",
                    src_path=src_path,
                    format=format,
                )
                raise ValueError(
                    f"Resolved unsupported jupytext format: {src_path=} {format=}"
                )

        if "app = marimo.App" in src_content:
            logger.info("Inferred marimo format", src_path=src_path)
            return cls("marimo", src_path.stem, src_path)

        logger.info(
            "No jupytext metadata found, defaulting percent format.",
            src_path=src_path,
        )

        return cls("percent", src_path.stem, src_path)


class CustomAppFileManager(AppFileManager):
    def __init__(
        self,
        filename: Optional[str] = None,
        readonly: bool = False,
        config_kwargs: dict[str, Any] = dict(),
    ):
        super().__init__(filename, **config_kwargs)
        self._readonly = readonly

    @property
    def readonly(self) -> bool:
        return self._readonly

    @logging.log_span_calls(logger.info, "_load_app", capture_args=True)
    def _load_app(self, path: Optional[str]) -> "InternalApp":
        """Read the app from the file, converting .jpytxt.py or .ipynb to .marimo.py if needed."""
        # if the path doesn't exist, return an empty app
        if not path or not Path(path).exists():
            return super()._load_app(None)

        # TODO expand this clause to check if path is a jupytext compatible source file in the py:percent format using the jupytext api or cli
        # if so, or if the path ends with .jpytxt.py, then use this codepath
        resolved = ResolvedFormat.resolve_format(Path(path))
        marimo_path = resolved.convert_src_to_marimo()

        logger.info(logging.func_name(), resolved=marimo_path)
        app = super()._load_app(str(marimo_path))

        # TODO jupytxt conversion doesn't preserve and store marimo layout metadata
        #
        # the app load ignores default_width and defaults to "compact" if not specified in the nb file
        # (which doesn't match default config)
        # default_width is only used to set the initial width when creating new notebooks
        #
        # override with the config default_width
        # TODO file bug on default width when loading w/o metadata
        app.update_config(dict(width=self._default_width))

        return app

    def _save_file(
        self,
        filename: str,
        codes: list[str],
        names: list[str],
        configs: list[Any],  # CellConfig, but keep Any for compatibility
        app_config: Any,  # _AppConfig, but keep Any for compatibility
        persist: bool,
        previous_filename: Optional[str] = None,
    ) -> str:
        if self.readonly:
            raise ValueError("This file manager is readonly and cannot save files.")
        logger.info(
            logging.func_name(),
            filename=filename,
            previous_filename=previous_filename,
            persist=persist,
        )
        resolved = ResolvedFormat.resolve_format(Path(filename))

        # Save to .marimo.py in workdir
        super()._save_file(
            str(resolved.marimo_path),
            codes,
            names,
            configs,
            app_config,
            persist,
            previous_filename,
        )

        return str(resolved.convert_marimo_to_src())


@dataclass
class CustomSingleRouter(AppFileRouter):
    # TODO investigate call flow in AppFileRouter interfaces
    _file: MarimoPath
    readonly: bool = False

    @property
    def files(self) -> list:
        return [
            # FileInfo expects: id, name, path, last_modified, is_directory, is_marimo_file
            # We'll use the same logic as from_filename
            FileInfo(
                id=self._file.absolute_name,
                name=self._file.relative_name,
                path=self._file.absolute_name,
                last_modified=self._file.last_modified,
                is_directory=False,
                is_marimo_file=True,
            )
        ]

    def get_unique_file_key(self) -> str:
        return self._file.absolute_name

    def maybe_get_single_file(self):
        return MarimoFile(
            name=self._file.relative_name,
            path=self._file.absolute_name,
            last_modified=self._file.last_modified
            if self._file.path.exists()
            else None,
        )

    def get_file_manager(
        self,
        key: str,
        # marimo passes AppFileManager config as kwargs
        **config_kwargs,
    ):
        logger.info("get_file_manager", key=key, config_kwargs=config_kwargs)

        # Use CustomAppFileManager instead of AppFileManager
        if key.startswith(AppFileRouter.NEW_FILE):
            return CustomAppFileManager(
                None, readonly=self.readonly, config_kwargs=config_kwargs
            )

        # assert that the provided key matches this router's file
        if key == self.get_unique_file_key():
            return CustomAppFileManager(
                key, readonly=self.readonly, config_kwargs=config_kwargs
            )

        raise HTTPException(
            status_code=HTTPStatus.NOT_FOUND,
            detail=f"File {key} not found in router for {self._file.absolute_name}",
        )


def start(
    mode: SessionMode,
    name: Path,
    # different from marimo
    readonly: bool = False,
    watch: bool = True,
    include_code: bool = True,
    # marimo
    port: Optional[int] = None,
    host: str = "127.0.0.1",
    proxy: Optional[str] = None,
    headless: bool = False,
    token: bool = False,
    token_password: Optional[str] = None,
    session_ttl: int = 120,
    base_url: str = "",
    allow_origins: list[str] = [],
    redirect_console_to_browser: bool = False,
    args: list[str] = [],
) -> None:
    """Start marimo server with default settings, vals extracted from click parameters of marimo._cli.cli"""
    marimo_server.start(
        file_router=CustomSingleRouter(MarimoPath(name), readonly=readonly),
        mode=mode,
        include_code=include_code,
        watch=watch,
        cli_args=dict(),  # TODO: parse cli_args from sys.argv or click args
        argv=list(args) if isinstance(args, (list, tuple)) else [],
        development_mode=False,
        quiet=False,
        ttl_seconds=session_ttl,
        headless=headless,
        port=port,
        host=host,
        proxy=proxy,
        base_url=base_url,
        allow_origins=tuple(allow_origins),
        auth_token=(
            (AuthToken(token_password) if token_password else AuthToken.random())
            if token
            else AuthToken("")
        ),
        redirect_console_to_browser=redirect_console_to_browser,
    )


main = typer.Typer()
main.command(short_help="Launch marimo server.")(start)

if __name__ == "__main__":
    main()

Will you submit a PR?

  • Yes

Alternative

No response

Additional context

It would be possible, and potentially quite useful, to extend this idea to other non-plain-text input formats.
For example, the same file handler logic could be extended to support directly editing (and updating) a standalone .html marimo export.
However, this would require extending the FileHandler interface to expose the current app state for efficient exports.
We have no investigated this refactor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions