ilum-data-lineage.mp4
Ilum fork. This is an Ilum-maintained fork of Marquez created while upstream development slowed. We used it to ship critical fixes and additive features without breaking compatibility. From 0.52.x, we’re aligning with upstream and contributing improvements back. Starting with 0.54.x, the API backend has been rewritten in Rust for improved performance and lower resource usage. Learn more in our short write-up: Ilum × Marquez — Project Description & Rationale.
Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem's metadata. It maintains the provenance of how datasets are consumed and produced, provides global visibility into job runtime and frequency of dataset access, centralization of dataset lifecycle management, and much more. Marquez was released and open sourced by WeWork.
Marquez is an LF AI & Data Foundation Graduated project under active development, and we'd love your help!
Want to be added? Send a pull request our way!
Marquez provides a simple way to collect and view dataset, job, and run metadata using OpenLineage. The easiest way to get up and running is with Docker. From the base of the Marquez repository, run:
$ ./docker/up.shBefore cloning Marquez, configure Git to check out files with Unix-style file endings:
$ git config --global core.autocrlf falseVerify that Bash and PostgreSQL have been installed and added to the PATH variable (Git Bash is recommended).
Start all services:
$ sh ./docker/up.shTip: Use the
--buildflag to build images from source, and/or--seedto start Marquez with sample lineage metadata. For a more complete example using the sample metadata, please follow our quickstart guide.
Note: Port 5000 is now reserved for MacOS. If running locally on MacOS, you can run
./docker/up.sh --api-port 9000to configure the API to listen on port 9000 instead. Keep in mind that you will need to update the URLs below with the appropriate port number.
WEB UI
You can open http://localhost:3000 to begin exploring the Marquez Web UI. The UI enables you to discover dependencies between jobs and the datasets they produce and consume via the lineage graph, view run metadata of current and previous job runs, and much more!
HTTP API
The Marquez HTTP API listens on port 5000 for all calls and port 5001 for the admin interface. The admin interface exposes helpful endpoints like /healthcheck and /metrics. To verify the HTTP API server is running and listening on localhost, browse to http://localhost:5001. To begin collecting lineage metadata as OpenLineage events, use the LineageAPI or an OpenLineage integration.
Note: By default, the HTTP API does not require any form of authentication or authorization.
GRAPHQL
To explore metadata via graphql, browse to http://localhost:5000/graphql-playground. The graphql endpoint is currently in beta and is located at http://localhost:5000/api/v1-beta/graphql.
Note: GraphQL is not yet available in the Rust backend. For GraphQL access, use the legacy Java backend.
We invite everyone to help us improve and keep documentation up to date. Documentation is maintained in this repository and can be found under docs/.
Note: To begin collecting metadata with Marquez, follow our quickstart guide. Below you will find the steps to get up and running from source.
Versions of Marquez are compatible with OpenLineage unless noted otherwise. We ensure backward compatibility with a newer version of Marquez by recording events with an older OpenLineage specification version. We strongly recommend understanding how the OpenLineage specification is versioned and published.
| Marquez | OpenLineage | Status |
|---|---|---|
UNRELEASED |
2-0-2 |
CURRENT |
0.54.0 |
2-0-2 |
RECOMMENDED |
0.53.0 |
2-0-2 |
MAINTENANCE |
0.50.0 |
2-0-2 |
DEPRECATED |
0.49.0 |
2-0-2 |
DEPRECATED |
Note: The
openlineage-pythonandopenlineage-javalibraries will a higher version than the OpenLineage specification as they have different version requirements.
We currently maintain three categories of compatibility: CURRENT, RECOMMENDED, and MAINTENANCE. When a new version of Marquez is released, it's marked as RECOMMENDED, while the previous version enters MAINTENANCE mode (which gets bug fixes whenever possible). The unreleased version of Marquez is marked CURRENT and does not come with any guarantees, but is assumed to remain compatible with OpenLineage, although surprises happen and there maybe rare exceptions.
Marquez uses a multi-project structure and contains the following modules:
api-rs: core API in Rust (Axum/SQLx/tokio), replacesapi/api: legacy Java API (Dropwizard), deprecatedweb: web UI used to view metadataclients: clients that implement the HTTP APIchart: helm chart
Note: The
integrationsmodule was removed in0.21.0, so please use an OpenLineage integration to collect lineage events easily.
- Rust stable (1.83+)
- PostgreSQL 16
Note: Docker users don't need a local Rust toolchain. To connect to your running PostgreSQL instance, you will need the standard
psqltool.
To build the entire project run:
cd api-rs
cargo build --workspace # debug build
cargo build --workspace --release # release buildThe executable can be found at api-rs/target/release/marquez-api
To run Marquez, you will have to define marquez.yml. The configuration file is passed to the application and used to specify your database connection. The configuration file creation steps are outlined below.
When creating your database using createdb, we recommend calling it marquez:
$ createdb marquezWith your database created, you can now copy marquez-rs.example.yml:
$ cp marquez-rs.example.yml marquez.ymlConfiguration uses Figment with the MARQUEZ_ prefix and __ (double underscore) for nesting. The following environment variables override config values:
MARQUEZ_DB__HOST,MARQUEZ_DB__PORT,MARQUEZ_DB__NAME,MARQUEZ_DB__USER,MARQUEZ_DB__PASSWORDMARQUEZ_SERVER__PORT,MARQUEZ_SERVER__ADMIN_PORT
Note: The Docker entrypoint also supports legacy
POSTGRES_*andMARQUEZ_DB_*environment variable conventions.
By default, Marquez uses the following ports:
- TCP port
8080is available for the HTTP API server. - TCP port
8081is available for the admin interface.
With the dev config (marquez-rs.dev.yml), ports are 5000 (API) and 5001 (admin).
Note: All of the configuration settings in
marquez.ymlcan be specified either in the configuration file or in an environment variable.
cd api-rs
cargo run --bin marquez-api -- serve --config ../marquez-rs.dev.ymlCLI subcommands: serve (default), db-migrate, db-retention.
Marquez listens on port 8080 for all API calls and port 8081 for the admin interface (or 5000/5001 with the dev config). To verify the HTTP API server is running and listening on localhost, browse to the admin port. We encourage you to familiarize yourself with the data model and APIs of Marquez. To run the web UI, please follow the steps outlined here.
Note: By default, the HTTP API does not require any form of authentication or authorization.
Legacy Java Backend (deprecated)
./gradlew buildThe executable can be found under api/build/libs/
$ cp marquez.example.yml marquez.ymlEnvironment variables: POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD, MARQUEZ_DB_HOST, MARQUEZ_DB_PORT.
$ ./gradlew :api:runShadowOpenLineage: an open standard for metadata and lineage collection
- Website: https://ilum.cloud/
- Source: https://github.com/ilum-cloud/marquez
- Chat: Ilum Slack
See CONTRIBUTING.md for more details about how to contribute.
If you discover a vulnerability in the project, please open an issue and attach the "security" label.
SPDX-License-Identifier: Apache-2.0 Copyright 2018-2025 contributors to the Marquez project.

