Apache Iceberg
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Go to file
Fokko Driesprong 445687d96d
Spec: Clarify `next-row-id` (#12018)
2 days ago
.baseline Core, Kafka, Spark: Use AssertJ instead of JUnit assertions (#11102) 5 months ago
.github Run `java-ci` on changes in `open-api/**` (#11972) 1 week ago
.palantir Core, Flink, Spark: Drop deprecated APIs scheduled for removal in 1.8.0 (#11721) 1 month ago
aliyun/src Aliyun: Remove spring-boot dependency (#11291) 3 months ago
api/src Revert "API: add hashcode cache in StructType (#11764)" (#12007) 4 days ago
arrow/src Spark 3.5: Support default values in vectorized reads (#11815) 1 month ago
aws/src AWS, Core, GCP: Support relative credential endpoint / pass OAuth2 token to credential provider (#11954) 1 week ago
aws-bundle Build: Use the active shadow plugin (#11315) 3 months ago
azure/src Azure: Support WASB scheme in ADLSFileIO (#11830) 3 days ago
azure-bundle Build: Use the active shadow plugin (#11315) 3 months ago
bundled-guava Build: Update NOTICE to include copyright for 2024 (#10471) 7 months ago
common/src Build: Upgrade google-java-format to 1.22.0 (#11050) 4 months ago
core/src Azure: Support WASB scheme in ADLSFileIO (#11830) 3 days ago
data/src Data: Fix Parquet and Avro defaults date/time representation (#11811) 1 month ago
dell/src Build: Upgrade google-java-format to 1.22.0 (#11050) 4 months ago
delta-lake/src API, Arrow, Core, Data, Spark: Replace usage of deprecated ContentFile#path API with location API (#11563) 2 months ago
dev Hive: Remove Hive runtime (#11801) 1 week ago
docker/iceberg-rest-fixture Add `curl` to the `iceberg-rest-fixture` Docker image (#11705) 1 month ago
docs Doc: Add missing content value to manifests table (#11989) 6 days ago
examples Use SessionState to load Hadoop conf (#642) 5 years ago
flink replace legacy converter with new (#11838) 2 weeks ago
format Spec: Clarify `next-row-id` (#12018) 2 days ago
gcp/src AWS, Core, GCP: Support relative credential endpoint / pass OAuth2 token to credential provider (#11954) 1 week ago
gcp-bundle Build: Use the active shadow plugin (#11315) 3 months ago
gradle Build: Bump com.google.cloud:libraries-bom from 26.52.0 to 26.53.0 (#12003) 4 days ago
hive-metastore/src Core: Add support for view-default property in catalog (#11064) 2 weeks ago
kafka-connect Kafka-connect-runtime: remove code duplications in integration tests (#11883) 3 weeks ago
mr Hive: Remove Hive runtime (#11801) 1 week ago
nessie/src Core: Add support for view-default property in catalog (#11064) 2 weeks ago
open-api Core: List namespaces/tables when testing identifier with a dot (#11991) 3 days ago
orc/src Core, Flink, Spark, KafkaConnect: Remove usage of deprecated path API (#11744) 1 month ago
parquet/src Parquet: Use compatible column name to set Parquet bloom filter (#11799) 2 weeks ago
project Build: Enforce one import per line in Scalastyle (#2199) 4 years ago
site Docs: Update Footer Copyright Year (#12011) 2 days ago
snowflake/src Build: Update baseline-java 5.69.0 (#11252) 4 months ago
spark Spark: Don't skip tests in TestSelect for SparkSessionCatalog (#11824) 3 days ago
.asf.yaml Infra: Add manuzhang to collaborators (#11927) 2 weeks ago
.gitattributes Python: Remove python directory and references (#8695) 1 year ago
.gitignore Build: Add .java-version to gitignore (#11167) 4 months ago
CONTRIBUTING.md Docs: Sync contributing page / refer to website for contributing (#9776) 11 months ago
LICENSE Core: Add portable Roaring bitmap for row positions (#11372) 3 months ago
NOTICE Build: Update NOTICE to include copyright for 2024 (#10471) 7 months ago
README.md docs: update `README.md` fix brand name `macOS` (#11964) 1 week ago
baseline.gradle Build: Forbid implicit case fall-through without a comment and enable couple more recommendable error-prone checks (#11251) 4 months ago
build.gradle Build: Bump openapi-generator plugin from 6.6.0 to 7.10.0 (#11970) 1 week ago
deploy.gradle open-api: Build runtime jar for test fixture (#11279) 3 months ago
doap.rdf Add C++ to the list of languages in `doap.rdf` (#11714) 1 month ago
gradle.properties Hive: Remove Hive runtime (#11801) 1 week ago
gradlew Gradle: Update `gradlew` with better `APP_HOME` definition (#11869) 4 weeks ago
jitpack.yml Update build for Apache releases (#531) 5 years ago
jmh.gradle Flink: adjust code for the new 1.20 module. 6 months ago
settings.gradle Hive: Remove Hive runtime (#11801) 1 week ago
tasks.gradle Drop support for Java 8 (#10518) 6 months ago

README.md

Iceberg

Slack

Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time.

Background and documentation is available at https://iceberg.apache.org

Status

Iceberg is under active development at the Apache Software Foundation.

The Iceberg format specification is stable and new features are added with each version.

The core Java library is located in this repository and is the reference implementation for other libraries.

Documentation is available for all libraries and integrations.

Collaboration

Iceberg tracks issues in GitHub and prefers to receive contributions as pull requests.

Community discussions happen primarily on the dev mailing list or on specific issues.

Building

Iceberg is built using Gradle with Java 11, 17, or 21.

  • To invoke a build and run tests: ./gradlew build
  • To skip tests: ./gradlew build -x test -x integrationTest
  • To fix code style for default versions: ./gradlew spotlessApply
  • To fix code style for all versions of Spark/Hive/Flink:./gradlew spotlessApply -DallModules

Iceberg table support is organized in library modules:

  • iceberg-common contains utility classes used in other modules
  • iceberg-api contains the public Iceberg API
  • iceberg-core contains implementations of the Iceberg API and support for Avro data files, this is what processing engines should depend on
  • iceberg-parquet is an optional module for working with tables backed by Parquet files
  • iceberg-arrow is an optional module for reading Parquet into Arrow memory
  • iceberg-orc is an optional module for working with tables backed by ORC files
  • iceberg-hive-metastore is an implementation of Iceberg tables backed by the Hive metastore Thrift client
  • iceberg-data is an optional module for working with tables directly from JVM applications

Iceberg also has modules for adding Iceberg support to processing engines:

  • iceberg-spark is an implementation of Spark's Datasource V2 API for Iceberg with submodules for each spark versions (use runtime jars for a shaded version)
  • iceberg-flink contains classes for integrating with Apache Flink (use iceberg-flink-runtime for a shaded version)
  • iceberg-mr contains an InputFormat and other classes for integrating with Apache Hive

NOTE

The tests require Docker to execute. On macOS (with Docker Desktop), you might need to create a symbolic name to the docker socket in order to be detected by the tests:

sudo ln -s $HOME/.docker/run/docker.sock /var/run/docker.sock

Engine Compatibility

See the Multi-Engine Support page to know about Iceberg compatibility with different Spark, Flink and Hive versions. For other engines such as Presto or Trino, please visit their websites for Iceberg integration details.

Implementations

This repository contains the Java implementation of Iceberg. Other implementations can be found at: