In this document, you will find some guidelines on contributing to Apache Iceberg. Please keep in mind that none of
these are hard rules and they're meant as a collection of helpful suggestions to make contributing as seamless of an
experience as possible.
If you are thinking of contributing but first would like to discuss the change you wish to make, we welcome you to
head over to the [Community](https://iceberg.apache.org/community/) page on the official Iceberg documentation site
to find a number of ways to connect with the community, including slack and our mailing lists. Of course, always feel
free to just open a [new issue](https://github.com/apache/iceberg/issues/new) in the GitHub repo.
## Pull Request Process
Pull requests are the preferred mechanism for contributing to Iceberg
* PRs are automatically labeled based on the content by our github-actions labeling action
* It's helpful to include a prefix in the summary that provides context to PR reviewers, such as `Build:`, `Docs:`, `Spark:`, `Flink:`, `Core:`, `API:`
* If a PR is related to an issue, adding `Closes #1234` in the PR description will automatically close the issue and helps keep the project clean
* If a PR is posted for visibility and isn't necessarily ready for review or merging, be sure to convert the PR to a draft
## Building the Project Locally
Please refer to the [Building](https://github.com/apache/iceberg#building) section of the main readme for instructions
on how to build iceberg locally.
## Website and Documentation Updates
The [Iceberg website](https://iceberg.apache.org/) and documentations are hosted in a different repository [iceberg-docs](https://github.com/apache/iceberg-docs).
Read the repository README for contribution guidelines for the website and documentation.
## Semantic Versioning
Apache Iceberg leverages [semantic versioning](https://semver.org/#semantic-versioning-200) to ensure compatibility
for developers and users of the iceberg libraries as APIs and implementations evolve. The requirements and
guarantees provided depend on the subproject as described below:
### Major Version Deprecations Required
__Modules__
`iceberg-api`
The API subproject is the main interface for developers and users of the Iceberg API and therefore has the strongest
guarantees. Evolution of the interfaces in this subproject are enforced by [Revapi](https://revapi.org/) and require
explicit acknowledgement of API changes.
All public interfaces and classes require one major version for deprecation cycle. Any backward incompatible changes
should be annotated as `@Deprecated` and removed for the next major release. Backward compatible changes are allowed
within major versions.
### Minor Version Deprecations Required
__Modules__
`iceberg-common`
`iceberg-core`
`iceberg-data`
`iceberg-orc`
`iceberg-parquet`
Changes to public interfaces and classes in the subprojects listed above require a deprecation cycle of one minor
release. These projects contain common and internal code used by other projects and can evolve within a major release.
Minor release deprecation will provide other subprojects and external projects notice and opportunity to transition
to new implementations.
### Minor Version Deprecations Discretionary
__modules__ (All modules not referenced above)
Other modules are less likely to be extended directly and modifications should make a good faith effort to follow a
minor version deprecation cycle. If there are significant structural or design changes that result in deprecations
being difficult to orchestrate, it is up to the committers to decide if deprecation is necessary.
## Deprecation Notices
All interfaces, classes, and methods targeted for deprecation must include the following:
1. `@Deprecated` annotation on the appropriate element
2. `@depreceted` javadoc comment including: the version for removal, the appropriate alternative for usage
3. Replacement of existing code paths that use the deprecated behavior
Example:
```java
/**
* Set the sequence number for this manifest entry.
*
* @param sequenceNumber a sequence number
* @deprecated since 1.0.0, will be removed in 1.1.0; use dataSequenceNumber() instead.
*/
@Deprecated
void sequenceNumber(long sequenceNumber);
```
## Adding new functionality without breaking APIs
Ideally, we'd want to add new functionality without breaking existing APIs, especially within the scope of the API modules that are being checked by [Revapi](https://revapi.org/).
Let's assume we'd want to add a `createBranch(String name)` method to the `ManageSnapshots` API.
The most straight-forward way would be to add the below code:
```java
public interface ManageSnapshots extends PendingUpdate<Snapshot> {
// existing code...
// adding this method introduces an API-breaking change
ManageSnapshots createBranch(String name);
}
```
And then add the implementation:
```java
public class SnapshotManager implements ManageSnapshots {
// existing code...
@Override
public ManageSnapshots createBranch(String name, long snapshotId) {
// GOOD: method calls at the same level, arguments indented
SomeObject myNewObject = SomeObject
.builder(schema, partitionSpec,
sortOrder)
.withProperty("x", "1")
.build()
```
#### Method naming
1. Make method names as short as possible, while being clear. Omit needless words.
2. Avoid `get` in method names, unless an object must be a Java bean.
* In most cases, replace `get` with a more specific verb that describes what is happening in the method, like `find` or `fetch`.
* If there isn't a more specific verb or the method is a getter, omit `get` because it isn't helpful to readers and makes method names longer.
3. Where possible, use words and conjugations that form correct sentences in English when read
* For example, `Transform.preservesOrder()` reads correctly in an if statement: `if (transform.preservesOrder()) { ... }`
#### Boolean arguments
Avoid boolean arguments to methods that are not `private` to avoid confusing invocations like `sendMessage(false)`. It is better to create two methods with names and behavior, even if both are implemented by one internal method.
```java
// prefer exposing suppressFailure in method names
.hasMessage("User 'testUser' has no permission to create namespace");
```
Checks on exceptions should always make sure to assert that a particular exception message has occurred.
### Awaitility
Avoid using `Thread.sleep()` in tests as it leads to long test durations and flaky behavior if a condition takes slightly longer than expected.
```java
deleteTablesAsync();
Thread.sleep(3000L);
assertThat(tables()).isEmpty();
```
A better alternative is using [Awaitility](https://github.com/awaitility/awaitility) to make sure `tables()` are eventually empty. The below example will run the check
with a default polling interval of **100 millis**:
Please refer to the [usage guide](https://github.com/awaitility/awaitility/wiki/Usage) of [Awaitility](https://github.com/awaitility/awaitility) for more usage examples.
### JUnit4 / JUnit5
Iceberg currently uses a mix of JUnit4 (`org.junit` imports) and JUnit5 (`org.junit.jupiter.api` imports) tests. To allow an easier migration to JUnit5 in the future, new test classes
that are being added to the codebase should be written purely in JUnit5 where possible.
Please refer to the [contributing](https://iceberg.apache.org/contribute/) section for instructions
@ -96,6 +96,7 @@ The API subproject is the main interface for developers and users of the Iceberg
guarantees.
Evolution of the interfaces in this subproject are enforced by [Revapi](https://revapi.org/) and require
explicit acknowledgement of API changes.
All public interfaces and classes require one major version for deprecation cycle.
Any backward incompatible changes should be annotated as `@Deprecated` and removed for the next major release.
Backward compatible changes are allowed within major versions.
@ -111,6 +112,7 @@ __Modules__
Changes to public interfaces and classes in the subprojects listed above require a deprecation cycle of one minor
release.
These projects contain common and internal code used by other projects and can evolve within a major release.
Minor release deprecation will provide other subprojects and external projects notice and opportunity to transition
to new implementations.
@ -121,6 +123,7 @@ __modules__ (All modules not referenced above)
Other modules are less likely to be extended directly and modifications should make a good faith effort to follow a
minor version deprecation cycle.
If there are significant structural or design changes that result in deprecations
being difficult to orchestrate, it is up to the committers to decide if deprecation is necessary.
@ -145,6 +148,89 @@ Example:
void sequenceNumber(long sequenceNumber);
```
## Adding new functionality without breaking APIs
When adding new functionality, make sure to avoid breaking existing APIs, especially within the scope of the API modules that are being checked by [Revapi](https://revapi.org/).
Assume adding a `createBranch(String name)` method to the `ManageSnapshots` API.
The most straight-forward way would be to add the below code:
```java
public interface ManageSnapshots extends PendingUpdate<Snapshot> {
// existing code...
// adding this method introduces an API-breaking change
ManageSnapshots createBranch(String name);
}
```
And then add the implementation:
```java
public class SnapshotManager implements ManageSnapshots {
// existing code...
@Override
public ManageSnapshots createBranch(String name, long snapshotId) {
// if any key doesn't exist, it won't show the content of the map
assertThat(map.get("key1")).isEqualTo("value1");
assertThat(map.get("key2")).isNotNull();
assertThat(map.get("key3")).startsWith("3.5");
// better: all checks can be combined and the content of the map will be shown if any check fails
assertThat(map)
.containsEntry("key1", "value1")
.containsKey("key2")
.hasEntrySatisfying("key3", v -> assertThat(v).startsWith("3.5"));
```
```java
// bad
@ -346,50 +444,3 @@ no "push a single button to get a performance comparison" solution available, th
post the results on the PR.
See [Benchmarks](benchmarks.md) for a summary of available benchmarks and how to run them.
## Website and Documentation Updates
Currently, there is an [iceberg-docs](https://github.com/apache/iceberg-docs) repository
which contains the HTML/CSS and other files needed for the [Iceberg website](https://iceberg.apache.org/).
The [docs folder](https://github.com/apache/iceberg/tree/master/docs) in the Iceberg repository contains
the markdown content for the documentation site. All markdown changes should still be made
to this repository.
### Submitting Pull Requests
Changes to the markdown contents should be submitted directly to this repository.
Changes to the website appearance (e.g. HTML, CSS changes) should be submitted to the [iceberg-docs repository](https://github.com/apache/iceberg-docs) against the `main` branch.
Changes to the documentation of old Iceberg versions should be submitted to the [iceberg-docs repository](https://github.com/apache/iceberg-docs) against the specific version branch.
### Reporting Issues
All issues related to the doc website should still be submitted to the [Iceberg repository](https://github.com/apache/iceberg).
The GitHub Issues feature of the [iceberg-docs repository](https://github.com/apache/iceberg-docs) is disabled.
### Running Locally
Clone the [iceberg-docs](https://github.com/apache/iceberg-docs) repository to run the website locally:
```shell
git clone git@github.com:apache/iceberg-docs.git
cd iceberg-docs
```
To start the landing page site locally, run:
```shell
cd landing-page && hugo serve
```
To start the documentation site locally, run:
```shell
cd docs && hugo serve
```
If you would like to see how the latest website looks based on the documentation in the Iceberg repository, you can copy docs to the iceberg-docs repository by: