diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e160aaec17..49f803cbf3 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -19,337 +19,5 @@ # Contributing -In this document, you will find some guidelines on contributing to Apache Iceberg. Please keep in mind that none of -these are hard rules and they're meant as a collection of helpful suggestions to make contributing as seamless of an -experience as possible. - -If you are thinking of contributing but first would like to discuss the change you wish to make, we welcome you to -head over to the [Community](https://iceberg.apache.org/community/) page on the official Iceberg documentation site -to find a number of ways to connect with the community, including slack and our mailing lists. Of course, always feel -free to just open a [new issue](https://github.com/apache/iceberg/issues/new) in the GitHub repo. - -## Pull Request Process - -Pull requests are the preferred mechanism for contributing to Iceberg -* PRs are automatically labeled based on the content by our github-actions labeling action -* It's helpful to include a prefix in the summary that provides context to PR reviewers, such as `Build:`, `Docs:`, `Spark:`, `Flink:`, `Core:`, `API:` -* If a PR is related to an issue, adding `Closes #1234` in the PR description will automatically close the issue and helps keep the project clean -* If a PR is posted for visibility and isn't necessarily ready for review or merging, be sure to convert the PR to a draft - -## Building the Project Locally - -Please refer to the [Building](https://github.com/apache/iceberg#building) section of the main readme for instructions -on how to build iceberg locally. - -## Website and Documentation Updates - -The [Iceberg website](https://iceberg.apache.org/) and documentations are hosted in a different repository [iceberg-docs](https://github.com/apache/iceberg-docs). -Read the repository README for contribution guidelines for the website and documentation. - -## Semantic Versioning - -Apache Iceberg leverages [semantic versioning](https://semver.org/#semantic-versioning-200) to ensure compatibility -for developers and users of the iceberg libraries as APIs and implementations evolve. The requirements and -guarantees provided depend on the subproject as described below: - -### Major Version Deprecations Required - -__Modules__ -`iceberg-api` - -The API subproject is the main interface for developers and users of the Iceberg API and therefore has the strongest -guarantees. Evolution of the interfaces in this subproject are enforced by [Revapi](https://revapi.org/) and require -explicit acknowledgement of API changes. - -All public interfaces and classes require one major version for deprecation cycle. Any backward incompatible changes -should be annotated as `@Deprecated` and removed for the next major release. Backward compatible changes are allowed -within major versions. - -### Minor Version Deprecations Required - -__Modules__ -`iceberg-common` -`iceberg-core` -`iceberg-data` -`iceberg-orc` -`iceberg-parquet` - -Changes to public interfaces and classes in the subprojects listed above require a deprecation cycle of one minor -release. These projects contain common and internal code used by other projects and can evolve within a major release. -Minor release deprecation will provide other subprojects and external projects notice and opportunity to transition -to new implementations. - -### Minor Version Deprecations Discretionary - -__modules__ (All modules not referenced above) - -Other modules are less likely to be extended directly and modifications should make a good faith effort to follow a -minor version deprecation cycle. If there are significant structural or design changes that result in deprecations -being difficult to orchestrate, it is up to the committers to decide if deprecation is necessary. - -## Deprecation Notices - -All interfaces, classes, and methods targeted for deprecation must include the following: - -1. `@Deprecated` annotation on the appropriate element -2. `@depreceted` javadoc comment including: the version for removal, the appropriate alternative for usage -3. Replacement of existing code paths that use the deprecated behavior - -Example: - -```java - /** - * Set the sequence number for this manifest entry. - * - * @param sequenceNumber a sequence number - * @deprecated since 1.0.0, will be removed in 1.1.0; use dataSequenceNumber() instead. - */ - @Deprecated - void sequenceNumber(long sequenceNumber); -``` - -## Adding new functionality without breaking APIs -Ideally, we'd want to add new functionality without breaking existing APIs, especially within the scope of the API modules that are being checked by [Revapi](https://revapi.org/). - -Let's assume we'd want to add a `createBranch(String name)` method to the `ManageSnapshots` API. - -The most straight-forward way would be to add the below code: - -```java -public interface ManageSnapshots extends PendingUpdate { - // existing code... - - // adding this method introduces an API-breaking change - ManageSnapshots createBranch(String name); -} -``` - -And then add the implementation: -```java -public class SnapshotManager implements ManageSnapshots { - // existing code... - - @Override - public ManageSnapshots createBranch(String name, long snapshotId) { - updateSnapshotReferencesOperation().createBranch(name, snapshotId); - return this; - } -} -``` - -### Checking for API breakages -Running `./gradlew revapi` will flag this as an API-breaking change: -``` -./gradlew revapi -> Task :iceberg-api:revapi FAILED -> Task :iceberg-api:showDeprecationRulesOnRevApiFailure FAILED - -1: Task failed with an exception. ------------ -* What went wrong: -Execution failed for task ':iceberg-api:revapi'. -> There were Java public API/ABI breaks reported by revapi: - - java.method.addedToInterface: Method was added to an interface. - - old: - new: method org.apache.iceberg.ManageSnapshots org.apache.iceberg.ManageSnapshots::createBranch(java.lang.String) - - SOURCE: BREAKING, BINARY: NON_BREAKING, SEMANTIC: POTENTIALLY_BREAKING - - From old archive: - From new archive: iceberg-api-1.4.0-SNAPSHOT.jar - - If this is an acceptable break that will not harm your users, you can ignore it in future runs like so for: - - * Just this break: - ./gradlew :iceberg-api:revapiAcceptBreak --justification "{why this break is ok}" \ - --code "java.method.addedToInterface" \ - --new "method org.apache.iceberg.ManageSnapshots org.apache.iceberg.ManageSnapshots::createBranch(java.lang.String)" - * All breaks in this project: - ./gradlew :iceberg-api:revapiAcceptAllBreaks --justification "{why this break is ok}" - * All breaks in all projects: - ./gradlew revapiAcceptAllBreaks --justification "{why this break is ok}" - ---------------------------------------------------------------------------------------------------- - -``` - -### Adding a default implementation -In order to avoid breaking the API, we can add a default implementation that throws an `UnsupportedOperationException`: -```java -public interface ManageSnapshots extends PendingUpdate { - // existing code... - - // introduces new code without breaking the API - default ManageSnapshots createBranch(String name) { - throw new UnsupportedOperationException(this.getClass().getName() + " doesn't implement createBranch(String)"); - } -} -``` - -## Style - -For Java styling, check out the section -[Setting up IDE and Code Style](https://iceberg.apache.org/contribute/#setting-up-ide-and-code-style) from the -documentation site. - -### Java style guidelines - -#### Line breaks - -Continuation indents are 2 indents (4 spaces) from the start of the previous line. - -Try to break long lines at the same semantic level to make code more readable. -* Don't use the same level of indentation for arguments to different methods -* Don't use the same level of indentation for arguments and chained methods - -```java - // BAD: hard to see arguments passed to the same method - doSomething(new ArgumentClass(1, - 2), - 3); - - // GOOD: break lines at the same semantic level - doSomething( - new ArgumentClass(1, 2), - 3); - - // BAD: arguments and chained methods mixed - SomeObject myNewObject = SomeObject.builder(schema, partitionSpec - sortOrder) - .withProperty("x", "1") - .build() - - // GOOD: method calls at the same level, arguments indented - SomeObject myNewObject = SomeObject - .builder(schema, partitionSpec, - sortOrder) - .withProperty("x", "1") - .build() -``` - -#### Method naming - -1. Make method names as short as possible, while being clear. Omit needless words. -2. Avoid `get` in method names, unless an object must be a Java bean. - * In most cases, replace `get` with a more specific verb that describes what is happening in the method, like `find` or `fetch`. - * If there isn't a more specific verb or the method is a getter, omit `get` because it isn't helpful to readers and makes method names longer. -3. Where possible, use words and conjugations that form correct sentences in English when read - * For example, `Transform.preservesOrder()` reads correctly in an if statement: `if (transform.preservesOrder()) { ... }` - -#### Boolean arguments - -Avoid boolean arguments to methods that are not `private` to avoid confusing invocations like `sendMessage(false)`. It is better to create two methods with names and behavior, even if both are implemented by one internal method. - -```java - // prefer exposing suppressFailure in method names - public void sendMessageIgnoreFailure() { - sendMessageInternal(true); - } - - public void sendMessage() { - sendMessageInternal(false); - } - - private void sendMessageInternal(boolean suppressFailure) { - ... - } -``` - -When passing boolean arguments to existing or external methods, use inline comments to help the reader understand actions without an IDE. - -```java - // BAD: it is not clear what false controls - dropTable(identifier, false); - - // GOOD: these uses of dropTable are clear to the reader - dropTable(identifier, true /* purge data */); - dropTable(identifier, purge); -``` - -#### Config naming - -1. Use `-` to link words in one concept - * For example, preferred convection `access-key-id` rather than `access.key.id` -2. Use `.` to create a hierarchy of config groups - * For example, `s3` in `s3.access-key-id`, `s3.secret-access-key` - -## Testing - -### AssertJ - -Prefer using [AssertJ](https://github.com/assertj/assertj) assertions as those provide a rich and intuitive set of strongly-typed assertions. -Checks can be expressed in a fluent way and [AssertJ](https://github.com/assertj/assertj) provides rich context when assertions fail. -Additionally, [AssertJ](https://github.com/assertj/assertj) has powerful testing capabilities on collections and exceptions. -Please refer to the [usage guide](https://assertj.github.io/doc/#assertj-core-assertions-guide) for additional examples. - -```java -// bad: will only say true != false when check fails -assertTrue(x instanceof Xyz); - -// better: will show type of x when check fails -assertThat(x).isInstanceOf(Xyz.class); - -// bad: will only say true != false when check fails -assertTrue(catalog.listNamespaces().containsAll(expected)); - -// better: will show content of expected and of catalog.listNamespaces() if check fails -assertThat(catalog.listNamespaces()).containsAll(expected); -``` -```java -// ok -assertNotNull(metadataFileLocations); -assertEquals(metadataFileLocations.size(), 4); - -// better: will show the content of metadataFileLocations if check fails -assertThat(metadataFileLocations).isNotNull().hasSize(4); - -// or -assertThat(metadataFileLocations).isNotNull().hasSameSizeAs(expected).hasSize(4); -``` - -```java -// bad -try { - catalog.createNamespace(deniedNamespace); - Assert.fail("this should fail"); -} catch (Exception e) { - assertEquals(AccessDeniedException.class, e.getClass()); - assertEquals("User 'testUser' has no permission to create namespace", e.getMessage()); -} - -// better -assertThatThrownBy(() -> catalog.createNamespace(deniedNamespace)) - .isInstanceOf(AccessDeniedException.class) - .hasMessage("User 'testUser' has no permission to create namespace"); -``` -Checks on exceptions should always make sure to assert that a particular exception message has occurred. - - -### Awaitility - -Avoid using `Thread.sleep()` in tests as it leads to long test durations and flaky behavior if a condition takes slightly longer than expected. - -```java -deleteTablesAsync(); -Thread.sleep(3000L); -assertThat(tables()).isEmpty(); -``` - -A better alternative is using [Awaitility](https://github.com/awaitility/awaitility) to make sure `tables()` are eventually empty. The below example will run the check -with a default polling interval of **100 millis**: - -```java -deleteTablesAsync(); -Awaitility.await("Tables were not deleted") - .atMost(5, TimeUnit.SECONDS) - .untilAsserted(() -> assertThat(tables()).isEmpty()); -``` - -Please refer to the [usage guide](https://github.com/awaitility/awaitility/wiki/Usage) of [Awaitility](https://github.com/awaitility/awaitility) for more usage examples. - - -### JUnit4 / JUnit5 - -Iceberg currently uses a mix of JUnit4 (`org.junit` imports) and JUnit5 (`org.junit.jupiter.api` imports) tests. To allow an easier migration to JUnit5 in the future, new test classes -that are being added to the codebase should be written purely in JUnit5 where possible. +Please refer to the [contributing](https://iceberg.apache.org/contribute/) section for instructions +on how to contribute to Iceberg. diff --git a/site/docs/contribute.md b/site/docs/contribute.md index fdc1ef46d8..e7fe35d14c 100644 --- a/site/docs/contribute.md +++ b/site/docs/contribute.md @@ -96,6 +96,7 @@ The API subproject is the main interface for developers and users of the Iceberg guarantees. Evolution of the interfaces in this subproject are enforced by [Revapi](https://revapi.org/) and require explicit acknowledgement of API changes. + All public interfaces and classes require one major version for deprecation cycle. Any backward incompatible changes should be annotated as `@Deprecated` and removed for the next major release. Backward compatible changes are allowed within major versions. @@ -111,6 +112,7 @@ __Modules__ Changes to public interfaces and classes in the subprojects listed above require a deprecation cycle of one minor release. + These projects contain common and internal code used by other projects and can evolve within a major release. Minor release deprecation will provide other subprojects and external projects notice and opportunity to transition to new implementations. @@ -121,6 +123,7 @@ __modules__ (All modules not referenced above) Other modules are less likely to be extended directly and modifications should make a good faith effort to follow a minor version deprecation cycle. + If there are significant structural or design changes that result in deprecations being difficult to orchestrate, it is up to the committers to decide if deprecation is necessary. @@ -145,6 +148,89 @@ Example: void sequenceNumber(long sequenceNumber); ``` +## Adding new functionality without breaking APIs +When adding new functionality, make sure to avoid breaking existing APIs, especially within the scope of the API modules that are being checked by [Revapi](https://revapi.org/). + +Assume adding a `createBranch(String name)` method to the `ManageSnapshots` API. + +The most straight-forward way would be to add the below code: + +```java +public interface ManageSnapshots extends PendingUpdate { + // existing code... + + // adding this method introduces an API-breaking change + ManageSnapshots createBranch(String name); +} +``` + +And then add the implementation: + +```java +public class SnapshotManager implements ManageSnapshots { + // existing code... + + @Override + public ManageSnapshots createBranch(String name, long snapshotId) { + updateSnapshotReferencesOperation().createBranch(name, snapshotId); + return this; + } +} +``` + +### Checking for API breakages + +Running `./gradlew revapi` will flag this as an API-breaking change: + +``` +./gradlew revapi +> Task :iceberg-api:revapi FAILED +> Task :iceberg-api:showDeprecationRulesOnRevApiFailure FAILED + +1: Task failed with an exception. +----------- +* What went wrong: +Execution failed for task ':iceberg-api:revapi'. +> There were Java public API/ABI breaks reported by revapi: + + java.method.addedToInterface: Method was added to an interface. + + old: + new: method org.apache.iceberg.ManageSnapshots org.apache.iceberg.ManageSnapshots::createBranch(java.lang.String) + + SOURCE: BREAKING, BINARY: NON_BREAKING, SEMANTIC: POTENTIALLY_BREAKING + + From old archive: + From new archive: iceberg-api-1.4.0-SNAPSHOT.jar + + If this is an acceptable break that will not harm your users, you can ignore it in future runs like so for: + + * Just this break: + ./gradlew :iceberg-api:revapiAcceptBreak --justification "{why this break is ok}" \ + --code "java.method.addedToInterface" \ + --new "method org.apache.iceberg.ManageSnapshots org.apache.iceberg.ManageSnapshots::createBranch(java.lang.String)" + * All breaks in this project: + ./gradlew :iceberg-api:revapiAcceptAllBreaks --justification "{why this break is ok}" + * All breaks in all projects: + ./gradlew revapiAcceptAllBreaks --justification "{why this break is ok}" + ---------------------------------------------------------------------------------------------------- + +``` + +### Adding a default implementation + +To avoid breaking the API, add a default implementation that throws an `UnsupportedOperationException`:` + +```java +public interface ManageSnapshots extends PendingUpdate { + // existing code... + + // introduces new code without breaking the API + default ManageSnapshots createBranch(String name) { + throw new UnsupportedOperationException(this.getClass().getName() + " doesn't implement createBranch(String)"); + } +} +``` ## Iceberg Code Contribution Guidelines @@ -217,10 +303,10 @@ adding a Copyright profile: 1. Make method names as short as possible, while being clear. Omit needless words. 2. Avoid `get` in method names, unless an object must be a Java bean. - * In most cases, replace `get` with a more specific verb that describes what is happening in the method, like `find` or `fetch`. - * If there isn't a more specific verb or the method is a getter, omit `get` because it isn't helpful to readers and makes method names longer. + * In most cases, replace `get` with a more specific verb that describes what is happening in the method, like `find` or `fetch`. + * If there isn't a more specific verb or the method is a getter, omit `get` because it isn't helpful to readers and makes method names longer. 3. Where possible, use words and conjugations that form correct sentences in English when read - * For example, `Transform.preservesOrder()` reads correctly in an if statement: `if (transform.preservesOrder()) { ... }` + * For example, `Transform.preservesOrder()` reads correctly in an if statement: `if (transform.preservesOrder()) { ... }` #### Boolean arguments @@ -292,6 +378,18 @@ assertThat(metadataFileLocations).isNotNull().hasSize(4); // or assertThat(metadataFileLocations).isNotNull().hasSameSizeAs(expected).hasSize(4); ``` +```java +// if any key doesn't exist, it won't show the content of the map +assertThat(map.get("key1")).isEqualTo("value1"); +assertThat(map.get("key2")).isNotNull(); +assertThat(map.get("key3")).startsWith("3.5"); + +// better: all checks can be combined and the content of the map will be shown if any check fails +assertThat(map) + .containsEntry("key1", "value1") + .containsKey("key2") + .hasEntrySatisfying("key3", v -> assertThat(v).startsWith("3.5")); +``` ```java // bad @@ -346,50 +444,3 @@ no "push a single button to get a performance comparison" solution available, th post the results on the PR. See [Benchmarks](benchmarks.md) for a summary of available benchmarks and how to run them. - -## Website and Documentation Updates - -Currently, there is an [iceberg-docs](https://github.com/apache/iceberg-docs) repository -which contains the HTML/CSS and other files needed for the [Iceberg website](https://iceberg.apache.org/). -The [docs folder](https://github.com/apache/iceberg/tree/master/docs) in the Iceberg repository contains -the markdown content for the documentation site. All markdown changes should still be made -to this repository. - -### Submitting Pull Requests - -Changes to the markdown contents should be submitted directly to this repository. - -Changes to the website appearance (e.g. HTML, CSS changes) should be submitted to the [iceberg-docs repository](https://github.com/apache/iceberg-docs) against the `main` branch. - -Changes to the documentation of old Iceberg versions should be submitted to the [iceberg-docs repository](https://github.com/apache/iceberg-docs) against the specific version branch. - -### Reporting Issues - -All issues related to the doc website should still be submitted to the [Iceberg repository](https://github.com/apache/iceberg). -The GitHub Issues feature of the [iceberg-docs repository](https://github.com/apache/iceberg-docs) is disabled. - -### Running Locally - -Clone the [iceberg-docs](https://github.com/apache/iceberg-docs) repository to run the website locally: -```shell -git clone git@github.com:apache/iceberg-docs.git -cd iceberg-docs -``` - -To start the landing page site locally, run: -```shell -cd landing-page && hugo serve -``` - -To start the documentation site locally, run: -```shell -cd docs && hugo serve -``` - -If you would like to see how the latest website looks based on the documentation in the Iceberg repository, you can copy docs to the iceberg-docs repository by: -```shell -rm -rf docs/content/docs -rm -rf landing-page/content/common -cp -r /docs/versioned docs/content/docs -cp -r /docs/common landing-page/content/common -```