Docs: Sync contributing page / refer to website for contributing (#9776)

pull/9815/head
Eduard Tudenhoefner 11 months ago committed by GitHub
parent c649923570
commit f3817e1281
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -19,337 +19,5 @@
# Contributing
In this document, you will find some guidelines on contributing to Apache Iceberg. Please keep in mind that none of
these are hard rules and they're meant as a collection of helpful suggestions to make contributing as seamless of an
experience as possible.
If you are thinking of contributing but first would like to discuss the change you wish to make, we welcome you to
head over to the [Community](https://iceberg.apache.org/community/) page on the official Iceberg documentation site
to find a number of ways to connect with the community, including slack and our mailing lists. Of course, always feel
free to just open a [new issue](https://github.com/apache/iceberg/issues/new) in the GitHub repo.
## Pull Request Process
Pull requests are the preferred mechanism for contributing to Iceberg
* PRs are automatically labeled based on the content by our github-actions labeling action
* It's helpful to include a prefix in the summary that provides context to PR reviewers, such as `Build:`, `Docs:`, `Spark:`, `Flink:`, `Core:`, `API:`
* If a PR is related to an issue, adding `Closes #1234` in the PR description will automatically close the issue and helps keep the project clean
* If a PR is posted for visibility and isn't necessarily ready for review or merging, be sure to convert the PR to a draft
## Building the Project Locally
Please refer to the [Building](https://github.com/apache/iceberg#building) section of the main readme for instructions
on how to build iceberg locally.
## Website and Documentation Updates
The [Iceberg website](https://iceberg.apache.org/) and documentations are hosted in a different repository [iceberg-docs](https://github.com/apache/iceberg-docs).
Read the repository README for contribution guidelines for the website and documentation.
## Semantic Versioning
Apache Iceberg leverages [semantic versioning](https://semver.org/#semantic-versioning-200) to ensure compatibility
for developers and users of the iceberg libraries as APIs and implementations evolve. The requirements and
guarantees provided depend on the subproject as described below:
### Major Version Deprecations Required
__Modules__
`iceberg-api`
The API subproject is the main interface for developers and users of the Iceberg API and therefore has the strongest
guarantees. Evolution of the interfaces in this subproject are enforced by [Revapi](https://revapi.org/) and require
explicit acknowledgement of API changes.
All public interfaces and classes require one major version for deprecation cycle. Any backward incompatible changes
should be annotated as `@Deprecated` and removed for the next major release. Backward compatible changes are allowed
within major versions.
### Minor Version Deprecations Required
__Modules__
`iceberg-common`
`iceberg-core`
`iceberg-data`
`iceberg-orc`
`iceberg-parquet`
Changes to public interfaces and classes in the subprojects listed above require a deprecation cycle of one minor
release. These projects contain common and internal code used by other projects and can evolve within a major release.
Minor release deprecation will provide other subprojects and external projects notice and opportunity to transition
to new implementations.
### Minor Version Deprecations Discretionary
__modules__ (All modules not referenced above)
Other modules are less likely to be extended directly and modifications should make a good faith effort to follow a
minor version deprecation cycle. If there are significant structural or design changes that result in deprecations
being difficult to orchestrate, it is up to the committers to decide if deprecation is necessary.
## Deprecation Notices
All interfaces, classes, and methods targeted for deprecation must include the following:
1. `@Deprecated` annotation on the appropriate element
2. `@depreceted` javadoc comment including: the version for removal, the appropriate alternative for usage
3. Replacement of existing code paths that use the deprecated behavior
Example:
```java
/**
* Set the sequence number for this manifest entry.
*
* @param sequenceNumber a sequence number
* @deprecated since 1.0.0, will be removed in 1.1.0; use dataSequenceNumber() instead.
*/
@Deprecated
void sequenceNumber(long sequenceNumber);
```
## Adding new functionality without breaking APIs
Ideally, we'd want to add new functionality without breaking existing APIs, especially within the scope of the API modules that are being checked by [Revapi](https://revapi.org/).
Let's assume we'd want to add a `createBranch(String name)` method to the `ManageSnapshots` API.
The most straight-forward way would be to add the below code:
```java
public interface ManageSnapshots extends PendingUpdate<Snapshot> {
// existing code...
// adding this method introduces an API-breaking change
ManageSnapshots createBranch(String name);
}
```
And then add the implementation:
```java
public class SnapshotManager implements ManageSnapshots {
// existing code...
@Override
public ManageSnapshots createBranch(String name, long snapshotId) {
updateSnapshotReferencesOperation().createBranch(name, snapshotId);
return this;
}
}
```
### Checking for API breakages
Running `./gradlew revapi` will flag this as an API-breaking change:
```
./gradlew revapi
> Task :iceberg-api:revapi FAILED
> Task :iceberg-api:showDeprecationRulesOnRevApiFailure FAILED
1: Task failed with an exception.
-----------
* What went wrong:
Execution failed for task ':iceberg-api:revapi'.
> There were Java public API/ABI breaks reported by revapi:
java.method.addedToInterface: Method was added to an interface.
old: <none>
new: method org.apache.iceberg.ManageSnapshots org.apache.iceberg.ManageSnapshots::createBranch(java.lang.String)
SOURCE: BREAKING, BINARY: NON_BREAKING, SEMANTIC: POTENTIALLY_BREAKING
From old archive: <none>
From new archive: iceberg-api-1.4.0-SNAPSHOT.jar
If this is an acceptable break that will not harm your users, you can ignore it in future runs like so for:
* Just this break:
./gradlew :iceberg-api:revapiAcceptBreak --justification "{why this break is ok}" \
--code "java.method.addedToInterface" \
--new "method org.apache.iceberg.ManageSnapshots org.apache.iceberg.ManageSnapshots::createBranch(java.lang.String)"
* All breaks in this project:
./gradlew :iceberg-api:revapiAcceptAllBreaks --justification "{why this break is ok}"
* All breaks in all projects:
./gradlew revapiAcceptAllBreaks --justification "{why this break is ok}"
----------------------------------------------------------------------------------------------------
```
### Adding a default implementation
In order to avoid breaking the API, we can add a default implementation that throws an `UnsupportedOperationException`:
```java
public interface ManageSnapshots extends PendingUpdate<Snapshot> {
// existing code...
// introduces new code without breaking the API
default ManageSnapshots createBranch(String name) {
throw new UnsupportedOperationException(this.getClass().getName() + " doesn't implement createBranch(String)");
}
}
```
## Style
For Java styling, check out the section
[Setting up IDE and Code Style](https://iceberg.apache.org/contribute/#setting-up-ide-and-code-style) from the
documentation site.
### Java style guidelines
#### Line breaks
Continuation indents are 2 indents (4 spaces) from the start of the previous line.
Try to break long lines at the same semantic level to make code more readable.
* Don't use the same level of indentation for arguments to different methods
* Don't use the same level of indentation for arguments and chained methods
```java
// BAD: hard to see arguments passed to the same method
doSomething(new ArgumentClass(1,
2),
3);
// GOOD: break lines at the same semantic level
doSomething(
new ArgumentClass(1, 2),
3);
// BAD: arguments and chained methods mixed
SomeObject myNewObject = SomeObject.builder(schema, partitionSpec
sortOrder)
.withProperty("x", "1")
.build()
// GOOD: method calls at the same level, arguments indented
SomeObject myNewObject = SomeObject
.builder(schema, partitionSpec,
sortOrder)
.withProperty("x", "1")
.build()
```
#### Method naming
1. Make method names as short as possible, while being clear. Omit needless words.
2. Avoid `get` in method names, unless an object must be a Java bean.
* In most cases, replace `get` with a more specific verb that describes what is happening in the method, like `find` or `fetch`.
* If there isn't a more specific verb or the method is a getter, omit `get` because it isn't helpful to readers and makes method names longer.
3. Where possible, use words and conjugations that form correct sentences in English when read
* For example, `Transform.preservesOrder()` reads correctly in an if statement: `if (transform.preservesOrder()) { ... }`
#### Boolean arguments
Avoid boolean arguments to methods that are not `private` to avoid confusing invocations like `sendMessage(false)`. It is better to create two methods with names and behavior, even if both are implemented by one internal method.
```java
// prefer exposing suppressFailure in method names
public void sendMessageIgnoreFailure() {
sendMessageInternal(true);
}
public void sendMessage() {
sendMessageInternal(false);
}
private void sendMessageInternal(boolean suppressFailure) {
...
}
```
When passing boolean arguments to existing or external methods, use inline comments to help the reader understand actions without an IDE.
```java
// BAD: it is not clear what false controls
dropTable(identifier, false);
// GOOD: these uses of dropTable are clear to the reader
dropTable(identifier, true /* purge data */);
dropTable(identifier, purge);
```
#### Config naming
1. Use `-` to link words in one concept
* For example, preferred convection `access-key-id` rather than `access.key.id`
2. Use `.` to create a hierarchy of config groups
* For example, `s3` in `s3.access-key-id`, `s3.secret-access-key`
## Testing
### AssertJ
Prefer using [AssertJ](https://github.com/assertj/assertj) assertions as those provide a rich and intuitive set of strongly-typed assertions.
Checks can be expressed in a fluent way and [AssertJ](https://github.com/assertj/assertj) provides rich context when assertions fail.
Additionally, [AssertJ](https://github.com/assertj/assertj) has powerful testing capabilities on collections and exceptions.
Please refer to the [usage guide](https://assertj.github.io/doc/#assertj-core-assertions-guide) for additional examples.
```java
// bad: will only say true != false when check fails
assertTrue(x instanceof Xyz);
// better: will show type of x when check fails
assertThat(x).isInstanceOf(Xyz.class);
// bad: will only say true != false when check fails
assertTrue(catalog.listNamespaces().containsAll(expected));
// better: will show content of expected and of catalog.listNamespaces() if check fails
assertThat(catalog.listNamespaces()).containsAll(expected);
```
```java
// ok
assertNotNull(metadataFileLocations);
assertEquals(metadataFileLocations.size(), 4);
// better: will show the content of metadataFileLocations if check fails
assertThat(metadataFileLocations).isNotNull().hasSize(4);
// or
assertThat(metadataFileLocations).isNotNull().hasSameSizeAs(expected).hasSize(4);
```
```java
// bad
try {
catalog.createNamespace(deniedNamespace);
Assert.fail("this should fail");
} catch (Exception e) {
assertEquals(AccessDeniedException.class, e.getClass());
assertEquals("User 'testUser' has no permission to create namespace", e.getMessage());
}
// better
assertThatThrownBy(() -> catalog.createNamespace(deniedNamespace))
.isInstanceOf(AccessDeniedException.class)
.hasMessage("User 'testUser' has no permission to create namespace");
```
Checks on exceptions should always make sure to assert that a particular exception message has occurred.
### Awaitility
Avoid using `Thread.sleep()` in tests as it leads to long test durations and flaky behavior if a condition takes slightly longer than expected.
```java
deleteTablesAsync();
Thread.sleep(3000L);
assertThat(tables()).isEmpty();
```
A better alternative is using [Awaitility](https://github.com/awaitility/awaitility) to make sure `tables()` are eventually empty. The below example will run the check
with a default polling interval of **100 millis**:
```java
deleteTablesAsync();
Awaitility.await("Tables were not deleted")
.atMost(5, TimeUnit.SECONDS)
.untilAsserted(() -> assertThat(tables()).isEmpty());
```
Please refer to the [usage guide](https://github.com/awaitility/awaitility/wiki/Usage) of [Awaitility](https://github.com/awaitility/awaitility) for more usage examples.
### JUnit4 / JUnit5
Iceberg currently uses a mix of JUnit4 (`org.junit` imports) and JUnit5 (`org.junit.jupiter.api` imports) tests. To allow an easier migration to JUnit5 in the future, new test classes
that are being added to the codebase should be written purely in JUnit5 where possible.
Please refer to the [contributing](https://iceberg.apache.org/contribute/) section for instructions
on how to contribute to Iceberg.

@ -96,6 +96,7 @@ The API subproject is the main interface for developers and users of the Iceberg
guarantees.
Evolution of the interfaces in this subproject are enforced by [Revapi](https://revapi.org/) and require
explicit acknowledgement of API changes.
All public interfaces and classes require one major version for deprecation cycle.
Any backward incompatible changes should be annotated as `@Deprecated` and removed for the next major release.
Backward compatible changes are allowed within major versions.
@ -111,6 +112,7 @@ __Modules__
Changes to public interfaces and classes in the subprojects listed above require a deprecation cycle of one minor
release.
These projects contain common and internal code used by other projects and can evolve within a major release.
Minor release deprecation will provide other subprojects and external projects notice and opportunity to transition
to new implementations.
@ -121,6 +123,7 @@ __modules__ (All modules not referenced above)
Other modules are less likely to be extended directly and modifications should make a good faith effort to follow a
minor version deprecation cycle.
If there are significant structural or design changes that result in deprecations
being difficult to orchestrate, it is up to the committers to decide if deprecation is necessary.
@ -145,6 +148,89 @@ Example:
void sequenceNumber(long sequenceNumber);
```
## Adding new functionality without breaking APIs
When adding new functionality, make sure to avoid breaking existing APIs, especially within the scope of the API modules that are being checked by [Revapi](https://revapi.org/).
Assume adding a `createBranch(String name)` method to the `ManageSnapshots` API.
The most straight-forward way would be to add the below code:
```java
public interface ManageSnapshots extends PendingUpdate<Snapshot> {
// existing code...
// adding this method introduces an API-breaking change
ManageSnapshots createBranch(String name);
}
```
And then add the implementation:
```java
public class SnapshotManager implements ManageSnapshots {
// existing code...
@Override
public ManageSnapshots createBranch(String name, long snapshotId) {
updateSnapshotReferencesOperation().createBranch(name, snapshotId);
return this;
}
}
```
### Checking for API breakages
Running `./gradlew revapi` will flag this as an API-breaking change:
```
./gradlew revapi
> Task :iceberg-api:revapi FAILED
> Task :iceberg-api:showDeprecationRulesOnRevApiFailure FAILED
1: Task failed with an exception.
-----------
* What went wrong:
Execution failed for task ':iceberg-api:revapi'.
> There were Java public API/ABI breaks reported by revapi:
java.method.addedToInterface: Method was added to an interface.
old: <none>
new: method org.apache.iceberg.ManageSnapshots org.apache.iceberg.ManageSnapshots::createBranch(java.lang.String)
SOURCE: BREAKING, BINARY: NON_BREAKING, SEMANTIC: POTENTIALLY_BREAKING
From old archive: <none>
From new archive: iceberg-api-1.4.0-SNAPSHOT.jar
If this is an acceptable break that will not harm your users, you can ignore it in future runs like so for:
* Just this break:
./gradlew :iceberg-api:revapiAcceptBreak --justification "{why this break is ok}" \
--code "java.method.addedToInterface" \
--new "method org.apache.iceberg.ManageSnapshots org.apache.iceberg.ManageSnapshots::createBranch(java.lang.String)"
* All breaks in this project:
./gradlew :iceberg-api:revapiAcceptAllBreaks --justification "{why this break is ok}"
* All breaks in all projects:
./gradlew revapiAcceptAllBreaks --justification "{why this break is ok}"
----------------------------------------------------------------------------------------------------
```
### Adding a default implementation
To avoid breaking the API, add a default implementation that throws an `UnsupportedOperationException`:`
```java
public interface ManageSnapshots extends PendingUpdate<Snapshot> {
// existing code...
// introduces new code without breaking the API
default ManageSnapshots createBranch(String name) {
throw new UnsupportedOperationException(this.getClass().getName() + " doesn't implement createBranch(String)");
}
}
```
## Iceberg Code Contribution Guidelines
@ -217,10 +303,10 @@ adding a Copyright profile:
1. Make method names as short as possible, while being clear. Omit needless words.
2. Avoid `get` in method names, unless an object must be a Java bean.
* In most cases, replace `get` with a more specific verb that describes what is happening in the method, like `find` or `fetch`.
* If there isn't a more specific verb or the method is a getter, omit `get` because it isn't helpful to readers and makes method names longer.
* In most cases, replace `get` with a more specific verb that describes what is happening in the method, like `find` or `fetch`.
* If there isn't a more specific verb or the method is a getter, omit `get` because it isn't helpful to readers and makes method names longer.
3. Where possible, use words and conjugations that form correct sentences in English when read
* For example, `Transform.preservesOrder()` reads correctly in an if statement: `if (transform.preservesOrder()) { ... }`
* For example, `Transform.preservesOrder()` reads correctly in an if statement: `if (transform.preservesOrder()) { ... }`
#### Boolean arguments
@ -292,6 +378,18 @@ assertThat(metadataFileLocations).isNotNull().hasSize(4);
// or
assertThat(metadataFileLocations).isNotNull().hasSameSizeAs(expected).hasSize(4);
```
```java
// if any key doesn't exist, it won't show the content of the map
assertThat(map.get("key1")).isEqualTo("value1");
assertThat(map.get("key2")).isNotNull();
assertThat(map.get("key3")).startsWith("3.5");
// better: all checks can be combined and the content of the map will be shown if any check fails
assertThat(map)
.containsEntry("key1", "value1")
.containsKey("key2")
.hasEntrySatisfying("key3", v -> assertThat(v).startsWith("3.5"));
```
```java
// bad
@ -346,50 +444,3 @@ no "push a single button to get a performance comparison" solution available, th
post the results on the PR.
See [Benchmarks](benchmarks.md) for a summary of available benchmarks and how to run them.
## Website and Documentation Updates
Currently, there is an [iceberg-docs](https://github.com/apache/iceberg-docs) repository
which contains the HTML/CSS and other files needed for the [Iceberg website](https://iceberg.apache.org/).
The [docs folder](https://github.com/apache/iceberg/tree/master/docs) in the Iceberg repository contains
the markdown content for the documentation site. All markdown changes should still be made
to this repository.
### Submitting Pull Requests
Changes to the markdown contents should be submitted directly to this repository.
Changes to the website appearance (e.g. HTML, CSS changes) should be submitted to the [iceberg-docs repository](https://github.com/apache/iceberg-docs) against the `main` branch.
Changes to the documentation of old Iceberg versions should be submitted to the [iceberg-docs repository](https://github.com/apache/iceberg-docs) against the specific version branch.
### Reporting Issues
All issues related to the doc website should still be submitted to the [Iceberg repository](https://github.com/apache/iceberg).
The GitHub Issues feature of the [iceberg-docs repository](https://github.com/apache/iceberg-docs) is disabled.
### Running Locally
Clone the [iceberg-docs](https://github.com/apache/iceberg-docs) repository to run the website locally:
```shell
git clone git@github.com:apache/iceberg-docs.git
cd iceberg-docs
```
To start the landing page site locally, run:
```shell
cd landing-page && hugo serve
```
To start the documentation site locally, run:
```shell
cd docs && hugo serve
```
If you would like to see how the latest website looks based on the documentation in the Iceberg repository, you can copy docs to the iceberg-docs repository by:
```shell
rm -rf docs/content/docs
rm -rf landing-page/content/common
cp -r <path to iceberg repo>/docs/versioned docs/content/docs
cp -r <path to iceberg repo>/docs/common landing-page/content/common
```

Loading…
Cancel
Save