You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
flink/flink-contrib/flink-connector-wikiedits
jingge fff221fe5d Update version to 1.19-SNAPSHOT 1 year ago
..
src [FLINK-32670][core] Cascade deprecation to classes that implement SourceFunction 1 year ago
README.md [contrib, connector-wikiedits] Add WikipediaEditsSource 9 years ago
pom.xml Update version to 1.19-SNAPSHOT 1 year ago

README.md

flink-connector-wikiedits

A non-parallel source that parses a live stream of Wikipedia edits.

Meta data about the edits is mirrored to the IRC channel #en.wikipedia. The source establishes a connection to this IRC channel and parses the messages into WikipediaEditEvent instances.

The purpose of this source is to ease the setup of demos of the DataStream API with live data.

The original idea is from the Hello Samza project of Apache Samza. The Samza code for this is located in the samza-hello-samza repository.

Example

Add the following dependency to your project:

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-connector-wikiedits</artifactId>
  <version>1.0-SNAPSHOT</version>
</dependency>

You can use the source like regular sources:

StreamExecutionEnvironment env = StreamExecutionEnvironment
    .getExecutionEnvironment();

DataStream<WikipediaEditEvent> edits = env
    .addSource(new WikipediaEditsSource());

Remember that it is non-parallel source and as such it will run with parallelism 1.