-
Notifications
You must be signed in to change notification settings - Fork 331
SAMZA-2124: Add Beam API doc to the website #948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dxichen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, thanks for the docs!
|
|
||
| {% endhighlight %} | ||
|
|
||
| To run this Beam program with Samza, you can simply provides "--runner=SamzaRunner" as a program argument. You can follow our [quick start](/startup/quick-start/{{site.version}}/beam.html) to set up your project and run different examples. For more details on writing the Beam program, please refer the comprehensive [Beam programming guide](https://beam.apache.org/documentation/programming-guide/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/provides/provide
s/refer the comprehensive/refer to the..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
| ``` | ||
| $ deploy/examples/bin/run-beam-standalone.sh org.apache.beam.examples.WordCount \ | ||
| --configFilePath=$PWD/deploy/examples/config/standalone.properties \ | ||
| --inputFile=/Users/xiliu/opensource/samza-beam-examples/pom.xml --output=word-counts.txt \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove username, I have a patch for these docs here apache/samza-beam-examples#1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to KafkaWordCount, to avoid the batch problems we have.
| ``` | ||
| $ deploy/examples/bin/run-beam-yarn.sh org.apache.beam.examples.WordCount \ | ||
| --configFilePath=$PWD/deploy/examples/config/yarn.properties \ | ||
| --inputFile=/Users/xiliu/opensource/samza-beam-examples/pom.xml \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove username
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by switching to kafka.
|
|
||
| #### Samza SQL API examples | ||
| You can easily create a Samza job declaratively using | ||
| [Samza SQL](https://samza.apache.org/learn/tutorials/0.14/samza-sql.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change version to latest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
|
|
||
| ### Apache Beam - A Samza’s Perspective | ||
|
|
||
| The goal of Samza is to provide large-scale streaming processing capabilities with first-class state support. This does not contradict with Beam. In fact, while Samza lays out a solid foundation for large-scale stateful stream processing, Beam adds the cutting-edge stream processing API and model on top of it. The Beam API and model allows further optimization in the Samza platform, including multi-stage distributed computation and parallel processing on the per-key basis. The performance enhancements from these optimizations will benefit both Samza and its users. Samza can also further improve Beam model by providing various use cases. Adopting Beam provides a solid understanding of the latest data processing technology, and we believe Samza will benefit from it. No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Adopting Beam provides a solid understanding of the latest data processing technology/ Beam provides cutting-edge data processing capabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
|
||
| ### Introduction | ||
|
|
||
| Apache Beam brings an easy-to-use, but powerful API and model for state-of-art stream and batch data processing with portability across a variety of languages. The Beam API and model has the following characteristics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor:
s/but powerful API/ powerful API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems better to keep the but. I removed "," to improve readability.
|
|
||
| - *Simple constructs, powerful semantics*: the whole beam API can be simply described by a `Pipeline` object, which captures all your data processing steps from input to output. Beam SDK supports over [20 data IOs](https://beam.apache.org/documentation/io/built-in/), and data transformations from simple [Map](https://beam.apache.org/releases/javadoc/2.11.0/org/apache/beam/sdk/transforms/MapElements.html) to complex [Combines and Joins](https://beam.apache.org/releases/javadoc/2.11.0/index.html?org/apache/beam/sdk/transforms/Combine.html). | ||
|
|
||
| - *Strong consistency via event-time*: Beam provides advanced [event-time support](https://beam.apache.org/documentation/programming-guide/#watermarks-and-late-data) so you can perform windowing and aggregations based on when the events happen, instead of when they are consumed. The event-time mechanism improves the accuracy of processing results, and has repeatability when reprocessing the same data set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor:
- s/instead of when they are consumed/instead of arrival time?
- s/and has repeatability/and guarantees repeatability in results/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
|
||
| 1. Download and install [Apache Maven](http://maven.apache.org/download.cgi) by following Maven’s [installation guide](http://maven.apache.org/install.html) for your specific operating system. | ||
|
|
||
| 1. A script named "grid" is included in this project which allows you to easily download and install Zookeeper, Kafka, and Yarn. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all the individual line-items in SetUp(Install JDK, install maven, install grid) are numbered with 1. May be it would better to provide them right ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch. install grid shouldn't be marked a 1. I fixed in the update.
|
LGTM, thanks! |
shanthoosh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
* SAMZA-2124: Add Beam API doc to the website * Address pr feedback
* SAMZA-2124: Add Beam API doc to the website * Address pr feedback
Add beam quick start, examples and api docs.