Kinesis (Event Stream)
Last updated
Was this helpful?
Last updated
Was this helpful?
Kinesis is an Event stream that handles real-time data streaming.
Kinesis allows to ingest, process, and analyse real-time streaming data.
The maximum size for a single record in Amazon Kinesis Data Streams is 1 MB. The API has a per-request limit of 1 MB/s for writes to a single shard, or 1,000 records per second.
Kinesis Data Streams ensures ordered processing in shareds
AWS Kinesis effectively provides alternatives to Apache Kafka, for real-time data streaming, Apache Flink (Kinesis Data Analytics) for both handling stream processing and analytics.
The are two versions of Kinesis:
Purpose
ingestion of data
data transfer tool to get information to S3, Redshift, Elasticsearch, or Splunk
Speed
Real time
near real time (within 60s)
Difficulty
you are are responsible for crating consumers and scaling the streams
Plug and play with AWS Architecture. Scaling and consumers are handled by AWS!
Kinesis Data Streams has some specific limitations on the number of consumers that can read from a stream:
For shared consumer mode (classic):
You can have up to 5 consumer applications (using the same Group ID) reading from a stream simultaneously
These share the throughput of the shard
For enhanced fan-out:
You can register up to 20 consumers per stream
Each consumer gets its own dedicated 2MB/second throughput per shard
You need to register each consumer explicitly using the RegisterStreamConsumer API
This is quite different from SNS, which allows you to have practically unlimited subscribers (there are soft limits but they're very high).
If you need to broadcast data to many consumers, SNS would be a better choice than Kinesis. Alternatively, you could set up a Lambda function to read from Kinesis and then fan out the data to multiple destinations, but this adds complexity and latency.
Analyze data using standard SQL.
Data Analytics supports both Kinesis Data Stream and Data Firehose.
There are no services to manage
You only pay for what you are using
Conceptual Similarities:
Both are the base unit of parallelism
Both guarantee ordered processing within a single shard/partition
Both are used to distribute data across consumers
Key Differences:
Capacity Model
Kafka Partition:
No built-in capacity limits
Bounded only by broker hardware
Can handle variable throughput
Kinesis Shard:
Fixed capacity limits:
1MB/sec input
2MB/sec output
1000 records/sec writes
Like a "metered pipe" with strict limits
Consumer Behavior
Kafka Partition:
One partition can be read by one consumer in a consumer group
No throttling on reads
Multiple consumer groups can read at full speed
Kinesis Shard:
Limited to 5 reads/sec per shard
Need Enhanced Fan-out for multiple consumers
More restrictive consumer model
Scaling Characteristics
Kafka Partition:
Partitions can be added but not split
Rebalancing happens automatically
No throughput guarantees per partition
Kinesis Shard:
Shards can be split/merged
Fixed throughput per shard makes capacity planning simpler
More predictable but less flexible
So while they serve similar purposes in distributing data, they operate on different models:
Kafka partition is more like an unbounded queue
Kinesis shard is more like a metered pipeline with strict capacity controls
This difference impacts how you design and scale applications on each platform.