Event Delivery Semantics

Currents for Data Storage is a continuous stream of data from our platform to a storage bucket on one of our data warehouse partner connections. Currents writes Avro files to your storage bucket at regular thresholds, allowing you to process and analyze the event data using your own Business Intelligence toolset.

At-Least-Once Delivery

As a high-throughput system, Currents guarantees “at-least-once” delivery of events, meaning that duplicate events can occasionally be written to your storage bucket. This can happen when events are reprocessed from our queue for any reason.

If your use cases require exactly-once delivery, you can use the unique identifier field that is sent with every event (id) to deduplicate events. Since the file leaves our control once it’s written to your storage bucket, we have no way to guanrantee deduplication from our end.

Timestamps

All timestamps exported by Currents are sent in the UTC timezone. For some events where it is available, a timezone field is also included, which delivers the IANA format of the user’s local timezone at the time of the event.

Avro

The Braze Currents data storage integrations output data in the .avro format. We chose Avro because it is a flexible data format that natively supports schema evolution and is supported by a wide variety of data products:

  • Avro is supported by nearly every major data warehouse.
  • In the event that you desire to leave your data in S3, Avro compresses better than CSV and JSON, so you pay less for storage and potentially can use less CPU to parse the data.
  • Avro requires schemas when data is written or read. Schemas can be evolved over time to handle the addition of fields without breaking.

Currents will create a file for each event type using the format below:

1
<your-bucket-prefix>/dataexport.<cluster-identifier>.<connection-type-identifier>.integration.<integration-id>/event_type=<event-type>/date=<date>/<schema-id>/<zone>/dataexport.<cluster-identifier>.<connection-type-identifier>.integration.<integration-id>+<partition>+<offset>.avro

Can’t see the code because of the scroll bar? See how to fix that here.

Filename Segment Definition
<your-bucket-prefix> The prefix set for this Currents integration.
<cluster-identifier> For internal use by Braze. Will be a string such as “prod-01”, “prod-02”, “prod-03”, or “prod-04”. All files will have the same cluster identifier.
<connection-type-identifier> The identifier for type of connection. Options are “S3”, “AzureBlob”, or “GCS”.
<integration-id> The unique ID for this Currents integration.
<event-type> The type of the event in the file (see event list below).
<date> The hour that events are queued in our system for processing. Formatted YYYY-MM-DD-HH.
<schema-id> Used to version .avro schemas for backwards-compatibility and schema evolution. Integer.
<zone> For internal use by Braze. Single letter.
<partition> For internal use by Braze. Integer.
<offset> For internal use by Braze. Integer.

Data files will be written to your storage bucket at set thresholds:

Partner Write Threshold
Amazon AWS S3 Every 5 minutes, 15,000 events, or on the hour.
Microsoft Azure Blob Storage Every 5 minutes, 5,000 events, or on the hour.
Google Cloud Storage Every 5 minutes, 5,000 events, or on the hour.

Avro Schema Changes

From time to time, Braze may make changes to the Avro schema when fields are added, changed, or removed. For our purposes here, there are two types of changes: breaking and non-breaking. In all cases, the <schema-id> will be advanced to indicate the schema was updated.

Non-breaking Changes

When a field is added to the Avro schema, we consider this a non-breaking change. Added fields will always be “optional” Avro fields (i.e. with a default value of null), so they will “match” older schemas according to the Avro schema resolution spec. These additions should have no effect on existing ETL processes as the field will simply be ignored until it is added to your ETL process. We recommend that your ETL setup is explicit about the fields it processes to avoid breaking the flow when new fields are added.

While we will strive to give advance warning in the case of all changes, we may include non-breaking changes to the schema at any time.

Breaking Changes

When a field is removed from or changed in the Avro schema, we consider this a breaking change. Breaking changes may require modifications to existing ETL processes as fields that were in use may no longer be recorded as expected.

All breaking changes to the schema will be communicated in advance of the change.

WAS THIS PAGE HELPFUL?
New Stuff!