Skip to content

Event delivery semantics

This article outlines how Currents manages flat file event data we send to Data Warehouse partners.

Currents for Data Storage is a continuous streams of data from our platform to a storage bucket on one of our data warehouse partner connections.

Currents writes Avro files to your storage bucket at regular thresholds, allowing you to process and analyze the event data using your own Business Intelligence toolset.

At-least-once delivery

As a high-throughput system, Currents guarantees “at-least-once” delivery of events, meaning that duplicate events can occasionally be written to your storage bucket. This can happen when events are reprocessed from our queue for any reason.

If your use cases require exactly-once delivery, you can use the unique identifier field that is sent with every event (id) to deduplicate events. Since the file leaves our control when it’s written to your storage bucket, we have no way to guarantee deduplication from our end.

Timestamps

All timestamps exported by Currents are sent in the UTC time zone. For some events where it is available, a time zone field is also included, which delivers the IANA format of the user’s local time zone at the time of the event.

Apache Avro

The Braze Currents data storage integrations output data in the .avro format. We chose Apache Avro because it is a flexible data format that natively supports schema evolution and is supported by a wide variety of data products:

  • Avro is supported by nearly every major data warehouse.
  • In the event that you desire to leave your data in S3, Avro compresses better than CSV and JSON, so you pay less for storage and potentially can use less CPU to parse the data.
  • Avro requires schemas when data is written or read. Schemas can be evolved over time to handle the addition of fields without breaking.

Currents will create a file for each event type using the following format:

1
<your-bucket-prefix>/dataexport.<cluster-identifier>.<connection-type-identifier>.integration.<integration-id>/event_type=<event-type>/date=<date>/<schema-id>/<zone>/dataexport.<cluster-identifier>.<connection-type-identifier>.integration.<integration-id>+<partition>+<offset>.avro
Filename Segment Definition
<your-bucket-prefix> The prefix set for this Currents integration.
<cluster-identifier> For internal use by Braze. Will be a string such as “prod-01”, “prod-02”, “prod-03”, or “prod-04”. All files will have the same cluster identifier.
<connection-type-identifier> The identifier for type of connection. Options are “S3”, “AzureBlob”, or “GCS”.
<integration-id> The unique ID for this Currents integration.
<event-type> The type of the event in the file.
<date> The hour that events are queued in our system for processing in the UTC time zone. Formatted YYYY-MM-DD-HH.
<schema-id> Used to version .avro schemas for backward-compatibility and schema evolution. Integer.
<zone> For internal use by Braze.
<partition> For internal use by Braze. Integer.
<offset> For internal use by Braze. Integer. Note that different files sent within the same hour will have a different <offset> parameter.

Avro write threshold

Under normal circumstances, Braze will write data files to your storage bucket every 5 minutes or 15,000 events, whichever is sooner. Under heavy load, we may write larger data files with as many as 100,000 events within the same 5-minute period.

Avro schema changes

From time to time, Braze may make changes to the Avro schema when fields are added, changed, or removed. For our purposes here, there are two types of changes: breaking and non-breaking. In all cases, the <schema-id> will be advanced to indicate the schema was updated.

Non-breaking changes

When a field is added to the Avro schema, we consider this a non-breaking change. Added fields will always be “optional” Avro fields (such as with a default value of null), so they will “match” older schemas according to the Avro schema resolution spec. These additions should not affect existing ETL processes as the field will simply be ignored until it is added to your ETL process.

While we will strive to give advance warning in the case of all changes, we may include non-breaking changes to the schema at any time.

Breaking changes

When a field is removed from or changed in the Avro schema, we consider this a breaking change. Breaking changes may require modifications to existing ETL processes as fields that were in use may no longer be recorded as expected.

All breaking changes to the schema will be communicated in advance of the change.

HOW HELPFUL WAS THIS PAGE?
New Stuff!