Best practices
Refer to these best practices when using Cloud Data Ingestion.
Understanding the UPDATED_AT column
UPDATED_AT is relevant for data warehouse integrations only, not for S3 syncs.
In your initial setup, you created the UPDATED_AT column in your table or view. Each time a sync runs, Braze looks for rows that have not previously been synced by using this column. Any rows where UPDATED_AT is later than the last UPDATED_AT timestamp from the previous successful sync run will be selected and pulled into Braze.
Braze CDI will sync rows strictly based on the UPDATED_AT value, regardless of whether the row content is the same as what’s currently in Braze. Given that, we recommend using UPDATED_AT properly to only sync new or updated data to avoid unnecessary data point usage.
Example: Recurring sync
To illustrate how UPDATED_AT is used in a CDI sync, consider this example recurring sync for updating user attributes:
Recurring sync, first run on July 2, 2022 at 12 pm
| UPDATED_AT | EXTERNAL_ID | PAYLOAD |
|---|---|---|
2022-07-17 08:30:00 |
customer_1234 |
{ “attribute_1”:”abcdefg”, “attribute_2”: { “attribute_a”:”example_value_1”, “attribute_b”:”example_value_1” }, “attribute_3”:”2019-07-16T19:20:30+1:00” } |
2022-07-18 11:59:23 |
customer_3456 |
{ “attribute_1”:”abcdefg”, “attribute_2”:42, “attribute_3”:”2019-07-16T19:20:30+1:00”, “attribute_5”:”testing” } |
2022-07-19 09:07:23 |
customer_5678 |
{ “attribute_1”:”abcdefg”, “attribute_4”:true, “attribute_5”:”testing_123” } |
All rows are synced because this is the first run for the sync. The last UPDATED_AT for this sync is now set at 2022-07-19 09:07:23.
Recurring sync, second run on July 20, 2022 at 12 pm
| UPDATED_AT | EXTERNAL_ID | PAYLOAD |
|---|---|---|
2022-07-17 08:30:00 |
customer_1234 |
{ “attribute_1”:”abcdefg”, “attribute_2”: { “attribute_a”:”example_value_2”, “attribute_b”:”example_value_2” }, “attribute_3”:”2019-07-16T19:20:30+1:00” } |
2022-07-18 11:59:23 |
customer_3456 |
{ “attribute_1”:”abcdefg”, “attribute_2”:42, “attribute_3”:”2019-07-16T19:20:30+1:00”, “attribute_5”:”testing” } |
2022-07-19 09:07:23 |
customer_5678 |
{ “attribute_1”:”abcdefg”, “attribute_4”:true, “attribute_5”:”testing_123” } |
2022-07-16 00:25:30 |
customer_9012 |
{ “attribute_1”:”abcdefg”, “attribute_4”:false, “attribute_5”:”testing_123” } |
A row was added, but the UPDATED_AT value is earlier than 2022-07-19 09:07:23 (stored from the first run). As a result, none of these rows will be synced in this run. The last UPDATED_AT for the sync is unchanged by this run, and remains as 2022-07-19 09:07:23.
Recurring sync, third run on July 21, 2022 at 12 pm
| UPDATED_AT | EXTERNAL_ID | PAYLOAD |
|---|---|---|
2022-07-17 08:30:00 |
customer_1234 |
{ “attribute_1”:”abcdefg”, “attribute_2”: { “attribute_a”:”example_value_1”, “attribute_b”:”example_value_1” }, “attribute_3”:”2019-07-16T19:20:30+1:00” } |
2022-07-18 11:59:23 |
customer_3456 |
{ “attribute_1”:”abcdefg”, “attribute_2”:42, “attribute_3”:”2019-07-16T19:20:30+1:00”, “attribute_5”:”testing” } |
2022-07-19 09:07:23 |
customer_5678 |
{ “attribute_1”:”abcdefg”, “attribute_4”:true, “attribute_5”:”testing_123” } |
2022-07-16 00:25:30 |
customer_9012 |
{ “attribute_1”:”xyz”, “attribute_4”:false, “attribute_5”:”testing_123” } |
2022-07-21 08:30:00 |
customer_1234 |
{ “attribute_1”:”abcdefg”, “attribute_2”: { “attribute_a”:”example_value_2”, “attribute_b”:”example_value_2” }, “attribute_3”:”2019-07-20T19:20:30+1:00” } |
In this third run, another new row was added. Now, one row has an UPDATED_AT value later than 2022-07-19 09:07:23, which means only one row will sync. The last UPDATED_AT is now set as 2022-07-21 08:30:00.
UPDATED_AT values are allowed to be even later than the run start time for a given sync. However, this is not recommended as it pushes the last UPDATED_AT timestamp “into the future” and subsequent syncs will not sync earlier values.
Use a UTC timestamp for the UPDATED_AT column
The UPDATED_AT column should be in UTC to prevent issues with daylight savings time. Prefer UTC-only functions, such as SYSDATE() instead of CURRENT_DATE() whenever possible.
Make sure the UPDATED_AT time isn’t the same time as your sync
Your CDI sync might have duplicate data if any UPDATED_AT fields are at the exact same time as the last UPDATED_AT timestamp of the previous successful sync job. This is because CDI will choose an “inclusive boundary” when it identifies any row that is the same time as the previous sync, and will make the rows able to sync. CDI will re-ingest those rows and create duplicate data.
Here are some suggestions to avoid duplicate data:
- If you’re setting up a sync against a
VIEW, don’t useCURRENT_TIMESTAMPas the default value. This will cause all data to sync every time the sync runs because theUPDATED_ATfield will evaluate to the time our queries are run. - If you have very long-running pipelines or queries writing data to your source table, avoid running these concurrently with a sync, or avoid using the same timestamp for every row inserted.
- Use a transaction to write all rows that have the same timestamp.
Example: Managing subsequent updates
This example shows the general process for syncing data for the first time, then only updating changing data (deltas) in the subsequent updates. Let’s say we have a table EXAMPLE_DATA with some user data. On day 1, it has the following values:
| external_id | attribute_1 | attribute_2 | attribute_3 | attribute_4 |
|---|---|---|---|---|
| 12345 | 823 | blue | 380 | FALSE |
| 23456 | 28 | blue | 823 | TRUE |
| 34567 | 234 | blue | 384 | TRUE |
| 45678 | 245 | red | 349 | TRUE |
| 56789 | 1938 | red | 813 | FALSE |
To get this data into the format that CDI expects, you could run the following query:
1
2
3
4
5
6
7
8
9
10
11
12
SELECT
CURRENT_TIMESTAMP AS UPDATED_AT,
EXTERNAL_ID AS EXTERNAL_ID,
TO_JSON(
OBJECT_CONSTRUCT(
'attribute_1', attribute_1,
'attribute_2', attribute_2,
'attribute_3', attribute_3,
'attribute_4', attribute_4
)
) AS PAYLOAD
FROM EXAMPLE_DATA;
None of this has synced to Braze before, so add all of it to the source table for CDI:
| UPDATED_AT | EXTERNAL_ID | PAYLOAD |
|---|---|---|
| 2023-03-16 15:00:00 | 12345 | { “ATTRIBUTE_1”: “823”, “ATTRIBUTE_2”:”blue”, “ATTRIBUTE_3”:”380”, “ATTRIBUTE_4”:”FALSE”} |
| 2023-03-16 15:00:00 | 23456 | { “ATTRIBUTE_1”: “28”, “ATTRIBUTE_2”:”blue”, “ATTRIBUTE_3”:”823”, “ATTRIBUTE_4”:”TRUE”} |
| 2023-03-16 15:00:00 | 34567 | { “ATTRIBUTE_1”: “234”, “ATTRIBUTE_2”:”blue”, “ATTRIBUTE_3”:”384”, “ATTRIBUTE_4”:”TRUE”} |
| 2023-03-16 15:00:00 | 45678 | { “ATTRIBUTE_1”: “245”, “ATTRIBUTE_2”:”red”, “ATTRIBUTE_3”:”349”, “ATTRIBUTE_4”:”TRUE”} |
| 2023-03-16 15:00:00 | 56789 | { “ATTRIBUTE_1”: “1938”, “ATTRIBUTE_2”:”red”, “ATTRIBUTE_3”:”813”, “ATTRIBUTE_4”:”FALSE”} |
A sync runs, and Braze records that you synced all available data up until “2023-03-16 15:00:00”. Then, on the morning of day 2, you have an ETL that runs and some fields in your users table are updated (highlighted):
| external_id | attribute_1 | attribute_2 | attribute_3 | attribute_4 |
|---|---|---|---|---|
| 12345 | 145 | red | 380 | TRUE |
| 23456 | 15 | blue | 823 | TRUE |
| 34567 | 234 | blue | 495 | FALSE |
| 45678 | 245 | green | 349 | TRUE |
| 56789 | 1938 | red | 693 | FALSE |
Now you need to add only the changed values into the CDI source table. These rows can be appended rather than updating the old rows. That table now looks like this:
| UPDATED_AT | EXTERNAL_ID | PAYLOAD |
|---|---|---|
| 2023-03-16 15:00:00 | 12345 | { “ATTRIBUTE_1”: “823”, “ATTRIBUTE_2”:”blue”, “ATTRIBUTE_3”:”380”, “ATTRIBUTE_4”:”FALSE”} |
| 2023-03-16 15:00:00 | 23456 | { “ATTRIBUTE_1”: “28”, “ATTRIBUTE_2”:”blue”, “ATTRIBUTE_3”:”823”, “ATTRIBUTE_4”:”TRUE”} |
| 2023-03-16 15:00:00 | 34567 | { “ATTRIBUTE_1”: “234”, “ATTRIBUTE_2”:”blue”, “ATTRIBUTE_3”:”384”, “ATTRIBUTE_4”:”TRUE”} |
| 2023-03-16 15:00:00 | 45678 | { “ATTRIBUTE_1”: “245”, “ATTRIBUTE_2”:”red”, “ATTRIBUTE_3”:”349”, “ATTRIBUTE_4”:”TRUE”} |
| 2023-03-16 15:00:00 | 56789 | { “ATTRIBUTE_1”: “1938”, “ATTRIBUTE_2”:”red”, “ATTRIBUTE_3”:”813”, “ATTRIBUTE_4”:”FALSE”} |
| 2023-03-17 09:30:00 | 12345 | { “ATTRIBUTE_1”: “145”, “ATTRIBUTE_2”:”red”, “ATTRIBUTE_4”:”TRUE”} |
| 2023-03-17 09:30:00 | 23456 | { “ATTRIBUTE_1”: “15”} |
| 2023-03-17 09:30:00 | 34567 | { “ATTRIBUTE_3”:”495”, “ATTRIBUTE_4”:”FALSE”} |
| 2023-03-17 09:30:00 | 45678 | { “ATTRIBUTE_2”:”green”} |
| 2023-03-17 09:30:00 | 56789 | { “ATTRIBUTE_3”:”693”} |
CDI will only sync the new rows, so the next sync that runs will only sync the last five rows.
Additional tips
Only write new or updated attributes to minimize consumption
Each time a sync runs, Braze looks for rows that have not previously been synced. We check this using the UPDATED_AT column in your table or view. Any rows where UPDATED_AT is equal to or later than the last UPDATED_AT timestamp from the last successful sync job will be selected and pulled into Braze, regardless of whether they are the same as what’s currently on the user profile. Given that, we recommend only syncing attributes you want to add or update.
Data point usage is identical using CDI as for other ingestion methods like REST APIs or SDKs, so it is up to you to make sure that you’re only adding new or updated attributes into your source tables.
Separate EXTERNAL_ID from PAYLOAD column
The PAYLOAD object should not include an external ID or other ID type.
Remove an attribute
You can set it to null if you want to omit an attribute from a user’s profile. If you want an attribute to remain unchanged, don’t send it to Braze until it’s been updated. To completely remove an attribute, use TO_JSON(OBJECT_CONSTRUCT_KEEP_NULL(...)).
Make incremental updates
Make incremental updates to your data so you can prevent unintentional overwrites when simultaneous updates are made.
In the following example, a user has two attributes:
- Color: “Green”
- Size: “Large”
Then Braze receives the following two updates to that user simultaneously:
- Request 1: Change color to “Red”
- Request 2: Change size to “Medium”
Because Request 1 occurs first, the user’s attributes are updated to the following:
- Color: “Red”
- Size: “Large”
However, when Request 2 occurs, Braze starts with the original attribute values (“Green” and “Large”), then updates the user’s attributes to the following:
- Color: “Green”
- Size: “Medium”
When the requests are finished, Request 2 will overwrite the update from Request 1, so it’s best to stagger your updates so you can prevent requests from being overwritten.
Create a JSON string from another table
If you prefer to store each attribute in its own column internally, you need to convert those columns to a JSON string to populate the sync with Braze. To do that, you can use a query like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CREATE TABLE "EXAMPLE_USER_DATA"
(attribute_1 string,
attribute_2 string,
attribute_3 number,
my_user_id string);
SELECT
CURRENT_TIMESTAMP as UPDATED_AT,
my_user_id as EXTERNAL_ID,
TO_JSON(
OBJECT_CONSTRUCT (
'attribute_1',
attribute_1,
'attribute_2',
attribute_2,
'yet_another_attribute',
attribute_3)
)as PAYLOAD FROM "EXAMPLE_USER_DATA";
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CREATE TABLE "EXAMPLE_USER_DATA"
(attribute_1 string,
attribute_2 string,
attribute_3 number,
my_user_id string);
SELECT
CURRENT_TIMESTAMP as UPDATED_AT,
my_user_id as EXTERNAL_ID,
JSON_SERIALIZE(
OBJECT (
'attribute_1',
attribute_1,
'attribute_2',
attribute_2,
'yet_another_attribute',
attribute_3)
) as PAYLOAD FROM "EXAMPLE_USER_DATA";
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CREATE OR REPLACE TABLE BRAZE.EXAMPLE_USER_DATA (attribute_1 string,
attribute_2 STRING,
attribute_3 NUMERIC,
my_user_id STRING);
SELECT
CURRENT_TIMESTAMP as UPDATED_AT,
my_user_id as EXTERNAL_ID,
TO_JSON(
STRUCT(
'attribute_1' AS attribute_1,
'attribute_2'AS attribute_2,
'yet_another_attribute'AS attribute_3
)
) as PAYLOAD
FROM BRAZE.EXAMPLE_USER_DATA;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CREATE OR REPLACE TABLE BRAZE.EXAMPLE_USER_DATA (
attribute_1 string,
attribute_2 STRING,
attribute_3 NUMERIC,
my_user_id STRING
);
SELECT
CURRENT_TIMESTAMP as UPDATED_AT,
my_user_id as EXTERNAL_ID,
TO_JSON(
STRUCT(
attribute_1,
attribute_2,
attribute_3
)
) as PAYLOAD
FROM BRAZE.EXAMPLE_USER_DATA;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CREATE TABLE [braze].[users] (
attribute_1 VARCHAR,
attribute_2 VARCHAR,
attribute_3 VARCHAR,
attribute_4 VARCHAR,
user_id VARCHAR
)
GO
CREATE VIEW [braze].[user_update_example]
AS SELECT
user_id as EXTERNAL_ID,
CURRENT_TIMESTAMP as UPDATED_AT,
JSON_OBJECT('attribute_1':attribute_1, 'attribute_2':attribute_2, 'attribute_3':attribute_3, 'attribute_4':attribute_4) as PAYLOAD
FROM [braze].[users] ;
Use the UPDATED_AT timestamp
We use the UPDATED_AT timestamp to track what data has been synced successfully to Braze. If many rows are written with the same timestamp while a sync is running, this may lead to duplicate data being synced to Braze. Some suggestions to avoid duplicate data:
- If you’re setting up a sync against a
VIEW, don’t useCURRENT_TIMESTAMPas the default value. This will cause all data to sync every time the sync runs because theUPDATED_ATfield will evaluate to the time our queries are run. - If you have very long-running pipelines or queries writing data to your source table, avoid running these concurrently with a sync, or avoid using the same timestamp for every row inserted.
- Use a transaction to write all rows that have the same timestamp.
Table configuration
We have a public GitHub repository for customers to share best practices or code snippets. To contribute your own snippets, create a pull request!
Data formatting
Any operations that are possible through the Braze /users/track endpoint are supported through Cloud Data Ingestion, including updating nested custom attributes, adding subscription status, and syncing custom events or purchases.
Fields within the payload should follow the same format as the corresponding /users/track endpoint. For detailed formatting requirements, refer to the following:
| Data type | Formatting specifications |
|---|---|
attributes |
See user attributes object |
events |
See events object |
purchases |
See purchases object |
Note the special requirement for capturing dates in nested attributes.
You may include nested custom attributes in the payload column for a custom attributes sync.
1
2
3
4
5
6
7
8
9
10
11
12
{
"most_played_song": {
"song_name": "Solea",
"artist_name": "Miles Davis",
"album_name": "Sketches of Spain",
"genre": "Jazz",
"play_analytics": {
"count": 1000,
"top_10_listeners": true
}
}
}
To sync events, an event name is required. The time field should be formatted as an ISO 8601 string or in yyyy-MM-dd'T'HH:mm:ss:SSSZ format. If the time field is not present, the UPDATED_AT column value is used as the event time. Other fields including app_id and properties are optional.
Note that you can only sync one event per row.
1
2
3
4
5
6
7
8
9
{
"app_id" : "your-app-id",
"name" : "rented_movie",
"time" : "2013-07-16T19:20:45+01:00",
"properties": {
"movie": "The Sad Egg",
"director": "Dan Alexander"
}
}
To sync purchase events, product_id, currency, and price are required. The time field, which is optional, should be formatted as an ISO 8601 string or in yyyy-MM-dd'T'HH:mm:ss:SSSZ format. If the time field is not present, the UPDATED_AT column value is used as the event time. Other fields, including app_id, quantity and properties are optional.
Note that you can only sync one purchase event per row.
1
2
3
4
5
6
7
8
9
10
11
12
{
"app_id" : "11ae5b4b-2445-4440-a04f-bf537764c9ad",
"product_id" : "Completed Order",
"currency" : "USD",
"price" : 219.98,
"time" : "2013-07-16T19:20:30+01:00",
"properties" : {
"products" : [ { "name": "Monitor", "category": "Gaming", "product_amount": 19.99, },
{ "name": "Gaming Keyboard", "category": "Gaming ", "product_amount": 199.99, }
]
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
"subscription_groups" : [
{
"subscription_group_id": "subscription_group_identifier_1",
"subscription_state": "unsubscribed"
},
{
"subscription_group_id": "subscription_group_identifier_2",
"subscription_state": "subscribed"
},
{
"subscription_group_id": "subscription_group_identifier_3",
"subscription_state": "subscribed"
}
]
}
Avoid timeouts for data warehouse queries
We recommend that queries be completed within one hour for optimal performance and to avoid potential errors. If queries exceed this timeframe, consider reviewing your data warehouse configuration. Optimizing resources allocated to your warehouse can help improve query execution speed.
Edit this page on GitHub