industry-department-solutions

latest

false

Supply Chain & Retail Solutions user guide

Data ingestion

Peak's Data Sources feature lets you ingest data from external sources into your data warehouse using connectors and feeds. Once configured, feeds run on a schedule, on demand, or via webhook.

To access Data Sources, open Manage and select Data Sources.

Peak supports four data ingestion paths:

File storage ingestion — ingest data from files via drag and drop, FTP/SFTP upload, or signed URL.
Application connectors — pull data from online platforms such as Google Ads, Amazon S3, and Braze.
Database connectors — connect directly to a database such as Redshift, Snowflake, PostgreSQL, MSSQL, MySQL, or Oracle.
Ingestion API — push data into Peak programmatically.

Data feeds

A data feed connects to a source, copies its data, and ingests the copy into your Peak data warehouse. Feeds can run automatically using a trigger or manually on demand.

Each feed is configured through four stages:

Stage	Description
Connection	Source location and credentials (for example, hostname, username, password).
Import configuration	Specific tables or data to ingest, load type, and key configuration. See Load types.
Destination	Target storage in the data warehouse or S3. See Destination options.
Trigger	When and how the feed runs. See Triggers and watchers.

Managing feeds

The Feeds screen shows the status, run history, and next scheduled run for each feed. Hover over a feed to access the following actions:

Action	Description
Run	Runs the feed immediately.
Pause	Pauses the feed schedule.
Tags	Manages tags for the feed. Only alphanumeric characters are allowed. Use Tab or Enter to separate values.
Edit	Opens the feed configuration for editing.
Resume	Resumes the feed schedule after a pause.

Filtering feeds

Use the filter function to find feeds when you have a large number. The following filters are available:

Feed status: Active, Paused
Trigger type: Schedule, Webhook, Run Once (Manual), Run Once (Schedule)
Last run status: Running, Failed, Success, No new data
Tags: Custom tags applied to your feeds

Monitoring feed activity

Select a feed to open its detail view. Two tabs are available:

Logs tab

Shows how many rows were successfully loaded into the data warehouse, and detailed logs for each feed run. Select the Browse file icon on a log entry to open Files and view the files associated with that feed run.

If a feed run fails, error details are shown in the detailed log. To get details of individual failed records, download the STL load error files using the download icon next to the error details.

Info tab

Shows basic configuration information about the feed.

Load types

Load types control how data is written to your destination table when a feed runs. You select a load type during import configuration.

Load type summary

Load type	Primary key required	Last run key required	Behavior
Truncate and insert	Optional	Not available	Replaces the destination table with each run.
Incremental	Optional	Required	Appends new records based on the last run key.
Upsert	Required	Optional	Updates existing records and inserts new records.

Key configuration

Two key columns can be configured when selecting a load type:

Primary key: The column or set of columns that uniquely identifies each record. Used to determine which records to update.
Last run key: A column used to detect new or changed data since the previous run. Can be a timestamp (detects new and modified rows) or a strictly incrementing value (provides a unique ID for updates). Last run key configuration is only available for database connectors.

Truncate and insert

Best suited for small data tables.

When this load type runs, the entire destination table is replaced with the data retrieved from the source.

Primary key: Optional. Because the full table is replaced each run, there is no need to identify specific records.
Last run key: Not available. The full dataset is always fetched.

Incremental

Best suited for event-type datasets where records are added over time but not modified.

With incremental feeds, records are only inserted — existing records are never updated. At each run, only records added since the last run are fetched, based on the column value specified in the last run key.

Primary key: Optional. Only new records are inserted into the existing table, so record-level identification is not required.
Last run key: Required. Determines the cutoff point for fetching new records. For example, if the last run key is a date column, only records with a value greater than the date of the last run are fetched.

Upsert

Best suited for transactional data where records can be created or modified over time.

With upsert feeds, new records are inserted and existing records are updated. In some cases, the full dataset is fetched to ensure all updates are captured.

Primary key: Required. Used to identify which existing records need to be updated when new data arrives.
Last run key: Optional. If you need to capture updates to existing records across the full dataset, omit the last run key so that all records are fetched on each run.

Destination options

When creating a data feed, you select a destination where Peak ingests your data. The available destinations depend on the data warehouse configured for your Peak organization.

Checking your data warehouse type

To see which warehouse type your organization uses, open Manage and select Data Bridge. In the Data warehouse section, check the storage type shown.

For more information about data warehouse configuration, see Data Bridge overview.

Available destinations by warehouse

Data warehouse	Available destinations
Snowflake	S3 (Spark processing) or Snowflake — not both
Redshift	S3 (Spark processing), Redshift, or both

S3 (Spark processing)

Stores data in Amazon S3. Peak uses Apache Spark to process large, unstructured CSV datasets.

For Snowflake organizations: data is processed as an external table during ingestion and is available to query in SQL Explorer.
For Redshift organizations: data is processed as Redshift Spectrum during ingestion and is available to query in SQL Explorer. This destination requires an active Glue Catalog. If the option is unavailable, configure the Glue Catalog first.

Snowflake

Data is ingested into the Snowflake data warehouse. Snowflake is SQL-based and supports querying and analyzing data using standard SQL.

After ingestion, Peak adds these audit columns to the destination table:

Audit column	Description
`PEAKAUDITCREATEDAT`	Time when the record was created in the table.
`PEAKAUDITFILENAME`	Path of the raw file in the data lake that contains the record.
`PEAKAUDITREQUESTID`	Feed run identifier.
`PEAKAUDITUPDATECOUNTER`	Number of times the record has been updated (upsert load type only).
`PEAKAUDITUPDATEDAT`	Date when the record was last updated (upsert load type only).

Redshift

Data is ingested into Amazon Redshift. Redshift is a relational database that supports SQL querying and is optimized for aggregations on large datasets. Incoming data must map exactly to the destination table schema, column by column. Any failed rows are flagged and written to a separate table.

After ingestion, Peak adds these audit columns to the destination table:

Audit column	Description
`peakauditcreatedat`	Time when the record was created in the table.
`peakauditrequestid`	Feed run identifier.
`peakauditupdatecounter`	Number of times the record has been updated (upsert load type only).
`peakauditupdatedat`	Date when the record was last updated (upsert load type only).

Failed row threshold

The failed row threshold defines how many rows can fail before Peak marks the feed as failed. Set a threshold that reflects your total row volume and acceptable data quality tolerance.

Behavior varies by load type:

Incremental and truncate and insert: the threshold is the aggregate error count across all files in a feed run.
Upsert: the threshold is applied per file. If some files exceed the threshold and others do not, the feed is marked as Partially ingested.

Additional rules:

If all rows in a file have corrupted data, the feed is marked as failed regardless of the threshold.
Error messages for failed rows can be downloaded from feed logs.
Only positive integers are accepted as threshold values.

Threshold defaults and limits by destination:

Property	Snowflake	Redshift
Default threshold	0	1,000
Maximum threshold	100,000	100,000
Editable in Database Connector	No (fixed at 0)	Yes
Available in REST API, Webhook, Braze Currents connectors	No	No

Schema evolution

Peak handles schema changes automatically when a feed run detects added or removed columns.

For CSV files, all files within a single feed run must use the same schema. Inconsistent schemas across files in the same feed run cause ingestion failure. For NDJSON files, different files can have different schemas, provided each individual file has a consistent unified schema.

Change	Behavior
New column added	Column is added to the destination table with data type `string`. Previous records are set to `NULL` for that column.
Existing column removed	Column is retained in the destination table. New records are set to `NULL` for that column.

Schema data types

When configuring a destination, you can override the inferred data type for each column. This is available for all connectors except Webhook and Braze Currents.

Supported data types:

STRING
INTEGER
NUMERIC
TIMESTAMP
DATE
BOOLEAN
JSON

TIMESTAMPTZ is not supported. Data in this format is ingested as STRING.

Triggers and watchers

Triggers define when a data feed runs. Watchers send notifications when feed events occur.

Trigger types

Trigger type	Purpose	Notes
Schedule	Runs feeds on a basic or cron schedule.	Cron expressions use 6 or 7 fields.
Webhook	Runs feeds when external systems send events.	Requires a webhook URL and API key.
Run once	Runs a feed once on demand or at a set time.	Manual runs are available from the Feeds list.

Schedule trigger

Basic schedules run a feed at a specified time and day, or at a recurring frequency (for example, every 2 hours or every Monday and Tuesday at 12:00).

Advanced schedules use cron expressions for more precise timing.

Cron format

A cron expression is a string of 6 or 7 space-separated fields:

Field	Mandatory	Allowed values	Allowed special characters
Seconds	Yes	0–59	`, - * /`
Minutes	Yes	0–59	`, - * /`
Hours	Yes	0–23	`, - * /`
Day of month	Yes	1–31	`, - * ? / L W`
Month	Yes	1–12 or JAN–DEC	`, - * /`
Day of week	Yes	1–7 or SUN–SAT	`, - * ? / L #`
Year	No	empty, 1970–2099	`, - * /`

Examples:

0 0 12 * * ? runs every day at 12:00.
0 15 10 * * ? 2021 runs at 10:15 every day during 2021.
0 15 10 ? * 6L runs at 10:15 on the last Friday of each month.

To validate your cron expressions before use, consider using a syntax checker such as crontab.guru.

Webhook trigger

Unlike regular APIs that require constant polling for updates, webhooks only send data when a specific event occurs — in this case, when new data is available for the feed.

For webhook triggers, Peak generates a unique webhook URL for the feed. Copy the URL into the external system that sends events. You can regenerate the URL if needed. If the external system is outside Peak, provide your tenant API key so the webhook can be authenticated. See API keys.

Run once trigger

Run once triggers can be manual or scheduled:

Manual: run the feed from the Feeds list when needed.
Date and time: run the feed once at a specified time (at least 30 minutes from now).

Watchers

Configure watchers to send notifications for feed events:

Watcher type	Purpose	Example use case
User watcher	Notifies Peak users in the platform.	Alert a data team when a feed fails.
Webhook watcher	Sends events to external systems.	Trigger a Slack notification or workflow.

To add a watcher to a feed:

In the Trigger stage, select Add watcher.
Choose User watcher or Webhook watcher.
Select the feed events to monitor and save the watcher.

User watcher

User watchers notify selected users inside Peak when a feed event occurs. Users can view notifications from the bell icon.

When configuring a user watcher, you can choose to watch all events or select a custom set.

User watchers can be configured for these events:

Create
Run fail
Run success
No new data
Feed edit or delete

Webhook watcher

Webhook watchers send a notification or trigger an action in an external system or Peak feature when a feed event occurs. Examples include Slack notifications or Peak Workflows.

The webhook URL is provided by the target application. If the target is a Peak Workflow, copy the URL from the Workflow's Trigger step.

The JSON payload is optional and can include these parameters to pass feed context:

{tenantname}
{jobtype}
{jobname}
{trigger}

Webhook watchers can be configured for these events:

Run fail
Run success
Running longer than a specified time
No new data

File naming and timestamps

Peak uses file names to determine how files are grouped into feeds. If a file is updated and fetched regularly as part of the same feed, the file must retain the same base name with a new timestamp appended on each update.

File names must include at least one alphanumeric character.

File naming patterns

Use one of these patterns when naming files for feeds:

<file_name>_<s|n|part><number>.<extension> (for example, Abcs_s12345.csv or ABC_part123323.csv)
<file_name>_<valid_date>.<extension> (for example, Abc_20131101.csv or Abc_20131101123432.csv)

How Peak parses file names

Peak applies these parsing rules to extract the feed name from each file:

File name pattern	Example	Parsed name
`name_timestamp.csv`	`customer_20181112.csv`	`customer`
`name_part_timestamp.csv`	`customer_part123_20181211.csv`	`customer_part123`
`name_timestampanytext.csv`	`customer_20181112anytext.csv`	`customer_20181112anytext`
No timestamp	`customer_profile.csv`	`customer_profile`
No underscore before timestamp	`customer20120312.csv`	`customer`
Special symbols	`customer-company:20130817.csv`	`customercompany`

Timestamp formats

Peak recognizes these valid timestamp formats in file names:

YYYYMMDD
YYYYMMDDHH
YYYYMMDDHHmm
YYYYMMDDHHmmss
YYYYMMDDHHmmssS
YYYYMMDDHHmmssSS
YYYYMMDDHHmmssSSS
DDMMYYYY
DDMMYYYYHH
DDMMYYYYHHmm
DDMMYYYYHHmmss
DDMMYYYYHHmmssS
DDMMYYYYHHmmssSS
DDMMYYYYHHmmssSSS

Required schemas

If you upload data to a managed Snowflake data warehouse, Peak expects schemas for organizing your data. Schemas are sub-areas of a data warehouse used to group tables by their purpose and processing stage.

Tables do not strictly need to be placed in a particular schema, but using the recommended schemas keeps your data well organized.

Recommended schemas

The following schemas organize your data by processing stage:

Schema	Purpose
`STAGE`	Raw data. Data ingested via Manage > Data Sources lands here by default.
`TRANSFORM`	Aggregated data ready for modeling.
`PUBLISH`	Processed and cleaned data for use in dashboards and web apps.
`SANDPIT`	Experimental or ad hoc data not ready for modeling or use in apps.

Was this page helpful?

PREVIOUSUsing Snowflake with Peak

NEXTPeak Platform Ingestion API

Supply Chain & Retail Solutions user guide

Data feeds​

Managing feeds​

Filtering feeds​

Monitoring feed activity​

Logs tab​

Info tab​

Load types​

Load type summary​

Key configuration​

Truncate and insert​

Incremental​

Upsert​

Destination options​

Checking your data warehouse type​

Available destinations by warehouse​

S3 (Spark processing)​

Snowflake​

Redshift​

Failed row threshold​

Schema evolution​

Schema data types​

Triggers and watchers​

Trigger types​

Schedule trigger​

Cron format​

Webhook trigger​

Run once trigger​

Watchers​

User watcher​

Webhook watcher​

File naming and timestamps​

File naming patterns​

How Peak parses file names​

Timestamp formats​

Required schemas​

Recommended schemas​