Understanding SAP Commerce Cloud Alerting And Thre...

To keep your environments stable, SAP Commerce Cloud uses specific alert thresholds. These alerts help identify performance issues, infrastructure failures, and shifts in user experience.

Here is a breakdown of how these monitoring rules work based on official documentation.

⚙️Performance Alerts and Baselines

Performance thresholds are dynamic. They are calculated using automated baselining based on a rolling window of the last seven days of historical data.

Note: Every service can have different thresholds, and they change over time. Customers cannot modify these thresholds.

Service Alerts (Frontend, Backend, Background, and External)

The system monitors response time and failure rates for all infrastructure tiers including:

Frontend, e.g www.xxx.com:443
Backend, e.g xxxstorefront, cmswebservices
Background, e.g Cron job, TaskEngine
External services, e.g ‘Requests to public networks’, ‘Requests to unmonitored hosts’

They are defined as follows:

Property	Setting
Detection mode	Automatic
All requests	o Absolute threshold: 500 ms o Relative threshold: 100%
Slowest 10%	o Absolute threshold: 1000 ms o Relative threshold: 200%
Avoid over-alerting	o Only alert if there are at least: 10 requests per minute o Only alert if the abnormal state remains for at least: 1 minute

Property	Setting
Detection mode	Automatic
Absolute threshold	10%
Relative threshold	100%
Avoid over-alerting	o Only alert if there are at least: 10 requests per minute o Only alert if the abnormal state remains for at least: 1 minute

Dynamic threshold rules – Database

The scope for these database alerts (e.g xxx-xxx-p1-db) are as follows:

Response time degradation
Failure rate increase
Failed database connects

The settings for database alerts are defined as follows:

Property	Setting
Detection mode	Automatic
All requests	o Absolute threshold: 5 ms o Relative threshold: 50%
Slowest 10%	o Absolute threshold: 20 ms o Relative threshold: 100%
Avoid over-alerting	o Only alert if there are at least: 10 requests per minute o Only alert if the abnormal state remains for at least: 1 minute

Property	Setting
Detection mode	Automatic
Absolute threshold	5%
Relative threshold	50%
Avoid over-alerting	o Only alert if there are at least: 10 requests per minute o Only alert if the abnormal state remains for at least: 1 minute

External Services Monitoring

By default, third-party requests are grouped by domain/host/IP under “Requests to public networks” or “Requests to unmonitored hosts.”.

To monitor critical third-party services better, SAP has preconfigured all mainstream Payment service providers as standalone services. If you use an unlisted payment service or you need to monitor other business critical third-party services, you can raise a support ticket to mark them as standalone.

⚙️Infrastructure Alerts

The scope for these database alerts (e.g xxx-xxx-p1-db) are as follows:

Host
Process Group/Instance
Database

The settings for availability alerts are enabled as follows:

Property	Setting
CPU saturation	Alert if CPU usage is higher than 95% in 3 of 5 one-minute intervals.
Memory event usage	o Alert if memory usage is higher than 90% on Windows or 80% on Linux o Alert if the memory page fault rate is higher than 100 faults/s on Windows or 20 faults/s on Linux in 3 of 5 one-minute intervals
GC activity	o Alert if GC time is higher than 40% o Alert if GC suspension if higher than 25% in 3 of 5 one-minute intervals
Java out of memory	Alert if the number of Java out-of-memory exceptions is 1 per minute or higher

Property	Setting
Number of dropped packets	o Alert if the receive/transmit dropped packet percentage is higher than 10%. o AND the total packets rate is higher than 10 packets/s in 3 of 5 one-minute intervals
Network utilization	Alert if the sent/received traffic utilization is higher than 90% in 3 of 5 one-minute intervals
TCP connectivity for process	o Alert if the percentage of new connection failures is higher than 3%. o AND the number of failed connections is higher than 10 connections/min in 3 of 5 one-minute intervals
Retransmission rate	o Alert if the retransmission rate is higher than 3%. o AND the number of retransmitted packets is higher than 10 packets/min in 3 of 5 one-minute intervals

Property	Setting
Low disk space	Alert if the free disk space is lower than 3% in 3 of 5 one-minute intervals
Slow running disks	Alert if the disk read time or write time is higher than 200 ms in 3 of 5 one-minute intervals
Inodes number available	Alert if the percent of available inodes is lower than 5% in 3 of 5 one-minute intervals

⚙️Database Infrastructure Alerts

The scope for these database alerts (e.g xxx-xxx-p1-db) are as follows:

Azure DTU/CPU Usage Critical
Azure ReadOnly DB DTU/CPU Usage Critical
Azure DB Storage Usage

⚙️Digital Experience & Synthetic Monitoring

Application Alerts

The system detects JavaScript errors and performance degradation for user actions. It also monitors Traffic Drops: an alert triggers if observed traffic is less than 50% of the expected value for at least 1 minute.

Synthetic Alerts (Availability)

Synthetic monitors simulate user behavior to check availability:

Global Outage: Alert if all locations cannot access the application 1 time.
Partial Outage: Alert if at least 1/3 of locations fail 2 times consecutively.
Performance: Alert if the total duration of events exceeds 10 seconds.

Unsupported Customization

Please note that the following are not supported:

Environment-specific custom alerts.
Custom alert thresholds.

Critical Anomaly Conditions

Specific conditions that trigger immediate alerts include:

Apache Workers: Pool usage over 99%.
Solr Index: 10+ exceptions (e.g., “Sync failed” or “Connection lost”) over 5 minutes.
Tomcat: Idle threads fall under 2% for four consecutive minutes.
Pods: 10 restarts within 15 minutes.

To understand this methodology in depth, see Dynatrace documentation

I hope this overview helps you better understand how monitoring works in your environment. By using automated baselines and predefined thresholds, the system does the heavy lifting for you—catching performance lags and infrastructure glitches before they impact your business.

Use these insights to keep your store fast, stable, and always ready for your customers!

Source link