Wednesday, March 28, 2012

Replication system disk performance issue after 1 month

Summary: Started replication April 1 of 4M xact / day publishing system to subscribing system.

Performance was good. Latency was ~ 5-7 seconds.

May 10 we noticed that the DB was behind (latency was 12 hours).

All performance counters seem good with the exception of the disk.

. Performance spikes are 8 minutes apart and last from 30 - 60 seconds.

. During this period, Disk % Busy (1 - Disk % Idle) is 100%

The publisher DB publishes about 50-52 xacts/sec.

Rate of distribution (distribution DB to Subscriber DB) is ~ 47 xacts / second, so latency is increasing (currently at 33 hours). Previously my Subscriber system's "capacity" was 150 xacts / sec.

I know this because several weeks ago, the network went down, we were 24 hours behind.

When the network came back up the replication subscriber system was able to catchup at around 150 xacts / sec, or 3X the production system rate.

What has changed between then and now? Not much. We did install Tivoli Service Manager (IBM's backup system) a couple of weeks ago. It seems to run fine on a nightly basis, but I don't see any periodic heavy Disk I/O from that. Just to be sure, I've had them shut the TSM services down just to be sure.

We've also eliminated all extraneous processes other than those I need for performance monitoring (there was a RTVscan, virus scan process).

I've eliminated Autogrowth's as an issue as I've bumped the growth so that they are very infrequent (several days at this point. When we resolve the problem, I'll dial this down to something more reasonable.

My disk configuration is not ideal I realize (single Raid-5 disk with 3 partitions), however, this has not changed in the 6 weeks.

Thanks for any help on this!

Jack Griffith

Configuration:

Subscribing System:

SQL Server: 2000, SP4 - 8.0.2039

CPU - 2.8GHZ Xeon, Quad Dual-core

Memory - 3.5GB RAM

Disk: 3 partitions on a single RAID-5 disk with 1118 GB of space:

C: 39GB System and Programs

D: 97GB Log space

E: 982 GB Data space

Replication configuration:

- nosynch, continuous Transactional Replication

- Distribution db is on Subscription system

- distribution - Publication of approx. 50 transactions / second

Subscriber DB configuration:

DB size: 64458 MB

Logging: Simple (at this point)

distribution

DB size: 3111 MB

Logging: Simple (at this point)

Is the whole topology (pub,dist and sub) SQL 2000 SP2? Do you know if logreader agent or distribution agent is the bottleneck? Did you add any new indexes or triggers at the subscriber db?

No comments:

Post a Comment