Hi all,
I have a huge replication task I need to perform. The source table has over 250,000,000 records and everyday approximately 400,000 records are added to the system regularly.
Currently, I am running snapshot replication and it's taking 10 to 11 hours to complete (The internet connection between the production and the report server is slow). The reason I am using this method is because the source table does not have a timestamp column (so I can run an incremental replication) and I cannot modify that table because it belongs to a third party software. It does have a field which is linked to another table holding the timestamp.
Here is the source table and the other table's structure for reference:
DataLog
Name Datatype Size
DataLogStampID int 4
Value float 8
QuantityID smallint 2
DataLogStamp
ID (Which is foreign key to DataLogStampID field in the above table) int 4
SourceID smallint 2
TimestampSoruceLT datatime 8
What I would like to know is, what is the best practice to handle such replication with MSSQL Server 2000.
I am thinking of developing a procedure which follows the following step using a temp table:
This procedure will run (say) every night.
The temp table will contain a timestamp field.
For one time only, I will run a snapshot replication, so that I can have the existing data into the destination db.
I will record this timestamp into a repl_status table which holds the timestamp of the last replication.
The procedure will read the record from repl_status and select all records from the source table which were added since this date and put them into a new temp table (this table will have the same structure as the source and destination tables) and then a snapshot replication will "add" these new records only every night to the dest. table. By using this method, I will only transfer the records which have been added since the last replication (last night - less records).
Any comments would be greatly appreciated.
Thanks for your time in advance,
Sinan Topuz
Hi,
Last year I made a query that replicated a table between two databases (say db1 and db2) on two remote servers (say from SRV1 to SRV2).
I added a column called "_replicated" to the table of SRV1.db1 which took binary values (0 or 1). 0 meant not yet replicated and 1 meant already replicated.
When I run the query, it inserted into SRV2.db2.table the rows of SRV1.db1.table that had "_replicated = 0".
Then I updated the column's values to 1 for SRV1.db1.table. In case of an error it would roll back the transaction.
Before execution, I also dropped the indexes on SRV2.DB2 to speed up the insertion and then recreated them.
Be sure you have a recent backup of SRV2.db2 before 'playing' in this manner (or maybe create it exactly before every replication just in case)
GOOD LUCK!
|||Hi JohDas, Thank you very much for your answer. I would be able to use that method, if I were able to modify the source table, but in my case I have to find out if my algoritm is the best approach in a situation of this sort.
I did not mean that your post is not helpful, but just hoping to get some more suggestions from other experts as well.
Thanks for your time again.
Sinan Topuz
|||sinan.topuz wrote:
Currently, I am running snapshot replication and it's taking 10 to 11 hours to complete (The internet connection between the production and the report server is slow). The reason I am using this method is because the source table does not have a timestamp column (so I can run an incremental replication) and I cannot modify that table because it belongs to a third party software.
Transactional replication, which replicates incremental changes, does not require a timestamp column. It does, however, require a PK column. If your tables do have a PK column, then you can easily use transactional replication by just scheduling the replication agents to run nightly.
If by chance your use of "timestamp column" is a typo, and you meant to say PK column, then you do have other options, one of which is just doing a nightly database backup, copy it to your destination, and restore it. This may or may not be faster than snapshot replication.
A third option is to use Log Shipping, you can read more about it in Books Online topic "Understanding Log Shipping" - http://msdn2.microsoft.com/en-us/library/ms187103.aspx.
|||Greg,
The table does not have a primay key. In fact, it should not because I think it is being written constantly and with a lot of rows. Right now it has almost 3 million records. (This the way the apps designers may have thought while designing the system).
I appreciate your time and your comment.
Thanks,
Sinan Topuz
|||I reread your original post and now understand what you're trying to do, it's your own "change tracking" mechanism. We are developing a "change tracking" solution in the next release of SQL Server which would fit your needs as well. Otherwise your home grown solution should work, just make sure to test it out.|||Thanks,
Just a correction, not because it is important, but I just wanted to give the correct info. The table has 300 million records now not 3 million.
|||ok, now your 10-11 hours figure makes more sense :) However I would also take a look at logshipping if you can.
sql
No comments:
Post a Comment