Design a stoppable long running process

Published on: 10th Jul 2012

Updated on: 17th May 2021

Overview

Some processes could be running at least 30 minutes. The question is how to design the long running process to be stoppable and continue from where it left upon restart? This is an interesting question. Unfortunately, most of the long running processes that I have heard of are not stoppable and require manual patching the processed output before re-run the process.

For example, the process requires you to issue a monthly bill on the web hosting. This web hosting company has more than 10,000 customers and each of the customers are subscribing to different services.

Reason to stop the long running process

The hard disk is running out of space and the process has stopped.
There are some new subscriptions to be queued into the bill.
Server has to be restarted due to OS (Operating System) updates or running out of memory.
The long running process is affecting the online users and the process should continue after the number of online users has reduced.

A table that keeping track the progress

You need two tables to keep track of this process.

The following table keeps track of how many batches have been processed and whether the batch has been completed without any interruption or not.

create table tb_bill_process_batch (
    bill_process_batch_id uniqueidentifier not null primary key
    , batch_no nvarchar(50) not null    -- the batch number.
    , remarks nvarchar(max) null        -- the administrator remarks.
    , bill_period_start datetime        -- the billing date range.
    , bill_period_end datetime          -- the billing date range.
    , process_restart_counter int       -- increment this value everytime the process restarts.
    , start_on datetime                 -- the first process start date/time.
    , complete_on datetime              -- the process completion date/time.
    , created_on datetime               -- the record creation date/time
)

The following table keeps track of which billing status for the customer.

create table tb_bill_process_log (
    bill_process_log uniqueidentifier not null primary key
    , sess_id uniqueidentifier not null         -- this is the session ID of the process.
    , batch_no nvarchar(50) not null            -- this is the batch number.
    , customer_id uniqueidentifier not null     -- this is the customer ID.
    , process_start_on datetime                 -- the start time.
    , process_end_on datetime                   -- the process completion time.
    , remarks nvarchar(255)                     -- the remarks set by the long running process.
    , created_on datetime                       -- the record creation date/time.
)

The flow of the long running process

Upon the long running process starts, you will create a batch record in tb_bill_process_batch and assign a batch number (stored in tb_bill_process_batch.batch_no). To continue running the previous incomplete process, increment the tb_bill_process_batch.process_restart_counter value by one.
Copy all customers who have subscribed to the services for the last month into tb_bill_process_log and these customers should receive a bill.
The billing process has started on calculating the charges. You need a cursor where retrieving the customer ID based on the tb_bill_process_log.batch_no + tb_bill_process_log.process_start_on is null.
After retrieving the first customer ID value from the cursor, set the tb_bill_process_log.process_start_on to current date/time and create the billing details. Upon completion, set the tb_bill_process_log.process_end_on to current date/time.

In this step, you need a database transaction to ensure that the billing details and tb_bill_process_log.process_end_on are committed in the same scope.

Two choices to handle transaction,
- You may use one transaction for one bill. But, this will be slow as compared to the next approach.
- Issue 1,000 bills per transaction. Use this option if the process per bill is able to complete in a very short time. You have to test it out with the live data before deciding how many bills per transaction.
  
  If you have many customers to be processed, you may have to split the customers by group or location. Then, processes one group/location after another and each group/location should have a transaction.
Continue with the next customer ID until all records have been processed.
Finally, update tb_bill_process_batch.complete_on field to indicate that the long running process has completed.

Advanced technique for speeding up the long running process

If you want to further reduce the process time, this is what you can do:

Develop a program in whichever programming language that supports multi-thread. Executes the charges calculation stored procedure in multiple threads. To do this, you need to split the long running process stored procedures into a few sub-processes:

The first stored procedure is responsible for generating a new batch number and collecting all the customer ID that requires to be billed (step 1 and 2). This stored procedure will be run by 1 thread only.
The second stored procedure is responsible for calculating the charges for each customer (step 3 to 5). This stored procedure will be run in multiple threads. In this stored procedure, you will not use any cursor. Instead, the cursor must be replaced with a query that always returns the TOP 1 customer ID by matching the batch_no + process_start_on is null. The transaction will be meant for one bill only.
The third stored procedure is responsible to run the process mentioned in step 6.

If you are running 2 threads of the charges calculation stored procedure, the entire process time will be reduced by <50%. Please take note that it won't be exactly 50% of reduction because of the process required to lock some resources and it has a higher chance to cause deadlock. Besides that, the storage I/O performance might become a bottleneck. But, running two threads will definitely be faster than using one thread.

Jump to #MSSQL blog

Author

Lau Hon Wan, software developer.