Saturday, April 11, 2009

Grid computing & Batch Jobs



“Mike what is the status of renewal premium bills to be generated for this month; we are already late by 2 days to send the letters to customers”

Mike: Boss, the process is running very slowly, 1 million records are getting processed and PDF files are to be generated for all customers. We will be sending the files for printing and also email them to customers. My estimate, it would take another 26 hours for the whole process to get completed.

Jack: But Mike, we invested in a Quad core server with 16 GB RAM last week for this job. I thought there will substantial improvement on this front. Gosh… we spent $16,000 on the machine.

Mike: The process is running in a single thread, all the resources of the machine is not getting utilized. The Memory and CPU are idle by 70%. I guess we can’t do much about it…

Many of us face such similar problems, where we have voluminous deliverables to give to business users but are also constrained by the way systems are built. “Single Thread processing problem” as Mike was facing is common to most of the batch job runs in a typical Insurance company. This may also be applicable to organizations with similar nature of jobs. E.g. a credit card company generates monthly statements and sends them to Emails of customers, summary of statement is sent as an SMS and consolidated data is sent for physical printing. A bank may be doing the same for monthly bank statement etc. In-house programmers in these companies made efforts in improving the performance by tweaking the queries which were extracting the data from the Databases or legacy systems, and the process of creating final output was parallelized by using “Multithreading”.

Mike: Boss, I can modify the program for fetching the data and do the process of making PDF by multithreading.

Jack: That seems to be a good idea, but how many threads you think, can be run in parallel in our new server.

Mike: It’s a Quad core and there is enough memory; I guess we can runs 4 threads simultaneously and reduce the time of processing by 1/3rd ...hmm…approximately.

Jack: seems to be a good idea… let’s try to implement it.

Mike with lot enthusiasm created the design of new idea.



Jim (the programmer) helped Mike to transform the idea into an actual running program. He did many trials and pilot runs before actually using the program to generate the next month’s premium bills. Results were outstanding; processing time was reduced to almost 1/3rd as actually guessed by Mike.
Bitten by performance improvement bug, Mike started thinking of further improvements in the process.

Mike: Boss, I’d like to scale up the server and would like to add more CPUs. We can create more number of threads for processing the files.

Jack: Mike I’m afraid that we will not able to invest any more in hardware. Budgets are not approved. It’s a difficult situation; you may have to think of some alternative.

“Necessity, who is the mother of invention” – Plato

When no silver bullets are left and necessity arises then as rightly said “constraints create Innovation”.

Jim: How about Grid Computing?

Mike: Wow that seems to be a good idea. But we do not have additional servers to run the grid.

Jim: We have 120 dual core Desktops in our office, we can use them for running our program. Desktops are usually free during the night time.

Mike: Any grid software that you are aware of?

Jim: Heard of many, but I did small Proof of concept with Gridbus during my university project. We may try writing our program on the alchemi framework. It’s an open source framework.

Mike: Jim, can you help me draw the architecture of our new idea? I am kind of excited to get this thing running…



Jim: We have 120 dual core desktops with 2 GB RAM, I think we have an ocean of memory and CPU. The Grid server component we’ll place in our server and all the processing (executors) in identified workstations. I suppose it should work like this
  • Server will fetch the Data from the database and keep the data in memory.
  • The process of formatting the data and generating of PDF files can be given to each individual executor.
  • Collate generated PDFs on the server.
  • Since we have to also Email the files to clients, we can dedicate few threads for sending Emails to individual customers.
Jim programmed the complete logic using grid framework and the results were marvelous. Perseverance of Mike and efforts of Jim resulted in a masterpiece.

Mike: Jim lets cross our fingers and run the program on the final data for this month.
Processing…..

Jack: Guys, I am proud of both of you. Your ideas and efforts have solved our everlasting pain.

Many of us face similar problems in day to day operations. Sometimes we are constrained by tighter budgets, ideas and resources but few of us think forward and out of the box to make life easier. I wanted to present this article as a theory on Grid computing but a short story seemed to me as a good idea (!)