Wednesday, November 9, 2011

Remove duplicate records without changing record order using SORT JCL

Whenever SORT utility is used for removing duplicates record order gets changed. Many times that is not the desired result. To remove duplicate records without changing record order use below trick. I have shown below example with file having record length 80. That needs to be changed according to file record length you want.


JCL Step :

//STEP1    EXEC PGM=SYNCTOOL
//TOOLMSG  DD SYSOUT=*                                    
//DFSMSG   DD SYSOUT=*                                    
//IN        DD DSN=N12345.S1234.EXAMPLE.DATA,DISP=SHR
//T1       DD DSN=&&T1,DISP=(,PASS),SPACE=(CYL,(200,150))  
//OUT     DD DSN=N12345.S1234.EXAMPLE.DATA.REMDUP, 
//            DISP=(,CATLG,DELETE),                       
//            SPACE=(CYL,(200,150)),                      
//            DCB=(RECFM=FB,LRECL=80,BLKSIZE=0)        
//TOOLIN DD *                                              
SELECT FROM(IN) TO(T1) ON(1,80,CH) FIRST USING(CTL1)    
SORT FROM(T1) TO(OUT) USING(CTL2)                         
/*                                                        
//CTL1CNTL DD *                                           
  INREC OVERLAY=(81:SEQNUM,16,ZD)                      
//CTL2CNTL DD *                                           
  SORT FIELDS=(81,16,ZD,A)                             
  OUTREC BUILD=(1,80)                                  
/*

The trick here is to tack 16 byte ( you can use 8 byte also if you have less records ) sequence number for all records at the end and then remove duplicate and then finally sort the records back according to the sequence number and then copy to target dataset excluding the sequence number.

No comments:

Post a Comment