The GSL often spreads libraries over several lanes or runs for sequencing, as this often helps us ensure data quality. In these cases, fastqs will usually be provided per read-group. However, having one fastq (pair of fastqs for PE runs) per library is often easier to work with. The python script provided here will merge fastqs with the GSL naming convention by their library IDs. For paired end runs, the script will produce two fastqs per library, one for forward and one for the reverse read.
The easiest way to run the script is to put it in the same directory as your fastq files and run
Run the script with the
-h flag for more options.