I have a perl script that parses a xml file approximately 800 MB and inserts records in the database. The script also re sizes images in different sizes and upload to a cdn. At the moment the processing is taking upwards of 4 hours, and the project entails optimizing the script so that all the processing can be done in an hour.
I have considered looking into gearman for distributed processing, distributed processing using perl module or may be there is a smart way of filtering the xml file to remove unwanted data before processing the file.
I am looking for smart developers who have dealt with processing large files in perl and successfully able to optimize and reduce the processing time. Please describe a similar experience in your bid.
I will share the script and the data file upon acceptance of the bid. Meanwhile, I will be happy to answer any more questions or share more information as required for the developer to figure out ways to optimize the processing of the file.