By Michael Marr
Expert Author
Article Date: 2010-09-10
Over my various years of experience in programming, it has been reiterated to me that Perl boasts amazing speeds in string manipulation. However, I realized that I had never really come across a study supporting this idea.
Over my various years of experience in programming, it has been reiterated to me that Perl boasts amazing speeds in string manipulation. However, I realized that I had never really come across a study supporting this idea. Despite never fully reviewing these claims, I recently made a decision to rewrite a PHP script primarily performing regular expressions and string manipulation into Perl. After realizing that I had no real evidence to support that Perl was in fact faster than other languages, in this case PHP, I decided to test it myself.First off, I tried to write identical code to open a file and perform a regular expression match for a keyword. Here's my resulting code:
perl
open FILE, "<", "haystack_1gig.txt" or die $!;
my $i;
my $search = 'thou';
while (<FILE>) {
while ($_ =~ s/$search/gi) {
$i++;
}
}
print "found $search $i timesn";
php
<?php
$file = file("haystack_1gig.txt");
$search = 'thou';
$i = 0;
foreach ($file as $line)
{
$i += preg_match_all("/$search/i", $replace, $matches);
}
echo "found $search $i timesn";
?>
** A reflection on this code now that I have completed my study is that I believe the file() function in PHP loads the entire contents of the file into memory via the array $file, and thus is more resource intensive than Perl's method, which simple opens a handler to read the file line by line.
After testing the search on a 1 gigabyte, 1 megabyte, 100 kilbyte, and 10 kilobyte files, the results are as follows:
Filesize | Execution Time in Seconds |
Perl | PHP |
10 kb | 0.003 | 0.014 |
100 kb | 0.004 | 0.018 |
1 mb | 0.017 | 0.810 |
1 gb | 14.581 | 824.065 |
These test times included complete execution time, and thus may not be completely indicative of the pure difference between regular expression computations. However, I wanted to view the difference in the complete process of loading a string and working with it, and thus chose to monitor the complete execution time outside of the script itself.
*Both PHP and Perl optimize data flows (they get faster loading and working with the same data each time), so I ran these scripts multiple times for each filesize, removing the highest and lowest results and averaging the rest.
The next test involved string search and replacement, and thus I made a few simple changes to my code:
perl
while ($_ =~ s/$search/$replace/gi) {
php
preg_replace("/$search/i", $replace, $line, -1, $x);
$i += $x;
The results for this test:
Filesize | Execution Time in Seconds |
Perl | PHP |
10 kb | 0.003 | 0.133 |
100 kb | 0.006 | 0.033 |
1 mb | 0.042 | 0.076 |
1 gb | 14.970 | 745.906 |
So in both cases, Perl came out a dominate winner. Although in the more practical examples (filesize of 1 megabyte and smaller) we're talking about microseconds of overall runtime difference, it appears that Perl has the edge over PHP on regular expression operations.