Search iEntry News

Putting Perl Regular Expressions To The Test

By Michael Marr
Expert Author
Article Date: 2010-09-10

Over my various years of experience in programming, it has been reiterated to me that Perl boasts amazing speeds in string manipulation. However, I realized that I had never really come across a study supporting this idea.

Over my various years of experience in programming, it has been reiterated to me that Perl boasts amazing speeds in string manipulation. However, I realized that I had never really come across a study supporting this idea. Despite never fully reviewing these claims, I recently made a decision to rewrite a PHP script primarily performing regular expressions and string manipulation into Perl. After realizing that I had no real evidence to support that Perl was in fact faster than other languages, in this case PHP, I decided to test it myself.

First off, I tried to write identical code to open a file and perform a regular expression match for a keyword. Here's my resulting code:

perl

open FILE, "<", "haystack_1gig.txt" or die $!;



my $i;

my $search = 'thou';

while (<FILE>) {

while ($_ =~ s/$search/gi) {

$i++;

}

}



print "found $search $i timesn";
php

<?php

$file = file("haystack_1gig.txt");

$search = 'thou';

$i = 0;

foreach ($file as $line)

{

$i += preg_match_all("/$search/i", $replace, $matches);

}

echo "found $search $i timesn";

?>

** A reflection on this code now that I have completed my study is that I believe the file() function in PHP loads the entire contents of the file into memory via the array $file, and thus is more resource intensive than Perl's method, which simple opens a handler to read the file line by line.

After testing the search on a 1 gigabyte, 1 megabyte, 100 kilbyte, and 10 kilobyte files, the results are as follows:

Filesize

Execution Time in Seconds

Perl

PHP

10 kb

0.003

0.014

100 kb

0.004

0.018

1 mb

0.017

0.810

1 gb

14.581

824.065

These test times included complete execution time, and thus may not be completely indicative of the pure difference between regular expression computations. However, I wanted to view the difference in the complete process of loading a string and working with it, and thus chose to monitor the complete execution time outside of the script itself.

*Both PHP and Perl optimize data flows (they get faster loading and working with the same data each time), so I ran these scripts multiple times for each filesize, removing the highest and lowest results and averaging the rest.

The next test involved string search and replacement, and thus I made a few simple changes to my code:

perl

while ($_ =~ s/$search/$replace/gi) {
php

preg_replace("/$search/i", $replace, $line, -1, $x);

$i += $x;

The results for this test:

Filesize

Execution Time in Seconds

Perl

PHP

10 kb

0.003

0.133

100 kb

0.006

0.033

1 mb

0.042

0.076

1 gb

14.970

745.906

So in both cases, Perl came out a dominate winner. Although in the more practical examples (filesize of 1 megabyte and smaller) we're talking about microseconds of overall runtime difference, it appears that Perl has the edge over PHP on regular expression operations.



About the Author:
Michael Marr is a IT staff Writer for WebProNews.




Newsletter Archive | Article Archive | Submit Article | Advertising Information | About Us | Contact

PerlProNews is an iEntry, Inc. ® publication - All Rights Reserved Privacy Policy and Legal