Search iEntry News



LWP (Library For WWW In Perl)

By A.P. Lawrence
Expert Author
Article Date: 2005-05-27

If you want to automatically process web pages to extract data, you have a number of tools available. You can bring a web page down to your computer using "curl" or "wget"

curl http:.//aplawrence.com > mysite

If you don't really want the html, use "lynx --dump http://whatever.com > /yourstorage/whatever.txt" to get a text representation of the page. Check the man page for options you might want like "--nolist" and also see lynx alternatives

You can also easily be selective and pull only the data you want from a page with simple Perl scripts.

#!/usr/bin/perl
use LWP::Simple;
$url = 'http://aplawrence.com";
$content = get $url;
print $content;


And then of course you'd process the $content as desired. It's only a little more complex if you are dealing with forms; see http://aplawrence.com/Words/2005_03_05.html for a small example of that.

A book that covers LWP is reviewed at http://aplawrence.com/Books/webc.html.

*Originally published at APLawrence.com

About the Author:
A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com




Newsletter Archive | Article Archive | Submit Article | Advertising Information | About Us | Contact

PerlProNews is an iEntry, Inc. ® publication - 1998-2008 All Rights Reserved Privacy Policy and Legal