Saturday, November 6, 2010

PHP perl Extract Email Addresses

SkyHi @ Saturday, November 06, 2010
f you need to extract email addresses from a plain text database or other plain text file, the software in this article can do it for you.

A plain text file is a file that contains only regular keyboard characters. (Even if the file contains a bunch of non-keyboard characters, the email address extractor software can still extract email addresses from the plain text parts of the file.)

Perhaps you need a list of all those who have purchased Ebook Title One to let them know it has been updated and where they can pick up the latest version. Perhaps something is coming up in your life and you need to send an email to all your regular correspondents.

Whatever the reason, if the email addresses are in a plain text file, they can be extracted. Even if the addresses are scattered among other text in the file.

If the data is in a CSV file, simply process the entire file with the email address extractor. The same with files containing tab- or other character delimited fields.

Addresses can be extracted from any plain text file that has email addresses within its content.

For a MySQL database with email addresses in columns containing other text, export the data to a CSV file. Then, process the exported data file with the email address extractor.

If you use Thunderbird or Apple's Mail or other email software that maintains emails in plain text databases, addresses can be extracted from the emails. Drag a copy of the emails to to be scanned into a separate folder. The the plain text file on your hard drive that contains the emails for that folder can be processed with the email address extractor.

The extracted email addresses can be used in PDQ Mailer (a WebSite's Secret exclusive), B-Mailer, or any other emailer that lets you import a list of email addresses.

Here is the source code of the software to extract email addresses. No customization necessary. Copy it, save it in a plain text processor as EmailExtract.php, and upload it to your server.
<html>
<head>
<style type="text/css">
body { font-family:sans-serif; margin:50px 0 50px 200px; }
input, textarea, #content { width:500px; }
.check { width:20px; }
a { text-decoration:none; }
</style>
<head>
<body>
<div id="content">
<div style="float:right;">
<a href="http://www.willmaster.com/">
<img src="http://www.willmaster.com/images/wmlogo_icon.gif" 
border="0" width="50" height="50" alt="Willmaster.com logo" 
title="Making quality web site software for over a decade.">
</a>
</div>
<h3>Email Address Extraction Software</h3>
<?php
$Emails = array();
$Continue = true;
$Domain = preg_replace('/:\d+$/','',$_SERVER['HTTP_HOST']);
function ExtractAddys($s)
{
   global $Emails;
   preg_match_all('/([a-zA-Z0-9-_\.]+@[a-zA-Z0-9-_\.]+)/',$s,$matched);
   foreach( $matched[1] as $addy ) { @$Emails{strtolower($addy)}++; }
}
if( count($_POST) )
{
   $_POST['file'] = trim($_POST['file']);
   $typeOfFile = 'pasted into the text box';
   if( preg_match('/^http:\/\//i',$_POST['file']) and (!preg_match('/[\r\n]/',$_POST['file'])) )
   {
      $location = preg_replace('/^http:\/\//i','',$_POST['file']);
      $location = preg_replace('/^[^\/]+/','',$location);
      $location = urldecode($location);
      $fp = fopen($_SERVER['DOCUMENT_ROOT'].$location,'rt');
      if( $fp )
      {
         while( ! feof($fp) ) { ExtractAddys( fgets($fp,4096) ); }
         fclose($fp);
      }
      else
      {
         $Continue = false;
         echo '<h4>Unable to open file '.$location.'<br>(URL '.$_POST['file'].')</h4>';
      }
      $typeOfFile = 'in the file at URL '.$_POST['file'];
   }
   else { ExtractAddys($_POST['file']); }
   if( $Continue )
   {
      if( count($Emails) )
      {
         echo '<p>The email addresses extracted from text '.$typeOfFile.':</p>';
         if( (!empty($_POST['sort'])) and $_POST['sort'] == 'yes' ) { ksort($Emails); }
         $separator = "\n";
         $boxheight = 250;
         if( (!empty($_POST['comma'])) and $_POST['comma'] == 'yes' )
         {
            $separator = ',';
            $boxheight = 45;
         }
         $onlyone = ( (!empty($_POST['purge'])) and $_POST['purge'] == 'yes' ) ? true : false;
         echo '<form><textarea wrap="off" style="height:'.$boxheight.'px;">';
         foreach( $Emails as $addy => $v )
         {
            echo "$addy$separator";
            if( $onlyone ) { continue; }
            for( $i=1; $i<$v; $i++ ) { echo "$addy$separator"; }
         }
         echo '</textarea></form>';
      }
      else {  echo '<p>No email addresses were extracted from text '.$typeOfFile.':</p>'; }
   }
   echo '<div style="margin:25px;"><hr width="50%"></div>';
}
?>

<p>
To extract email addresses from a plain text file, either
</p>
<ul>
<li>
paste the content of the file into the text box, or
</li>
<li>
upload the file to the <?php echo($Domain); ?> server and then type the file's URL into the text box.
</li>
</ul>
<form method="post" action="<?php echo($_SERVER['PHP_SELF']) ?>">
<textarea style="height:250px;" name="file"></textarea>
<p>
<input type="checkbox" class="check" name="comma" value="yes">Comma-separated on one line.<br>
<input type="checkbox" class="check" name="purge" value="yes">Purge duplicates.
<br>
<input type="checkbox" class="check" name="sort" value="yes">Sort.
</p>
<p>
<input type="submit" name="submit" value="Submit">
</p>
</form>
<p>
Copyright 2010 <a href="http://www.willmaster.com/">Bontrager Connection, LLC</a>
</p>
</div>
</body>
</html>


When EmailExtract.php is on your server, type its URL into your browser.

You'll see a text box where you can paste in the content of the plain text file that contains email addresses to extract.

If the plain text file is larger than 32k (which is the maximum form submission size for some browsers), the file can be uploaded to your server. Upload it to the same domain that EmailExtract.php is installed at. Then, instead of pasting the file into the text box, type or paste the uploaded file's http://... URL into it.

Unless modified, the software can not be used to extract email addresses from any URL except URLs of files located on the server where the software is installed. In other words, if you install the software on your domain and someone else guesses the URL, it can not be used to extract email addresses from web pages on other domains.

Installing the software is copy and paste. Using the software is copy and paste.

The extracted email addresses can be used with emailing software that lets you import your own list.


1. perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}' emails.txt | sort -u > output.txt


Outlook Attachment Bounces
#
grep -C 2 "fatal errors" bounces.txt > step1.txt
#
perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}' step1.txt | sort -u > step2.txt
#
sort step2.txt |uniq > step3.txt



REFERENCES
http://www.willmaster.com/library/email/extract-email-addresses.php
http://bytes.com/topic/unix/answers/648158-extract-email-addresses-big-file
http://www.getfreefile.com/mailhuntscreenshots.html