How to extract email addresses and twitter user names from a text file

Sometimes you might have to extract email addresses or somthing similar from a big piece of text, a couple of week ago I had to do this job for my wife, I had to extract every web site addresses from a book for her phd exam.

So I tought it can be useful having a script which is able to extract also email and twitter usernames.

One of the best tools for this kind of computations is awk,


The AWK utility is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports. The language used by awk extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions.


AWK is one of the early tools to appear in Version 7 Unix and gained popularity as a way to add computational features to a Unix pipeline. A version of the AWK language is a standard feature of nearly every modern Unix-like operating system available today. AWK is mentioned in the Single UNIX Specification as one of the mandatory utilities of a Unix operating system. Besides the Bourne shell, AWK is the only other scripting language available in a standard Unix environment. It is also present amongst the commands required by the Linux Standard Base specification.


Here you are the scripts:


The first script extracts every the email addresses from the file doc.txt (doc .txt is a text file) and put them into the email.txt file, one for row.

awk '

 

{

  for (i=1;i<=NF;i++) {

       if ( $i ~ /[[:alpha:]]@[[:alpha:]]/ )  { 

      print $i      

       }

  }

}' "doc.txt" > email.txt



The second one extracts every the twitter usernames from the file doc.txt (doc .txt is a text file) and put them into the usernames.txt file, one for row.

 

awk '

{

  for (i=1;i<=NF;i++) {

       if ( $i ~ /@[[:alpha:]]/ )  {

      print $i

       }

  }

}' "doc.txt" >usernames.txt



Gg1






You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *