Nov/092
Custom word lists with wyd.pl
For those of you who don't know what a word list is, a word list is...dun dun dun....a giant list of words.
They can be used for many things, but in our case, they are used for brute forcing possible passwords for various things from a users login to a website, or the WPA-PSK encryption a company may be running to protect their wireless network from intruders. You can find a bunch of websites that have word lists on them around the web by searching Google. For higher quality word lists though, you may have to fork out some cash.
This is all fine and dandy, but these word lists are usually based on the english language. What if a company has created a password that is tailored to their company? For example, using the last name of the founder of the company as their WPA-PSK password. If a company has done this, there goes the word list you found based on almost every word in the English dictionary. In comes wyd.pl!
wyd.pl was developed by Max Moser & Martin J. Muench and is included on BackTrack. The general idea behind the tool is to gather data about a specific target and generate a word list based on this data. Enough of the talk, let's see this tool do it's magic. In three simple commands we can have a word list based on all the data we can collect from a target website.
wget -r http://www.mstaint.com
wyd.pl -n -o wordlist.tmp mstaint.com/
cat wordlist.tmp | sort | uniq > wordlist.txt
I started off above by doing a recursive wget of this website, www.mstaint.com. Doing this will create a directory mstaint.com and it will spider though the website doing a wget on each page it can spider.
Next is where wyd does it's magic. Take a look at my output below:
enigma:wyd gerry$ ./wyd.pl -n -o wordlist.tmp mstaint.com/
*
* ./wyd.pl 0.2 by Max Moser and Martin J. Muench
*
* Error initializing some modules:
wlgmod::doc: Cannot find 'catdoc' (http://www.45.free.net/~vitus/software/catdoc/)
wlgmod::odt: Canont find module OpenOffice::OODoc (http://www.cpan.org/modules/index.html)
wlgmod::mp3: Cannot find 'mp3info' (http://www.ibiblio.org/mp3info/)
wlgmod::pdf: Cannot find 'pdftotext' (http://www.foolabs.com/xpdf/)
wlgmod::jpeg: Cannot find 'jhead' (http://www.sentex.net/~mwandel/jhead/)
wlgmod::ppt: Cannot find 'catppt' (http://www.45.free.net/~vitus/software/catdoc/)
Ignoring file 'mstaint.com/me.jpg'
Ignoring file 'mstaint.com/xmlrpc.php?rsd'
Wide character in print at ./wyd.pl line 153.
Wide character in print at ./wyd.pl line 153.
Ignoring file 'mstaint.com/wp-content/plugins/google-analyticator/external-tracking.min.js?ver=5.3.1'
Ignoring file 'mstaint.com/wp-content/themes/lightword/style.css'
Ignoring file 'mstaint.com/wp-content/themes/lightword/js/cufon.js'
Ignoring file 'mstaint.com/wp-content/themes/lightword/js/mp.font.js'
Ignoring file 'mstaint.com/wp-content/themes/lightword/js/tabs.js'
Ignoring file 'mstaint.com/wp-includes/wlwmanifest.xml'
Ignoring file 'mstaint.com/wp-includes/js/comment-reply.js?ver=20090102'
Ignoring file 'mstaint.com/wp-includes/js/jquery/jquery.js?ver=1.3.2'
** Done
Taking a look at wordlist.tmp, we can see that wyd created a pretty extensive word list of everything it could out of the data we downloaded earlier. One thing you may notice, is nothing is in order and there may be duplicates. Lets fix this with our final command.
enigma:wyd gerry$ cat wordlist.tmp | sort | uniq > wordlist.txt
Our last little command sorts by alphabetical order and then does a uniq on the file removing all of the duplicates.
Take a look and enjoy your new word list in wordlist.txt!
Enjoy!