BYU Speeches player with history

Update 10/26/2012: The history functionality needs some work. I’ve tweaked it to only log a play if you listen for at least 30 seconds (to avoid accidentally logging something that you didn’t actually listen to), and it won’t log additional plays after the first play until you’ve switched to a new talk (so you don’t end up with 15 “you started…” records in a row for the same talk). I made the talk title and author name link to the appropriate pages so you can easily find what you want to listen to. I’ve also added some social sharing stuff (per Joey’s request) in case you really like a talk and want to send it to a friend.

Original post:

The BYU Speeches website provides over a thousand devotionals and firesides in MP3 format. The speakers include people you’ve never heard of, and a bunch of big name people, too (like apostles and prophets). I used to download the PDFs of the talks and read them, but I thought I’d give listening a try.

I stumbled on a player written by a friend of mine, Joey Novak, but the pretty looking version of it wasn’t working on my computer and it was missing a few features I wanted. Rather than bug him to change it I decided I’d just make my own. I got to work on it this morning and have a pretty functional player ready for use. There are a lot of features I’d like to implement, but I’m not going to go too crazy devoting tons of time to this until I get a chance to use it and see what would actually be useful.

The first step in building this was scraping the data. I wrote a small script that looped through all the content at BYU Speeches and scraped the author(s), speech title, date it was given, and the URL to the MP3. I shoved all of this into a database. Next I searched for a free MP3 player script, and found the same one that Joey was using. Turns out everyone uses jPlayer. It didn’t take long to get a basic working player up with a random playlist selected from my database. A great perk of using jPlayer is that it looks great on a mobile phone. I spent a few more minutes and added in the ability to pull up a playlist for a specific speaker.

Where I spent most of my time was working on a history system. I’ve added the ability to log in using a Google account, and then I’ll (confidentially) track what you listen to. This needs some work and I’m not satisfied with how it looks, but the basic functionality is there. It will let you know when you started listening to an item, finished listening to an item, and (bonus!) won’t select stuff you’ve already listened to when you pull up a random playlist. This is what I was really wanting because it drives me nuts to get 15 minutes into a talk only to realize I’ve already heard it. With about 1,400 talks at around 30 minutes each, I doubt I’ll ever run out of stuff to listen to.

I think I’ll eventually modify it to only log that you started listening to something once you’ve been listening for 60 seconds or so. This gives you time to decide if you want to listen to it or not without falsely thinking you’ve actually listened to it. I’m also planning on adding a “resume” function, so it will remember what you were listening to last and how far in you got, so it can automatically pick up where you left off when you pull the page up.

I also need to write an RSS parser so I can update my database whenever they announce new content. I don’t expect this will take long, I just haven’t gotten around to doing it yet.

Anyways, click here to check out my BYU Speeches player.

Read More

Backing up to Amazon Glacier

I built my business on the premise that data is valuable. My most profitable websites are the result of hundreds of hours of gathering and organizing data, so backups are important to me. I have a cron that runs daily that gathers all my databases, configuration files, keys, backup scripts, and a few other random bits of information into a compressed archive. The compression ratio is pretty good: about 1.8GB of data is compressed to about 112MB.

For the past couple of years I’ve been using Dropbox to store the backups. I have the command line version of Dropbox installed so the backup script just has to copy the backup archive into a folder, then Dropbox uploads it and pushes it out to my home computers. This has been working great for the most part, but it has a few downsides. First, it is sort of expensive. I can’t fit all the backups I want plus my own personal files into a free Dropbox account, so I’ve been paying $99/year for Dropbox Pro ($8.25 per month). Second, it isn’t terribly secure. Dropbox employees have access to your data, which isn’t really unexpected but it still isn’t something I’m thrilled about. Last, it makes all my home computers download a large 100MB+ archive every morning which is particularly annoying on my laptop while traveling.

So I decided to switch to Amazon Glacier. They charge $0.01 per GB per month for long term storage. The pricing model is very wonky, with lots of random little fees here and there depending on what you do, but as best as I can tell I think I’ll be paying less than $0.40 per month to store a rolling 100 days of backups.

To do the actual heavy lifting I decided to use glacier-cli, which is a Python tool that provides a pretty easy interface to Glacier and is easy to use in a cron or bash script. It requires Python 2.7 but my server only had Python 2.6. I was able to follow the instructions at Too Much Data to install Python 2.7 next to my existing installation. Next, I followed the installation instructions for glacier-cli to clone the required scripts. I also set up environment variables to hold my AWS access key ID and secret access key. Finally, I was ready to go. Or so I thought. I ran a simple command to list my vaults: /usr/local/bin/python2.7 /root/glacier-cli/glacier.py vault list

Egh. It didn’t work. I was missing some required Python stuff. I then spent a fair amount of time installing all the missing stuff I needed by using the easy_install-2.7 command. I was also missing the sqlite headers, which I installed using yum.

I then tried again with glacier.py and it worked! Yay! I created a vault, uploaded a file, and finally modified my backup script to use Glacier instead of Dropbox.

Anyways, I’m pretty excited. I’ve canceled my Dropbox Pro subscription and completely removed Dropbox from my server. It has freed up a pile of space on my computer, saved me some unneeded bandwidth usage, and I’m now storing backups longer than I ever have before. It is so inexpensive that I’ll likely start a weekly backup of all the files for my websites, too (about 7.4GB uncompressed if I leave out Fake Name Generator order files). This data is almost entirely in git on at least one other server, so I’m not terribly worried about losing it, but it doesn’t hurt to have redundancies in backups.

Read More

Switching from Route 53 to DNS Made Easy

I discovered recently that, sometimes and for some people, my Amazon Route 53 DNS is quite slow. I started to dig into this and discovered that, from several different servers I have access to, Route 53 is slower than the free DNS I get from SoftLayer. Since I am paying for Route 53 and I care a lot about how fast my websites are, I decided it was time to switch to something better.

My hunt for better DNS brought me to DNS Made Easy. I did some research and some testing of people already using DNS Made Easy and discovered that they are drastically faster than both Route 53 and the free DNS I get from SoftLayer. For $5/month you get up to 25 domains and up to 10 million queries. Additional queries can be purchased for a discount up front, or can be automatically billed in the event you unexpectedly go over. They also provide some awesome features like vanity name servers and DNS failover.

Anyways, took me about 3 minutes to get everything switched over and now I’m just waiting for everything to propagate.

Read More

Ich bin ein Berliner

Ich bin ein Berliner

Today I completed my first website designed entirely for a foreign language. Wegwerf-eMail-Adresse is a clone of my Fake Mail Generator site, but has been tailored for German visitors.

It was a lot of fun to work on. Most of the work was straightforward (replacing English with German), but there were a few interesting bits having to do with date/time formatting, time zones, and making sure the URLs were all in German.

Read More

Why you shouldn’t have fake pages on your site

Why you shouldn’t have fake pages on your site

I’m in the market for some rack space in a colocation facility. I’ve been running the numbers and it looks like I could save some substantial cash and add redundancy to my websites by buying a couple of servers instead of renting from SoftLayer. But where to colocate?

Ideally it’d be somewhere I either already live, am moving to, or near someone I visit often. I don’t plan on living in Connecticut any longer than I have to but I have no idea where I’m going to move, so that leaves me with the option of near someone I visit often. My parents have a goal of moving overseas so that leaves Becca’s parents in the Roanoke, VA area.

So I search for “roanoke, va colocation” and lucky me! The first result is a Roanoke colo from a company called Coloco! I check the pricing, spend time crunching numbers, checking my bandwidth usage to see what I need, pricing servers, etc. Becca then asks where the colo is actually located. I search their site but can’t find an address. Weird. They give the addresses of other locations.. The page definitely says Roanoke. Where in the world is this colo?

And then I realize what is happening. This company has flooded Google with fake pages that say whatever city name you are looking for. To test my theory, I visit: http://www.coloco.com/colo/colocation_in_your%20mom’s%20basement.HTM

Sure enough, I’m greeted with this entirely convincing sales pitch (emphasis added):

Grrr. I’ve just wasted 30-45 minutes evaluating a spammy company that is at least 3 hours from where I want to host my servers. I’m not sure which misguided individual at their company decided it’d be a good idea to introduce their company to the world using blatant lies, but I’m definitely not going to host with these guys.

I decided to thank them for wasting my time by offering them some free SEO services. I’ve submitted a few of their URLs to Google that were missing before, including your mom’s basement, the ghetto, and the ball pit at your local McDonald’s. I sure hope it gives them some extra traffic.

Read More

How iptables earned me an extra $500 per year

A few weeks ago I started taking a more active role in monitoring the traffic going to my server. I discovered that lots of people were scraping my sites, or in other words, they were writing programs to extract the data off of my sites without actually browsing them in something like Chrome or Firefox. Very rude.

So I started using iptables, a Linux program that lets you configure the kernel firewall, to block IP addresses that were obviously abusing my services.

One of these scrapers was very persistent. They were scraping my ABA Number Lookup site instead of using the very inexpensive API that I provided. As soon as I blocked an IP, a new one started up. I probably would have let them get away with it but their programming was atrocious. Within the space of a few minutes they were looking up the same routing numbers dozens of times instead of looking up unique routing numbers. So I kept blocking their IPs until apparently they ran out, and the scraping stopped.

A few days later I was hanging out with my family when my cell phone starting ringing on my business line. I answered the phone and was greeted by an individual that needed help signing up for the API. I gave him the information he needed, and then he bashfully asked if I could unblock their IP addresses. Ah hah! This was the man that was hammering my server! Turns out he works for a finance-related company on Wall Street and instead of paying the measly $1 per thousand look-ups he was scraping my site.

So now they are using the API like they should have been the whole time, and I’m making an extra $500 per year. Yay!

Moral of the story: Sometimes it pays to check your logs.

Read More