Zend Alternatives

When most people think of PHP they think Zend, but did you know that Zend isn’t the only company that makes a PHP engine? There are many competing compilers and interpreters for PHP, each with their own slightly different take on the PHP language.

Facebook’s HipHop

HipHop takes normal PHP code and transforms it into C++, which is then compiled in machine code. Code compiled by HipHop runs significantly faster than normal PHP run with the Zend Engine.

It has its disadvantages though. HipHop is only capable of transforming a subset of the PHP language. Some important (and some may say evil) features, such as eval, have been removed to make it possible to transform from PHP to C++. It also has a few other limitations, such as only supporting PHP 5.2 and 64-bit operating systems.

IBM’s WebSphere sMash

WebSphere sMash is a Java implementation of a PHP runtime environment. The PHP code is compiled into Java bytecode which runs on the Java Virtual Machine. The major benefit of running things in Java is that it is portable to more operating systems and allows you to use things like database connection pooling.

While it does support more PHP functionality than HipHop, it is still missing some features. However, it is capable of running a few major open source packages out of the box, including SugarCRM, Mediawiki, and WordPress.

Caucho’s Quercus

Quercus is another Java implementation of a PHP runtime environment.It too allows you to use Java features like connection pooling, and supports major open source packages like MediaWiki and WordPress.

The paid version allows you to pre-compile your PHP scripts, which allows them to run super-fast. Unfortunately, this is cost-prohibitive for users that have more than 1 CPU in their server (like me) or plan on having more than 1 CPU in the future (most anyone with a vaguely popular website).

Phalanger

Phalanger is a strange creature. It is sort of like PHP.NET. It takes your normal PHP code and compiles it into MSIL, which can then be run by .NET or Mono. The major advantage to using Phalanger is that it makes it possible to access .NET classes in PHP (for example, VB.NET or C#).

Thanks to Mono, .NET isn’t limited to Windows. This means you can write Gtk applications in PHP. I hope to have something released in the near future that uses Phalanger.

Read More

Fixing Git

I use git as my version control system for all of my sites. It helps to reduce disk usage by occasionally running git gc. Last night when I went to run git gc on one of my sites, I received a message I had never seen before telling me that my git repo was corrupt. Corrupt? Oh noes!

So I searched the internet and discovered I could run git fsck –full to figure out what the problem was:

$ git fsck --full
broken link from    tree e79ab5efffca93a784afdb9cef73801eb1fa9db4
              to    tree fb9d707c76dde3f8cb672d1c07ca046615175587
dangling tree 1003e43ee3725b0482193aea9d82ca527c2903cc
dangling blob 361211b552b7910299e63b6627944ff72b0577eb
dangling blob 8e79df8dbe87431fbec8f25a4c6527fabe4d9e6f
dangling blob 9b3a891e5094920f743c36a49d1647d9aad2b2f0
missing tree fb9d707c76dde3f8cb672d1c07ca046615175587

After digging around the internet, I discovered that my repo was missing a tree, and that I could get it out of a cloned copy of the repo, assuming the cloned copy wasn’t corrupt, too.

I created a new empty repo:

$ mkdir new-repo
$ cd new-repo
$ git init

Next, I copied my pack file from the good repo to the new empty repo (there was only one in my pack folder):

$ cd /www/path/to/good/repo
$ cp .git/objects/pack/pack-c376dc3b9d2ce5f4d9ab9269d29ddc82f6f6a822.pack /path/to/new-repo

I unpacked the pack file to see if I could find the missing tree:

$ cd /path/to/new-repo
$ git unpack-objects < pack-c376dc3b9d2ce5f4d9ab9269d29ddc82f6f6a822.pack

My missing tree is fb9d707c76dde3f8cb672d1c07ca046615175587, so I looked to see if my file was there:

$ cd .git/objects/fb/
$ ls -l

9d707c76dde3f8cb672d1c07ca046615175587 showed up in the list, so yay! My missing tree was there! Now to copy it to the corrupt repo (my corrupt repo was a bare repo, so no .git in the path):

$ mkdir -p /path/to/corrupt/repo/objects/fb
$ cp /path/to/new-repo/.git/objects/fb/9d707c76dde3f8cb672d1c07ca046615175587 /path/to/corrupt/repo/objects/fb/9d707c76dde3f8cb672d1c07ca046615175587

That was it for me! I ran git gc and everything worked without a problem.

Hopefully this helps someone, or at least will help me remember what the heck to do next time my repo gets corrupted.

Read More

DoS using a single client

Have you ever heard of Slowloris? It has been around for a little over a year, but fortunately I have never had the “pleasure” of dealing with it.

This short Perl script (less than 350 lines of actual code) is capable of turning your lowly desktop computer into a server killing monster. Traditional denial of service attacks use several clients (hundreds, sometimes thousands) to overwhelm the target server. The clients make as many requests as they can from the target server, causing it to use all its resources responding to the requests. Slowloris has a different approach.

Instead of using hundreds of clients, a Slowloris attack can often be successful run from a single client. And instead of overloading the server and utilizing a pile of bandwidth, Slowloris leaves the load on the target server at near zero and uses almost no bandwidth. It still makes a pile of requests, but it makes ….. them ….. nice ….. and ….. slow. Most web servers are only capable of handling a certain number of requests at a time (say, 150) so if you start 150 requests to the target server and leave those requests open for 10, 20, or even 30 minutes, you’ve effectively made it impossible for anyone else to get a valid request through to the web server. The target server isn’t actually doing any work, so the load doesn’t go up. What makes it even more annoying is the requests don’t show up in the log until they fail, if they ever do, which makes it more difficult to figure out what the heck is going on.

There are ways to mitigate the effectiveness of this type of attack, the most common being the use of a proxy to sit between the client and the web server (like haproxy or CloudFlare) and/or installing a bunch of Apache modules. Fortunately I’m doing both so hopefully I’ll be fine if someone decides they don’t like me much and wants to take down my server.

As a final note, I wholeheartedly discourage you from trying out the software unless you only do it on your local network. Attacking someone’s server is probably illegal and is likely to get your internet service shutoff. In other words, don’t be stupid.

Read More

meld > diff

meld > diff

I recently was assigned a task at work that required finding the differences between one directory full of configuration files and another directory full of configuration files. Normally I’d use diff to look at the differences between the files, and figure out what has changed. With a directory of over 100 files, this wasn’t really feasible.

Enter meld, an awesomely awesome GUI-fied tool for Linux that not only lets you compare files, it also lets you compare directories full of files and three files at the same time. Very cool. It is also in the Ubunutu repos which is a plus.

In less time than it would take to interpret a diff on a single file, I quickly saw which files only existed in the original directory, which only existed in the new directory, and which were in both but different. The diff for each file was only a double-click away.

Pretty cool app. If you are on an Ubunutu based system check it out by running: sudo apt-get install meld

Read More

All The Domains in the World

After waiting months I’ve finally been given access to the .com TLD zone file! (And the .net, but who cares about .net, right?)

So what is a zone file, you ask? Basically this file keeps track of how to access every .com domain name in the entire world. Well, technically not all of them, just the ones that have name servers associated with them, but practically all of them. So how many .com’s are there in the zone file? Over 88 million! Holy cow that is a lot of domains!

So the first problem is how to use the zone file. It is a 6.5 GB text file so you can’t just open it up and say “hey is jacoballred.com taken?” and expect a quick reply. On top of that, it isn’t even designed to give you a list of taken domain names, that is just a happy side effect of keeping track of how to access all the .com’s in the world.

My solution was to preprocess the data using a few Linux utilities, then load it into MySQL.

The zone file looks a little like this:

 NS E.GTLD-SERVERS.NET.
 NS M.GTLD-SERVERS.NET.
$TTL 172800
ENERCONTECHNOLOGIES NS NS1.BIZ.RR
ENERCONTECHNOLOGIES NS NS2.BIZ.RR
SELF-DRIVE-CAR-RENTAL NS NS3.IZP

None of the domains in the file have .com on the end. Each of these lists a nameserver after it. There are also non-TLD domains (nameservers) that I don’t care about, and other random markers in the file ($TTL). All I want are the domain names, so I use Linux to strip out the stuff I don’t want:

sed -e '/^[^A-Z0-9]/d' -e '/^$/d' -e 's/ .*$//' -e /[^A-Z0-9\-]/d com.zone \
| sort -u \
| awk -F "" '{close(f);f=$1}{print > "com.zone.split."f}'

Wow doesn’t that look fun? So lets go over it.

That first line uses sed to load in the zone file (com.zone) and remove all lines that don’t start with A-Z or 0-9 (the only valid characters for the first character of a .com domain), then it removes blank lines, then it removes all but the first word on each line (gets rid of the nameserver after the domain name), and finally removes any line that has characters that aren’t allowed in a domain (anything other than A-Z, 0-9, or a dash). This gets a list of JUST the domain names (without the .com), but has duplicates and they aren’t in any particular order.

The next line sorts the list of domains and removes duplicates.

The last line uses awk to split the list of domains into 36 separate files, one for each starting character (A-Z, 0-9). This isn’t technically needed but makes things more convenient.

My server is pretty wussy (1GB of RAM) so I’m preprocessing on my fast 8GB of RAM desktop at home. So I kick off that command and 20 minutes later I have files ready to be loaded into MySQL.

My table structure is pretty basic. I have 1 table for each letter/number (for performance) that has a numeric primary key and a varchar for the domain name. So I run this for each letter and number:

DROP TABLE IF EXISTS `com_A`;

CREATE TABLE `com_zone`.`com_A` (
    `id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
    `name` VARCHAR( 255 ) NOT NULL
) ENGINE = MYISAM CHARACTER SET utf8 COLLATE utf8_general_ci;

Next I use LOAD DATA INFILE to quickly pull in the data:

LOAD DATA INFILE '/path/to/file/com.zone.split.A' INTO TABLE `com_A` (name);

This step took about 5 minutes total for all the processed files. It is super super fast, but we still have one step left. Without an index on the name field, queries are really slow (about 2 seconds for a single domain). So we add an index to each table:

ALTER TABLE `com_A` ADD UNIQUE `name` ( `name` ( 255 ) );

This step was painfully slow, about 40 minutes, but once it was done I could do pretty much any query in a fraction of a second.

The final step was to turn off MySQL on my desktop, copy the MyISAM files to my server, then restart MySQL on my server so it could use them. Woot! I know have nearly every .com in the world on my server, ready to tell any web app I want if a domain is available or not with a high degree of confidence. I have a couple really fun webpages in the works that will use this.

Well that was a bit of a ramble but should be enough to get someone else in my position on the road to domain goodness!

Update 11/6/2013: A visitor asked me to create a command to parse the .INFO zone file. Here you go:

sed -e 's/\.INFO\.//g' -e '/^[^A-Z0-9]/d' -e '/^$/d' -e 's/ .*$//' -e /[^A-Z0-9\-]/d info.zone \
| sort -u \
| awk -F "" '{close(f);f=$1}{print > "info.zone.split."f}'
Read More