I run the Fake Name Generator, a fairly popular website that generates fake names and addresses. The website offers free bulk orders of up to 40,000 names.
For years I’ve been struggling to make this feature run quickly. The best I’ve been able to get it down to is about 40,000 records in 37 minutes. ~1,000 records per minute isn’t that great.
So yesterday I decided I’d try doing some profiling using xdebug. I set up the test to generate a 40,000 record file. Within a few minutes, the profiler generated a 2GB file and promptly crashed. Hmm. Something was definitely not right.
I dropped the test to 100 records. Within just a couple minutes, a 300MB profile file was generated. Hmm. That seems odd. 300MB profile file for generating 100 records? Hmm…
I loaded it up in WinCacheGrind (which took 15 minutes) and was shocked to see that a date generating function was taking up 90% of the script’s execution time. What the heck is going on?
So I opened up my code and found a glaringly obvious, horrific mistake. My code generates a random month number, day number, number of seconds, etc. However, instead of doing a mt_rand(1,12) — min month to max month — my code was doing mt_rand(1,86400) — min month to max seconds. It then checks if the date is valid, and tries again if it isn’t. This means this function had to run thousands of times before it came even close to generating a valid date.
So the results of fixing the bug? 40,000 records in 3.2 minutes (even better under PHP 5.3: 2.2 minutes).
Moral of the story? Profile your code!