topher

So, Ed posted a link to an article about PHP scalability, and I thought it was a bit off. I mentioned that, and people wanted to hear what I had to say about it. So, here it is.

First I’d like to clarify a few things. One is that wherever possible, quotes from the original article will be italicized.

Another is that it appears the article was written in 2002. Quite a few things have changed since then.

Another is that I don’t claim to be a PHP expert. I am made aware of this every time I choose to teach it at university level. Anyone who wants to be made aware of their ignorance on a certain topic need only try to teach it. I have. however, been using PHP for 6 years now.

Lastly, the author of the article in question chose in his title to restrict his topic to large websites, so I’d like to define what I think is “large”. I think a university site is on the small end of large. You have multiple departments, with multiple people per department all wanting their own unique thing. I don’t think of high bandwidth as a necessary element in a “large” site. I don’t really think of SlashDot as a large site. Lots of data, lots of users, lots of bandwidth, yes, but at the end of the day its number of functions aren’t that high, nor is its number of different internal designs very high.

This decision to restrict himself to large websites leads me to restrict my analysis to people who know what they’re doing, at least to the same level as me. When building something like hp.com (which has php, asp, and html extensions all over the place), I would hope they’d have someone a lot more knowledgeable than myself.

I don’t think I personally have ever developed a “large” website (which may make me the wrong person to discourse on it, but you asked).

So on with the review (of the review).

Separation of Presentation from Business Logic

This is far less difficult that first glance would indicate. Planning is key. In the end, ALL data must have some sort of markup if it is to be adequately interpreted, whether it be HTML, XML, PDF, CSV, etc. This should lead the developer to plan for that. The eternal choice is how basic to make the original data, regardless of language, platform, or intended use. You can go ahead and mark it all up with HTML, but that’ll make it require work to port it to other environments. Or you can keep it really simple and require some work to port it to any environment (which is the path I actually advocate).

The author points out that changing the design of a PHP driven site is daunting, but I beg to differ. About a year ago I changed the design of WaYfm’s site. Aside from the work of actually making a design for a website, it took me about 1 hour to change the design for the entire site. During that process, I streamlined some things, so next time it should take about half that time.

Keep in mind that only involved the design, not strategic decisions about adding new areas, or deleting old unused ones.

Using a Team of Developers

The author points out difficulties building a site with a team of developers:

1. You can redefine a function defined by another member of the team. Developers need to have extremely good lines of communication to avoid this.

Why is good communication in a team of developers a bad thing? Yes, a function could be redefined. The password on the database could be changed too. Don’t do that, or if you do, tell the other developers. There are a plethora of large PHP sites out there developed by groups of people. There were far fewer in 2002, granted, but PHPNuke has been around for a long time.

2. Even worse, you can redefine a built-in function used in code written by another developer.

Redefining a function built into PHP is dumb. I can’t immediately think of any reason this should happen on purpose, and if it happened by accident, it wouldn’t be that hard to fix with a global search and replace.

3. File inclusion (PHP’s main mechanism for code reuse) tends to exacerbate this problem: trying to use code written anywhere else can be a portability nightmare.

Sure it can. Or it can be a dream, depending on how much planning went into the process.

It was pointed out to me by a mysterious PHP wizard that the issue may come up of downloading a script that has a function with the same name as something you wrote. While this is an unfortunate possibility, it could still be avoiding by making reasonably esoteric function names.

On the other hand, if the php developers choose to add namespacing for functions, more power to them.

Deployment Problems

I’ll just respond to bits and pieces.

…”it’s best to configure PHP to throw an error when you try to use unassigned variables; this makes PHP more similar to programming languages which require all variables must be declared before use.”

Why is that best? At what level may PHP stop acting like other languages, and do its own thing?

The biggest issue with the php.ini file is that there can be only one per web server.

These days, individual php.ini settings can be reset on a directory or even file level, using .htaccess, or the ini_set() and friends calls. Was it 2 years ago? I don’t know.

Another problem thrown up in deployment is that you can’t in general rely on a web host having any particular version of PHP. To all intents and purposes, this requires that the developers both know what version of PHP will be on the web host, and in addition use the same version for development.

On the one hand, there are thousands of hosts with PHP out there, with different options. Granted, there were probably far fewer in 2002, but this is 2004, and that argument doesn’t really stand anymore. On the other hand, even in 2002, if you’re building a “large” website, you should probably have your own virtual or real server anyway, with your own install of PHP, configured the way you want it. “Large” websites tend to be significant, and significant websites usually have dollars behind them, and while a private virtual or real server is more expensive than I’d want for my site, it’s relatively dirt cheap for a site with significant dollars behind it.

Oversimplification Leading to Excessive Complexity

This is an area where I may be incorrect out of ignorance. PHP is the only “programming” language I know, which I think is both an advantage and disadvantage.

One thing I’d like to point out, which will affect the author’s entire logic in this area (and which he had NO knowledge of at the time) is that PHP5 will resolve many of his issues. Most notably that “[PHP] was accreted, rather than ever having been designed. Most programming languages that have achieved genuine long-lasting popularity have been the work of at most a small team of gifted language designers.” PHP5 is a heavy rebuild, with a (relatively) small group of developers, specifically to lend it more of the “world view” that the author finds lacking. There was no way for him to know about this at the time.

Some function names have multiple words separated by underscores (str_replace); others have words squashed together (strtoupper). Some functions have aliases, like disk_free_space and diskfreespace.

The disparity in the use of the underscore is unfortunate, but supplies the reason for the alias. The developers realized that there should be some consistency, and so began creating that consistency without removing the older functions, simply making them aliases. This set them up for alerting PHP users that the change had taken place, and that the poorly named functions would be going away, and that they should start using the new (proper) ones.

PHP has both [a numerically-indexed array type, and at least one other type which allows data to be indexed by strings], but conflates them into a single array structure.

Why is this bad? If you want one style, use it. If you want the other style, use that one instead.

Here’s some code and some questions:

$a1 = Array(10, ‘Anne’ => 32, 11, ‘Bob’ => 28, 12);
$a2 = Array(1 => 21, 2 => 22, 3 => 23);
$a2[0] = 20;

Questions to consider:

The author says “While these questions can be answered (by careful reading of the manual and not a little experimentation), the answers aren’t entirely obvious.

* What index do we have to consult to get the value 11 out of $a1?
print $a1[‘1’];

* What’s the iteration order for $a2? Is it numerically-indexed or hash-style?
Numerically-indexed.

* What happens if you use PHP’s $a2[] = … construct to add a new element to the ‘end’ of the array?
It’ll get added to the end of the array.

* Can numerically-indexed arrays have elements missing? If so, can you still trivially iterate over the values in index order?
I don’t know this one right off, and I’m running out of time. I would guess that you CAN still trivially iterate over the values.

My mysterious PHP wizard pointed out that both his and the author’s problems with this combination of arrays and hashes arise from the fact that a poorly constructed array CAN happen, and for a young developer, could very well happen. But as I’ve said before, we’re talking about large sites here, which should be developed by experienced developers, who theoretically would know not to create such a convoluted array.

While these questions can be answered (by careful reading of the manual and not a little experimentation), the answers aren’t entirely obvious.

I don’t use a lot of arrays. The vast majority of my use comes from mysql usage, with the mysql_fetch_array() function, which obviates my need for the majority of what he uses in his examples. My point is that even with my limited knowledge, I was able to answer the first three without looking at the manual at all, and about 30 seconds of testing. it only took that long because Urban Mill’s wireless is winding down for the day.

Programmers with a reasonable understanding of basic data structures would be well advised to program as if a given PHP array can be indexed by either contiguous integers or by strings, but not both.

This brings up a point I’ll emphasize more at the end. Once you know this, is it a problem? If you’re building a large web site, you should be a competent PHP user.

Value comparison

The author takes issue with PHP’s == comparison operator, pointing out that PHP’s ability to hold either a number or a string in a variable makes it kludgey. He then points out that === resolves that problem, but takes issue with its deeper discrimination, believing that it nullifies the advantage of mixing data types. To this I respond with “planning”. If it’s really a problem, put the time into making sure your data type ducks are in a row. I’ve never ever had a problem with this.

He further takes issue with the fact that PHP doesn’t do it like perl or C++. To that I respond “so?”. If all languages did things the same way as all other languages, there’d be one language that did everything right.

Variable Scope

The author spends quite a bit of time discussing variable scope, essentially coming to the conclusion that because it’s different from other languages, it’s bad. I just don’t think that’s a valid reason.

I mentioned earlier the fact that knowing only PHP is both an advantage and disadvantage. In this case I think it’s an advantage for me, because PHP’s scope practices feel perfectly normal to me, and I don’t get confused by them. That’s the way scope works. Of course, I’ll probably have trouble when I try to learn another language, but that’s not what we’re talking about.

Magic Quotes

With get_magic_quotes_gpc turned on, PHP puts back slashes in front of characters that could break an SQL query or HTML segment. There are easy, short ways around this, which the author points out. His beef is that they shouldn’t have to be done, bolstering his argument that it’s needlessly complex. Later I’ll be discussing the whole issue of relative complexity.

Superglobals

Most of the author’s issues with register_globals and superglobals dealt with problems existing in older installations. Those installations are now 2 years older, and any “large” project run on them gets what it deserves.

When to use PHP

I’ll respond to a couple of these here, and the rest in the next section.

How much control will you have over the deployment platform? PHP’ one-size-fits-all approach to the php.ini file makes it hard to share servers with sites that were developed with different settings.

This is an excellent point. I would suggest that a large site either shouldn’t share a php.ini file, or find a wonderful host that fits its needs perfectly.

How long will the site be expected to last? The longer it lasts, the more likely it is that significant design changes will be needed. If you use PHP in the obvious manner, major design changes are difficult. If you extend PHP with a templating system, whether ad hoc or carefully enforced, using PHP buys you little if anything.

I guess this depends on what “the obvious manner” is. To me, templating is the obvious manner, and it buys me incredible flexibility. Being able to change the design on an entire site in an hour would have been unthinkable in plain html. Would it be easier or harder using something like Perl or Java?

Conclusions

I’ll draw loosely on his conclusions, and make my own.

His primary argument is that PHP is too simple for “large” sites, requiring the coder to build in complexity in order for it to have the same capabilities as other languages. That it works well for the inexperienced (who shouldn’t be building a high end site) but would frustrate an experienced coder by being too simple. He also suggests that having a group of people work on a PHP project is nigh impossible.

My response is thus: Yes, PHP is simple, and to do high end things, you must build some complexity back into it. I think this is a good thing. There’s a certain amount of work involved in writing any script, and I think the more of it that is done for you before starting, the better.

Your script must reach a certain amount of complexity to accomplish your goal. You can either start with a language that has a specific purpose of helping you reach that level, or you can use a more abstracted language and build all the complexity yourself.

PHP was written specifically for web development, and for that reason and in that area it excels, similar to a mail truck with the steering wheel on the opposite side. Perl on the other hand was written to do pretty much anything. This does indeed give it a little more flexibility, but it also requires you to construct for yourself many of the things that PHP has built in.

These developers are the ones who have the skills needed to build large and/or complex websites; using PHP for such sites therefore tends to be a net loss.

Only if those skills are in some language or genre other than PHP. If a developer is weak in his PHP knowledge, then of course it will result in a net loss. Conversely, if an developer is skilled in the use of PHP, he should have no problem doing just about anything that needs to be done in a reasonable amount of time.

How many people will work on the site, now and in the future? PHP as a language lacks the features necessary to promote effective teamwork; the bigger your team, the greater the problems you’ll have.

After talking it through with my mysterious PHP wizard, I can see how various issues with scope can cause problems with really large teams (more than 10). With teams smaller than that, I can see an occasional problem happening, but I can’t imagine anything coming up to be a real show stopper. But then, who knows?

Where I think PHP should not be used

Basically, anything not dealing with displaying content on a web page. Stats processing is an excellent example. There are quite a few stats packages these days that are built with PHP, but I think the action of processing all those log files would be better accomplished by an external program or CGI written in a language that does a better job at raw number crunching. Then the results of that process can be put into a file or database, and PHP can be used to render the data.

Another is Macromedia Flash generation. PHP has functions built in to create Flash elements on the fly. Creating a whole site, or even elements of it with PHP is a poor choice, since it’s time intensive. On the other hand, PHP can pull data from a source and drop it into a text file, and Flash can include it, using ActionScript to reload that section, allowing for really dynamic content like stock quotes.

I could give any number of ways PHP should not be used, so I won’t belabor that issue. My point is that the large size of the project, or the high number of developers shouldn’t necessarily hold back qualified PHP developers.

Many thanks to Joel for reviewing this before post, and preventing me from making too big of an idiot of myself.

7 thoughts on “MY Experience of Using PHP in Websites

  1. Great post, Topher. Here’s something Guido van Rossum (inventer of Python) said about Python that I think holds true for PHP, too:

    “A 20,000-line Python program would probably be a 100,000-line Java or C++ program. It might be a 200,000-line C program, because C offers you even less structure. Looking for a bug or making a systematic change is much more work in a 100,000-line program than in a 20,000-line program. For smaller scales, it works in the same way. A 500-line program feels much different than a 10,000-line program.”

  2. Redefining a function built into PHP is dumb. I can’t immediately think of any reason this should happen on purpose, and if it happened by accident, it wouldn’t be that hard to fix with a global search and replace.

    Not true! A perfect example is one of my own favorite subjects (and PHP tools): PHP sessions. I work in a multi-server (web-cluster) environment, and the only way to have them work is by re-declaring the default PHP functions (so that the session information is database-stored, rather than stored on a local server’s hard drive).

    However, all in all, I would agree more with Topher than Aaron. If Aaron doesn’t like PHP, then there are plenty of other languages that he can use. PHP was created to fill a niche – and it’s a nice that it has quite successfully taken over. While there may be flaws within current versions, this is true of ANY programming language, and deep comparisons between any of them tend to be asinine. The bottom line is that languages are different because they meet different needs and serve different purposes. None of them are perfect, and you use the tool you need to get the job done.

    PHP is (a) free, and (b) wildly popular. You can say you don’t like it, but you can’t argue with it’s successes….

  3. Regarding redefining functions…

    When you create your own custom session handling functions, you are not redefining existing PHP functions; you are telling PHP to use your own functions instead of its built-in ones to handle functions. This wasn’t the author’s beef; in fact, I think it’s a pretty clean way of customizing PHP’s behavior.

    I just tried making my own version of mysql_connect(), and it (appropriately) failed, telling me:

    Fatal error: Cannot redeclare mysql_connect()

    So I’m not quite sure what the original author’s point was there. Perhaps older versions of PHP allowed this, or perhaps I’m misunderstanding what he’s saying.

  4. I love PHP. I’ve been using it for a few years now. However, I’m not a programmer like Topher is by any means, and therefor get hung up in PHP a lot more than I do in CFM. I think the next step for PHP is building a dev enviroment that allows the perl dummies (like me) to rapidly deploy internet aps. I know dreamweaver is well on its way to including some helps in this area, but you can understand why they spend more time on CFM.

    Anyway, I use php for almost all the site dev I do. The price is right, and the community support is awesome. (unlike CFM where you have to by help from even the rookie devs.)

    Good job defending a great app Topher!

  5. Good article. Some points and counterpoints:

    Separation of presentation from business logic (aka, the panacea of the 2000s) is particularly easy in PHP using Smarty. I’ve used Smarty *just* for data formatting (ignoring the general page layout, which uses a proprietary templating system) and it’s a godsend.

    PHP should throw an error when using unassigned variables NOT because other languages do it, but because it’s RIGHT.

    While it’s generally true that php.ini settings can be set through Apache directives or ini_set(), this is not true for all directives (see the list and note the ones that are labelled PHP_INI_SYSTEM). Understandably, many of these “less configurable” directives shouldn’t be exposed to Joe Q. User, it sometimes makes things a little difficult for those of us who know WTF we’re doing. 😉

    Re: knowing which version is on there, that’s just an idiotic, specious argument. You can never “know” which version of Perl is on a server… unless your provider tells you, or you demand a certain version. (There are a lot of these silly arguments in the rest that I will just be ignoring.)

    Magic quotes is simply evil. Unexpected behavior like that is bad. If anything, PHP should disallow that from being configurable in the php.ini and require that lazy users enable it in their own scripts. I would much rather use an intelligent DBI-like class to execute my queries (and take care of all of those things that Perl’s DBI does, including placeholders in the queries) than to allow PHP’s “magic” to take care of it.

    “Perl on the other hand was written to do pretty much anything.” Yep. But Perl “requires you to construct for yourself many of the things that PHP has built in”? Bzzt. Being a PHP-only user, I’m not surprised that you haven’t considered Perl’s vastly-superior alternative to PHP’s PEAR code library: CPAN. Your statement would be more accurate, 95% of the time, as “it also requires you to download the CPAN module [etc.]”.

    Also, since you aren’t a system administrator, I’m sure you don’t realize just how severely broken some of those convenient PHP built-ins are. Like PHP’s built-in MySQL support was hosed for many, MANY versions, and required you link against the external MySQL library. (In fact, I’m not sure if it isn’t STILL broken.)

    Another built-in aspect of PHP that is just the biggest burr in my saddle is safe_mode. Conceptually, it’s great. The results have been pretty poor. (Again, I have no idea where it’s at now, since we don’t use it.) One step below the results, the implementation is absolutely the worst thing I have ever seen coded by man or alien. This was based on my attempts to extend 4.0.x’s safe_mode to be a little less retarded… unsuccessfully… I get a little physically ill thinking about it. Maybe languages are a special case, but for me there’s more to an application than what’s there, but what COULD be there through extending the code. (Insert comparison of sendmail and qmail here.) But that’s a little off-topic… safe_mode still sucks it.

    I don’t generally spend time convincing people that PHP sucks… at least, not anymore. 🙂 Like one of the comments said above, paraphrased, “To each his own.” But the above is part of why I really don’t like it for myself.

Leave a Reply

Your email address will not be published. Required fields are marked *