The Weather

Am I the only one who finds it ironic that the grocery store sells bags of ice right next to bags of ice melter? Also, the half-inch of snow that landed on my car in the half an hour I was shopping reminded me how glad I am that our apartment came with a parking spot in a covered garage.

The Trouble With TrackBacks

WARNING: What follows is a somewhat confused ramble about a topic of a boring, technical nature that I don’t really know that much about and in which I’m quite possibly completely wrong, and where I capitalize “TrackBack” at least three different ways:

I recently converted my weblog to UTF-8. I’ve discovered an interesting problem, though: the other day, I got a trackback in German, from a Movable Type weblog encoded in ISO-8859-1. Movable Type inserted the trackback’s excerpt as-is, and the non-ASCII characters showed up wrong. I’ve fixed this particular problem, via a hack that assumes trackback excerpts are ISO-8859-1 if they aren’t valid UTF-8, but this doesn’t address the issue at hand here, since it works only for text written in Western European languages (luckily for me, this is an English weblog, so that’s mostly what I’ve got).

The TrackBack Technical Specification makes no mention of character encoding. A trackback ping is a HTTP POST request of type application/x-www-form-urlencoded (the response is XML, and XML handles encoding problems very well). The form of a application/x-www-form-urlencoded document is defined in the HTML specification (of all places) to be the same as URL encoding, but URL encoding is (at least, at the time HTML was designed) for ASCII text only, and the encoding of non-ASCII characters is undefined. Most browsers handle this issue by using the encoding of the HTML page to which the form was submitted, but this isn’t possible with trackbacks, who have no associated form.

It appears that the choice of application/x-www-form-urlencoded as an encoding for submitting Trackback pings is an unfortunate one. Using multipart/form-data or another format (such as XML) that allows for a defined character encoding would have been a better choice. But given that the choice has already been made, what can be done to improve the situation?

One possibility would be to amend the TrackBack specification to allow (and recommend) the use of multipart/form-data instead of application/x-www-form-urlencoded. The Movable Type implementation of TrackBack uses the Perl CGI module, which already knows how to parse these, so existing weblogs would be able to receive pings in this format (although further code would be needed to extract the charset and do the translation). Another possibility would be to define TrackBacks to always use UTF-8 encoding for sending pings. This would cause problems with non-ASCII TrackBacks between new and old implementations, but once everyone was upgraded, things would work smoothly. The third possibility, and the one with the least interoperability concerns with existing implementations, would be to have senders of TrackBack pings encode any non-ASCII characters using HTML entities before sending. This has the advantage that it does not require sending any non-ASCII character in a ping, so no changes need to be made to the protocol, and existing Web sites that display excerpts in HTML or XML contexts will work without a hitch.

The downside is that the trackback contents would no longer be treatable as unformatted text for the purpose of putting in email, etc… Any parser of trackback data for non-SGML-like purposes would need to decode the HTML entities. This is not necessarily bad, especially since I imagine there are trackback pingers out there who are probably sending full-fledged HTML in their excerpts anyway. It would require the addition of character set conversion to Movable Type, though (since HTML entities are Unicode-based), which it somehow has managed to avoid so far.

Okay, I’m done now.

Houston is the new pink

Laura‘s father just got back from a trip to Houston. A bunch of people from her group are going to a conference in Houston next month. And on Ed, Frankie’s fiancee has just moved to Houston.

Has the city of Houston signed some sort of advertising tie-in deal with my life?

MTThreadedComments: You know you want it

Has the bare look and minimal functionality of the comments in your Movable Type weblog been getting you down? Have your readers been clamoring for cool new features like threading? Been considering buying a LiveJournal account so your comments can have subject lines? No longer! Now, you can get all these features and more with MTThreadedComments, new from Alexei Kosut.

How much would you pay for this amazing plugin? $100? Too much! $50? Too much! 27¢? Too much! What would you say if I told you it was free? That’s right. Absolutely free!

MTThreadedComments enhances your weblog’s comments with subject lines and threading, letting your readers reply not only to your entry, but to other comments as well, displaying the comments in an easy-to-read nested format. It even makes julienne fries! And if you act now, we’ll throw in, at no extra cost: absolutely nothing! That’s right folks. You can’t get a better deal than that.

So pick up that phone and download today.

“So I have this computer problem…”

As an (out-of-work) computer professional, I’ve found that revealing my profession in a conversation is a recipe for disaster. I imagine it’s much the same way for doctors and lawyers. What other jobs get this sort of attention? Do people sidle up to civil engineers and ask them for help building bridges?

It’s not that I mind random questions about computers. Not at all; I’m usually perfectly happy to talk about general computing and technology issues—although I sometimes suffer from the inability to reduce my knowledge to a basic enough level to actually have a conversation—but the assumption that I not only am I interested in being free technical support, but that I will somehow be able to help fix the problem with no access to the computer, and armed only with a few barely-remembered details (“it doesn’t work”), annoys me. Stop it!

For that matter, I’m a software developer, and the skill set isn’t necessarily applicable. In my case, I happen to have a good deal of experience with personal computer support, but I know plenty of very good programmers who wouldn’t know a hard drive from a power cable. It’s like conversationally asking a criminal lawyer for help with your divorce. He just isn’t going to be all that helpful.

Fun with marketingspeak

I recently purchased (and will have, if Amazon ever gets around to shipping) a USB memory card reader, to make it easier to get pictures from my digital camera into my iMac.

What I found amusing, though, was the designation given to these multi-slot card readers. Fact: these things have slots for four different types of memory cards. CompactFlash, SmartMedia, Memory Stick, and MMC/SD. Some of them have four separate slots, some have two (one for CF, one for the others). But none read more than these four different types of cards.

These devices are, however, referred to as “6-in-1,” “7-in-1” or even “8-in-1” card readers. Why the distinction? Well, there’s CompactFlash Type I and Type II, which are sometimes counted separately. There’s the “Microdrive”, which used to be an IBM product, but now seems to refer generically to a miniature hard drive in a CompactFlash Type II enclosure. But (and this is important) there’s absolutely no difference, from the card reader’s point of view, between a CompactFlash Type II memory card and a “Microdrive.” That doesn’t stop marketeers from counting them twice. MMC and SD are sometimes separated, even though the two are physically and electrically identical. And I once even saw Memory Stick MagicGate counted separately. MagicGate is a content protection mechanism, so this is roughly akin to a DVD player advertising that it plays two different types of video discs: DVDs and DVDs with Macrovision!

There are some “7-in-1” readers and a few “8-in-1” devices, but most are “6-in-1.” Even though they are all exactly the same, functionality-wise, there seems little consensus on which six “different” memory cards are counted. Some treat CompactFlash types I and II as different, some the same. Some count the Microdrive separately, some ignore it. Some treat MMC and SD as separate card types, some just list MMC/SD. It would seem, though, that in marketingland, four is never less than six.

Administrivia

I’ve upgraded my weblog to Movable Type 2.6. I’ve got a fairly permissive set of HTML tags allowed in comments, but I may have missed one or two people might want to use. So preview your comments carefully, and let me know if it strips out something you expected to see. I also switched the character set to UTF-8, mainly because I got tired of Safari not posting …, – and — to ISO-8859-1 forms in a useful way (But see! I typed them in Safari this time, and they worked! Unicode rocks.)

I also installed Adam Kalsey’s SimpleComments plugin, which merges comments and Trackbacks into a single list. And I fixed the comment posting script so that the back button in your browser won’t cause you to post comments twice anymore. Sorry about that.

Update (4:19 PM): I hacked MT to add subjects to comments. I’ll reimplement all of LiveJournal yet!

Some assembly required

Someone please tell me this was not the easiest way to get “Just Another American Folk Song” (from last week’s American Dreams) into iTunes:

  1. NBCKSDK, digital via satellite
  2. KSDK ⇒ Charter, analog NTSC over-the-air broadcast
  3. Charter ⇒ TiVo, encoded using MPEG-2 and stored on disk
  4. TiVo ⇒ digital camcorder, decoding the MPEG-2, transferring via an analog cable, and re-encoding to DV and storing on tape
  5. Digital camcorder ⇒ iMovie
  6. iMovie ⇒ iTunes, converting from DV audio to MP3

For those counting at home, that’s six different audio formats (not counting anything that took place before it left NBC), including four conversions between digital and analog. The result sounds that way, too…

(Yes, I realize I could have compressed steps 4–6 if I’d just recorded to the computer directly from the TiVo, but (a) the iMac is in a different room and (b) I don’t have a cable that long. I tried recording directly to my laptop, but I couldn’t figure out a way to connect a line-level signal to the mic jack without severe distortion.)

Now a major motion picture by Brad Cox

I was re-reading The Design and Evolution of C++, and was struck by a new thought in the C++ vs. Objective-C discussion that rages through my mind from time to time. I like and use both languages, but they’re similar enough in goal and function that it seems a shame that one language can’t suffice1. Usually, I compare the two languages in terms of differences like their (very different) object models and their (very different) syntax. But today I had a new idea:

I’ve noticed a commonality in motion pictures that are based on novels. In the novel, you will often find a scene that goes something like this: “The obvious thing to do here is X. But because of A, B and C, we have to do Y instead, which will take an extra 150 pages.” In the movie version, they usually just do X in the first place, with no mention made of A, B, C or Y. It’s a convenient way of cutting the plot down to size.

In sections 3.9 and 10.2 of D&E, Stroustrup explains that he wanted C++ to allow “separate specification of allocation and initialization.” But he didn’t want to actually separate allocation and initialization, because he wanted the new operator to ensure that objects were always properly constructed. He also wanted per-class allocation, but assignment to this was considered too ugly. So C++ has operator new(). Actually, it has several different operators new().

How does this work in Objective-C2? Well, allocation (alloc) and initialization (init) are separate, and you do per-class allocation by assigning to self. And The Objective-C Programming Language is 800 pages shorter than the The C++ Programming Language. Movie-sized.

1 Some people would probably say that it does, and that it’s called Java.
2 By Objective-C, by the way, I mean the NeXT, Apple and GNU implementations. If your name is David Stes, pretend I’m talking about some other language entirely.