Archive for the ‘Computing’ Category

Enterprising software

Thursday, February 14th, 2008

Cyndi Mitchell in a talk from Rails Conf points out how “enterprise” in the phrase “enterprise software” has taken on the opposite of its customary meaning.

If you call a person enterprising, you have in mind someone who takes risks and accomplishes things.  And “Enterprise” has been the name numerous ships, real and fictional, based on the bold, adventurous overtones of the name. But Cyndi Mitchell says when she thinks about enterprise software, the first words that come to mind are bloatware, incompetence, and corruption. I wouldn’t go that far, but words like “bureaucratic” and “rigid” would be on my list. In any case, “enterprise” has a completely different connotation in “enterprise software” than in “USS Enterprise.”

The USS Enterprise circa 1890 at the New York Navy Yard

Introduction to Mac for Windows developers

Wednesday, February 13th, 2008

Here are a couple podcasts introducing Windows developers to software development on the Macintosh.

Scott Hanselman: What’s it like for Mac Developers, an nterview with Steven Frank

.NET Rocks: Miguel de Icaza and Geoff Norton on Mono, mostly about .NET development on the Mac

Also, there are a lot of Mac-related talks on the GeekCruise podcast. The talks from January 2007 were directed at a general audience new to the Mac.

Hanselman’s podcast talks about some of the cultural difference between Microsoft and Apple customers. For example, Mac users update their OS more often and complain less about OS changes that break software.

A bigger clipboard

Tuesday, February 5th, 2008

Imagine you find a paragraph on the web that you want to email to a friend. You copy the paragraph. Then you think you should send a link to full article, so you copy that too. You start composing your email and you type Ctrl-V to paste in the paragraph, but to your disappointment you just paste the link. So you go back and copy the paragraph again.

The problem is that the Windows clipboard only holds the most recent thing you copied. Jeff Atwood posted an article a few days ago called Reinventing the Clipboard where he recommends a utility called ClipX that extends the clipboard. After you install ClipX, typing Ctrl-Shift-V brings up a little menu of recent clippings available for pasting.

I’ve been using ClipX for a few days now. It’s simple and unobtrusive. The only slight challenge at first is remembering that it’s available. One you think to yourself once or twice, “Oh wait, I don’t have to go back and copy that again,” you’re hooked.

Footnote to interruption post

Tuesday, February 5th, 2008

In my post yesterday about interruptions I quoted Mary Czerwinski from Microsoft Research. She told me afterward that two of the applications mentioned in the interview have been released. They are publically available from the Microsoft Research download site.

I haven’t had a chance to use either of these tools yet. If you try them out, let me know what you think.

Three-hour-a-week language

Thursday, January 31st, 2008

I listened to an interview last night with Perl guru Randal Schwartz. He said that Perl is meant for people who use the language at least two or three hours per week. If you’re not going to use it that often, you’re probably better off using something else. But if you use it full time, you can see huge productivity increases.

That matches my experience, at least the first part. I was attracted to Perl because I could imagine being very productive with it. At first I used the language infrequently. Every time I sat down to write Perl I had to work hard to reload the language into my brain. Then for a while I used it frequently enough to achieve some fluency. Then as I wrote Perl less often I could almost feel the language slipping away.

Of course you have to use any language, human or computer, to achieve and maintain fluency. But my sense is that Perl requires more frequent use than other programming languages in order to remain minimally competent, and it repays frequent use more than other languages. I imagine this is a consequence of the natural language principles baked into the language.

One of the things I prefer about Python is that you do not have to use it three hours a week to remain competent.

I no longer write Perl, but I still enjoy listening to thought-provoking Perl speakers like Randal Schwartz.

Paper doesn’t abort

Wednesday, January 30th, 2008

My daughter asked me recently what I thought about a Rube Goldberg machine she sketched for a school project. I immediately thought about how difficult it would be to implement parts of her design. I asked her if she really had to build it or just had to sketch it. When she said she didn’t really have to build it, I told her it was great.

I thought of a friend’s comment about designing code versus actually writing code: “I’ve never seen a piece of paper abort.” The hard part about writing software is that people can tell when you’re wrong.

Programming the last mile

Tuesday, January 29th, 2008

In any programming project there comes a point where the programming ends and manual processes begin. That boundary is where problems occur, particularly for reproducibility.

Before you can build a software project, there are always things you need to know in addition to having all the source code. And usually at least one of those things isn’t documented. Statistical analyses are perhaps worse. Software projects typically yield their secrets after a moderate amount of trial and error; statistical analyses may remain inscrutable forever.

The solution to reproducibility problems is to automate more of the manual steps. It is becoming more common for programmers to realize the need for one-click builds. (See Pragmatic Project Automation for a good discussion of why and how to do this.  Here’s a one-page summary of the book.) Progress is slower on the statistical side, but a few people have discovered the need for reproducible analysis.

It’s all a question of how much of a problem should be solved with code. Programming has to stop at some point, but we often stop too soon. We stop when it’s easier to do the remaining steps by hand, but we’re often short-sighted in our idea of “easier”. We mean easier for me to do by hand this time. We don’t think about someone else needing to do the task, or the need for someone (maybe ourselves) to do the task repeatedly. And we don’t think of the possible debugging/reverse-engineering effort in the future.

I’ve tried to come up with a name for the discipline of including more work in the programming portion of problem solving. “Extreme programming” has already been used for something else. Maybe “turnkey programming” would do; it doesn’t have much of a ring to it, but it sorta captures the idea.

Empirical support for TDD

Saturday, January 26th, 2008

Phil Haack gives his summary of a recent study on the benefits of test-driven development (TDD). The study had two groups of students write unit tests for their programming assignments. Students assigned to the test-first group were instructed to write their unit tests before writing their production code, as required by TDD. Students assigned to the test-last group were told to write their tests after writing their production code. Students in the test-first group wrote higher quality code.

The study concluded that code quality was correlated with the number of unit tests, independent of whether the test were written first or last. However, the test-first students wrote more tests in the same amount of time.

Note that students were assigned to test-first or test-last. Most programming studies are just surveys.  The results are always questionable because professional programmers decide their tools. So, for example, you cannot conclude from a survey that technique X makes progrogrammers more productive than technique Y. The survey may say more about the programmers who chose each technique than about the techniques themselves.

The excitement of not knowing what you’re doing

Wednesday, January 23rd, 2008

The following excerpt is a quote from Edsgar Dijkstra.

I think it was in 1970 that I gave my first talk in a foreign country on the design of programs that you could actually control and prove were correct. … The talk fell completely on its face. … The programmers didn’t like the idea at all because it deprived them of the intellectual excitement of not quite understanding what they were doing. They liked the challenge of chasing the bugs.

Taken from Out of their Minds: The Lives and Discoveries of 15 Great Computer Scientists. Emphasis added.

C# verbatim strings vs. PowerShell here-strings

Tuesday, January 22nd, 2008

C# verbatim strings and PowerShell here-strings have just enough in common to be confusing. The differences are summarized here.

C# verbatim strings PowerShell here-strings
May contain line breaks Must contain line breaks
Only double quote variety Single and double quote varieties
Begins with @” Begins with @” (or @’) plus a line break
Ends with “ Ends with a line break followed by “@ (or ‘@)
Cannot contain un-escaped double quotes May contain quotes
Turns off C# escape sequences @’ turns off PowerShell escape sequences but @” does not

Selective use of technology

Monday, January 21st, 2008

In his book The Nine, Jeffrey Toobin gives a few details of Supreme Court Justice David Souter’s decidedly low-tech life. Souter has no cell phone or voice mail. He does not use email. He was given a television once but never turned it on. He moves his chair around his office throughout the day so he can read by natural light. Toobin says Souter lives something like an eighteenth century gentleman.

I find it refreshing to read of someone like Justice Souter with such independence of mind that he chooses not to use much of the technology that our world takes for granted. I also find it encouraging to see examples of people who do not reject electronics entirely but selectively decide how and whether to use them.

I once had the opportunity to see a talk by the father of computer typography, Donald Knuth. Much to my surprise, his slides were handwritten. The author of TeX didn’t see the need to use TeX for his slides. While he cares about the fine details of how math looks in print, he apparently didn’t feel it was worth the effort to typeset his notes for an informal presentation.

I also think of the late computer scientist Edsgar Dijkstra who wrote with pen and ink, even when writing computer programs.

If you’re reading legal briefs by sunlight, your thoughts will not be exactly the same as they would be if you were reading by fluorescent light. If you’re writing computer programs by hand, you’re not going to think the same way you would if you are pecking on a computer keyboard. (And if you do use a computer, your thinking is subtlety different depending on what program you use.) Technology effects the way you think. The effect is not uniformly better or worse, but it is certainly real.

To shake up your thinking, try going low-tech for a day.

Quick TeX to graphic utility

Thursday, January 17th, 2008

Here’s a web site where you can type in some TeX code, click a button, and get back a GIF with a transparent background. Handy for pasting equations into HTML.

http://www.artofproblemsolving.com/LaTeX/AoPS_L_TeXer.php

For example:

gaussian integral

Sun acquires MySQL

Wednesday, January 16th, 2008

Jonathan Schwartz announced on his blog today that Sun is acquiring the company behind MySQL.

Literate programming and statistics

Tuesday, January 15th, 2008

Sweave, mentioned in my previous post, is a tool for literate programming. Donald Knuth invented literate programming and gives this description of the technique in his book by the same name:

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: “Literate Programming.”

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

Knuth says the quality of his code when up dramatically when he started using literate programming. When he published the source code for TeX as a literate program and a book, he was so confident in the quality of the code that he offered cash rewards for bug reports, doubling the amount of the reward with each edition. In one edition, he goes so far as to say “I believe that the final bug in TeX was discovered and removed on November 27, 1985.” Even though TeX is a large program, this was not an idle boast. A few errors were discovered after 1985, but only after generations of Stanford students studied the source code carefully and multitudes of users around the world put TeX through its paces.

While literate programming is a fantastic idea, it has failed to gain a substantial following. And yet Sweave might catch on even though literate programming in general has not.

In most software development, documentation is an after thought. When push comes to shove, developers are rewarded for putting buttons on a screen, not for writing documentation. Software documentation can be extremely valuable, but it’s most valuable to someone other than the author. And the benefit of the documentation may only be realized years after it was written.

But statisticians are rewarded for writing documents. In a statistical analysis, the document is the deliverable. The benefits of literate programming for a statistician are more personal and more immediate. Statistical analyses are often re-run, with just enough time between runs for the previous work to be completely flushed from term memory. Data is corrected or augmented, papers come back from review with requests for changes, etc. Statisticians have more self-interest in making their work reproducible than do programmers.

Patrick McPhee gives this analysis for why literate programming has not caught on.

Without wanting to be elitist, the thing that will prevent literate programming from becoming a mainstream method is that it requires thought and discipline. The mainstream is established by people who want fast results while using roughly the same methods that everyone else seems to be using, and literate programming is never going to have that kind of appeal. This doesn’t take away from its usefulness as an approach.

But statisticians are more free to make individual technology choices than programmers are. Programmers typically work in large teams and have to use the same tools as their colleagues. Statisticians often work alone. And since they deliver documents rather than code, statisticians are free to use use Sweave without their colleagues’ knowledge or consent. I doubt whether a large portion of statisticians will ever be attracted to literate programming, but technological minorities can thrive more easily in statistics than in mainstream software development.

Irreproducible analysis

Tuesday, January 15th, 2008

Journals and granting agencies are prodding scientists to make their data public. Once the data is public, other scientists can verify the conclusions. Or at least that’s how it’s supposed to work. In practice, it can be extremely difficult or impossible to reproduce someone else’s results. I’m not talking here about reproducing experiments, but simply reproducing the statistical analysis of experiments.

It’s understandable that many experiments are not practical to reproduce: the replicator needs the same resources as the original experimenter, and so expensive experiments are seldom reproduced. But in principle the analysis of an experiment’s data should be repeatable by anyone with a computer. And yet this is very often not possible.

Published analyses of complex data sets, such as microarray experiments, are seldom exactly reproducible. Authors inevitably leave out some detail of how they got their numbers. In a complex analysis, it’s difficult to remember everything that was done. And even if authors were meticulous to document every step of the analysis, journals do not want to publish such great detail. Often an article provides enough clues that a persistent statistician can approximately reproduce the conclusions. But sometimes the analysis is opaque or just plain wrong.

I attended a talk yesterday where Keith Baggerly explained the extraordinary steps he and his colleagues went through in an attempt to reproduce the results in a medical article published last year by Potti et al. He called this process “forensic bioinformatics,” attempting to reconstruct the process that lead to the published conclusions. He showed how he could reproduce parts of the results in the article in question by, among other things, reversing the labels on some of the groups. (For details, see “Microarrays: retracing steps” by Kevin Coombes, Jing Wang, and Keith Baggerly in Nature Medicine, November 2007, pp 1276-1277.)

While they were able to reverse-engineer many of the mistakes in the paper, some remain a mystery. In any case, they claim that the results of the paper are just wrong. They conclude “The idea … is exciting. Our analysis, however, suggests that it did not work here.”

The authors of the original article replied that there were a few errors but that these have been fixed and they didn’t effect the conclusions anyway. Baggerly and his colleagues disagree. So is this just a standoff with two sides pointing fingers at each other saying the other guys are wrong? No. There’s an important asymmetry between the two sides: the original analysis is opaque but the critical analysis is transparent. Baggerly and company have written code to carry out every tiny step of their analysis and made the Sweave code available for anyone to download. In other words, they didn’t just publish their paper, they published code to write their paper.

Sweave is a program that lets authors mix prose (LaTeX) with code (R) in a single file. Users do not directly paste numbers and graphs into a paper. Instead, they embed the code to produce the numbers and graphs, and Sweave replaces the code with the results of running the code. (Sweave embeds R inside LaTeX the way CGI embeds Perl inside HTML.) Sweave doesn’t guarantee reproducibility, but it is a first step.