Out of curiousity, I ran SLOCCount by David Wheeler over all of CPAN. The headline news is that it reports that CPAN took 5,012 person-years to develop and cost about $677 million. There's also a more detailed SLOCCount report on CPAN.
How I did it:
- Create a copy of only the newest versions of everything in CPAN using Randal Schwartz's useful mini-cpan script.
- Run a small perl script to uncompress everything ending in .zip, .tar.gz or .tar.bz2, each file being uncompressed to a directory named after it.
- Change file permissions so that everything was readable and writable by me (some files compressed with no read permissions).
- Run SLOCCount 2.22 over the resulting tree.
- Wrote this page.
Reasons why these results are meaningless:
- Most importantly, I've told SLOCCount all of CPAN is one project, which is probably inflating the numbers significantly. When I get more time, I may run SLOCCount per-distribution, then sum the totals. However, SLOCCount appears to have bugs handling this many sub-projects, so I will need to run them separately and manually sum the results.
- mini-cpan.pl doesn't actually find only the latest versions of everything, some dists are duplicated and some may be ignored.
- There's probably plenty of generated code not being identified correctly.
- There's probably plenty of code downloadable from CPAN that wasn't written for CPAN, and so probably shouldn't be counted.
- All the usual reasons why code metrics based on numbers of lines of source code are meaningless.
If you're interested in open-source code metrics, you should probably also read the article written by David Wheeler himself on GNU/Linux in general - More Than a Gigabuck: Estimating GNU/Linux's Size.
Thanks to Adam Kennedy of Phase N annoying me enough to actually do this.