Advertisement

Anyone know of large open code repository?

Started by April 15, 2014 10:42 PM
4 comments, last by ddn3 10 years, 5 months ago

I'm looking for lots of code to do some meta-analysis on. Does anyone know of an large open code repo? Worse case I can just grab a bunch of code off GitHub but that requires unzipping them and isolating the code part.

Thanks!

sourceforge.net hosts code base of projects such as FireBirdSQL f.e., why not go there?

Advertisement
U4 engine about 2million lines, need to pay at least one month @ $19. Doom 3 source code, GPL, NASM source code, GPL. Couple of ideas there.

I suspect ddn3 needs a large sample of different projects, not just one huge codebase (which would result in selection bias). If so, you could just sift through github/bitbucket/sourceforge projects as you suggested, picking out those with "src" folders, files with source code extensions, and so on... a few simple filters should allow you to scrape the code off of almost all projects. You could script that and let it run overnight, wake up with several dozen million lines of code to crunch.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

What's wrong with git cloning the github projects, and scan all source files (*.c, *.cpp, etc.) in the directory, and analyze each individual file?

There isn't anything wrong with using SourceForge or GitHub, I was just wondering if there wasn't already a large repro of just code made for this purpose. Bacterius is correct, I want a wide range of sources.

Thanks guys!

This topic is closed to new replies.

Advertisement