I'm looking for lots of code to do some meta-analysis on. Does anyone know of an large open code repo? Worse case I can just grab a bunch of code off GitHub but that requires unzipping them and isolating the code part.
Thanks!
I'm looking for lots of code to do some meta-analysis on. Does anyone know of an large open code repo? Worse case I can just grab a bunch of code off GitHub but that requires unzipping them and isolating the code part.
Thanks!
sourceforge.net hosts code base of projects such as FireBirdSQL f.e., why not go there?
I suspect ddn3 needs a large sample of different projects, not just one huge codebase (which would result in selection bias). If so, you could just sift through github/bitbucket/sourceforge projects as you suggested, picking out those with "src" folders, files with source code extensions, and so on... a few simple filters should allow you to scrape the code off of almost all projects. You could script that and let it run overnight, wake up with several dozen million lines of code to crunch.
“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”
What's wrong with git cloning the github projects, and scan all source files (*.c, *.cpp, etc.) in the directory, and analyze each individual file?