Recently on an engagement we came across a rather interesting misconfiguration on a clients website. Somewhere during their deployment process, their web developer had git-cloned the website's source from their internal git repository, onto the webserver itself. This seems like a reasonable thing to do, for a web developer, as it allowed s/he to immediately deploy their latest code with minimal effort of shuffling around files. What they were unaware of, is it exposed their site to attack, as they forgot to either delete or protect the ".git" directory created during the git-clone process, which contained a full copy of their sites source code, including configuration data (think: database credentials...).

We were able to leverage this small oversight to gain a copy of the clients sites source code in its entirety, effectively allowing us to turn our "black box" assessment into a "white box" assessment. The availability of API keys, database connection strings/credentials, and the likes in the source we acquired assisted us in breaching the client's customer database in a very short space of time.

This issue seems to crop up time and time again with developers (using all manner of version control systems - svn, Mercurial, etc) simply git cloning stuff to production servers, so we figured it might well be worthwhile doing a small survey of how widespread this issue is.

Firstly, we should point out that the small survey we did was limited due to other engagements we had, and that it only serves as an indicator. We are working to run a far larger survey in the coming weeks.

The methodology we used was rather simple. We wrote a very short Python script using asyncio/aiohttp to perform HTTP GET requests for the path "/.git/config" on a selection of websites, with a trivial regular expression to determine if the file existed or not. If the file existed, the URL was logged. We then ran this script on the Alexa Top 10,000 list, which completed in a couple of hours.

We decided to use the Alexa Top 10,000 as a starting point, as the sites on that list are among the most well trafficked in the world, and therefore, we figured they would probably make excellent targets for a threat actor to exploit for the purposes of data theft, malware spreading, or other nefarious activities.

Furthermore, we limited our initial scan to just testing for the Git folder, due to time constraints. Future surveys will cover more version control systems.

Anyway, of the Alexa Top 10,000 websites, 90 of them were leaking data by having an internet exposed .git directory. This is 0.9%, which may not seem like much, however given the high profile of these websites, the impact to public online security is quite serious.

Some examples of affected types of websites (we don't intend to publicize the names of impacted sites):
News Websites
Cryptocurrency Related Websites
Video Streaming Websites (some of questionable legality)
Advertisement Networks (malvertizing, anyone?)
A major Telecoms Provider/Mobile Network/ISP
Pornography/Adult Websites
Tourism/Bookings Websites
A large eCommerce Website

We will be attempting to notify the organizations affected by this issue in due course, once we manage to locate security contacts for them. And as mentioned before, we intend to do a more far reaching investigation into this issue in the near future.