GitGuardian data reveals 20% rise in ‘secrets’ hidden in public GitHub repos

 GitGuardian data reveals 20% rise in ‘secrets’ hidden in public GitHub repos

Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.


GitGuardian, a cybersecurity platform that helps companies find sensitive data hidden in code, has revealed that it found more than 2 million “secrets” in public GitHub repositories in 2020, a 20% increase on the previous year.

Founded out of Paris in 2017, GitGuardian serves to “prevent hackers from using GitHub as a backdoor to your business,” as the company puts it, by scanning public GitHub repositories in real time to identify any private data bad actors could use to access their systems (e.g. a cloud or database), such as API or cryptographic keys, login credentials, and more.

GitGuardian’s inaugural State of Secrets Sprawl on GitHub report is based on its constant monitoring of every commit pushed to a public GitHub repository. By comparing data from last year against the corresponding period in 2019, it showed that the number of secrets detected had grown by a fifth.

The “secrets sprawl” GitGuardian’s report refers to is essentially authentication credentials stored in lots of different places, making it hard to track.

“We think the growth is due to two factors — the increase of GitHub usage and the move toward cloud architectures and componentization,” GitGuardian CEO Jeremy Thomas told VentureBeat. “These two trends generate more digital authentication credentials.”

Evidence suggests there is some truth to these assertions. With regards to GitHub usage, GitHub’s own data indicated that people collaborated more last year as open source project contributions jumped by over 40% in the months following lockdown. And a Red Hat study released last week found that enterprises upped their open source game in 2020.

More and more companies are shifting from monolithic on-premises software to the cloud and a microservices-based software architecture. But while applications built on smaller, function-based components that connect via APIs may be easier to develop and maintain, the culmination of all this digital transformation is that developers have a growing amount of sensitive data to manage.

GitGuardian, which raised a $12 million tranche of funding in 2019 from backers such as GitHub cofounder Scott Chacon, is one of a number of players operating in the secrets detection and management space. A few weeks back, Israeli startup Spectral exited stealth with $6.2 million to find costly security mistakes buried in code, while last week Doppler expanded its cloud-hosted secrets manager to the enterprise with $6.5 million in funding.

Human condition

The problem platforms like GitGuardian are looking to fix relates to human error, which is likely to increase as a company hires more developers. Error rates are also compounded by shortened release cycles.

But high-profile data breaches have put companies under increasing pressure to shore up their defenses. A few years back, Uber revealed a major breach that exposed the personal data of millions of users. Several security shortcomings were at play, but the root cause was that the hackers found an AWS access key in a private GitHub repository belonging to an Uber developer. The hackers then used that key to access files from Uber’s Amazon S3 Datastore. This incident illustrates how important it is to safeguard secrets.

GitGuardian’s report found that 85% of the 2 million secrets it found were in developers’ personal repositories, which fall outside of corporate control. “What’s surprising is that a worrying number of these secrets leaked on developers’ personal public repositories are corporate secrets, not personal secrets,” Thomas added.

This means a company’s internal systems could be vulnerable due to sensitive data hidden in current or former developers’ repositories. But it also shows how the problem can impact companies, regardless of whether they work on open source projects or not, as they have little visibility or control over how their developers use GitHub.

“Organizations can’t control what developers do with their personal GitHub projects,” Thomas explained. “GitHub is a fantastic platform for developers to collaborate together, learn new skills, and showcase their work. Developers typically have one GitHub account that they use both for personal and professional purposes, sometimes mixing the repositories. Developers use GitHub as their LinkedIn — that’s why they need one account that is really tied to them and contains their work.”

Digging further down into GitGuardian’s report shows that 27.6% of secrets found were access keys to Google accounts Other common system secrets found offered access to development tools (15.9%), data storage (15.4%), messaging tools (11.1%), and cloud providers (8.4%). In terms of the top file extensions that contained secrets, Python accounted for 27.7%, followed by JavaScript (18.7%), environment variables files (9.6%), and JSON (7.5%).

GitGuardian’s best practice suggestions to avoid such scenarios include restricting API access and permissions, encouraging developers not to share secrets unencrypted in messaging systems such as Slack, and never storing unencrypted secrets in .git repositories.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Leave a Reply

Your email address will not be published. Required fields are marked *