Github Copilot: Public release and Privacy Concerns

So what is Github Pilot?

GitHub Copilot is an AI pair programmer that helps you write code faster and with less work. It draws context from comments and code to suggest individual lines and whole functions instantly. GitHub Copilot is powered by Codex, a generative pretrained language model created by OpenAI. It is available as an extension for Visual Studio Code, Visual Studio, Neovim, and the JetBrains suite of integrated development environments (IDEs).

Github Inc

Basically, the GOAT of code autocomplete

How to get access to GitHub copilot?

It's currently free for students who have the education student pack from Github. It's also free for users who have contributed to popular open-source frameworks.
Lastly, you can get it through their 60-day free trial that transitions to either $10/month or 100$/year
Try it here

But is it good?

This depends on many factors, such as the language you code in and the complexity of your projects. Since Copilot was trained on open source projects, where there isn't an even distribution of programming languages, it means it's likely to be really good at specific languages and certain common tasks. (reversing array, form validation code etc)

From a language perspective, expect it to perform really well in popular languages like Python, JavaScript, TypeScript, Ruby, Java, and Go. You should definitely give it ago(using the trial) to fully asses if its worth it, as this may very subjective and depend on your usecase

Will Copilot steal my code

This was a huge concern for me during the technical preview, as their privacy policy, was a bit vague and not so clear.

There are two concerns here:

1. Will Copilot be trained on my snippets and surrounding code?

Here surrounding code stands for the model prompt, the information needed by copilot to give a suggestion, might include other files in your IDE

From my understanding, No*, though it's not completely clear
Here is a quote from Github, regarding that

Will my private code be shared with other users?
No. We use data, including information about which suggestions users accept or reject, to improve the model. We follow responsible practices in accordance with our Privacy Statement to ensure that your code snippets will not be used as suggested code for other users of GitHub Copilot.
Github Copilot FAQ

the language here isn't very clear, they may mean that:

We use your code snippets to humanly see which suggestions are wrong, and then improve the model by training it in that specific area,
Example: Copilot's suggestion acceptance on Keras framework is low, so lets train copilot on publicly available Keras code and not the suggestions/snippets it just failed on

we feed-back your snippets to copilot to train it, but we have "responsible practices" in place to prevent it from spitting back your code, the problem here is obviously you can never trust how good those measures are.

2. Will engineers at Github/Copilot be able to see the content of my code snippets

From the above statement, it seems so.

But fear not there is a setting to disable this.

Allow GitHub to use my code snippets for product improvements. Uncheck it (it should be enabled by default if you are transitioning from technical preview)

If you have disabled Allow GitHub to use my code snippets for product improvements we will not record the details of the prompt (the source code that was sent to Copilot to complete) or the response(s) from the model.

If you disable this I believe that the only data collected for product improvement should be basic telemetry data, such as usage statistics

June 24, 2022 Bob Kimani

AI Coding Github