A proposal for Comment Tagging AI Generated Source Code

December 6, 2022December 8, 2022 Jeff Luszcz

A proposal for Comment Tagging AI Generated Source Code

Source code generated by “AI” tools like GitHub CoPilot or OpenAI ChatGPT should prepend a language appropriate comment block explaining that the source code was generated by a tool as well as helpful metadata to allow discovery and management of that code by code scanners like SCA or SAST tools.

End users would obviously have the ability to remove this comment block, though I believe we all would be well served by marking all generated code with comments detailing the tool that generated it, the version or date generated, as well as inputs that might be helpful in understanding the conditions that caused this code to be generated.

AI Generated code, while still a new universe, appears to have a series of potential defects that existing and future

code scanners will need to be on the look out for. These include code hallucinations, missing cases, confidently wrong constants/algorithms, concerns around the license of the generated code, and other issues we still have not discovered.

Having a clear machine readable key that indicates that this code was AI generated allows for appropriate scanning, filtering, as well as metrics generation.

Code or data generated by AI based tools may have different standards of trust than code or data created or curated by human authors.

Parallels to SCA Snippet Analysis

There are parallels in Software Composition Analysis (SCA) snippet scanning world where awareness of generated code is very helpful when scanning or clearing scan results.

In the snippet analysis world, generated code is extremely similar to massive amounts of other open source code generated by the same tool. Therefore, performing snippet matching is often slow and resource intensive due to the sheer amount of similar snippets. This causes user pain due to slow scanning as well as a perceived large amount of “false positive” matches. There is also a belief that this generated code is “fine” which means it is often incorrectly ignored when it comes to SCA/SAST scanning due to the above issues.

Code generated by traditional non-AI code generators like the .NET IDE, Antlr, Apache MyBatis, protobuf, etc.. often tag their generated code with special comment strings and tags.

This allows SCA tools or SCA tool users to either ignore snippet matching for these files before scans are performed, automatically bucket or filter results afterward, or allow the end user to manage the results quickly through string matching.

One issue with these code generators is that the tags used are not standardized and require multiple methods to discover. The identifying strings include XML fragments, strings, JavaDoc style tags, custom tags, etc…

Future SCA/SAST tools can be even more nimble as they become more aware of the possible code generators that exist and perform appropriate scanning methods to the generated code depending on what needs to be discovered.

Qualifications of a good “generated by” comment

Easy to parse by machine (oh the irony!)
Easy to read and understand by a human
Not too wordy so it will be left in place by the end user
Not too wordy so that code generators decide to use it
Explains what tool generated the code using a unique name
Provides a version number or generated date so that eras of similar code can be examined with appropriate tools
Does not change too quickly so that code generated by the same tool can easily be found with simple pattern matches or even greps

Future extension

In the future, the user text prompt that caused the code to be generated should be embedded as well

Current AI code outputs are typically single pages and should therefor have a single line comments.

Future code generators will generate entire applications and should have a larger banner with more details explaining the user prompts that generated the application.

A tool URL or project home URL (e.g. @generatorURL ) could be optionally used to prevent naming confusion and/or provide easy branding or publicity for the various tools

Current Proposal:

// @generatedNote This code was generated by a AI code generator tool.
// @generatedBy CoolAIGenerator v1.2.3

Examples of current comments from non-AI Code Generators:

https://github.com/VarathaRamanujam/EEE_LOGIN_VALIDATION/blob/fc3787e3acfd302cce5abfd6bbb5b8abf2bef72c/src/main/java/com/hider/eee_students_login/Filemodel.java

/*
* Created on 2022-11-27 ( 18:26:59 )
* Generated by Telosys ( http://www.telosys.org/ ) version 3.3.0
*/

https://github.com/koo5/alertmanager_api_js/blob/7227cc6e1854fe29f45d011584ba563d385a5fdb/src/model/Alert.js

/**
* Alertmanager API
* API of the Prometheus Alertmanager (https://github.com/prometheus/alertmanager)
*
* The version of the OpenAPI document: 0.0.1
* 
*
* NOTE: This class is auto generated by OpenAPI Generator (https://openapi-generator.tech).
* https://openapi-generator.tech
* Do not edit the class manually.
*
*/

https://github.com/Brayds-Dev/Merchandiser-tool/blob/381d9e83698d35bfc10312ec195788cb06c31e39/MerchandisersTool/MerchandisersTool/obj/Release/netstandard2.0/Views/UpdateClientInfo.xaml.g.cs

//------------------------------------------------------------------------------
// <auto-generated>
// This code was generated by a tool.
// Runtime Version:4.0.30319.42000
//
// Changes to this file may cause incorrect behavior and will be lost if
// the code is regenerated.
// </auto-generated>
//------------------------------------------------------------------------------

https://github.com/wq3426/study/blob/e9830a44a3c9e7314fd1f3974b7d68c0eeafc1a6/spring_boot_project/bwmTools2/src/main/java/com/dhl/tools/domain/CargoLocationData.java

/**
*
* This class was generated by MyBatis Generator.
* This class corresponds to the database table CargoLocation_Data
*
* @mbg.generated do_not_delete_during_merge
*/

https://github.com/cqym/cut_tools2/blob/64a447205ae8a9a5c6089097f966c9c3b786357c/development/src/com/tl/resource/dao/pojo/TQuotationProductDetail.java

/**
* This field was generated by Apache iBATIS ibator. This field corresponds to the database column t_quotation_product_detail.id
* @ibatorgenerated Wed Oct 14 14:13:27 CST 2009
*/

https://github.com/thaovy2902/Web_KTMT/blob/ffcc022506b4b5635e67bc046f01a3ed80bb3605/vendor/google/cloud/CommonProtos/src/Audit/RequestMetadata.php

# Generated by the protocol buffer compiler. DO NOT EDIT!
# source: google/cloud/audit/audit_log.proto

https://github.com/mehmetsen80/xtalk/blob/cf88a0025b44a9e4ee0055457c565dbf30c05e1e/MSOutlookRecurrencePatternType.h

/*******************************************************************************
**NOTE** This code was generated by a tool and will occasionally be
overwritten. We welcome comments and issues regarding this code; they will be
addressed in the generation tool. If you wish to submit pull requests, please
do so for the templates in that tool.
This code was generated by Vipr (https://github.com/microsoft/vipr) using
the T4TemplateWriter (https://github.com/msopentech/vipr-t4templatewriter).
Copyright (c) Microsoft Corporation. All Rights Reserved.
Licensed under the Apache License 2.0; see LICENSE in the source repository
root for authoritative license information.
******************************************************************************/

I’m speaking at Open Source 101 on 3/30 at 2:45pm-4:30pm ET

March 29, 2021March 29, 2021 Jeff Luszcz

Are you thinking about selling your company? Are you building a software product? Do you lead an engineering team? Want to jumpstart your knowledge of Open Source and its impacts on security, compliance and valuation?

Join my workshop “Just Enough Open Source: A Kick start on security, license compliance and business models” Tuesday March 30th, 2021 from 2:45pm – 4:30pm ET.

Talk description:

“Open Source powers the world, but you need to do more than download it

In this talk we will provide background on the most common types of open source licenses, business models, security issues and most importantly the processes required to help you remain secure and in compliance. We will discuss best practices, scanning tools, remediation, customer and partner expectations around OSS compliance and how to manage OSS during events such as a product release or M&A.”

FREE registration link: https://opensource101.com/register-now/

Blogging for Overly Busy Experts

March 3, 2021March 3, 2021 Jeff Luszcz

I was talking with an old colleague recently about an age-old problem. He said, “I should write more but I’m just too busy.” It’s true, it’s really hard to justify sitting down and writing something that isn’t part of a deliverable, may not even be read and worst of all surely isn’t billable! Decades of experience get hidden away, only showing up on phone calls and emails. We often joke, “Our best work is under NDA.”

picture of a typewriter and paper

In the end though, public writing is one of the best things that you can do to show off what you know, teach others and make the case that you are the person that should be trusted by your customer for those billable hours.

Here are some of the things that I’ve found helpful for getting words down on paper.

Start writing!

First off, whether you write one article a year, or an article a week, I can assure you that you will look back on your writing with pride. Get something on paper and publish it.

Get a home for your writing!

Figure out where you want to publish your writing. It could be an internal blog, a Slack channel, your external company blog or even a self-hosted blog with or without a custom domain. Sites like LinkedIn, Medium and WordPress make it pretty easy to make a space for your articles. I like getting this figured out first, since once your article is done, you are going to be antsy to get it out there right away.

Keep track of ideas!

I keep an electronic to-do list to keep track of the tasks I need to do. I’ve made a folder just for blog and article topics. Whenever you get a good idea (I always seem to get them when I’m washing dishes!) get it recorded. You might not get around to writing on that topic for a year or more, but having a bank of ideas ready to go really helps the writing process.

Review that list from time to time. More ideas will almost certainly come to you, write those down as well. I currently have 213 items in my “Blogs to write” folder. (I should write more!) Some of the stored away items are questions, some are comparisons, and some are simply the title of an article. After a while you will find that you have a stable of topics that you could crank out articles for with only a little preparation.

Try the “old favorites” to get started!

What are some good ideas that you could get started on today?

There are some articles that “write themselves”, they are often either topical or document part of your day-to-day activities.

For example:

Top Ten Lists / Listicals
How you got started
Trends or things you find interesting
A technology you like and why
The hot news item du jour
The year in review
Take a slide from a webinar and turn it into an entire article
Why some topic is important to you

You could probably start riffing on any of those topics and get something interesting out right now.

An example to get started

Picking a top 5 or top 10 can be an easy onramp to writing an article. Start first by writing your top 10 list. Then, expand each item to explain what they mean. Write an introduction explaining what top 10 items you are going to go through and finally wrap up your article with a couple sentences. It’s fine to be a little cliché if it helps you wrap up your first few articles. Find some stock phrases that help you close up. Name it something engaging or descriptive and then fix any typos and problems in an editing pass or two. Then Publish!

The writing process

The hardest thing is the first word, that’s why I often write my conclusion first! You often will know what you are trying to tell people, so tell them, and then back fill the details and intro as needed.

Outlines or sentence prompts really help. Get some words and phrases down in your document, as many as you can at first. You don’t have to fit them all into your final piece. Some of them become perfect future articles whether or not they get in. Filling in the meat of the article is a lot easier when you’ve blocked out most of the content in the first phase. I like writing a series of topic headings and then going back and fleshing them out. Prompts work!

Use Speech to Text (if that works for you). Many phones, word processors and computers come with free speech to text. I’ve experimented with riffing on topics while taking walks and then cleaning them up later. This can be a great way to get a couple hundreds of unedited words on the page.

I’ve found it works best if you immediately move over to the editing and writing process since there’ll be enough recognition mistakes that you lose track of what you meant if you let too much time pass.

Figure out how many words you want to target. For me around 1000 words is a sweet spot. I often will write a couple hundred more words than that, and then tighten things up in the editing process. This amount of text takes me an evening to write when the words are flowing. I like to do an editing pass when I’ve hit my word count target, but then also will sleep on a piece and re-read the next day for my final editing pass.

Build on what your write

Use what you wrote as a stepping-stone to further content. A sentence can become a new article, or you can expand the topic to a conference proposal or talk.

I’ve had great luck turning articles into webinars (and vice versa). This is a great way to get your words in front of a new audience. Some articles are evergreen and can be updated for the next year, others can be reworked to expand on one of your paragraphs.

It’s your text; don’t let it stop working for you.

Let people know what you wrote

At first, few people will know what you’ve written. Let people know by announcing your article on social media like Twitter, LinkedIn and similar sites. Send links to your coworkers and customers. I’ve found that customers love this type of content and will often forward it around their companies and peers. It goes beyond the thing you are selling and shows you are there as a partner, not just to sell more product.

Wrapping up

Writing is a muscle and when exercised, becomes stronger. 750 words on a topic feels like a walk around the block when you’ve trained for a marathon.

Everyone ends up with a different voice and style so figure out what works for you.

In the end, the most important thing is to write. Get your knowledge and thoughts out there.

I’d love to see what you write!

My Open Source Talk at All Things Open 2020

November 18, 2020 Jeff Luszcz

Recently I was asked to put together a Workshop on Open Source Licensing and Security for All Things Open 2020. I always love these types of talks since it gives you a chance to both kick start people new to the topic and also give a current overview to people who have been doing this role for a while.

The video below has two sections, the first gives an onramp and introduction to open source licensing, and the second half discusses more of the day to to day operations of open source compliance and security.

I’d love to hear your feedback!

Your code will outlive you! Will the Future be able to use it?

October 26, 2020 Jeff Luszcz

Every decade or so the technology world gets punched in the face by a problem requiring poring through massive amounts of code written well before many of us were born.

In the late 1990s it was the Y2K problem when two digit years were no longer sufficient.

In the 2008 financial crisis COBOL based systems required hand editing in order to change state employee salaries en mass.

In 2020 we had the Pandemic related economic impact requiring Bank and Employment system’s code to be modified, many of which again were written in COBOL!

In 2032 we’ll have the Y2k38 problem when the Unix time will overflow causing software issues akin to Y2K.

While many of these systems were closed source and proprietary, open source systems are starting to dominate the software landscape. Much like your grandparents’ hammer, these examples show us that useful tools will almost always outlive the person who first selected or created them.

Besides the question of maintainability and programmer experience with very old languages like COBOL, questions of intellectual property and software licensing will complicate the usability of open source software over the next 50 years and beyond.

Think about Licensing!

As part of software due diligence (when a company purchases another company and confirms that the source code they are buying is correctly owned, licensed and documented) I have had the experience of trying to track down the ownership and licensing of code decades old. This type of software archeology requires access to archives of source code, books, magazines, blogs and other places that programmers have published software over the last 60 years! In many cases the true origin of some source code is lost to time, or can be only partially known.

Questions such as “Who wrote this?”, “Did they expect others to freely use this source?”, or “Does a commercial company own this?” are common.

It is important to make these types of answers clear for those who come after us.

The most important of these is to specify an open source license for the code you are publishing, even if it’s just a “single page” or block of code. If it’s worth putting on the Internet, it’s worth telling people what its license is. There are many suggestions on how to label code to make the copyright and license clear, but I strongly suggest that each file contains a copyright statement and at least a SPDX license identifier (See https://spdx.org/licenses/)

This allows someone in the distant future to know who wrote the source code and what the obligations are even if only a single file remains.

Who do you depend on?

Document all your third party dependencies, including dependencies of dependences. There is no guarantee that any of our current repository managers will still be working decades in the future, but your code may be. By listing these dependencies, you help the Future build and run your code.

In a similar vein, your build system and running environment should be documented as well. For example, if you depend on a certain make system or database to be installed, call these out in separate build and running environment documentation.

Till Death Do Us Part

In the short term, understand that source code is considered property. What happens after you die should be clearly specified. In most places the ownership of your code will pass on to your heirs, but possibly with complicated and divided ownership. Do you want anyone in particular to be the new code owner (or someone outside of your family)? In this case your will (or related documents) should make this clear.

While everyone should have the permissions available under your open source license for the duration of your copyright, you may wish your heir to have the ability to “own” the code just like you do.

By making the ownership clear, they will then have the ability to change the license for your code, just like you likely do now. This means they may have the permission to also sell commercial licenses to this code, or change the open source license of the project. An open source project with multiple contributors has additional concerns about ownership. You may need to make clear dividing lines between projects you own outright as opposed to projects you contribute to, or have others contribute to.

Do you want to change the license after your death, or after a certain amount of time? Make these changes clear as well. Some may want to open previously closed source, or change to a Creative Commons Zero (CC0) license or Public domain declaration.

Similar questions may come up in the case of divorce. While it’s often clear who owns code you write when “on the clock at work”, the code you write at home may have complex ownership issues.

Go beyond the code!

An additional thing to consider is account access, logins, domain names and payments.

Typically we tell everyone to keep their account information private and secure. This may be at odds with your desire to keep your project going even after your death.

Keeping a list of domain names, third party services and other account information related to your project can help the heirs to your code keep the project going.

Bear in mind that, after your death, certain accounts may be locked, go away or may be controlled by people other than the code owner.

As with most things involving intellectual property and life events, it is best to consult a lawyer to understand your best options.

Rest in Peace!

A little care and effort in the present can save the community a significant amount of time in the future. By specifying a license, documenting project dependencies, and clearly transferring ownership you can make sure your code stands the test of time.

I’m speaking at All Things Open on 10/19 at 2pm ET

October 18, 2020October 18, 2020 Jeff Luszcz

Are you thinking about selling your company? Are you building a software product? Want to jumpstart your knowledge of Open Source and its impacts on security, compliance and valuation? Join my workshop a week from today at all Things Open.

“Open Source Licensing: Types, Strategies and Compliance” Monday October 19th from 2pm -3:45pm ET (11am PT)

Free Registration link: https://2020.allthingsopen.org/register/

JavaScript Minimization, Obfuscation and Open Source Compliance

September 1, 2020 Jeff Luszcz

One of the most important things for a technology company is to have their web site look attractive, be responsive and load quickly. Milliseconds can be the difference between gaining or losing a customer and web designers and programmers will use every trick in the book to make their pages load rapidly.

A commonly used technique to speed up web sites is to send fewer bytes over the Internet when loading a web page. You can do this by using smaller photos, lower quality images and smaller JavaScript and HTML files.

How can you shrink a source file without throwing away functionality?

To reduce the size of JavaScript source files, a technique called Minification is used to remove unneeded text in a file while still preserving the core functionality. This comes at the expense of human readability.

Due to how this technique works, it complicates compliance with Open Source Licenses. In this article I’ll discuss the basics of Minification and some best practices to remain compliant.

What is Minification?

Minification is the process of removing redundant text, whitespace or descriptive variable names that are unneeded by the web browser to interpret the code. For example, your human readable code might use a variable with the name EMPLOYEE_LAST_NAME and use this long name dozens of times in your code. The minifier will replace this name with something short, perhaps simply the letter A. This saves 17 characters each times the variable is used.

Most, if not all, comments are removed, as are extra spaces. These small changes add up, and over the entire file, you may find that you save 70% or more compared to the original file.

For example, the popular JQuery library is available as both the human readable and minimized files. The human readable file weighs in at 288KB while the minimized file is only 89KB.

Why would you want to minify code?

Minification typically is used to reduce the size of files to speed up web page loading.

A developer might also minimize their code in order to put multiple libraries together in a single file for ease of downloading and use by their page. In those cases you may see a comment detailing what the original filename or library name was, and possible a short license blurb.

How is that different than Obfuscation?

In some cases minification is used to make it more difficult (though not impossible) to reverse engineer or easily copy code or business logic. You may sometimes hear people use the term “Obfuscation” used for that case.

What are some common tools used to minimize or obfuscate code?

As with most tools there are many ways to scratch an itch, so there are dozens of minification tools available. Some are online only, others are GUI tools and many are command line tools to be used as part of your development tool chain. Some of the most common command line tools you will encounter are:

UglifyJS https://github.com/mishoo/UglifyJS

JSMin https://www.crockford.com/jsmin.html

Minifier https://www.npmjs.com/package/minifier

How does minification affect Open Source License compliance?

Many open source licenses require the preservation of the original copyright strings and license text when a program incorporating those libraries is distributed (or possibly served via software as a service).

Since comments are typically the first thing removed to make a minified version of a source file, the copyright and license text can be discarded in a way that makes it difficult or impossible to comply with the open source license the code is under.

Additionally, some source files may be checked in to source control already in minified form. These files may have been stripped of the required copyright and license text. It may be required to review minified files that are checked in to source code control to discover their true original and update them with the proper copyright and open source license text.

It is often possible to preserve the copyright and license in the minified files.

Many minification tools provide flags or plugins that attempt to discover license comments and preserve them. For example, the UglifyJS plugin uglify-save-license allows the user to preserve license text found on the first line, or if it is in a comment block containing common license names or copyright statements.

See https://www.npmjs.com/package/uglify-save-license

That said, the minified output should be viewed and compared to the original file in order to confirm that the appropriate copyrights and license text are preserved.

If you end up using a library that does not declare its license in a way discoverable by your minification tools, it would be helpful to log an enhancement request with the original component author to make it easier to comply with the licensing in the future. You may find that you need to manually fix this license comment yourself in the meantime.

SAAS vs. Distribution issues with Open Source licenses and minification

Many open source licenses have obligations that come into effect when the program is distributed to users.

The need to preserve or display copyrights and license text is clear if you are distributing a product to users for them to run on machines that they control (a classic distribution).

Untold hours have been spent discussing whether JavaScript and other web resources downloaded to a web browser count as “distribution” in Software as a Service (SaaS) applications.

In my opinion, I would treat any file or resource downloaded to a web browser as “distribution” and would comply with the license obligations as expected in that use case.

As with many elements of OSS licensing you should request legal advice from your legal counsel about what is required for your use case and venue.

CDN and Minification

Web apps often use a Content Delivery Network (CDN) to speed up access to resources, images and code that the app requires to run. Think of a CDN as a global cache (or fast hard drive) that distributes often used files. CDNs will often minimize JavaScript files automatically. You should take care to confirm that the proper licensing is being preserved by the CDN you have selected to host your files. This can be done by examining the source files delivered to your end user’s browser.

Source Maps

In order to help debug minified source, a technology called Source Maps was created. Source Maps allow one to un-minify source code for debugging purposes, though require special mapping files to be used as well as Source Map aware development tools to view.

While this process is often done in development, some organization ship source maps to production. Care should be taken to confirm that the actual users of the web application can see the required copyright and license text without the use of specialized developer tools.

What are best practices for dealing with Minimized code?

I hope this quick overview of minification has been helpful. As you can see, techniques designed for speed and performance can cause difficulties with open source license compliance. Keep this checklist in mind as you review your program and use of minification:

Understand your license obligations when using JavaScript source files
Review the minification tools or services you use
Confirm the proper copyright notices, license text and other information is preserved by the minification step
Perform an asset review or Software Composition Analyses (SCA) step to discover untracked third party source code
Log request for enhancement / bugs against Open Source Libraries or Tools to make it easier to preserve OSS license information
Store away the original source files, not just the minified version

Getting the Gist of GitHub Gist Licensing

August 20, 2020August 20, 2020 Jeff Luszcz

What is a GitHub Gist?

A GitHub “Gist” is a webpage used to share snippets of code or a single file up to 1MB of source or text. It is commonly used to share examples, small utilities or documentation. Other similar sites are commonly referenced to as a “pastebin”.

Why would you use the code from one?

It’s common for a developer to search for code to implement a small procedure, for example, “reverse the letters in a string” or “check for the existence of a file on disk”. These small pieces of code are not seen as large or complicated enough to be named and hosted as an open source project or have an entire web site devoted to them.

More complicated code is likely to be found as well. Single page utilities or programs can be encountered, often with embedded documentation in comments, or on a web page pointing to the Gist.

You may also see Gists used to host data listings or research. Common examples of this are ID numbers of products or the code used to demonstrate a bug or exploit.

Why does it need a license?

In a nutshell, other people’s source code or resources often require a software license for you to use it in your project. A license (commercial or open source) is a way to explain the conditions that you are able to use that software and without one you likely do not have permission to use it.

The topic of what requires a license and at what size or complexity of source code is sufficient to require a license is beyond the scope of this blog entry. In general, source code cut and pasted from somewhere else may require a license in order to be legally used.

How do Gist authors show off their licenses?

Unfortunately most Gist authors FAIL to clearly show what license is attached to the source code they are sharing.

The most helpful and accurate way for a Gist author to declare their license is to put the license text in the source on the Gist itself. Typically this would be at the top of the file in a comment block and would contain the copyright date and owner if required by the license.

You may also see a short one-line comment such as:

This code is licensed under the terms of the MIT license

This is helpful but not complete, you don’t have a copyright date or Copyright owner, but at least have a general idea of the licensing style the author prefers.

In some cases the Gist author places a note somewhere in their GitHub site or homepage that declares the default license for their Gists. They may use text such as “The default license for all public Gists I publish is the following:” and then put the name or text of the license.

While this is far better than nothing (and does reduce perceived clutter in published Gists) it does break the connection between the source and the license that it was published under. It makes it harder for a consumer of that source to bring along the accurate licensing information when using the code.

What if it doesn’t have a license and you want to use it?

If a file, snippet or other content does not have a declared license, it is a good practice to reach out to the original author and ask what license the content is available under. You may want to suggest a license that you are comfortable with in order to short circuit any back and fourth. For example, you may want to send an email like so:

Dear Jeff,

  I found the code you published on your GitHub Gist site at https://gist.github.com/jeff-luszcz/c470d282599ea42424b976c673d7c115

It currently does not appear to have a license associated with it. I would like to use this code but only if it has an open source license. Would you be able to let me know what license this code is under? I’m a big fan of the MIT license if you are looking for suggestions. (See https://choosealicense.com/licenses/mit/ )




Thanks!

While the author does not owe you a response, don’t be surprised if you get a helpful answer in a couple of days or so.

How should you preserve or document licenses to the code you use from a Gist?

As a developer one of the best things you can do for yourself or others that use your code is to document the licensing and origin of all third party code you use. This helps you legally share the projects you build, and also helps your users comply with open source licenses and stay ahead of security problems in the third party code you selected.

The best time to document things is when you have the data initially right in front of you.

If need be, cut and paste the licensing and origin information and attach it to the code you are using. For example:

# code snippet taken from https://gist.github.com/jeff-luszcz/c470d282599ea42424b976c673d7c115

# licensed under a MIT license as per the information on http://www.example.com/jeff-gists-license-info which says:

#  All my gist code is licensed under the terms of the MIT license

What are some caveats or warnings about using code from Gists?

One of the questions people often have about code snippets is “Where did this code come from originally?” Is the code in the Gist a bug fix to some Apache code? Is it originally from the Linux Kernel and copied out as an example? Is it something from a commercial SDK that wasn’t publically available on the net?

If so, the original author’s licensing may have been stripped (often accidently, but sometimes purposely) by the person who has published the Gist.

It may be difficult to track this information down, and it may not be something the original author can even remember. Through the use of software composition analysis (SCA) tools or source code fingerprinting databases, you may discover earlier origins of code. In that case you may need to update your licensing or remediate (fix, remove, etc…) the snippet.

Wrap up and next steps

If you get in the habit of documenting the licensing you use for all third party content, including snippets, you will find yourself in a much better position in the future when your code is used by other people or projects. It is always easier to get licensing done right at creation time, than if you have to go back in time and become a license or source code detective.

15 Years in Open Source Compliance (talk given for OpenChain)

June 29, 2020August 20, 2020 Jeff Luszcz

Recently I was asked to give a short talk to the Open Chain project about my experiences over the last 15 years of helping people with Open Source compliance.

Here are the slides from that talk, let me know if you have any questions or comments!

What is Zebra Cat Zebra?

June 18, 2020 Jeff Luszcz

As long as I’ve had to tell people my last name, I’ve had to spell it out. Teachers, friends, coworkers, random companies, and so on need a little help after hitting the 4 consonants in a row.

L-u-ess Zebra Cat Zebra has been a helpful hint passed down the generations to remember how to spell my name.

This is also the place I’ll be posting articles and thoughts on open source, security and technology in general.

Please let me know if you have any questions or comments about what I write. My contact info should be available in the menu.

Zebra Cat Zebra

Jeff Luszcz's Tech and IP Blog

Author: Jeff Luszcz