My Open Source Talk at All Things Open 2020

Recently I was asked to put together a Workshop on Open Source Licensing and Security for All Things Open 2020. I always love these types of talks since it gives you a chance to both kick start people new to the topic and also give a current overview to people who have been doing this role for a while. 

 

The video below has two sections, the first gives an onramp and introduction to open source licensing, and the second half discusses more of the day to to day operations of open source compliance and security. 

 

I’d love to hear your feedback!

Your code will outlive you! Will the Future be able to use it?

Every decade or so the technology world gets punched in the face by a problem requiring poring through massive amounts of code written well before many of us were born.

In the late 1990s it was the Y2K problem when two digit years were no longer sufficient.

 

In the 2008 financial crisis COBOL based systems required hand editing in order to change state employee salaries en mass.

 

In 2020 we had the Pandemic related economic impact requiring Bank and Employment system’s code to be modified, many of which again were written in COBOL!

 

In 2032 we’ll have the Y2k38 problem when the Unix time will overflow causing software issues akin to Y2K.

 

While many of these systems were closed source and proprietary, open source systems are starting to dominate the software landscape. Much like your grandparents’ hammer, these examples show us that useful tools will almost always outlive the person who first selected or created them.

 

Besides the question of maintainability and programmer experience with very old languages like COBOL, questions of intellectual property and software licensing will complicate the usability of open source software over the next 50 years and beyond.

Think about Licensing!

As part of software due diligence (when a company purchases another company and confirms that the source code they are buying is correctly owned, licensed and documented) I have had the experience of trying to track down the ownership and licensing of code decades old. This type of software archeology requires access to archives of source code, books, magazines, blogs and other places that programmers have published software over the last 60 years! In many cases the true origin of some source code is lost to time, or can be only partially known.

 

Questions such as “Who wrote this?”, “Did they expect others to freely use this source?”, or “Does a commercial company own this?” are common.

 

It is important to make these types of answers clear for those who come after us.

 

The most important of these is to specify an open source license for the code you are publishing, even if it’s just a “single page” or block of code. If it’s worth putting on the Internet, it’s worth telling people what its license is.  There are many suggestions on how to label code to make the copyright and license clear, but I strongly suggest that each file contains a copyright statement and at least a SPDX license identifier (See https://spdx.org/licenses/)

 

This allows someone in the distant future to know who wrote the source code and what the obligations are even if only a single file remains.

Who do you depend on?

Document all your third party dependencies, including dependencies of dependences. There is no guarantee that any of our current repository managers will still be working decades in the future, but your code may be. By listing these dependencies, you help the Future build and run your code.

 

In a similar vein, your build system and running environment should be documented as well. For example, if you depend on a certain make system or database to be installed, call these out in separate build and running environment documentation. 

Till Death Do Us Part

In the short term, understand that source code is considered property. What happens after you die should be clearly specified. In most places the ownership of your code will pass on to your heirs, but possibly with complicated and divided ownership.  Do you want anyone in particular to be the new code owner (or someone outside of your family)? In this case your will (or related documents) should make this clear.

 

While everyone should have the permissions available under your open source license for the duration of your copyright, you may wish your heir to have the ability to “own” the code just like you do.

 

By making the ownership clear, they will then have the ability to change the license for your code, just like you likely do now. This means they may have the permission to also sell commercial licenses to this code, or change the open source license of the project. An open source project with multiple contributors has additional concerns about ownership. You may need to make clear dividing lines between projects you own outright as opposed to projects you contribute to, or have others contribute to.

 

Do you want to change the license after your death, or after a certain amount of time?  Make these changes clear as well. Some may want to open previously closed source, or change to a Creative Commons Zero (CC0) license or Public domain declaration.

 

Similar questions may come up in the case of divorce. While it’s often clear who owns code you write when “on the clock at work”, the code you write at home may have complex ownership issues.

Go beyond the code!

An additional thing to consider is account access, logins, domain names and payments.

Typically we tell everyone to keep their account information private and secure. This may be at odds with your desire to keep your project going even after your death.

Keeping a list of domain names, third party services and other account information related to your project can help the heirs to your code keep the project going.

Bear in mind that, after your death, certain accounts may be locked, go away or may be controlled by people other than the code owner.

 

As with most things involving intellectual property and life events, it is best to consult a lawyer to understand your best options.

Rest in Peace!

A little care and effort in the present can save the community a significant amount of time in the future. By specifying a license, documenting project dependencies, and clearly transferring ownership you can make sure your code stands the test of time.

 

 

 

 

JavaScript Minimization, Obfuscation and Open Source Compliance

 

One of the most important things for a technology company is to have their web site look attractive, be responsive and load quickly. Milliseconds can be the difference between gaining or losing a customer and web designers and programmers will use every trick in the book to make their pages load rapidly.

 

A commonly used technique to speed up web sites is to send fewer bytes over the Internet when loading a web page. You can do this by using smaller photos, lower quality images and smaller JavaScript and HTML files.

 

How can you shrink a source file without throwing away functionality?

 

To reduce the size of JavaScript source files, a technique called Minification is used to remove unneeded text in a file while still preserving the core functionality. This comes at the expense of human readability.

 

Due to how this technique works, it complicates compliance with Open Source Licenses. In this article I’ll discuss the basics of Minification and some best practices to remain compliant.

 

 

What is Minification?

 

Minification is the process of removing redundant text, whitespace or descriptive variable names that are unneeded by the web browser to interpret the code. For example, your human readable code might use a variable with the name EMPLOYEE_LAST_NAME and use this long name dozens of times in your code. The minifier will replace this name with something short, perhaps simply the letter A. This saves 17 characters each times the variable is used.

 

Most, if not all, comments are removed, as are extra spaces. These small changes add up, and over the entire file, you may find that you save 70% or more compared to the original file.

For example, the popular JQuery library is available as both the human readable and minimized files. The human readable file weighs in at 288KB while the minimized file is only 89KB.

Why would you want to minify code?

 

Minification typically is used to reduce the size of files to speed up web page loading.

 

A developer might also minimize their code in order to put multiple libraries together in a single file for ease of downloading and use by their page. In those cases you may see a comment detailing what the original filename or library name was, and possible a short license blurb.

 

How is that different than Obfuscation?

 

In some cases minification is used to make it more difficult (though not impossible) to reverse engineer or easily copy code or business logic. You may sometimes hear people use the term “Obfuscation” used for that case.

 

 

What are some common tools used to minimize or obfuscate code?

 

As with most tools there are many ways to scratch an itch, so there are dozens of minification tools available. Some are online only, others are GUI tools and many are command line tools to be used as part of your development tool chain. Some of the most common command line tools you will encounter are:

 

UglifyJS https://github.com/mishoo/UglifyJS

JSMin https://www.crockford.com/jsmin.html

Minifier https://www.npmjs.com/package/minifier

 

 

 

How does minification affect Open Source License compliance?

 

Many open source licenses require the preservation of the original copyright strings and license text when a program incorporating those libraries is distributed (or possibly served via software as a service).

 

Since comments are typically the first thing removed to make a minified version of a source file, the copyright and license text can be discarded in a way that makes it difficult or impossible to comply with the open source license the code is under.

 

Additionally, some source files may be checked in to source control already in minified form. These files may have been stripped of the required copyright and license text. It may be required to review minified files that are checked in to source code control to discover their true original and update them with the proper copyright and open source license text.

 

It is often possible to preserve the copyright and license in the minified files.

 

Many minification tools provide flags or plugins that attempt to discover license comments and preserve them. For example, the UglifyJS plugin uglify-save-license allows the user to preserve license text found on the first line, or if it is in a comment block containing common license names or copyright statements.

 

See https://www.npmjs.com/package/uglify-save-license

 

That said, the minified output should be viewed and compared to the original file in order to confirm that the appropriate copyrights and license text are preserved.

 

If you end up using a library that does not declare its license in a way discoverable by your minification tools, it would be helpful to log an enhancement request with the original component author to make it easier to comply with the licensing in the future. You may find that you need to manually fix this license comment yourself in the meantime.

 

 

SAAS vs. Distribution issues with Open Source licenses and minification

 

Many open source licenses have obligations that come into effect when the program is distributed to users.

 

The need to preserve or display copyrights and license text is clear if you are distributing a product to users for them to run on machines that they control (a classic distribution).

 

Untold hours have been spent discussing whether JavaScript and other web resources downloaded to a web browser count as “distribution” in Software as a Service (SaaS) applications.

 

In my opinion, I would treat any file or resource downloaded to a web browser as “distribution” and would comply with the license obligations as expected in that use case.

 

As with many elements of OSS licensing you should request legal advice from your legal counsel about what is required for your use case and venue.

 

 

CDN and Minification

 

Web apps often use a Content Delivery Network (CDN) to speed up access to resources, images and code that the app requires to run. Think of a CDN as a global cache (or fast hard drive) that distributes often used files. CDNs will often minimize JavaScript files automatically. You should take care to confirm that the proper licensing is being preserved by the CDN you have selected to host your files. This can be done by examining the source files delivered to your end user’s browser.

 

 

Source Maps 

In order to help debug minified source, a technology called Source Maps was created. Source Maps allow one to un-minify source code for debugging purposes, though require special mapping files to be used as well as Source Map aware development tools to view.  

 

While this process is often done in development, some organization ship source maps to production. Care should be taken to confirm that the actual users of the web application can see the required copyright and license text without the use of specialized developer tools.

 

 

What are best practices for dealing with Minimized code?

 

I hope this quick overview of minification has been helpful. As you can see, techniques designed for speed and performance can cause difficulties with open source license compliance. Keep this checklist in mind as you review your program and use of minification:

 

  • Understand your license obligations when using JavaScript source files
  • Review the minification tools or services you use
  • Confirm the proper copyright notices, license text and other information is preserved by the minification step
  • Perform an asset review or Software Composition Analyses (SCA) step to discover untracked third party source code
  • Log request for enhancement / bugs against Open Source Libraries or Tools to make it easier to preserve OSS license information
  • Store away the original source files, not just the minified version

 

 

 

 

 

 

 

Getting the Gist of GitHub Gist Licensing

 

 

What is a GitHub Gist?

A GitHub “Gist” is a webpage used to share snippets of code or a single file up to 1MB of source or text. It is commonly used to share examples, small utilities or documentation. Other similar sites are commonly referenced to as a “pastebin”.

 

Why would you use the code from one?

 

It’s common for a developer to search for code to implement a small procedure, for example,  “reverse the letters in a string” or “check for the existence of a file on disk”. These small pieces of code are not seen as large or complicated enough to be named and hosted as an open source project or have an entire web site devoted to them.

 

More complicated code is likely to be found as well. Single page utilities or programs can be encountered, often with embedded documentation in comments, or on a web page pointing to the Gist.

 

You may also see Gists used to host data listings or research. Common examples of this are ID numbers of products or the code used to demonstrate a bug or exploit.

 

 

Why does it need a license?

 

In a nutshell, other people’s source code or resources often require a software license for you to use it in your project. A license (commercial or open source) is a way to explain the conditions that you are able to use that software and without one you likely do not have permission to use it.

 

The topic of what requires a license and at what size or complexity of source code is sufficient to require a license is beyond the scope of this blog entry. In general, source code cut and pasted from somewhere else may require a license in order to be legally used.

 

How do Gist authors show off their licenses?

 

Unfortunately most Gist authors FAIL to clearly show what license is attached to the source code they are sharing.

 

The most helpful and accurate way for a Gist author to declare their license is to put the license text in the source on the Gist itself. Typically this would be at the top of the file in a comment block and would contain the copyright date and owner if required by the license.

 

You may also see a short one-line comment such as:

 

This code is licensed under the terms of the MIT license

 

This is helpful but not complete, you don’t have a copyright date or Copyright owner, but at least have a general idea of the licensing style the author prefers.

 

In some cases the Gist author places a note somewhere in their GitHub site or homepage that declares the default license for their Gists. They may use text such as “The default license for all public Gists I publish is the following:” and then put the name or text of the license.

 

While this is far better than nothing (and does reduce perceived clutter in published Gists) it does break the connection between the source and the license that it was published under. It makes it harder for a consumer of that source to bring along the accurate licensing information when using the code.

 

 

What if it doesn’t have a license and you want to use it?

 

If a file, snippet or other content does not have a declared license, it is a good practice to reach out to the original author and ask what license the content is available under. You may want to suggest a license that you are comfortable with in order to short circuit any back and fourth. For example, you may want to send an email like so:

 

Dear Jeff,

  I found the code you published on your GitHub Gist site at https://gist.github.com/jeff-luszcz/c470d282599ea42424b976c673d7c115

It currently does not appear to have a license associated with it. I would like to use this code but only if it has an open source license. Would you be able to let me know what license this code is under? I’m a big fan of the MIT license if you are looking for suggestions. (See https://choosealicense.com/licenses/mit/ )




Thanks!

 

 

While the author does not owe you a response, don’t be surprised if you get a helpful answer in a couple of days or so.

 

 

How should you preserve or document licenses to the code you use from a Gist?

As a developer one of the best things you can do for yourself or others that use your code is to document the licensing and origin of all third party code you use. This helps you legally share the projects you build, and also helps your users comply with open source licenses and stay ahead of security problems in the third party code you selected.

The best time to document things is when you have the data initially right in front of you.

If need be, cut and paste the licensing and origin information and attach it to the code you are using. For example:

 

# code snippet taken from https://gist.github.com/jeff-luszcz/c470d282599ea42424b976c673d7c115

# licensed under a MIT license as per the information on http://www.example.com/jeff-gists-license-info which says:

#  All my gist code is licensed under the terms of the MIT license

 

What are some caveats or warnings about using code from Gists?

 

One of the questions people often have about code snippets is “Where did this code come from originally?” Is the code in the Gist a bug fix to some Apache code? Is it originally from the Linux Kernel and copied out as an example? Is it something from a commercial SDK that wasn’t publically available on the net?

 

If so, the original author’s licensing may have been stripped (often accidently, but sometimes purposely) by the person who has published the Gist.

 

It may be difficult to track this information down, and it may not be something the original author can even remember. Through the use of software composition analysis (SCA) tools or source code fingerprinting databases, you may discover earlier origins of code. In that case you may need to update your licensing or remediate (fix, remove, etc…) the snippet.

 

Wrap up and next steps

If you get in the habit of documenting the licensing you use for all third party content, including snippets, you will find yourself in a much better position in the future when your code is used by other people or projects. It is always easier to get licensing done right at creation time, than if you have to go back in time and become a license or source code detective.