You are the dog that caught the car: Handling the SBOM you asked for!

 

We all think we have more time.

 

One way or another, you are going to soon receive an email telling you that the Software Bill of Materials (The SBOM!) that you asked for is ready for you. Maybe it’s coming from a Vendor, maybe it’s an internal project, maybe it’s from your own team. You suddenly have a very important document to review, and it’s hard to even know where to begin. 

 

A big cause of paralysis in the security world is not knowing what to do next, especially with a seemingly buzzword heavy issue like this. Certainly everyone else knows what they are doing, but where do I even start?

 

I’m going to walk you through the basics of SBOMs, how to view them, some good first questions and where to go next.

 

What did you just receive?

 

A Software Bill of Materials (SBOM) is a listing of the third party software components that a software project uses in order to function. Typically this list of components consists of open source packages and libraries, but may also contain Commercially licensed components and possibly components with licenses that are neither Open Source or Commercial and may require further review.

 

We use this list of components to understand the software dependencies of this project, use it to identify potential security vulnerabilities, find end of life and unsupported software components, discover components that can be supported with money or software contributions, as well as discover other architecture or support issues.

 

This listing should include at least the name of the software component, its version and possibly information about the license it is released under. Beyond these basics, you may find additional pieces of information for these components such as Project URL, description, known vulnerabilities, etc. Since exchange formats and OSS scanners are still being defined, you may find yourself with varying levels of disclosure with varying levels of data quality and completeness.

 

The first thing to get a handle on is what type of SBOM you have received. There are a few different file formats and mechanisms for SBOM sharing, and more appearing every day.

 

What type of formats might you have received?

 

There are three main formats for SBOM sharing right now. CycloneDX, SPDX and free text files of varying complexity and origin. Typically, these files are designed to be both human and machine readable though it seems like the machines often have an easier time of it!

 

CycloneDX (https://cyclonedx.org/) is a file format created by the OWASP Foundation. You will know you have a CycloneDX file if your partner tells you that that is the format they will be giving you or possibly if the filenames are bom.json, bom.xml or end with .cdx.json or .cdx.xml

 

SPDX (https://spdx.dev/) is a file format created through the Linux Foundation. You will know if you have a SPDX file if your partner tells you that is the format they will be giving you or if the file name is similar to the following .spdx, .spdx.json or .spdx.rdf.xml.  

 

Free Text, CSV or Excel Files are traditional text or spreadsheet files that contain SBOM information in one-off or less common SBOM formats. They may be created by a tool or a human and are often designed for human review instead of computer processing. 

 

All of these file formats will contain information about the software components in use, many of the files will be “self documenting” meaning they will have Field Names (like Component Name or Version) near the data you are reading, or in a traditional spreadsheet format will have Column Names for each piece of data. 

 

In a JSON, XML or free text file, component data often is spread out over multiple lines of text.

 

In a spreadsheet, each row is often a single component, where each column is the component’s metadata (e.g. name, version, etc…)

 

How to view and process the SBOM

 

The easiest way to get insights from the SBOM you just received is to run it through a SBOM scanner tool like Bomber. Bomber is a free and open source tool that can provide information about known vulnerabilities and license information for the open source components found in the supplied SBOM. Bomber can handle CycloneDX files in either JSON or XML format, SPDX SBOMS in XML format, as well as Syft XML SBOM files. If you have a file in a different format, you can use a free tool to convert it to one of these, or request your partner to resubmit it in a format you can handle.

 

See https://github.com/devops-kung-fu/bomber for installation and usage information. If using command line tools is new to you, this might be a perfect time to call one of your developers to work together.

 

Examining a SBOM file by hand (if not using a tool like Bomber)

While using a tool is much easier, it is possible to examine the SBOM files using a text editor and picking it apart by hand.

 

When looking at the JSON or XML files themselves in a text editor, you can find the component name, URL, version information and license information. For example, in CycloneDX the following tags are found near the information of interest:

 

“name”   (the component name)

“version”  (the component version)

“bom-ref” (the URL or similar locator for the component in question) 

“license”  (The license or license options for the component)

 

The license tag may be after a stretch of “hashes” or IDs used to describe the files that make up the component. 

 

By using this information a web search can be used to find out vulnerability information. For example if you found Struts 2.3.31, you could do a web search using the terms “Struts 2.3.31 cve” and find out that this version of the component is affected by the vulnerability known as CVE-2017-5638 ( See https://nvd.nist.gov/vuln/detail/CVE-2017-5638 )

 

CycloneDX

 

For a deeper description of the CycloneDX SBOM format see https://cyclonedx.org/guides/sbom/OWASP_CycloneDX-SBOM-Guide-en.pdf

 

SPDX

 

For a deeper description of the SPDX SBOM format see https://spdx.github.io/spdx-spec/v2.3/

 

Many of these SBOM documents can be read in a standard text file viewer, or in a worst case, a Word Processor application. If the file is jumbled together or is in one long line, you will want to explore finding a more powerful text viewer that can better handle line breaks or special characters. Free tools like Visual Studio Code ( https://code.visualstudio.com/ ) can view Text, JSON and XML files. You may need to reformat the text if it is all in one line or jumbled together. In Visual Studio Code, Go to the Command Palette and select Format Document. The file should be more readable to a human now.

 

There exist JSON and XML file viewers which can make these files prettier to see and more useful to search or view.

 

Additional utilities are being released to support the use and viewing of SBOM documents in CycloneDX and SPDX formats.

 

A CSV or .XLS document can be opened in a spreadsheet application like Excel or Open Office.

 

Now That You Can View the SBOM What are you looking for?

One thing you can do is put it in a drawer! The very act of asking for an SBOM does a lot to kick the vendor into managing their third party risk. While this process works best if you ask them questions or give some pushback, asking for an SBOM allows them to say internally “our customer is asking for this, we need to do SBOM generation, SCA scanning, OSS Patch management, etc…

 

That said, you have it, let’s go get some value out of it!

 

This might be the point to bring in a developer if you are not familiar with open source libraries. There are a few things you can do on your own or you might find it is helpful to work together to understand the SBOM you just received.

 

There are a few questions we use SBOMs to help answer when looking at a piece of software

 

  • Does the SBOM seem legitimate? Can you view it, read through it, see real data?
  • Is it recently created? When was it generated? What version of the project was scanned? (e.g. is the SBOM wildly out of date?)
  • Does the SBOM only contain “Top Level Dependencies” or does it include the dependencies of those dependencies, also known as Transitive Dependencies? This could be a difference of 3-10 times the number of actual dependencies seen!
  • Can you find a few well known open source components and check their versions against the National Vulnerability Database (NVD) ( https://nvd.nist.gov/vuln/search )
  • Are there well known vulnerabilities in the codebase (old versions of Log4j, OpenSSL, Apache HTTPServer, Apache Tomcat, Apache Struts) 
  • Does the list of component versions seem “too old”? Are all reported vulnerabilities from years ago (e.g. CVE-2017-5638 in Struts) 
  • Are there Open Source Licenses that might cause a problem for you? (Do you see licenses like the General Public License or Affero General Public License which might be contrary to your company’s license policy. This can be complicated since some parts of your company may happily use GPL software in Linux Operating Systems but may forbid it in distributed applications)  
  • Does it seem complete? Is it missing important information like version information? 
  • What software languages are seen? Do you see what you expect? Java libraries? NPM libraries? Is something missing?

 

 

 

Pushing back or Requesting More Information

After processing or examining the SBOM you may have some questions or feedback for the team that supplied it to you. Typically you might request more information about the highest security vulnerabilities or license issues found in the report. There’s a lot of discussion about how customers and suppliers can work together best to share and respond to SBOM and vulnerability questions. In general, especially if this is your first experience with SBOMS, you might find the most value in letting your supplier know you’ve run the SBOM through a vulnerability tool and you have some questions about what you are seeing. The idea is to gently (and perhaps later on, not so gently) work with your partner to reduce exposure to known vulnerabilities, as well as better provide their customers or end users with an explanation of why they are or are not affected. As you get more experience, you may find that providing 3 to 5 clear concerns can help your partner start to get a handle on your expectations, as well as chip away at the worst problems. For example, if you see that the software contains old high severity vulnerabilities in Log4J, Curl, OpenSSL etc,, this might be a sign that they have not been using SCA scanning or good vulnerability management practices.

 

Feedback from the Supplier

 

In general, throwing a list of 100s of problems back to a vendor will not be well received, especially if you are new to SBOM reviews. That said, getting feedback on 5-10 of the worst of the worst can give you a good feeling if they are managing their supply well or not.  Many vulnerabilities may be present in an open source library, but not affect the software as you use it. A company should be able to clearly explain why they think they are not affected. “Trust me Bro!” is usually not a satisfying answer though. There should be clear explanations. For example, a good answer might be something like “This reported CVE only affects this component when run under the Windows operating system, and in this case we are using Linux”. 

 

As time goes on the SBOM you receive from this supplier should contain fewer vulnerabilities, a more complete listing of third party dependencies, as well as explanations on why potential vulnerabilities seen in the codebase are not valid for their current usage.

 

 

Keep Requesting High Quality SBOMS

 

As mentioned before, one of the best side effects of requiring a SBOM to be delivered to you is that the team responsible for creating the SBOM will now put in place Software Composition Analysis (SCA) scanning tools, CVE/Vulnerability Patch Management, and processes in place to create/fix/deliver up to date SBOM information to you. A better understood product is a more secure product. The more SBOMS you see, the more that quality issues will pop out to you. Keep reviewing and keep giving and demanding strong feedback!



Open Source License Location Alignment Chart

 

Text Version:

 

Open Source License Location Alignment Chart

 

Where’s the open source license?

Lawful Good: SPDX Identifier at the top of each file
Lawful Neutral: in a LICENSE file at the top level of the source tree
Lawful Evil: at the bottom of each file

Neutral Good: on the project’s home page
Neutral Neutral: on the project’s Wikipedia page
Neutral Evil: as a reply to a GitHub issue asking for the license text

Chaotic Good: available as output of a python script
Chaotic Neutral: author states no license applies since code was written in a country with no copyright law
Chaotic Evil: in a scanned image in a TIFF file only found on the WayBack Machine

 

 

A proposal for Comment Tagging AI Generated Source Code

A proposal for Comment Tagging AI Generated Source Code

Source code generated by “AI” tools like GitHub CoPilot or OpenAI ChatGPT should prepend a language appropriate comment block explaining that the source code was generated by a tool as well as helpful metadata to allow discovery and management of that code by code scanners like SCA or SAST tools.

End users would obviously have the ability to remove this comment block, though I believe we all would be well served by marking all generated code with comments detailing the tool that generated it, the version or date generated, as well as inputs that might be helpful in understanding the conditions that caused this code to be generated.

AI Generated code, while still a new universe, appears to have a series of potential defects that existing and future

code scanners will need to be on the look out for. These include code hallucinations, missing cases, confidently wrong constants/algorithms, concerns around the license of the generated code, and other issues we still have not discovered.

Having a clear machine readable key that indicates that this code was AI generated allows for appropriate scanning, filtering, as well as metrics generation.

Code or data generated by AI based tools may have different standards of trust than code or data created or curated by human authors.

Parallels to SCA Snippet Analysis

There are parallels in Software Composition Analysis (SCA) snippet scanning world where awareness of generated code is very helpful when scanning or clearing scan results.

In the snippet analysis world, generated code is extremely similar to massive amounts of other open source code generated by the same tool. Therefore, performing snippet matching is often slow and resource intensive due to the sheer amount of similar snippets. This causes user pain due to slow scanning as well as a perceived large amount of “false positive” matches. There is also a belief that this generated code is “fine” which means it is often incorrectly ignored when it comes to SCA/SAST scanning due to the above issues.

Code generated by traditional non-AI code generators like the .NET IDE, Antlr, Apache MyBatis, protobuf, etc.. often tag their generated code with special comment strings and tags.

This allows SCA tools or SCA tool users to either ignore snippet matching for these files before scans are performed, automatically bucket or filter results afterward, or allow the end user to manage the results quickly through string matching.

One issue with these code generators is that the tags used are not standardized and require multiple methods to discover. The identifying strings include XML fragments, strings, JavaDoc style tags, custom tags, etc…

Future SCA/SAST tools can be even more nimble as they become more aware of the possible code generators that exist and perform appropriate scanning methods to the generated code depending on what needs to be discovered.

 

Qualifications of a good “generated by” comment

  • Easy to parse by machine (oh the irony!)
  • Easy to read and understand by a human
  • Not too wordy so it will be left in place by the end user
  • Not too wordy so that code generators decide to use it
  • Explains what tool generated the code using a unique name
  • Provides a version number or generated date so that eras of similar code can be examined with appropriate tools
  • Does not change too quickly so that code generated by the same tool can easily be found with simple pattern matches or even greps

Future extension

In the future, the user text prompt that caused the code to be generated should be embedded as well

Current AI code outputs are typically single pages and should therefor have a single line comments.

Future code generators will generate entire applications and should have a larger banner with more details explaining the user prompts that generated the application.

A tool URL or project home URL (e.g. @generatorURL ) could be optionally used to prevent naming confusion and/or provide easy branding or publicity for the various tools

Current Proposal:

// @generatedNote This code was generated by a AI code generator tool.
// @generatedBy CoolAIGenerator v1.2.3

 

Examples of current comments from non-AI Code Generators:

https://github.com/VarathaRamanujam/EEE_LOGIN_VALIDATION/blob/fc3787e3acfd302cce5abfd6bbb5b8abf2bef72c/src/main/java/com/hider/eee_students_login/Filemodel.java

/*
* Created on 2022-11-27 ( 18:26:59 )
* Generated by Telosys ( http://www.telosys.org/ ) version 3.3.0
*/

https://github.com/koo5/alertmanager_api_js/blob/7227cc6e1854fe29f45d011584ba563d385a5fdb/src/model/Alert.js

/**
* Alertmanager API
* API of the Prometheus Alertmanager (https://github.com/prometheus/alertmanager)
*
* The version of the OpenAPI document: 0.0.1
* 
*
* NOTE: This class is auto generated by OpenAPI Generator (https://openapi-generator.tech).
* https://openapi-generator.tech
* Do not edit the class manually.
*
*/

https://github.com/Brayds-Dev/Merchandiser-tool/blob/381d9e83698d35bfc10312ec195788cb06c31e39/MerchandisersTool/MerchandisersTool/obj/Release/netstandard2.0/Views/UpdateClientInfo.xaml.g.cs

//------------------------------------------------------------------------------
// <auto-generated>
// This code was generated by a tool.
// Runtime Version:4.0.30319.42000
//
// Changes to this file may cause incorrect behavior and will be lost if
// the code is regenerated.
// </auto-generated>
//------------------------------------------------------------------------------

https://github.com/wq3426/study/blob/e9830a44a3c9e7314fd1f3974b7d68c0eeafc1a6/spring_boot_project/bwmTools2/src/main/java/com/dhl/tools/domain/CargoLocationData.java

/**
*
* This class was generated by MyBatis Generator.
* This class corresponds to the database table CargoLocation_Data
*
* @mbg.generated do_not_delete_during_merge
*/

https://github.com/cqym/cut_tools2/blob/64a447205ae8a9a5c6089097f966c9c3b786357c/development/src/com/tl/resource/dao/pojo/TQuotationProductDetail.java

/**
* This field was generated by Apache iBATIS ibator. This field corresponds to the database column t_quotation_product_detail.id
* @ibatorgenerated Wed Oct 14 14:13:27 CST 2009
*/

https://github.com/thaovy2902/Web_KTMT/blob/ffcc022506b4b5635e67bc046f01a3ed80bb3605/vendor/google/cloud/CommonProtos/src/Audit/RequestMetadata.php

# Generated by the protocol buffer compiler. DO NOT EDIT!
# source: google/cloud/audit/audit_log.proto

https://github.com/mehmetsen80/xtalk/blob/cf88a0025b44a9e4ee0055457c565dbf30c05e1e/MSOutlookRecurrencePatternType.h

/*******************************************************************************
**NOTE** This code was generated by a tool and will occasionally be
overwritten. We welcome comments and issues regarding this code; they will be
addressed in the generation tool. If you wish to submit pull requests, please
do so for the templates in that tool.
This code was generated by Vipr (https://github.com/microsoft/vipr) using
the T4TemplateWriter (https://github.com/msopentech/vipr-t4templatewriter).
Copyright (c) Microsoft Corporation. All Rights Reserved.
Licensed under the Apache License 2.0; see LICENSE in the source repository
root for authoritative license information.
******************************************************************************/

 

 

I’m speaking at Open Source 101 on 3/30 at 2:45pm-4:30pm ET

Are you thinking about selling your company? Are you building a software product? Do you lead an engineering team? Want to jumpstart your knowledge of Open Source and its impacts on security, compliance and valuation?

 

Join my workshop “Just Enough Open Source: A Kick start on security, license compliance and business models” Tuesday March 30th, 2021 from 2:45pm – 4:30pm ET.

 

 

Talk description:

“Open Source powers the world, but you need to do more than download it

In this talk we will provide background on the most common types of open source licenses, business models, security issues and most importantly the processes required to help you remain secure and in compliance. We will discuss best practices, scanning tools, remediation, customer and partner expectations around OSS compliance and how to manage OSS during events such as a product release or M&A.”

FREE registration link: https://opensource101.com/register-now/

My Open Source Talk at All Things Open 2020

Recently I was asked to put together a Workshop on Open Source Licensing and Security for All Things Open 2020. I always love these types of talks since it gives you a chance to both kick start people new to the topic and also give a current overview to people who have been doing this role for a while. 

 

The video below has two sections, the first gives an onramp and introduction to open source licensing, and the second half discusses more of the day to to day operations of open source compliance and security. 

 

I’d love to hear your feedback!

Your code will outlive you! Will the Future be able to use it?

Every decade or so the technology world gets punched in the face by a problem requiring poring through massive amounts of code written well before many of us were born.

In the late 1990s it was the Y2K problem when two digit years were no longer sufficient.

 

In the 2008 financial crisis COBOL based systems required hand editing in order to change state employee salaries en mass.

 

In 2020 we had the Pandemic related economic impact requiring Bank and Employment system’s code to be modified, many of which again were written in COBOL!

 

In 2032 we’ll have the Y2k38 problem when the Unix time will overflow causing software issues akin to Y2K.

 

While many of these systems were closed source and proprietary, open source systems are starting to dominate the software landscape. Much like your grandparents’ hammer, these examples show us that useful tools will almost always outlive the person who first selected or created them.

 

Besides the question of maintainability and programmer experience with very old languages like COBOL, questions of intellectual property and software licensing will complicate the usability of open source software over the next 50 years and beyond.

Think about Licensing!

As part of software due diligence (when a company purchases another company and confirms that the source code they are buying is correctly owned, licensed and documented) I have had the experience of trying to track down the ownership and licensing of code decades old. This type of software archeology requires access to archives of source code, books, magazines, blogs and other places that programmers have published software over the last 60 years! In many cases the true origin of some source code is lost to time, or can be only partially known.

 

Questions such as “Who wrote this?”, “Did they expect others to freely use this source?”, or “Does a commercial company own this?” are common.

 

It is important to make these types of answers clear for those who come after us.

 

The most important of these is to specify an open source license for the code you are publishing, even if it’s just a “single page” or block of code. If it’s worth putting on the Internet, it’s worth telling people what its license is.  There are many suggestions on how to label code to make the copyright and license clear, but I strongly suggest that each file contains a copyright statement and at least a SPDX license identifier (See https://spdx.org/licenses/)

 

This allows someone in the distant future to know who wrote the source code and what the obligations are even if only a single file remains.

Who do you depend on?

Document all your third party dependencies, including dependencies of dependences. There is no guarantee that any of our current repository managers will still be working decades in the future, but your code may be. By listing these dependencies, you help the Future build and run your code.

 

In a similar vein, your build system and running environment should be documented as well. For example, if you depend on a certain make system or database to be installed, call these out in separate build and running environment documentation. 

Till Death Do Us Part

In the short term, understand that source code is considered property. What happens after you die should be clearly specified. In most places the ownership of your code will pass on to your heirs, but possibly with complicated and divided ownership.  Do you want anyone in particular to be the new code owner (or someone outside of your family)? In this case your will (or related documents) should make this clear.

 

While everyone should have the permissions available under your open source license for the duration of your copyright, you may wish your heir to have the ability to “own” the code just like you do.

 

By making the ownership clear, they will then have the ability to change the license for your code, just like you likely do now. This means they may have the permission to also sell commercial licenses to this code, or change the open source license of the project. An open source project with multiple contributors has additional concerns about ownership. You may need to make clear dividing lines between projects you own outright as opposed to projects you contribute to, or have others contribute to.

 

Do you want to change the license after your death, or after a certain amount of time?  Make these changes clear as well. Some may want to open previously closed source, or change to a Creative Commons Zero (CC0) license or Public domain declaration.

 

Similar questions may come up in the case of divorce. While it’s often clear who owns code you write when “on the clock at work”, the code you write at home may have complex ownership issues.

Go beyond the code!

An additional thing to consider is account access, logins, domain names and payments.

Typically we tell everyone to keep their account information private and secure. This may be at odds with your desire to keep your project going even after your death.

Keeping a list of domain names, third party services and other account information related to your project can help the heirs to your code keep the project going.

Bear in mind that, after your death, certain accounts may be locked, go away or may be controlled by people other than the code owner.

 

As with most things involving intellectual property and life events, it is best to consult a lawyer to understand your best options.

Rest in Peace!

A little care and effort in the present can save the community a significant amount of time in the future. By specifying a license, documenting project dependencies, and clearly transferring ownership you can make sure your code stands the test of time.

 

 

 

 

JavaScript Minimization, Obfuscation and Open Source Compliance

 

One of the most important things for a technology company is to have their web site look attractive, be responsive and load quickly. Milliseconds can be the difference between gaining or losing a customer and web designers and programmers will use every trick in the book to make their pages load rapidly.

 

A commonly used technique to speed up web sites is to send fewer bytes over the Internet when loading a web page. You can do this by using smaller photos, lower quality images and smaller JavaScript and HTML files.

 

How can you shrink a source file without throwing away functionality?

 

To reduce the size of JavaScript source files, a technique called Minification is used to remove unneeded text in a file while still preserving the core functionality. This comes at the expense of human readability.

 

Due to how this technique works, it complicates compliance with Open Source Licenses. In this article I’ll discuss the basics of Minification and some best practices to remain compliant.

 

 

What is Minification?

 

Minification is the process of removing redundant text, whitespace or descriptive variable names that are unneeded by the web browser to interpret the code. For example, your human readable code might use a variable with the name EMPLOYEE_LAST_NAME and use this long name dozens of times in your code. The minifier will replace this name with something short, perhaps simply the letter A. This saves 17 characters each times the variable is used.

 

Most, if not all, comments are removed, as are extra spaces. These small changes add up, and over the entire file, you may find that you save 70% or more compared to the original file.

For example, the popular JQuery library is available as both the human readable and minimized files. The human readable file weighs in at 288KB while the minimized file is only 89KB.

Why would you want to minify code?

 

Minification typically is used to reduce the size of files to speed up web page loading.

 

A developer might also minimize their code in order to put multiple libraries together in a single file for ease of downloading and use by their page. In those cases you may see a comment detailing what the original filename or library name was, and possible a short license blurb.

 

How is that different than Obfuscation?

 

In some cases minification is used to make it more difficult (though not impossible) to reverse engineer or easily copy code or business logic. You may sometimes hear people use the term “Obfuscation” used for that case.

 

 

What are some common tools used to minimize or obfuscate code?

 

As with most tools there are many ways to scratch an itch, so there are dozens of minification tools available. Some are online only, others are GUI tools and many are command line tools to be used as part of your development tool chain. Some of the most common command line tools you will encounter are:

 

UglifyJS https://github.com/mishoo/UglifyJS

JSMin https://www.crockford.com/jsmin.html

Minifier https://www.npmjs.com/package/minifier

 

 

 

How does minification affect Open Source License compliance?

 

Many open source licenses require the preservation of the original copyright strings and license text when a program incorporating those libraries is distributed (or possibly served via software as a service).

 

Since comments are typically the first thing removed to make a minified version of a source file, the copyright and license text can be discarded in a way that makes it difficult or impossible to comply with the open source license the code is under.

 

Additionally, some source files may be checked in to source control already in minified form. These files may have been stripped of the required copyright and license text. It may be required to review minified files that are checked in to source code control to discover their true original and update them with the proper copyright and open source license text.

 

It is often possible to preserve the copyright and license in the minified files.

 

Many minification tools provide flags or plugins that attempt to discover license comments and preserve them. For example, the UglifyJS plugin uglify-save-license allows the user to preserve license text found on the first line, or if it is in a comment block containing common license names or copyright statements.

 

See https://www.npmjs.com/package/uglify-save-license

 

That said, the minified output should be viewed and compared to the original file in order to confirm that the appropriate copyrights and license text are preserved.

 

If you end up using a library that does not declare its license in a way discoverable by your minification tools, it would be helpful to log an enhancement request with the original component author to make it easier to comply with the licensing in the future. You may find that you need to manually fix this license comment yourself in the meantime.

 

 

SAAS vs. Distribution issues with Open Source licenses and minification

 

Many open source licenses have obligations that come into effect when the program is distributed to users.

 

The need to preserve or display copyrights and license text is clear if you are distributing a product to users for them to run on machines that they control (a classic distribution).

 

Untold hours have been spent discussing whether JavaScript and other web resources downloaded to a web browser count as “distribution” in Software as a Service (SaaS) applications.

 

In my opinion, I would treat any file or resource downloaded to a web browser as “distribution” and would comply with the license obligations as expected in that use case.

 

As with many elements of OSS licensing you should request legal advice from your legal counsel about what is required for your use case and venue.

 

 

CDN and Minification

 

Web apps often use a Content Delivery Network (CDN) to speed up access to resources, images and code that the app requires to run. Think of a CDN as a global cache (or fast hard drive) that distributes often used files. CDNs will often minimize JavaScript files automatically. You should take care to confirm that the proper licensing is being preserved by the CDN you have selected to host your files. This can be done by examining the source files delivered to your end user’s browser.

 

 

Source Maps 

In order to help debug minified source, a technology called Source Maps was created. Source Maps allow one to un-minify source code for debugging purposes, though require special mapping files to be used as well as Source Map aware development tools to view.  

 

While this process is often done in development, some organization ship source maps to production. Care should be taken to confirm that the actual users of the web application can see the required copyright and license text without the use of specialized developer tools.

 

 

What are best practices for dealing with Minimized code?

 

I hope this quick overview of minification has been helpful. As you can see, techniques designed for speed and performance can cause difficulties with open source license compliance. Keep this checklist in mind as you review your program and use of minification:

 

  • Understand your license obligations when using JavaScript source files
  • Review the minification tools or services you use
  • Confirm the proper copyright notices, license text and other information is preserved by the minification step
  • Perform an asset review or Software Composition Analyses (SCA) step to discover untracked third party source code
  • Log request for enhancement / bugs against Open Source Libraries or Tools to make it easier to preserve OSS license information
  • Store away the original source files, not just the minified version

 

 

 

 

 

 

 

Getting the Gist of GitHub Gist Licensing

 

 

What is a GitHub Gist?

A GitHub “Gist” is a webpage used to share snippets of code or a single file up to 1MB of source or text. It is commonly used to share examples, small utilities or documentation. Other similar sites are commonly referenced to as a “pastebin”.

 

Why would you use the code from one?

 

It’s common for a developer to search for code to implement a small procedure, for example,  “reverse the letters in a string” or “check for the existence of a file on disk”. These small pieces of code are not seen as large or complicated enough to be named and hosted as an open source project or have an entire web site devoted to them.

 

More complicated code is likely to be found as well. Single page utilities or programs can be encountered, often with embedded documentation in comments, or on a web page pointing to the Gist.

 

You may also see Gists used to host data listings or research. Common examples of this are ID numbers of products or the code used to demonstrate a bug or exploit.

 

 

Why does it need a license?

 

In a nutshell, other people’s source code or resources often require a software license for you to use it in your project. A license (commercial or open source) is a way to explain the conditions that you are able to use that software and without one you likely do not have permission to use it.

 

The topic of what requires a license and at what size or complexity of source code is sufficient to require a license is beyond the scope of this blog entry. In general, source code cut and pasted from somewhere else may require a license in order to be legally used.

 

How do Gist authors show off their licenses?

 

Unfortunately most Gist authors FAIL to clearly show what license is attached to the source code they are sharing.

 

The most helpful and accurate way for a Gist author to declare their license is to put the license text in the source on the Gist itself. Typically this would be at the top of the file in a comment block and would contain the copyright date and owner if required by the license.

 

You may also see a short one-line comment such as:

 

This code is licensed under the terms of the MIT license

 

This is helpful but not complete, you don’t have a copyright date or Copyright owner, but at least have a general idea of the licensing style the author prefers.

 

In some cases the Gist author places a note somewhere in their GitHub site or homepage that declares the default license for their Gists. They may use text such as “The default license for all public Gists I publish is the following:” and then put the name or text of the license.

 

While this is far better than nothing (and does reduce perceived clutter in published Gists) it does break the connection between the source and the license that it was published under. It makes it harder for a consumer of that source to bring along the accurate licensing information when using the code.

 

 

What if it doesn’t have a license and you want to use it?

 

If a file, snippet or other content does not have a declared license, it is a good practice to reach out to the original author and ask what license the content is available under. You may want to suggest a license that you are comfortable with in order to short circuit any back and fourth. For example, you may want to send an email like so:

 

Dear Jeff,

  I found the code you published on your GitHub Gist site at https://gist.github.com/jeff-luszcz/c470d282599ea42424b976c673d7c115

It currently does not appear to have a license associated with it. I would like to use this code but only if it has an open source license. Would you be able to let me know what license this code is under? I’m a big fan of the MIT license if you are looking for suggestions. (See https://choosealicense.com/licenses/mit/ )




Thanks!

 

 

While the author does not owe you a response, don’t be surprised if you get a helpful answer in a couple of days or so.

 

 

How should you preserve or document licenses to the code you use from a Gist?

As a developer one of the best things you can do for yourself or others that use your code is to document the licensing and origin of all third party code you use. This helps you legally share the projects you build, and also helps your users comply with open source licenses and stay ahead of security problems in the third party code you selected.

The best time to document things is when you have the data initially right in front of you.

If need be, cut and paste the licensing and origin information and attach it to the code you are using. For example:

 

# code snippet taken from https://gist.github.com/jeff-luszcz/c470d282599ea42424b976c673d7c115

# licensed under a MIT license as per the information on http://www.example.com/jeff-gists-license-info which says:

#  All my gist code is licensed under the terms of the MIT license

 

What are some caveats or warnings about using code from Gists?

 

One of the questions people often have about code snippets is “Where did this code come from originally?” Is the code in the Gist a bug fix to some Apache code? Is it originally from the Linux Kernel and copied out as an example? Is it something from a commercial SDK that wasn’t publically available on the net?

 

If so, the original author’s licensing may have been stripped (often accidently, but sometimes purposely) by the person who has published the Gist.

 

It may be difficult to track this information down, and it may not be something the original author can even remember. Through the use of software composition analysis (SCA) tools or source code fingerprinting databases, you may discover earlier origins of code. In that case you may need to update your licensing or remediate (fix, remove, etc…) the snippet.

 

Wrap up and next steps

If you get in the habit of documenting the licensing you use for all third party content, including snippets, you will find yourself in a much better position in the future when your code is used by other people or projects. It is always easier to get licensing done right at creation time, than if you have to go back in time and become a license or source code detective.