ANALYZING THE SECURITY OF C# SOURCE CODE USING A HIERARCHICAL QUALITY MODEL by Payton Rae Harrison A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science MONTANA STATE UNIVERSITY Bozeman, Montana May 2022 ©COPYRIGHT by Payton Rae Harrison 2022 All Rights Reserved ii ACKNOWLEDGEMENTS I would like to acknowledge my advisor, Dr. Clemente Izurieta, for the mentorship and support throughout the entirety of this thesis. I would also like to acknowledge my committee members Dr. Derek Reimanis and Dr. Ann Marie Reinhold for their support and guidance throughout this project. Montana State University Software Engineering Lab (MSUSEL) members provided many valuable insights and discussions that helped advance this work. The Construction Engineering Research Laboratory (CERL) provided the financial support for me to complete the research conducted in this thesis. Finally, I would also like to thank my close friends and family for their belief in me to finish this thesis and for constantly supporting me. Thank you all. iii TABLE OF CONTENTS 1. INTRODUCTION .................................................................................................. 1 2. BACKGROUND..................................................................................................... 5 Security Quality Modeling....................................................................................... 5 Security Metrics ..................................................................................................... 8 Static Analysis Tools ............................................................................................ 10 3. SUPPORTING WORK ......................................................................................... 13 Quamoco ............................................................................................................. 13 QATCH ............................................................................................................... 15 PIQUE ................................................................................................................ 18 PIQUE-C# .......................................................................................................... 21 PIQUE-Bin .......................................................................................................... 21 Other Quality Models ........................................................................................... 22 4. RESEARCH GOALS ............................................................................................ 26 Motivation ........................................................................................................... 26 Goal Question Metric............................................................................................ 26 5. PIQUE-C#-SEC DEVELOPMENT ....................................................................... 28 Gather Requirements ............................................................................................ 28 Development ........................................................................................................ 29 Model Structure and Design........................................................................... 30 PIQUE-C#-Sec Mechanisms .......................................................................... 34 Tools ............................................................................................................ 37 6. MODEL VALIDATION......................................................................................... 43 Analysis of Benchmark Data ................................................................................. 43 Count-Based Analysis of Chosen Tools................................................................... 59 Sensitivity to Single Diagnostics ............................................................................ 60 7. DISCUSSION ....................................................................................................... 73 Implications ......................................................................................................... 74 Future Work......................................................................................................... 74 iv TABLE OF CONTENTS – CONTINUED 8. THREATS TO VALIDITY.................................................................................... 76 Internal Validity ................................................................................................... 76 External Validity .................................................................................................. 76 Construct Validity ................................................................................................ 78 9. CONCLUSION..................................................................................................... 80 REFERENCES CITED.............................................................................................. 83 APPENDICES .......................................................................................................... 88 APPENDIX A : Exploratory Study ....................................................................... 89 Approach ............................................................................................................. 90 Analysis Plan ....................................................................................................... 95 Visual Observation ........................................................................................ 95 Non-Parametric Tests .................................................................................... 97 Threats to Validity ............................................................................................... 98 Internal Validity............................................................................................ 98 External Validity........................................................................................... 98 Construct Validity ......................................................................................... 99 Conclusion ........................................................................................................... 99 APPENDIX B : Tool Diagnostics .........................................................................100 APPENDIX C : CWE Top 25 Most Dangerous Software Weaknesses for 2021 ........................................................................................................103 APPENDIX D : Benchmark Attribute Table .........................................................105 APPENDIX E : TSI/Security Aspect Impact Table...............................................108 v LIST OF TABLES Table Page 2.1 The Microsoft STRIDE threat modeling chart maps security properties to their respective threats. ........................................................... 8 5.1 Security Aspect Node Definitions ............................................................... 33 5.2 Product Factor Node Definitions ................................................................ 34 5.3 Static analysis tools researched for potential use in the PIQUE-C#-Sec model and what criteria it violated..................................... 41 5.4 Static analysis tools researched for potential use in the PIQUE-C#-Sec model and what criteria it violated continued. .................... 42 6.1 TSI Linear Model Coefficient P-Values from Open Source Projects............... 53 6.2 Attributes for Closed Source Projects with Low TSIs .................................. 56 6.3 Security Aspect Linear Model Coefficient P-Values from Open Source Projects ................................................................................ 58 6.4 Security Aspect Linear Model Coefficient P-Values from Closed Source Projects .............................................................................. 59 6.5 Most Impactful Diagnostics on the TSI ...................................................... 62 6.6 Most Impactful Diagnostics on Authenticity ............................................... 64 6.7 Most Impactful Diagnostics on Accountability ............................................ 69 A.1 PIQUE-C# calibration versions and the difference in TQI relative to Version 1. ................................................................................. 92 A.2 PIQUE-C# versions and the difference in TQI relative to Version 5 when removing nodes from the model. ......................................... 94 A.3 PIQUE-C# versions and the difference in TQI relative to Version 5 when introducing vulnerabilities into the source code. ................... 95 B.1 Tool Diagnostics ......................................................................................101 B.2 Tool Diagnostics Continued ......................................................................102 C.1 CWE Top 25 Most Dangerous Software Weaknesses for 2021 ......................104 D.1 Benchmark Attribute Table ......................................................................106 vi LIST OF TABLES – CONTINUED Table Page D.2 Benchmark Attribute Table Continued ......................................................107 E.1 TSI/Security Aspect Impact Table ............................................................109 vii LIST OF FIGURES Figure Page 2.1 An example of a derived PIQUE quality model structure [37]......................... 6 2.2 ISO/IEC 25010 Standard for Software Quality [20]. These eight attributes are meant to best represent stakeholder needs that can be included in a quality model. .............................................. 7 3.1 An example of a Quamoco quality model structure [43]. .............................. 14 3.2 An example of a QATCH quality model structure [40]. ................................ 17 3.3 An example of a PIQUE-Bin security model structure [22]........................... 22 5.1 Partial PIQUE-C#-Sec Model ................................................................... 31 5.2 Tool mapping examples from the Measure to Diagnostic layer for each of our static analysis tools in the PIQUE- C#-Sec model........................................................................................... 35 6.1 Study Design for our PIQUE-C#-Sec Model Validation............................... 44 6.2 Observing the TSI of each benchmark project compared with that project’s size in lines of code. ...................................................... 46 6.3 Observing Each Attribute Subset as a Function of TSI and Size................... 47 6.4 Histogram of Open Source Project TSIs ..................................................... 48 6.5 Histogram of Closed Source Project TSIs.................................................... 48 6.6 Open Source Linear Model Diagnostic Plots ............................................... 50 6.7 TSI for Different Attribute Types in Open Source Projects .......................... 52 6.8 Closed Source Linear Model Diagnostic Plots.............................................. 54 6.9 TSI for Different Attribute Types in Closed Source Projects ........................ 55 6.10 Single vulnerability impacts on TSI using the diagnostics in the PIQUE-C#-Sec model to measure the change in TSI. ........................ 63 6.11 Single vulnerability impacts on authenticity using the diagnostics in the PIQUE-C#-Sec model to measure the change in authenticity. ............ 64 6.12 Single vulnerability impacts on availability using the diagnostics in the PIQUE-C#-Sec model to measure the change in availability. .............. 65 viii LIST OF FIGURES – CONTINUED Figure Page 6.13 Single vulnerability impacts on authorization using the diagnostics in the PIQUE-C#-Sec model to measure the change in authorization. ............................................................................ 67 6.14 Single vulnerability impacts on confidentiality using the diagnostics in the PIQUE-C#-Sec model to measure the change in confidentiality. ........................................................................... 68 6.15 Single vulnerability impacts on accountability using the diagnostics in the PIQUE-C#-Sec model to measure the change in accountability............................................................................. 69 6.16 Single vulnerability impacts on non-repudiation using the diagnostics in the PIQUE-C#-Sec model to measure the change in non-repudiation.......................................................................... 70 6.17 Single vulnerability impacts on integrity using the diagnostics in the PIQUE-C#-Sec model to measure the change in integrity. ................. 71 6.18 Top Vulnerabilities’ Impact on PIQUE-C#-Sec TSI and Security Aspect Values .............................................................................. 72 A.1 PIQUE-C# Study Design .......................................................................... 90 A.2 PIQUE-C# hierarchical model structure based on real- world government source code. ................................................................... 92 A.3 Scatterplot with the difference in TQI for each PIQUE- C# version number relative to PIQUE-C# Version 1 when removing nodes from the PIQUE-C# model. .............................................. 96 A.4 Scatterplot with the difference in TQI for each PIQUE- C# version number relative to PIQUE-C# Version 1 when introducing vulnerabilities into the source code. .......................................... 97 ix NOMENCLATURE CERL Construction Engineering Research Laboratory CISA Cybersecurity and Infrastructure Security Agency CVE Common Vulnerabilities and Exposures CWE Common Weakness Enumeration DHS U.S. Department of Homeland Security INL Idaho National Lab MSUSEL Montana State University Software Engineering Lab NIST National Institute of Standards and Technology NVD National Vulnerability Database OWASP Open Web Application Security Project PIQUE PIQUE is a collection of library functions and runner entry points designed to support experimental software quality analysis from a language-agnostic perspective PIQUE-Bin This project is an operationalized PIQUE model for the assessment of security quality in binary files PIQUE-C# This project represents a C# actualization of the PIQUE quality analysis platform. This project integrates the C# static analysis framework tool, Roslynator, and provides example extensions of the default weighting, benchmarking, normalizing, and evaluation strategies provided by PIQUE PIQUE-C#-Sec This project is an operationalized PIQUE model for the assessment of security quality in C# source code x ABSTRACT In software engineering, both in government and in industry, there are no universal standards or guidelines for security or quality. There is an increased need for evaluating the security of source code projects, which is made apparent by the number of real-world cyber attacks that have taken place recently. Our research goal is to design and develop a security quality model that helps stakeholders assess the security of C# source code projects. While there are many analysis tools that can be used to identity security vulnerabilities, the use of a model is beneficial in integrating multiple analysis tools to have better coverage over the number of security vulnerabilities detected (compared to the use of a single tool) and to aggregate these vulnerabilities upward into a broader security quality context. We accomplished our goal by developing and validating a hierarchical security quality model (PIQUE-C#-Sec) to evaluate the security quality of software written in C#. This model is an operationalized model using PIQUE, or the Platform for Investigative software Quality Understanding and Evaluation. PIQUE-C#-Sec improves upon previous security quality models and quality models that precede it by focusing on being specific, flexible, and extensible. This thesis introduces the model design for PIQUE-C#-Sec and examines the results from the efforts of validating the PIQUE-C#-Sec model. This model was validated using sensitivity analysis, which consisted of collecting data on benchmark repositories and observing if and how the PIQUE-C#-Sec model output varied as a function of these repository attributes. Additionally, the model was analyzed by testing to see how the PIQUE-C#-Sec model node values changed because of the tools reporting additional vulnerabilities. Based on these results, we conclude that the PIQUE-C#-Sec model is effective for stakeholders to use when evaluating C# source code, and the model can be used as a security quality gate for evaluating these projects. 1 INTRODUCTION Government organizations often hire contractors to assist in writing their software. Government organizations are interested in evaluating the security quality of contractor source code to ensure that the code does not have overall poor security quality or major security vulnerabilities. There are currently limited security or quality standards or guidelines for contractors to follow when writing their code. When the government organizations examine the code to be released, it is difficult to properly evaluate the security quality of the project. Internet usage and the development of software have both been on the rise, which has led to an increase of virtual attacks and many new threats to security [18][35][4]. The increase of virtual attacks places an increasing priority and importance on information security [6]. The increase of virtual attacks has also resulted in software companies concerning themselves with security-related threats connected to their products’ source code in recent years as customers demand high security in these software products [19]. Successful cyber attacks can cost organizations large amounts of resources because of losses caused by the attack, which can be in confidentiality, integrity, and availability [17]. The increased need for a focus on security vulnerabilities in source code projects is made apparent by the number of cyber threats and attacks that have taken place over the course of the last several years. Some examples of these cyber threats and attacks include the Colonial Pipeline1 cyber incident in April 2021 and the SUNBURST2 cyber attack, which was discovered in December 2020. The Colonial Pipeline cyber incident consisted of hackers breaching the pipeline using 1https://www.energy.gov/ceser/colonial-pipeline-cyber-incident 2https://www.solarwinds.com/sa-overview/securityadvisory 2 a single compromised password, according to the Office of Cybersecurity, Energy Security, and Emergency Response3. The incident took down the largest fuel pipeline in the United States and led to shortages across the entire east coast. In the SUNBURST cyber attack, SolarWinds4 and their customers were the victims of a cyber attack to their systems that inserted a vulnerability within their Orion Platform builds. This vulnerability when present and activated could potentially allow an attacker to compromise the server on which the Orion products run. This cyber attack was a supply chain attack which is a disruption in a standard process resulting in a compromised result with a goal of being able to attack subsequent users of the software, according to the SolarWinds Security Advisory5. Both cyber attacks had huge impacts on both the government and citizens. Using hierarchical models to evaluate security quality presents an opportunity to lower these expected impacts and losses by alerting government agencies to these potential security threats. The hierarchical models developed by the Montana State University Software Engineering Lab (MSUSEL) can aid in the process of lowering the impacts of cyber attacks by being used by users and stakeholders to evaluate the security quality and quality of software. The Platform for Investigative software Quality Understanding and Evaluation, or PIQUE6, is a collection of library functions and runner entry points designed to support language-agnostic software quality analysis. PIQUE allows users to create hierarchical models that define which aspects of security and quality are important to them. By employing static analysis tools to search for and aggregate these security and quality vulnerabilities into a security or quality score, this allows users to have a better idea about the security or quality of their code. 3https://www.energy.gov/ceser/colonial-pipeline-cyber-incident 4https://www.solarwinds.com/ 5https://www.solarwinds.com/sa-overview/securityadvisory 6https://github.com/msusel-pique/msusel-pique 3 PIQUE is designed to build operational quality models from its framework. This brings us to an instance of such an operationalized model, PIQUE-C#-Sec. PIQUE-C#-Sec7 is an actualization of the PIQUE quality analysis platform and can be used to evaluate the security quality of real-world government contractor source code since much of this code is written in C#. PIQUE-C#-Sec has two C# static analysis tools integrated into the model, and provides default weighting, benchmarking, normalizing, and evaluation strategies from PIQUE. A use case for the PIQUE-C#-Sec model is that government organization management can use PIQUE-C#-Sec as a security quality gate for government contractor source code. Government organization management can choose to set a threshold security quality score that will require the source code to return to the government contractors for modification if the score is not met [1]. The overarching aim of this thesis is to design and develop a hierarchical security quality model for projects with C# source code. A series of objectives were addressed to achieve this aim. To design and develop our own security quality model, we needed to research and become familiar with the background concepts of security quality modeling, security metrics, and static analysis tools (Chapter 2) and previous quality and security quality models (Chapter 3). Once we had this foundational knowledge, we needed to define our own research goals to ensure we knew what questions we wanted to address and answer with our research (Chapter 4). Once we had our problem framed by our research questions, we could begin starting to design and develop our security quality model (PIQUE-C#-Sec) to address these research questions (Chapter 5). After the model was developed, we needed to address validating our model to help user and stakeholders build trust in our model’s ability to evaluate C# source code and to help us investigate what the model output means (Chapter 6). Once the model 7https://github.com/MSUSEL/msusel-pique-csharp-sec 4 validation is completed, we need to discuss the implications of these results (Chapter 7) and detail the threats to validity from this thesis (Chapter 8). Finally, we will detail the overall conclusions of this thesis (Chapter 9). 5 BACKGROUND Security Quality Modeling The purpose of quality modeling centers around providing a systematic approach for modeling quality requirements, analyzing and monitoring quality, and directing quality improvement measures [42]. The evaluation of quality is subjective, and the evaluation needs will vary widely between users. The perception of security quality can differ significantly even when the quality models are based on the same hierarchical model structure, and this may create confusion when stakeholders evaluate the security quality of their software [21]. Quality modeling helps with this by providing a hierarchical system that helps structure and visualize these needs. A quality model specifies the meaning of quality of software in a way that it can be used in several scenarios. For example, to improve or assess the quality of software [44]. Many models have been proposed to support stakeholders in evaluating their software quality [25]. As software programs and computer systems usually have multiple vulnerabilities, it is desirable to aggregate the score of these individual vulnerabilities [17]; this is where hierarchical models are useful. Cyber security models can enable users to make more well-informed choices that reduce the risk and impact of security vulnerabilities and incidents [16]. We can use our security quality model (PIQUE-C#-Sec) to provide these benefits of security quality modeling to our users and stakeholders. We will provide a brief overview of the PIQUE-C#-Sec hierarchical model structure to demonstrate how our model helps structure and visualize a user’s subjective quality needs, and enable users to be more informed about the security of their software. Chapter 5 will go into more detail about the design of this PIQUE-C#-Sec model. The PIQUE-C#-Sec model has a standard taxonomy for tree traversal, from the Total 6 Security Index (TSI) root node to the leaf nodes. The layers in the hierarchical model are TSI, Security Aspects, Product Factors, Measures, and Diagnostics. An example of this hierarchical structure is shown in Figure 2.1 [37]. Figure 2.1: An example of a derived PIQUE quality model structure [37]. The PIQUE-C#-Sec model is created using a top-down approach. The root note is Total Security Index, or TSI. This node decomposes into Security Aspect nodes, which are nodes taken from both the ISO/IEC 25010 standard [20] and the Microsoft STRIDE model1. The ISO/IEC 25010 standard divides software quality into eight different attributes, as 1https://docs.microsoft.com/en-us/azure/security/develop/threat-modeling-tool-threats 7 shown in Figure 2.2. These different attributes are meant to best represent stakeholder needs that can be included in a quality model [20]. For the PIQUE-C#-Sec model, we are only interested in the Security attribute. This consists of the properties confidentiality, integrity, non-repudiation, authenticity, and accountability. Figure 2.2: ISO/IEC 25010 Standard for Software Quality [20]. These eight attributes are meant to best represent stakeholder needs that can be included in a quality model. The ISO/IEC 9126 standard preceded the ISO/IEC 25010 standard, and the ISO/IEC 9126 standard forms a basis from which many quality models derived [2]. The ISO/IEC 9126 standard describes a model for software product quality that dissects the overall notion of quality into six main characteristics which are further subdivided into 27 sub-characteristics [15]. However, ISO/IEC 25010 has largely replaced ISO/IEC 9126. ISO/IEC 25010 has expanded the standard to include eight characteristics and 31 sub-characteristics [20]. Additionally, ISO/IEC 25010 improved upon and extended ISO/IEC 9126 by including computer systems and quality in use from a system perspective [2]. Another large improvement from ISO/IEC 9126 to ISO/IEC 25010 that impacts our decision to use the ISO/IEC 25010 standard for our PIQUE-C#-Sec model is that ISO/IEC 25010 added Security as a characteristic rather than a sub-characteristic, as it was in ISO/IEC 9126 [2]. The Microsoft STRIDE Threat modeling chart maps security properties to their 8 respective threats. STRIDE consists of the properties authentication, integrity, non- repudiation, confidentiality, availability, and authorization as shown in Table 2.12. Table 2.1: The Microsoft STRIDE threat modeling chart maps security properties to their respective threats. Property Threat Definition Authentication Spoofing Impersonating something or someone else Integrity Tampering Modifying data or code Non-repudiation Repudiation Claiming to have not performed an action Confidentiality Information disclosure Exposing information to someone not authorized to see it Availability Denial of service Deny or degrade service to users Authorization Elevation of privilege Gain capabilities without proper authorization These Security Aspects are then further decomposed into Product Factors. The Product Factors in this model are the categories from the rules documented by the PIQUE-C#-Sec static analysis tools. These Product Factor nodes will be defined and detailed in Chapter 5. These Security Aspects continue to be decomposed into low-level nodes until the concepts are at a measurable level. At this point, measurement tools are attached to populate these nodes with numerical values from the evaluated product [37]. Finally, the model needs to define how these bottom-level node values aggregate upward back to the TSI value. Security Metrics Security quality modeling can be used to evaluate the security attributes of a software project by using multiple types of static analysis tools and analysis, or security metrics. Security metrics allow users to measure the success of security policies, mechanisms, and 2https://www.microsoft.com/security/blog/2007/09/11/stride-chart/ 9 implementations [33]. The use of security metrics is important because the security of a software project cannot be improved if it cannot be measured. Metrics assist in the identification of a system’s vulnerabilities, which leads to the facilitation of correcting these vulnerabilities while also raising the level of consciousness regarding the system’s security [33]. The usage of security metrics in software projects allows security to be evaluated at the code implementation level. Defining these metrics earlier allows for security vulnerabilities to potentially be recognized earlier within a software project. This is valuable because security vulnerabilities that are discovered later in the development cycle are more expensive to fix than ones discovered earlier [11]. The actual severity of a vulnerability is difficult to capture accurately as there is no standard method for this purpose available [16]. However, the vulnerabilities found in the software can be mapped to vulnerability catalogs that are lists developed by the community. The Common Weakness Enumeration3 (CWE) and Common Vulnerabilities and Exposures4 (CVE) are catalogs that capture information about weaknesses and vulnerabilities [22]. A vulnerability is defined as a weakness in some aspect or feature of a system that makes a threat possible [41]. CWEs are representative of general software and hardware weaknesses, while CVEs are representative of specific vulnerabilities in platforms and products. CWE [29] provides a set of software weaknesses that contain a description, selection, and use of software security services and tools, which permits these weaknesses to be discovered in the software’s source code [33]. This also creates a better understanding and management of these weaknesses. To provide a high-level summary of detecting software vulnerabilities, users can run software vulnerability scans. These vulnerabilities are enumerated with a CVE identification number [30]. These are then stored in the National Vulnerability Database5 (NVD) which 3https://cwe.mitre.org/ 4https://cve.mitre.org/ 5https://nvd.nist.gov 10 is maintained by the National Institute of Standard and Technology6 (NIST) [4]. Static Analysis Tools The most basic first step in assessing the security of software is to run static analysis tools on the source code to identify flaws within the software [26]. Static analysis tools are external resources that audit the product under assessment by parsing source code, byte code, or compiled source code to retrieve metrics or finding data [37]. The number (i.e., count) of vulnerabilities that exist in software is one of the most practical metrics to use for measuring security [17]. The measures and diagnostics in the PIQUE-C#-Sec model are obtained by running static analysis tools on the software. The tools can search for security vulnerabilities by examining the code directly; automated static analysis tools examine software by evaluating the code without executing it [13][8]. Using static analysis tools is important because as modern code bases are increasingly growing, it becomes more difficult to find security vulnerabilities [7]. Static analysis tools aid in this process and make it easier for users to efficiently identify potential weaknesses within the code base. For the PIQUE-C#-Sec model, security quality is measured strictly by static code quality. There are two tools used in the PIQUE-C#-Sec model, Security Code Scan7 and Insider8. Security Code Scan detects security vulnerabilities within source code such as SQL injections, cross-site scripting, and cross-site request forgery. Insider is focused on covering the OWASP Top 109 vulnerabilities within source code. Both are static analysis tools that run on C# source code, assess security-related 6https://www.nist.gov/ 7https://security-code-scan.github.io/ 8https://github.com/insidersec/insider 9https://owasp.org/www-project-top-ten/ 11 findings and vulnerabilities, are free and open source, and are command line tools. As discussed earlier, the leaf nodes of the security quality model are from any tooling output. We decided to implement multiple static analysis tools into the PIQUE-C#-Sec model because using a combination of several techniques is an effective way to find additional vulnerabilities [7]. Different tools have different objectives [26], and for this reason it is likely that integrating more than one tool into our model will be beneficial. Since there is a large selection of static analysis tools, they have obvious value, and they are easily available, there is little reason to not include them in a project’s development cycle [26], or in our case, a security quality model. While our PIQUE-C#-Sec model does implement multiple static analysis tools, it currently only has two tools integrated. This is because of the tools available to us based on our search criteria, which will be detailed in Chapter 5. There is no single technique or single set of rules that will detail all security vulnerabilities [32]. Therefore, using a combination of factors is beneficial, and why we are using a combination of static analysis tools in our model. McGraw and Stephen [31] published an article about comparing multiple static analysis tools. They found that two tools perform differently on the same code bases because of internal rules used by the tools and coding style. They also claim that tool configuration and operators can greatly influence the discovery of vulnerabilities [31]. The tools currently used within PIQUE-C#-Sec identify security-related findings and will be implemented into the security quality model. The diagnostics found by each tool map to CWEs, which allows us to easily include these CWEs at the Measure layer within the PIQUE-C#-Sec model. We chose to use CWEs at the Measure layer because having our Diagnostics map to CWEs increases the trust that stakeholders and users can place in our model since this layer maps to this well-known standard for weakness cataloging that is 12 maintained by MITRE. Tools that scan the source code of a program to find vulnerabilities and weaknesses are the first line of defense in assessing the security of software [26]. This makes it important to choose the best tools available for static analysis. For the exploration of new tools, it is viable to continue exploring static analysis tools that identify CWEs and CVEs in source code. This would allow any potential tool output to have a clear mapping to the upper layers in the PIQUE-C#-Sec model. 13 SUPPORTING WORK The supporting work in quality modeling has many previously existing models with efforts that are useful. Our security quality model (PIQUE-C#-Sec) uses elements from these models and improves upon them to achieve the research goals we have defined for our model, which will be detailed in Chapter 4. PIQUE-C#-Sec was designed and built from the PIQUE framework [37]. There are two hierarchical quality models that influenced the design of PIQUE. They are Quamoco [43] and QATCH [40]. Additionally, there are other hierarchical quality models that were developed using the PIQUE framework, which are PIQUE-C#1 and PIQUE-Bin2 [22]. This chapter serves to provide an overview to all these pre-existing models that aided in the design and development of the PIQUE-C#-Sec model. Quamoco The Quamoco model is a model that aims to bridge the gap between abstract quality aspects and concrete measurements [43]. The base model uses the ISO 25010 quality attributes, which continues to be a good starting point for quality modeling [43]. The Quamoco model evaluates the quality of a system by aggregating the issues and measures that affect the system [21]. These form the lowest level of the hierarchy and provide input at the measure level. The Quamoco project consists of the base model, a GUI quality model editor, and quality adaption tools provided by the framework. The base model hierarchical structure consists of four layers under the root node: Quality Aspects, Product Factors, Measures, 1https://github.com/MSUSEL/msusel-pique-csharp 2https://github.com/MSUSEL/msusel-pique-bin 14 and Instruments. Within the Quamoco model, there are several key terms that are important to note. These terms are explained below and can be seen in the model diagram in Figure 3.1. Figure 3.1: An example of a Quamoco quality model structure [43]. • Factors: express a property of an entity, where an entity is something that is important to quality and property is the attribute of what we are interested in. A factor constitutes a property of the software that is related to its quality [44]. The central 15 concept of the model is a factor, which is meant to represent a property or attribute of an entity (an entity being an important aspect of quality we want to measure) [21]. • Quality Aspects: these express abstract quality goals and can map to the quality attributes from a scientific standard; for example, ISO 25010 [20]. Quality Aspects provide information on the quality focused on within the model [25]. • Product Factors: measurable attributes that make up a part of the product. • Measures: can be associated to more than one Product Factor. Measures are collected by instruments, and multiple measurements can collect values for a single measure. Defines how a specific entity is measured and provides a means to quantify factors that characterize this entity [44]. • Instruments: these are separated from the measures. The instruments are used to describe a concrete implementation of their measure. Used to determine the value directly using an external tool or a manual assessment [44]. • Aggregation: this is used to aggregate values of other measures [44]. Quamoco identified several areas for potential future work. These included (1) taking further technologies and contents for the base model into account and (2) working on further empirical studies to understand the weaknesses of the approach that still exist and improving these [43]. QATCH QATCH, or the Quality Assessment Tool CHain, is a quality model that investigates an automated way to derive quality models that are responsive to subjectivity [40]. This is a valuable characteristic because the quality model will change with each new stakeholder and domain. 16 The QATCH model consists of three layers under the root node: the Characteristics layer, the Properties layer, and the Measures layer. QATCH also uses the ISO/IEC 25010 standard, as the Quamoco model does. One of the ways that automated subjectivity is integrated into the QATCH model is through an Analytical Hierarchy Process (AHP) [40]. AHP is an approach followed for decision making that reduces the decision complexity for pairwise comparisons. AHP is used to assist decision makers in choosing the best option among a set of alternatives. AHP provides the ability for users and stakeholders to input their prioritized values into a quality model, and it is typically used to help these decision makers in scenarios where there are several objectives [38]. AHP enables the usage of pairwise comparisons between criteria to derive an order of importance for decisions [22]. Stakeholders can weigh specific aspects within the model as being more important than others with regards to the overall quality. With AHP, this can be done without requiring extensive knowledge from the user about quality modeling or hierarchical structures. The pairwise comparisons for AHP may be expressed linguistically, allowing the comparisons to be more intuitive for non-technical stakeholders to express their values and prioritization [22]. QATCH presents a fuzzy AHP that allows stakeholders to express uncertainty about certain comparisons [22]. This decreases the total time for a stakeholder to tune the model since AHP helps stakeholders decide which nodes in each layer are most important. The current support in QATCH is only for Java software. One area of future work is to expand the QATCH system so that it will be able to assess the quality of software products developed in other programming languages other than Java [40]. Future work also could include adding other static analysis tools or adding dynamic analysis metrics. Within the QATCH model, there are several key terms that are important to note. These terms are explained below and are shown in the model diagram in Figure 3.2. 17 Figure 3.2: An example of a QATCH quality model structure [40]. • Characteristics: these are defined by a quality model definition that helps assess the overall software quality through aggregation and normalization. • Properties: these are the decomposed components from software, and each can be given quality properties. Some examples of properties include programs, modules, objects, and variables [40]. • Measures: these are a set within the model, and are well-chosen to evaluate the properties. In the validity experiment that the QATCH model underwent, the QATCH model reported perfect correlation with expert opinions on the evaluated projects. This indicates that QATCH is able to perform comparably, or possibly even better, than Quamoco in 18 its project evaluation compared to expert judges [22]. This also indicates that the loss of complexity in the QATCH model compared to Quamoco did not result in a loss of correlation with the expert opinions during the validity experiment [22]. PIQUE PIQUE3, or Platform for Investigative software Quality Understanding and Evaluation, is a collection of library functions and runner entry points designed to support language- agnostic software quality analysis. PIQUE was built with nine design goals [22]: 1. Improve benchmarking, utility functions, and adaptive edge weighting 2. Improve default model mechanisms 3. Extend or modify the model mechanisms 4. Allow models to be easy to derive 5. Allow these derived models to be easy to operationalize 6. Allow simple addition, removal, or modification of tool support 7. Input and output are easy to interact with 8. Facilitate automation and continuous integration 9. Facilitate trustworthy models PIQUE supports two processes needed for quality control: quality model derivation and quality assessment. PIQUE, like Quamoco and QATCH, uses the ISO/IEC 25010 standard. 3https://github.com/msusel-pique/msusel-pique 19 The PIQUE model derivation calculates the edge weights and the threshold values in the model, while the quality assessment populates each individual node value and the TQI value. The PIQUE model consists of four layers under the root node: Quality Aspects, Product Factors, Measures, and Diagnostics. This structure is very similar to the Quamoco base model. The PIQUE model is designed to have actualizations of the PIQUE model plug into PIQUE (for example, PIQUE-C# and PIQUE-Bin). PIQUE enables small scale teams to build quality models without the resources required for models such as the Quamoco model in [43]. This enables models to be created by single developers, which is the case for both the PIQUE-C# and PIQUE-Bin models [22]. PIQUE models are designed to be more flexible, extensible, and specific than Quamoco and QATCH models. PIQUE also realizes the future work described in the QATCH paper by developing a model that can evaluate software in other languages. Quamoco, QATCH, and PIQUE all use the ISO/IEC 25010 standard as abstract guidelines for their models. Quamoco, the first of the three models, was found to be effective but slow and complex. QATCH improved upon this by being effective and simple. PIQUE improved upon both Quamoco and QATCH by creating a framework that allows for easily building quality models by focusing on reducing the resources needed to build quality models. PIQUE focuses more on flexibility and extensibility. PIQUE Mechanisms Thresholding Functions: Threshold values are created through a benchmarking process that begins with a group of similar projects to the one which the model will be applied. The various utility functions that PIQUE offers will be detailed below. The tools of the model are then applied to all the projects within this benchmark repository; this provides an estimate 20 of the number of findings that can be expected in the average project [22]. The benchmarking creates threshold values. These are the maximum and minimum values found in the benchmark repository that PIQUE uses by default. Custom utility functions can be written to change the calculation of these threshold values, as was done in PIQUE-Bin. In PIQUE-Bin, the threshold values are calculated as the mean plus and minus the standard deviation of the values in the benchmark repository [22]. We will discuss how the thresholding functions in the PIQUE-C#-Sec model in Chapter 5. Weighting Strategy: Edges between the TQI/Quality Aspect nodes and the Quality Aspect/Product Factor nodes can be custom weighted which will affect how they are aggregated into the next layer of the model. This process is done using comparison matrices to rank the importance of the various nodes relative to each other [22]. We will discuss which weighting strategy is used in the PIQUE-C#-Sec model in Chapter 5. Normalizing Process: Normalization in PIQUE divides the value of a Measure node in the model by the lines of code in the project. The DefaultNormalizer is the default normalizer used in PIQUE, which returns the Measure node value divided by the normalizer value (which is the lines of code of the project). Normalizing allows us to take the different size of projects being evaluated into account. We normalize due to the size of the system directly influencing the number of findings that occur since we are evaluating projects in C# source code. Utility Functions: Utility functions in the PIQUE model are created through the benchmarking process. The utility function for a measure takes the measurement of some diagnostic for a specific software project and outputs a value based on how it compares 21 to other software projects in the benchmark repository [22]. PIQUE uses the Linear Interpolation utility function by default, which reports the minimum and maximum value found within the benchmark repository for each measure. We will discuss which utility functions are used in the PIQUE-C#-Sec model in Chapter 5. PIQUE-C# PIQUE-C#4 is an operationalized PIQUE model for the assessment of quality in projects with C# source code. This project integrates the C# static analysis framework tool, Roslynator5, and provides example extensions of the default weighting, benchmarking, normalizing, and evaluation strategies provided by PIQUE. PIQUE-Bin PIQUE-Bin6 is an operationalized PIQUE model for the assessment of security quality in binary files. A few of the ways in which PIQUE-Bin has paved the way for PIQUE- C#-Sec include implementing weighting between the TQI/Quality Aspect and Quality Aspect/Product Factor layers, linking multiple static analysis tools to the quality model, and developing custom utility functions in the calibration and evaluator classes. An example of part of the PIQUE-Bin hierarchical model structure is shown in Figure 3.3. 4https://github.com/MSUSEL/msusel-pique-csharp 5https://github.com/JosefPihrt/Roslynator 6https://github.com/MSUSEL/msusel-pique-bin 22 Figure 3.3: An example of a PIQUE-Bin security model structure [22]. Other Quality Models The Quamoco and QATCH quality models directly shaped the development of PIQUE. However, there are other quality models and operational quality models that make up the supporting work in quality modeling as well. We will summarize some of those here and tie them back into the PIQUE model where applicable. SQUID The SQUID model defines the software requirements of the Telescience project [24]. The SQUID model uses the ISO/IEC 9126 standard for software quality. The key elements in the SQUID model are product behavior, statement of quality requirements, quality characteristics, quality sub characteristics, internal software property, and measures [34]. The distinction of the SQUID model is that in addition to the quality characteristics, it also considers the operational behavior of the product to derive the quality 23 requirements [34]. When applying the SQUID model to the Telescience project, a number of practical problems were identified that were inherent in the current version of ISO/IEC 9126 [24]. This and problems from other models are likely what caused the shift in more modern quality models (such as Quamoco and QATCH) to use the ISO/IEC 25010 standard. Factor-Strategy Quality Model The goal of the Factor-Strategy Quality model is to demonstrate that the gap between qualitative and quantitative statements can be bridged [28]. This model is defined by quality factors being expressed in a set of quantifiable rules that identify violations of design principles, rules, and heuristics [28]. This is similar to the PIQUE model because PIQUE also decomposes quality factors into principles, rules, and heuristics (Product Factors, Measures, and Diagnostics). The Factor-Strategy model makes the construction of the quality model easier because the quality of the design is naturally and explicitly linked to the principles below [28]. It also allows the quality factors within the model to be described in a concrete manner with respect to the programming paradigm [28]. This model introduces the concept of detection strategies in the quality model to formulate good design rules and heuristics in a quantifiable manner [34]. DEQUALITE DEQUALITE, or Design Enhanced QUALITy Evaluation is a method to build quality models to measure the quality of object-oriented systems by taking into account both their internal attributes and their designs [23]. There are four steps in their method to build a quality model [23]: 1. Identify a set of high-level quality attributes 24 2. Identify and classify the most significant, tangible, quality-carrying properties of the system 3. Propose a set of axioms for linking product properties to quality attributes 4. Evaluate the model, identify its weaknesses, and either refine it or scrap it and start again The key elements in the DEQUALITE model are quality attributes, quality metrics, design pattern/design rule, and the product or system [34]. SQALE SQALE, or the Software Quality Assessment Based on Lifecycle Expectations, is a model used to estimate both the quality and technical debt of an application source code [27]. The SQALE quality model is used for formulating and organizing the non-functional requirements that relate to code quality [27]. This quality model consists of three hierarchical levels: characteristics, sub-characteristics, and requirements that relate to the source code’s internal attributes. The SQALE quality model asks users or stakeholders to establish their own concrete definition of “right code” [27]. This is similar to PIQUE in which the users or stakeholders must decide which nodes within the hierarchical model structure are of most importance relative to their project under analysis. One area for future work that SQALE has identified is the fact that their users are expecting a standardized definition on what “right code” is [27]. This could also be something we will see in PIQUE as more stakeholders and users start testing our models. QMOOD QMOOD, or the Quality Model for Object Oriented Design, is the hierarchical model that defines relation between quality attributes and design properties with the help of 25 equations [14]. QMOOD metrics are subjective in nature, but with the relationship between quality attributes and design property being defined, these quality attributes can be calculated and aggregated [14]. There are four levels in the QMOOD model. These layers are Design Quality Attributes, Object oriented design Properties, Object oriented design Metrics, and Object oriented design Components [14]. This is similar to the PIQUE hierarchical model structure, which also has four layers under the root node. 26 RESEARCH GOALS Motivation Modern quality models do not provide adequate support for sole-security characteristics. Many of the current quality models such as Quamoco, QATCH, and PIQUE incorporate security into their quality models as one of the quality aspect nodes, but security is usually evaluated based on a single metric or several metrics. Security is a complex concept, so there exists a need for a dedicated security quality model with more advanced evaluation. A security quality model is structured similarly to the PIQUE quality model, with a Total Security Index (TSI), Security Aspects, Product Factors, Measures, and Diagnostics. Goal Question Metric Basili [10] established an approach called the Goal Question Metric to break up research goals into their respective questions that need to be answered to meet the goal, and the metrics that need to be measured and gathered to answer the questions. We are using this approach as a tool to frame our research goal, questions, and metrics for our security quality model. These goals, questions, and metrics are outlined below: Goal: Design and develop a security quality model that helps stakeholders assess the security of C# source code projects. • Q1 What attributes in C# source code projects result in a different Total Security Index and different Security Aspect scores? (Chapter 6) – M1 Attributes that are tagged in benchmark data (project size in lines of code, project source, and project type) 27 – M2 Production of Benchmark Attribute Table – M3 Production of TSI/Security Aspect Impact Table • Q2 How effective are the selected static analysis tools at measuring and reporting security vulnerabilities from the CWE Top 25 Most Dangerous Software Weaknesses for 2021 list? (Chapter 6) – M4 Number of CWEs in model compared to the CWE Top 25 Most Dangerous Software Weaknesses for 2021 list – M5 Production of PIQUE-C#-Sec tool output files – M6 Production of a PIQUE-C#-Sec model description JSON file • Q3 What is the impact on the Total Security Index and Security Aspects for each single vulnerability? (Chapter 6) – M7 Production of TSI/Security Aspect Impact Table – M8 Production of PIQUE-C#-Sec tool output files The tables mentioned within each question’s metrics are included in Appendix D and Appendix E. These include the Benchmark Attribute Table (Table D.1) and the TSI/Security Aspect Impact Table (Table E.1). 28 PIQUE-C#-SEC DEVELOPMENT This chapter will detail how we created our own security quality model to address our research goal of designing and developing a security quality model that helps stakeholders assess the security of C# source code projects. This security quality model is the PIQUE- C#-Sec model and was developed by using the PIQUE framework. This chapter explains the development of the PIQUE-C#-Sec model over the course of two sections: gather requirements, and development. Gather Requirements The requirements for the PIQUE-C#-Sec model were gathered by working with our stakeholders directly. Their approval was obtained on design choices such as the static analysis tools used, the Security Aspects of the model, and the linkage between Security Aspect and Product Factor layers. Another component of running the PIQUE-C#-Sec model that had to be achieved was defining a set of benchmark repositories to use for the model derivation. The benchmark repository used for the PIQUE-C#-Sec model contains a combination of open source projects written in C# and closed source projects written in C#. The open source benchmark projects were gathered by searching GitHub1, filtering by the C# language, and sorting by the most stars, or most popular projects [37]. This gathering was conducted in July of 2019, as these were also the benchmark repositories used for PIQUE-C#. This list of open source benchmark projects can be found at the MSUSEL Benchmarks GitHub page2. The closed source benchmark projects were downloaded from our stakeholders’ public 1https://github.com 2https://github.com/MSUSEL/benchmarks/tree/main/csharp-opensource 29 GitLab3 repositories. These closed source projects are similar to what PIQUE-C#-Sec will be used to analyze, so it is important to have closed source projects represented in our benchmark repositories so that they serve as a valid benchmarker for the final model. Development There exists a lack of quality and security standards or guidelines for contractors to follow when writing their code. The development of the PIQUE-C#-Sec model was based on previous work done in the exploratory study PIQUE-C# model and the PIQUE-Bin model [22]. An exploratory study was conducted on developing a hierarchical model that measures the security of a C# project that first focused on a bottom-up model design. This exploratory PIQUE-C# model can be referenced in Appendix A. However, while this hierarchical model allows for great precision with respect to the project repository being analyzed, this specialized PIQUE-C# model is not generalizable. It will not work for any other project, and therefore every project that needs to be analyzed will need to have a model design file manually created. This results in the models being very time-consuming to create and not scalable as the number of projects increases. Therefore, the next and final model design was a top-down approach which allows for more generalizability and maintainability across projects. The PIQUE-C#-Sec model differs from the PIQUE-C# model because the PIQUE- C# model was designed using a bottom-up approach while the PIQUE-C#-Sec model was designed using a top-down approach. Additionally, the PIQUE-C# model was designed to focus on evaluating the quality of source code while the PIQUE-C#-Sec model was designed to focus on evaluating the security quality of source code. 3https://about.gitlab.com/ 30 This resulted in different static analysis tools being selected that focus on security- related findings, and the nodes in the hierarchical structure of the model changed to focus on security-related Security Aspects, Product Factors, and Measures. The TQI and Quality Aspects from the PIQUE-C# model changed to TSI and Security Aspects in the PIQUE- C#-Sec model. The development of the PIQUE-Bin model resulted in some new technical advancements for the PIQUE-C#-Sec model to utilize. These advancements include implementing weighting between the TQI/Quality Aspect and Quality Aspect/Product Factor layers, linking multiple static analysis tools to the quality model, and developing custom utility functions in the calibration and evaluator classes. However, the PIQUE-Bin model differs from the PIQUE-C#-Sec model because the PIQUE-Bin model was designed to evaluate the security quality of binaries while the PIQUE- C#-Sec model was designed to evaluate the security quality of C# source code. Additionally, PIQUE-C#-Sec differs from PIQUE-Bin in that it utilizes normalization. These differences resulted in different static analysis tools needing to be integrated into each respective model, and the nodes in the lower half of the hierarchical structure of the model (Diagnostic and Measure layers) changed based on the tool output. Additionally, the PIQUE-Bin model mapped its Quality Aspect nodes to the properties from the Microsoft STRIDE model, which can be reviewed in Table 2.1, while the PIQUE- C#-Sec model maps its Security Aspects to both the Microsoft STRIDE model and the ISO/IEC 25010 standard [20] (Figure 2.2). Model Structure and Design The model structure for the PIQUE-C#-Sec model, as mentioned previously, is designed using a top-down approach. A partial view of the PIQUE-C#-Sec model can be seen in Figure 5.1. This is just a snippet from the middle of the model. 31 Figure 5.1: Partial PIQUE-C#-Sec Model At the top level, the root node is Total Security Index (TSI). Below are the Security Aspects which are taken from the Security category in the ISO/IEC 25010 standard [20] and the properties from the Microsoft STRIDE model4. There is some overlap between the two sources, and in total they create seven Security Aspect nodes. All seven nodes are connected to the root node or the TSI node. These Security Aspect nodes consist of confidentiality, integrity, non-repudiation, authenticity, accountability, availability, and authorization. These Security Aspect nodes are defined in Table 5.1. These definitions come from the ISO/IEC 25010 standard [20]. The next layer, or Product Factor nodes, are made up from the rules of the Security 4https://docs.microsoft.com/en-us/azure/security/develop/threat-modeling-tool-threats 32 Code Scan diagnostics5. These Product Factor nodes consist of injection, cryptography, cookies, view state, request validation, password management, and other. One more Product Factor node was added with the addition of Insider6, our second tool, to fit the mapping from Insider’s diagnostics. This additional Product Factor is access control. These Product Factor nodes are defined in Table 5.2. Research was done on all eight of these Product Factor nodes to determine how to link them to the Security Aspect nodes. This work was approved by our stakeholders, which helps validate these design choices. Both of our tools find and report diagnostics, and these tools’ output make up the PIQUE-C#-Sec Diagnostics layer. There are 31 diagnostics found from Security Code Scan, and 28 diagnostics found from Insider. While all 59 nodes at the Diagnostic layer are too many to show in a figure, they are shown in Appendix B. The diagnostic IDS and descriptions for the Security Code Scan rules are taken from the Security Code Scan documentation7 and the diagnostic IDs and descriptions for the Insider rules are taken from the Insider documentation8. This still leaves one layer left: the Measures layer that connects the Diagnostics to the Product Factors. The Measures layer is made up of the CWEs9 associated with each Diagnostic. For each Security Code Scan diagnostic, a CWE is linked under the references section which is what Measure that Diagnostic maps to. For each Insider diagnostic, it is mapped to its parent CWE. Each CWE in the CWE documentation lists a “ParentOf” relationship for that CWE unless it is a pillar CWE. None of our CWE diagnostics are pillar CWEs. An example of each of these tool mappings from the Diagnostic to the Measure layer is shown in Figure 5.2. 5https://security-code-scan.github.io/ 6https://github.com/insidersec/insider 7https://security-code-scan.github.io/#Rules 8https://github.com/insidersec/insider/blob/master/rule/csharp.go 9https://cwe.mitre.org/index.html 33 Table 5.1: Security Aspect Node Definitions Node Description Confidentiality Degree to which a product or system ensures that data are accessible only to those authorized to have access Integrity Degree to which a system, product or component prevents unauthorized access to, or modification of, computer programs or data Non-repudiation Degree to which actions or events can be proven to have taken place so that the events or actions cannot be repudiated later Authenticity Degree to which the identity of a subject or resource can be proved to be the one claimed Accountability Degree to which the actions of an entity can be traced uniquely to the entity Availability Degree to which a system, product or component operates as intended despite the presence of hardware or software faults Authorization Individuals, entities, or processes only have access to data and processes they should Therefore, CWEs make up the PIQUE-C#-Sec model’s Measure layer. The list of CWEs are sponsored by national security organizations such as DHS10, CISA11, and MITRE12. 10https://www.dhs.gov/ 11https://www.cisa.gov/cybersecurity-division 12https://mitre.org/ 34 Table 5.2: Product Factor Node Definitions Node Description Access control Dictates who is allowed to access and use company information and resources Injection Attacker supplied untrusted input to a program Cryptography Prevents unauthorized access to information and keeps data safe Cookies Text files with small pieces of data that are used to identify your computer as you use a computer network View state Poses a security risk when information of view state can be seen in the page output source directly Request validation Feature in ASP.NET that examines HTTP requests and determines whether they contain potentially dangerous content Password management Most important feature to look for in a password manager is advanced encryption Other Encompasses any topics not covered by other Product Factors PIQUE-C#-Sec Mechanisms In Chapter 3, we defined the functions within PIQUE including benchmarking, weighing, normalization, thresholding, and normalizing. Here we will detail which strategies are used for each of these mechanisms. Thresholding Functions: PIQUE offers a NaiveBenchmarker and a BinaryBenchmarker. The NaiveBenchmarker calculates the lowest and highest of each Measure value (or the minimum and maximum values) for the threshold values. The BinaryBenchmarker calculates the mean plus or minus the standard deviation of each Measure value for the threshold values. 35 Figure 5.2: Tool mapping examples from the Measure to Diagnostic layer for each of our static analysis tools in the PIQUE-C#-Sec model. We decided to use the BinaryBenchmarker for PIQUE-C#-Sec because it was the most recently adapted thresholding function at the time that PIQUE-C#-Sec was developed. We thought that the function would provide value in the fact that the calculation was more specific than just using the minimum and maximum values. However, as we will see with our model validation in Chapter 6, this choice in thresholding function may be limiting the overall range of TSIs our model can produce. By using the NaiveBenchmarker, we can have a wider range in threshold values in our model which could allow for greater flexibility and a more accurate reflection of TSI values. 36 Weighting Strategy: PIQUE offers a NaiveWeighter and a BinaryCWEWeighter. The NaiveWeighter sets each node’s incoming edge weights equal to the average of all incoming edges. The BinaryCWEWeighter is a strategy that completes the edge weighting with comparison matrices. We are using the NaiveWeighter for PIQUE-C#-Sec based on our stakeholder’s request. We asked our stakeholders if they would like to rank the Security Aspects and Product Factors, and we also offered to rank those nodes for them. However, they requested that the nods be left equally weighted. If this request changes in the future, the BinaryCWEWeighter can be used as the weighting strategy. This is a more effective weighter in the case that there are clear preferences from the user with regards to which nodes should be ranked of higher importance than other nodes. Normalizing Process: PIQUE offers a DefaultNormalizer and a NoNormalizer. The DefaultNormalizer returns the Measure node value divided by the normalizer value (which is the lines of code of the project). NoNormalizer always returns a value of 1, as no normalization occurs. PIQUE-C#-Sec uses the DefaultNormalizer as we are dealing with C# source code. This means that each C# source code project does contain its size in lines of code. Therefore, it makes sense for us to normalize in this model so that our model can take the different size of projects being evaluated into account. We normalize due to the size of the system directly influencing the number of findings that occur since we are evaluating projects in C# source code. Utility Functions: PIQUE offers three utility functions: DefaultUtility, GAMUtilityFunction, and GaussianUtilityFunction. The DefaultUtility provided by PIQUE uses linear interpolation. 37 For each Diagnostic in the PIQUE-C#-Sec model, we created a histogram with that Diagnostic’s number of findings compared to the frequency of the findings. We then went through each Diagnostic and visually observed the histogram to decide which utility function was best based on each Diagnostic’s plots. Out of PIQUE-C#-Sec’s 59 Diagnostic nodes, 25 had non-zero findings, so therefore plots could be created for these 25 Diagnostics. Of those 25, 9 were mapped to the DefaultUtility based on histogram shape, and the remaining 16 were mapped to the GAMUtilityFunction. Since the GAMUtilityFunction was the mapping for most of our observed Diagnostics, we also assigned our zero finding Diagnostics to the GAMUtilityFunction as well. Tools This subsection will detail the process of filtering through static analysis tools and describing this selection process. The effort to obtain and install these tools can be seen as excessive by software developers or support teams [26]. Configuring tools so that the output produced is useful can also be an issue [26]. Some tools have too much documentation, while some tools have little to no documentation, and both are issues when trying to identify tools to use in the model [26]. Whereas the exploratory PIQUE-C# model used Roslynator and the other tools that Roslynator allows to run through its interface (Security Code Scan and VS-Threading), the PIQUE-C#-Sec model only links findings from Security Code Scan13 and Insider14. PIQUE-C#-Sec removed Roslynator as a tool to find diagnostics because Roslynator and VS-Threading diagnostics focused on code quality, while Security Code Scan and Insider 13https://security-code-scan.github.io/ 14https://github.com/insidersec/insider 38 focus on security-related findings. Neither Roslynator nor VS-Threading have findings that are related to security. Roslynator has findings that relate to quality attributes (functional suitability, performance, compatibility, usability, reliability, maintainability, portability) which is why it was chosen for the PIQUE-C# quality model. However, Roslynator does report lines of code, which the PIQUE-C#-Sec model still utilizes to complete its normalization. While Roslynator is used to report the lines of code, it does not report any diagnostics for the PIQUE-C#-Sec model. Measures contain a method for normalizing the diagnostic values, and this process is done through a utility function [22]. Normalization is calculated after the measures are derived from a benchmark repository and before the evaluation. The PIQUE-C#-Sec model uses lines of code to normalize the diagnostic values. This divides the value of the node by the lines of code, which normalizes each node by the lines of code in the system [37]. Roslynator counts the physical lines of code in the specified project or solution15. This differs from logical lines of code because physical lines of code include the number of lines of code read in the source code, while logical lines of code include the total number of instructions. This means that for logical lines of code, there could be several instructions per physical line of code. We used two static analysis tools in the PIQUE-C#-Sec model to cover more security vulnerabilities and increase the validity of the model. Specific criteria were used when searching for static analysis tools. For the PIQUE- C#-Sec model, these criteria were as follows: • The tool is compatible with source code written in the language C# • The tool is free or open source 15https://github.com/JosefPihrt/Roslynator/blob/master/docs/cli/loc-command.md 39 • The tool is a command line tool (not an analyzer/IDE plugin/extension) • The tool has security-related rules or findings • The tool has a way to output results • The tool has good documentation A paper that explored assessing security technical debts included a list of selected static analysis tools that are used to help identify design vulnerabilities in smart contracts [3]. The tools the paper explored after filtering out options by a set of criteria were as follows: Slither, SmartCheck, Securify, Mythril, sFuzz, Solhint, Ethlint, and Mythos. However, none of these tools were viable for the PIQUE-C#-Sec model for two main reasons: almost all the tools in the paper were compatible with source code written in Solidity, not C#, and those tools were targeting design vulnerabilities, not security vulnerabilities. Many tools were researched and found from static analysis tool lists. The lists that we researched and investigated were OWASP Source Code Analysis Tools16, NIST Source Code Security Analyzers17, Analysis Tools Static Analysis Tools18, and List of Tools for Static Code Analysis19. Between these four resources and other research (which consisted of two additional tools found from references in the previously found 27 tools), a total of 29 tools were considered after filtering the list by language (C#) and free/open source tools. After investigating these 29 tools, two were chosen for the PIQUE-C#-Sec model. This reduction was due to filtering out the tools based on our other criteria (the tool is a command line tool, has security-related rules or findings, has a way to output results, and has good documentation). 16https://owasp.org/www-community/Source_Code_Analysis_Tools 17https://www.nist.gov/itl/ssd/software-quality-group/source-code-security-analyzers 18https://github.com/analysis-tools-dev/static-analysis 19https://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis 40 The main takeaways after looking at all 29 tools are that there are a lot of analyzers that integrate with IDEs but are not command line tools, and there are a lot of tools advertised as being free or open source when they only are offering a free trial or demo. All 29 tools are listed in Table 5.3, with a legend preceding the table with the detailed reason as to why each tool was ruled out and not used in the PIQUE-C#-Sec model. Table Legend: 1. Analyzer library/IDE extension 2. Tool has not been updated in at least over a year, the repository is marked as archived, or the tool is retiring in the near future 3. Tool does not appear to have security-focused rules/findings 4. Not a command line tool - may be a GUI or application 5. Not fully free/open source; only offers free demo or trial 6. Tool has too little documentation to be of use 7. Tool depends on registering for an organization account 8. Tool’s rules are encompassed by another tool used in the model Additionally, a cell value of “N/A” indicates that the tool is used in the PIQUE-C#-Sec model, and therefore was not ruled out. 41 Table 5.3: Static analysis tools researched for potential use in the PIQUE-C#-Sec model and what criteria it violated. Source Tool Name Reason why it was Ruled Out OWASP Source Code Analysis Tools LGTM 1 Microsoft FxCop 2 Puma Scan Professional 5 PVS-Studio 5 SonarCloud 3 VeraCode 4 VisualCodeGrepper 6 Coverity 5 AppScan 3 Klocwork 5 ShiftLeft 7 Agnitio 6 SonarQube 5 HCL AppScan on Cloud 3 Insider N/A NIST Source Code Security Analyzers Security Code Scan N/A Analysis Tools Static Analysis Tools .NET Analyzers 1 ArchUnitNet 3 code-cracker 1 CSharpEssentials 2 Infer# 1 Roslynator 8 Wintellect.Analyzers 1 42 Table 5.4: Static analysis tools researched for potential use in the PIQUE-C#-Sec model and what criteria it violated continued. Source Tool Name Reason why it was Ruled Out List of Tools for Static Code Analysis ConQAT 2 .NET Compiler Platform (Roslyn) 1 Sourcetrail 4 StyleCop 1 Other Research SonarAnalyzer.CSharp 1 Microsoft Security Code Analysis 2 43 MODEL VALIDATION The PIQUE-C#-Sec model has now been fully designed and developed, but we must build trust in the PIQUE-C#-Sec model and its ability to evaluate C# source code and investigate what the model output means. This can be achieved by validating the model. The PIQUE-C#-Sec model has been validated using three approaches. First, we analyzed PIQUE-C#-Sec’s benchmark projects and the attribute data associated with those projects to determine if any attributes in C# source code projects result in a different TSI and different Security Aspect scores. Next, we observed how effective the selected static analysis tools within the PIQUE- C#-Sec model are at measuring and reporting security vulnerabilities by comparing all 59 Diagnostic nodes in our model to the CWE Top 25 Most Dangerous Software Weaknesses for 2021 list. Finally, we analyzed the impact that each single vulnerability within our model had on the TSI and Security Aspect node values. We can see our study design in Figure 6.1. Analysis of Benchmark Data We applied an analysis of the benchmark data to answer Q1 from our Goal Question Metric in the Approach chapter. Q1 asked what attributes in C# source code projects result in a different Total Security Index and different Security Aspect scores? The first step in analyzing the benchmark data involves identifying project characteristics (size in lines of code, type, and source). The type of the benchmark can either be a library or an application. The source of the benchmark can either be open source or closed source. The open source benchmark projects were the same ones used in Chapter 5 from the MSUSEL 44 Figure 6.1: Study Design for our PIQUE-C#-Sec Model Validation Benchmarks GitHub page1. The closed source benchmark projects were downloaded from our stakeholders’ public GitLab2 repositories. To summarize some descriptive statistics from the lines of code attribute of the benchmark projects, a total of 72 projects were used, with sizes ranging from 11 to 286,151 lines of code. Alves [5] presents a methodology for the calibration of mappings from code- level measurements to system-level ratings and in this approach, also measures size using 1https://github.com/MSUSEL/benchmarks/tree/main/csharp-opensource 2https://about.gitlab.com/ 45 lines of code. The total lines of code for all benchmark projects was 1,747,620 lines of code. Of the 72 benchmarks, 26 are open source and 46 are closed source. Additionally, of the 72 benchmarks, 42 were of type library and 30 were of type application. Appendix D shows a summary of all these results by detailing each project in the benchmark repository, their size in lines of code, their source, and their type. The project names of the closed source projects have been renamed so as not to compromise any sensitive information from the stakeholder that provided us access to these projects. The first plot that was created with this benchmark attribute data was a plot to observe the TSI of each benchmark project compared with that project’s size in lines of code. This plot is shown in Figure 6.2. Since we used the DefaultNormalizer in PIQUE, this means that the TSI is normalized with respect to the size in lines of code. As we can see in Figure 6.2, the benchmark projects that are below a certain size appear to have little to no influence on the project’s TSI. This suggests that with the projects below a certain size, either the PIQUE-C#-Sec model tools are not detecting many vulnerabilities, or that the projects do not contain many vulnerabilities due to their small size. Additionally, from Figure 6.2, we can see that the lowest TSI we are observing from our benchmark data is a value of around 0.6. This could mean that our benchmark projects are generally of average or good security quality when compared against each other. Based on this theory that all the benchmark projects are representative of good security quality, we could seek out projects that are of known poor security quality and include them within the benchmark repository to see how they compare against the projects we have already selected. Another approach we could implement to try to observe TSI scores closer to 0.0 could be to use the DefaultBenchmarker instead of the BinaryBenchmarker. Taking the minimum and maximum values as the thresholds (DefaultBenchmarker) allows for more range than taking the mean plus and minus the standard deviation (BinaryBenchmarker), so this could 46 result in node values aggregating upward into a lower TSI score. Figure 6.2: Observing the TSI of each benchmark project compared with that project’s size in lines of code. We wanted to break Figure 6.2 out into smaller subsets of the other measured attributes. We created a plot to show the effect of each combination of the categorical attributes as a function of TSI and size. Figure 6.3 shows this data by separating it into four quadrants: closed source projects categorized as libraries (top left), closed source projects categorized as applications (bottom left), open source projects categorized as libraries (top right), and open source projects categorized as applications (bottom right). After looking at Figure 6.3, what we see is that the open source software appears fundamentally different than the closed source software based on the distribution of the data. The two plots on the right half of Figure 6.3 (the open source projects) have points ranging across the plot, and we can see that it is possible to fit a linear regression line to the 47 Figure 6.3: Observing Each Attribute Subset as a Function of TSI and Size data. However, in the two plots on the left half of Figure 6.3 (the closed source projects), all points are clustered in the top left corner of the plot and do not appear to fit a linear regression line. The justification for analyzing open and closed source data separately is that they come from two different populations. We can see from Figure 6.3 that there was not a lot of variation in the closed source project points. All the projects primarily were of small size and high TSI. This goes back to our earlier observation that projects below a certain size appear to have little to no evidence on the project’s TSI. We have created histograms for the frequency of the TSIs in open source projects 48 (Figure 6.4) and in closed source projects (Figure 6.5). Figure 6.4: Histogram of Open Source Project TSIs Figure 6.5: Histogram of Closed Source Project TSIs From these two histograms (Figure 6.4 and Figure 6.5), we can show that the distributions are fundamentally different. The differences between the two histograms shows 49 that we clearly do not have normally distributed data. Therefore, we decided to subset our attribute data by either open or closed source. We have fit a linear model to both open and closed source data to observe if there are any significant attributes that have an effect on the TSI. Before we begin fitting these linear models, we need to define our research question and hypotheses. Our research question is: What attributes in C# source code projects result in different security vulnerabilities and a different Total Security Index? To answer this question, our null hypotheses are as follows: • H10: There is no correlation between the size (lines of code) of the C# project under analysis and the TSI. • H20: There is no correlation between the type (library or application) of the C# project under analysis and the TSI. • H30: There is no correlation between the interaction between size and type of the C# project under analysis and the TSI. Finding a significant correlation (p < 0.05) between an attribute or the interaction of attributes and the TSI will lead us to reject the null hypothesis for that attribute or interaction of attributes [22]. To test our three null hypotheses stated above, we created a linear model for open source projects that is the TSI as a function of the interaction between size and type. We are subsetting the data to only include projects that are open source. This linear model is as follows: TˆSI = β0 + β1size+ β2type+ β3size ∗ type, (6.1) 50 Where β0 is the y-intercept and β1, β2, and β3 are all regression coefficients. This linear model is represented as follows in R code: lm(TSI ~ Size * Type, data = attribute[attribute$Source == "Open",]) After fitting the above linear regression model, we must assess to what degree the assumptions are violated since the assumptions can never be fully met. The diagnostic plots for this model are shown in Figure 6.6. Figure 6.6: Open Source Linear Model Diagnostic Plots From the context of our research question, we know that we have not violated independence to a great degree. There is some evidence against normality. From these Q-Q plot in Figure 6.6, we are seeing that we are getting standardized residual values way below -2. The points do seem to generally follow the dashed line in the Q-Q plot, with a bit of left skew or negative skew. 51 In this asymmetric distribution, since the distribution has negative skew, this indicates that the distribution has a longer left tail [12]. When looking our histogram for the frequency of TSIs in open source projects (Figure 6.4), we can see a longer left tail and that most of the data is concentrated on the far right side of the histogram, with values trailing through the left tail. There is also some evidence against linearity and equal variance, as neither the residuals vs. fitted plot nor the scale-location plot follow a strictly horizontal line. However, both lines overall closely follow a horizontal line. Additionally, our open source models only contain 26 repositories. This relatively small sample size could explain some of the randomness in these plots. Our model should be robust enough to allow for a small amount of evidence against the assumptions [22]. Now that we have determined that the assumptions have not been violated to a great degree, we fit the linear model to observe if there are any significant variables. We have created a plot to map the linear regression line on the data, separating the open source project data into their two respective types: library and application. This is shown in Figure 6.7. As we can see in the plot, our y-intercept for open source projects of type application is 0.89 and our y-intercept for open source projects of type library is 0.88. This means that given an open source project of size 0 lines of code, the regression lines predict an average TSI of 0.89 for open source projects of type application and an average TSI of 0.88 for open source projects of type library. In the context of our model, we would not have a source code project with 0 lines of code. Therefore, the y-intercepts do not have practical meaning; they are just used to measure where our regression lines cross the y-axis. The slope for type application projects is steeper than the slope of type library projects, with the slope of type application being -7.129e-7 and the slope of type library being -1.864e- 7. These slopes indicate that for every 100,000 additional lines of code in an open source 52 Figure 6.7: TSI for Different Attribute Types in Open Source Projects project, the predicted TSI decreases by about 0.07 in type application projects and decreases by about 0.02 in type library projects. After fitting our linear model for open source projects, we can see from Table 6.1 there is only one variable that has a p-value less than 0.05. This is denoted with an asterisk (*) after the value within Table 6.1 for easy identification. Table 6.1 contains our linear model coefficient p-values from open source projects. These are separated into columns based on the model coefficient: intercept, size, type, and the interaction between size and type. When looking at the linear regression results in open source projects for the size attribute, we found that there is strong evidence against the null hypothesis that there is no correlation between the size (lines of code) of the C# project under analysis and the TSI (p-value = 0.0373). 53 Table 6.1: TSI Linear Model Coefficient P-Values from Open Source Projects Intercept Size Type Size:Type TSI 2.00E-16 0.0373* 0.9392 0.2435 When looking at the linear regression results in open source projects for the type attribute, we found that there is there is little to no evidence against the null hypothesis that there is no correlation between the type (library or application) of the C# project under analysis and the TSI (p-value = 0.9392). Finally, when looking at the linear regression results in open source projects for the interaction between the size and type attributes, we found that there is there is little to no evidence against the null hypothesis that there is no correlation between the interaction between size and type of the C# project under analysis and the TSI (p-value = 0.2435). Based on these results, we may therefore reject the null hypothesis H10, but we may not reject H20 and H30 for our open source linear models. Additionally, we have obtained an adjusted R-squared value of 0.1197, which means that our linear model on open source projects accounts for 11.97% of the variance. To test our three null hypotheses, we created a linear model for closed source projects that is the TSI as a function of the interaction between size and type. We are subsetting the data to only include projects that are closed source. This linear model is as follows: TˆSI = β0 + β1size+ β2type+ β3size ∗ type, (6.2) Where β0 is the y-intercept and β1, β2, and β3 are all regression coefficients. This linear model is represented as follows in R code: 54 lm(TSI ~ Size * Type, data = attribute[attribute$Source == "Closed",]) After fitting the above linear regression model, we assessed to what degree the assumptions are violated since the assumptions can never be fully met. The diagnostic plots for this model are shown in Figure 6.8. Figure 6.8: Closed Source Linear Model Diagnostic Plots From the context of our research question, we know that we have not violated independence to a great degree. And from these plots, we can conclude that neither normality nor linearity appear to have been violated to a great degree. The points seem to generally follow the dashed line in the Q-Q plot, with a bit of left skew or negative skew. In this asymmetric distribution, since the distribution has negative skew, this indicates that the distribution has a longer left tail [12]. When looking our histogram for the frequency of TSIs in closed source projects (Figure 6.5), we can see a longer left tail and that most of the data is concentrated on 55 the far right side of the histogram, with values trailing through the left tail. There is some evidence against equal variance, as the scale-location plot does not follow a horizontal line. Our model should be robust enough to allow for a small amount of evidence against the assumptions [22]. Now that we have determined that the assumptions have not been violated to a great degree, we fit the linear model to observe if there are any significant variables. We have created a plot to map the linear regression lines on the data, separating the closed source project data into their respective types: library and application. This is shown in Figure 6.9. Figure 6.9: TSI for Different Attribute Types in Closed Source Projects As we can see in the plot, our y-intercept for closed source projects of type application is 1.0 and our y-intercept for open source projects of type library is 0.997. This means that given a closed source project of size 0 lines of code, the regression lines predict an average TSI of 1.0 for closed source projects of type application and an average TSI of 0.997 for 56 closed source projects of type library. In the context of our model, we would not have a source code project with 0 lines of code. Therefore, the y-intercepts do not have practical meaning; they are just used to measure where our regression lines cross the y-axis. The slope for type application projects is steeper than the slope of type library projects, with the slope of type application being -4.203e-8 and the slope of type library being -5.913e-7. These slopes indicate that for every 10,000 additional lines of code in a closed source project, the predicted TSI decreases by about 0.0004 in type application projects and decreases by about 0.006 in type library projects. After fitting our model and looking at Figure 6.9, we decided that it did not make sense to run statistics on this subset of data. Instead, we found it more interesting to look at the points in Figure 6.9 with the low TSIs and find out what potential differences there are with those projects’ source code. As we can see in the plot, nearly all the closed source project TSIs are very close to 1.0. We have three points in our plot below this TSI (the Project names are referenced from Appendix D): Table 6.2: Attributes for Closed Source Projects with Low TSIs Project Name TSI Size (loc) Type 41 0.960 549 Application 42 0.942 1731 Library 43 0.993 2701 Application We can rule out project type being the difference in these three projects compared to the rest of the closed source projects, as these three projects range in type. They also all are about in the middle of the range for size of closed source projects (11 to 13,654 lines of code for closed source projects) so they are not the smallest or largest projects in size. 57 We observed the source code to see if these three projects had more C# files to be analyzed than the other projects, but this did not appear to be the case (with many other closed source projects containing more and less C# files than these three). We also observed the number of external library calls in multiple closed source projects, but again, our three projects in Table 6.2 were not unique in this way either. The three project names are sequential in order, which indicates that they are similar to each other alphabetically. This is an interesting finding because the fact that all three projects are alphabetically sequenced means that they all are projects that relate to similar topics. Without revealing too much about our stakeholder’s repositories, all three repositories are named after either security or reports. This indicates that the closed source repositories that relate to security and reports may be more prone to lower TSIs. This is based on our initial observations and there is the potential for other similarities between these projects besides this to be the reason for their low TSIs. Now that we have created linear models for open and closed source projects to observe potential attribute effects on the TSI, we now want to look at the seven Security Aspect nodes to observe if there are any significant findings. We will only spend time discussing the attributes in each Security Aspect’s linear model that had a significant correlation (p < 0.05) between the attribute or interaction of attributes and the TSI. The coefficient p-values for each variable in the linear model for each Security Aspect for the open source projects is in Table 6.3, and the coefficient p-values for each variable in the linear model for each Security Aspect for the closed source projects is in Table 6.4. Table 6.3 and Table 6.4 contain our linear model coefficient p-values from open and closed source projects, respectively. These are separated into columns based on the coefficient: intercept, size, type, and the interaction between size and type. As we can see from Table 6.3, there are only two variables across the seven Security Aspect nodes that have p-values less than 0.05. These are the variables of size for authenticity 58 Table 6.3: Security Aspect Linear Model Coefficient P-Values from Open Source Projects Intercept Size Type Size:Type Authenticity 2.00E-16 0.00524* 0.88011 0.12034 Availability 5.82E-16 0.801 0.392 0.713 Authorization 2.52E-15 0.00619* 0.51575 0.05339 Confidentiality 2.10E-15 0.11 0.805 0.296 Accountability 2.00E-16 0.0779 0.7944 0.274 Non-repudiation 3.79E-14 0.0606 0.7701 0.506 Integrity 2.00E-16 0.138 0.996 0.386 with a p-value of 0.00524, and size for authorization with a p-value of 0.00619. They are denoted with an asterisk (*) after the value within Table 6.3 for easy identification. The linear models for authenticity and authorization also have higher adjusted R- squared values than the TSI linear model, being 0.2715 and 0.2145 respectively. This means that our linear model for authenticity accounts for 27.15% of the variation in that model, and our linear model for authorization accounts for 21.45% of the variation in that model. As we can see from Table 6.4, there are also two variables across the seven Security Aspect nodes that have p-values less than 0.05. These are the variables of type for availability with a p-value of 0.0362, and type for accountability with a p-value of 0.0362. They are denoted with an asterisk (*) after the value within Table 6.4 for easy identification. The linear models for availability and accountability do not have very high adjusted R-squared values, both at 0.0576. This means that our linear models for availability and accountability accounts for 5.76% of the variation in the models. 59 Table 6.4: Security Aspect Linear Model Coefficient P-Values from Closed Source Projects Intercept Size Type Size:Type Authenticity 2.00E-16 0.444 0.302 0.437 Availability 2.00E-16 0.3793 0.0362* 0.6618 Authorization 2.00E-16 0.984 0.46 0.463 Confidentiality 2.00E-16 0.998 0.868 0.601 Accountability 2.00E-16 0.3793 0.0362* 0.6618 Non-repudiation 2.00E-16 0.401 0.284 0.446 Integrity 2.00E-16 0.984 0.46 0.463 Count-Based Analysis of Chosen Tools This section of the PIQUE-C#-Sec model validation is conducted to answer Q2 from our Goal Question Metric in the Approach chapter. Q2 asked how effective are the selected static analysis tools at measuring and reporting security vulnerabilities from the CWE Top 25 Most Dangerous Software Weaknesses for 2021 list? We can use all the PIQUE-C#-Sec model nodes from our model description file and compare those against the CWE Top 25 Most Dangerous Software Weaknesses for 2021 list3 to get an idea for how many of the top 25 CWEs of 2021 our static analysis tools and model structure provide coverage for. The CWE Top 25 Most Dangerous Software Weaknesses for 2021 is a list by MITRE of the most common and impactful issues experienced over the previous two calendar years according to CWE’s documentation4. These weaknesses are dangerous because they are often easy to find and exploit. They can allow adversaries to completely take over a system, steal 3https://cwe.mitre.org/top25/archive/2021/2021_cwe_top25.html 4https://cwe.mitre.org/top25/archive/2021/2021_cwe_top25.html 60 data, or prevent an application from working. This list is a valuable community resource that can help users, testers, developers, researchers, project managers, security researchers, and educators provide insight into current and the most severe security weaknesses according to the CWE documentation5. The CWE Top 25 Most Dangerous Software Weaknesses for 2021 list is shown in Appendix C. There is an additional column indicating whether that CWE is represented as either a Diagnostic or Measure node within the PIQUE-C#-Sec model. If the CWE from the table is directly in the model, we indicate that with a “Yes” response in the column. We also traced each Top 25 CWE’s relationships in the CWE documentation to see if it has a “ParentOf”, “ChildOf”, or “CanPrecede” relationship with any nodes in our PIQUE-C#-Sec model. If this was the case, this is indicated with a “By relation” response in the column. The number of “Yes” responses totaled to 13. This means that our PIQUE-C#-Sec model has direct coverage for 52% of the CWE Top 25 Most Dangerous Software Weaknesses for 2021 list. When we add the “By relation” responses to this count, it increases to 19 which covers 76% of the Top 25 list. Having this level of coverage over the CWE Top 25 Most Dangerous Software Weaknesses for 2021 list indicates that our selected static analysis tools are effective at measuring and reporting security vulnerabilities since our tools cover over a majority of the Top 25 list. Sensitivity to Single Diagnostics We applied an analysis of the sensitivity to single diagnostics within the PIQUE-C#-Sec model to answer Q3 from our Goal Question Metric in the Approach chapter. Q3 asked what is the impact on the Total Security Index and Security Aspects for each single vulnerability? 5https://cwe.mitre.org/top25/archive/2021/2021_cwe_top25.html 61 To identify the impact that each vulnerability in the PIQUE-C#-Sec model has on the overall TSI and Security Aspects, we first run PIQUE-C#-Sec on a C# source code project. We observe the TSI and Security Aspect node values for this evaluation. Next, we take each of the Diagnostics in the PIQUE-C#-Sec model, as each represents its own vulnerability, and we add one instance of that diagnostic to the project’s output file. Now that the PIQUE-C#-Sec model has evaluated the project with one extra diagnostic injected manually, we record the change in TSI and change in each of the Security Aspect values. This will allow us to observe any potential impacts of each Diagnostic. This method of measuring single vulnerability impacts does not truly inject that vulnerable source code into the project; rather, we are modifying the output file to appear the way it would if our two static analysis tools would have found that vulnerability. This is because we want to measure the impact that finding those vulnerabilities would have on the PIQUE-C#-Sec node values if the vulnerabilities were found. For this section of the model validation, we are not trying to measure whether our static analysis tools can find the vulnerabilities. We are interested in ensuring that the PIQUE- C#-Sec model can recognize these vulnerabilities from the output files if they were to be found and perform accordingly. PIQUE-C#-Sec evaluated a closed source project which we will refer to as Project 42. For each of the 59 Diagnostics within the PIQUE-C#-Sec model, a PIQUE-C#-Sec model output file was created as if that vulnerability had been injected into Project 42. This allows us to measure the impact that each vulnerability has on the PIQUE-C#-Sec model nodes without conducting a true injection. Figure 6.10 shows the change in TSI for each individual vulnerability within the PIQUE- C#-Sec model when evaluating Project 42. The most impactful Diagnostics when measuring the change in TSI were CWE-554, SCS0017, SCS0021, SCS0022, SCS0023, SCS0024, and SCS0030. These Diagnostic descriptions are in Table 6.5. 62 It is also important to note that the bar heights in Figure 6.10 are not additive and do not sum to 1. If this was the case, it would be possible to receive negative TSIs which would not make sense for our model. Table 6.5: Most Impactful Diagnostics on the TSI Diagnostic Description CWE-554 ASP.NET Misconfiguration: Not Using Input Validation Framework SCS0017 Request Validation Disabled (Attribute) SCS0021 Request Validation Disabled (Configuration File) SCS0022 Event Validation Disabled SCS0023 View State Not Encrypted SCS0024 View State MAC Disabled SCS0030 Request Validation is Enabled Only for Pages (Configuration File) These seven Diagnostics are the only ones in the PIQUE-C-Sec model that eventually aggregate upward into three Product Factors. They all aggregate upward into View State, Request Validation, and Other. Most of the other Diagnostics only aggregate upward into one Product Factor, and several aggregate upward to two Product Factors. These seven Diagnostics aggregate upward to CWE-1173 at the Measure layer, which is Improper Use of Validation Framework6. CWE-1173 is also a listed member of the OWASP Top Ten 2021 list in the category of Insecure Design for 20217. As a member of the OWASP Top Ten 2021 list, it makes sense that this CWE-1173 would aggregate upward to three Product Factors and impact the TSI so greatly. The next seven figures show a similar view, but for each individual Security Aspect 6https://cwe.mitre.org/data/definitions/1173.html 7https://cwe.mitre.org/data/definitions/1348.html 63 Figure 6.10: Single vulnerability impacts on TSI using the diagnostics in the PIQUE-C#-Sec model to measure the change in TSI. (Authenticity, Availability, Authorization, Confidentiality, Accountability, Non-repudiation, and Integrity). Figure 6.11 shows the change in authenticity for each individual vulnerability within the PIQUE-C#-Sec model when evaluating Project 42. The most impactful Diagnostics when measuring the change in authenticity were CWE-259, CWE-521, SCS0015, SCS0032, SCS0033, and SCS0034. These Diagnostic descriptions are in Table 6.6. All these six Diagnostics fall under the Product Factor node Password Management. Therefore, it makes sense that these Diagnostics would impact Authenticity since Password Management links to the Security Aspect node Authenticity. Figure 6.12 shows the change in availability for each individual vulnerability within the PIQUE-C#-Sec model when evaluating Project 42. The most impactful Diagnostics when 64 Table 6.6: Most Impactful Diagnostics on Authenticity Diagnostic Description CWE-259 Use of a Hard-coded Password CWE-521 Weak Password Requirements SCS0015 Hardcoded Password SCS0032 PasswordRequiredLength Too Small SCS0033 Password Complexity SCS0034 Password RequiredLength Not Set Figure 6.11: Single vulnerability impacts on authenticity using the diagnostics in the PIQUE- C#-Sec model to measure the change in authenticity. 65 measuring the change in availability were CWE-554, SCS0017, SCS0021, SCS0022, SCS0023, SCS0024, and SCS0030. Figure 6.12: Single vulnerability impacts on availability using the diagnostics in the PIQUE- C#-Sec model to measure the change in availability. The seven Diagnostic nodes that have the greatest impact on Availability are also the same seven that have the greatest impact on TSI. This makes sense because as mentioned earlier, these seven Diagnostics map to the Product Factors View State, Request Validation, and Other. These three Product Factors aggregate upward to be three of the four total Product Factors mapped to Availability (with the other being Injection). 66 Figure 6.13 shows the change in authorization for each individual vulnerability within the PIQUE-C#-Sec model when evaluating Project 42. The most impactful Diagnostics when measuring the change in authorization was CWE-12 (ASP.NET Misconfiguration: Missing Custom Error Page). When taking a closer look at CWE-12’s documentation8, we can see that it is a member of the OWASP Top Ten 2004 for the category of Insecure Configuration Management 9. As a member of the OWASP Top Ten 2004 list, it makes sense that this CWE-12 would have an impact on Authorization. 8https://cwe.mitre.org/data/definitions/12.html 9https://cwe.mitre.org/data/definitions/731.html 67 Figure 6.13: Single vulnerability impacts on authorization using the diagnostics in the PIQUE-C#-Sec model to measure the change in authorization. Figure 6.14 shows the change in confidentiality for each individual vulnerability within the PIQUE-C#-Sec model when evaluating Project 42. The most impactful Diagnostics when measuring the change in confidentiality were CWE-554, SCS0017, SCS0021, SCS0022, SCS0023, SCS0024, and SCS0030. The seven Diagnostic nodes that have the greatest impact on Confidentiality are also the same seven that have the greatest impact on TSI. 68 Figure 6.14: Single vulnerability impacts on confidentiality using the diagnostics in the PIQUE-C#-Sec model to measure the change in confidentiality. Figure 6.15 shows the change in accountability for each individual vulnerability within the PIQUE-C#-Sec model when evaluating Project 42. The most impactful Diagnostics when measuring the change in accountability were CWE-554, SCS0017, SCS0021, SCS0022, SCS0023, SCS0024, SCS0030, CWE-611, SCS0007, SCS0011, CWE-11, CWE-129, SCS0012, and SCS0027. The first seven Diagnostics are the same as in Table 6.5, but the next seven Diagnostics descriptions are in Table 6.7. These Diagnostics are mapped to the Product Factor Other. Accountability is only mapped to one Product Factor node, which is Other. This explains why these Diagnostics impact accountability so greatly. 69 Table 6.7: Most Impactful Diagnostics on Accountability Diagnostic Description CWE-611 Improper Restriction of XML External Entity Reference SCS0007 XML eXternal Entity Injection SCS0011 Unsafe XSLT Setting Used CWE-11 ASP.NET Misconfiguration: Creating Debug Binary CWE-129 Improper Validation of Array Index SCS0012 Controller Method is Potentially Vulnerable to Authorization Bypass SCS0027 Open Redirect Figure 6.15: Single vulnerability impacts on accountability using the diagnostics in the PIQUE-C#-Sec model to measure the change in accountability. 70 Figure 6.16 shows the change in non-repudiation for each individual vulnerability within the PIQUE-C#-Sec model when evaluating Project 42. The most impactful Diagnostics when measuring the change in non-repudiation were SCS0006 (Weak Hashing Function), SCS0010 (Weak Cipher Algorithm), and SCS0013 (Potential Usage of Weak CipherMode Mode). This makes sense because all these Diagnostics fall under the Product Factor Cryptography, which aggregates upward to the Security Aspect Non-Repudiation. Figure 6.16: Single vulnerability impacts on non-repudiation using the diagnostics in the PIQUE-C#-Sec model to measure the change in non-repudiation. Figure 6.17 shows the change in integrity for each individual vulnerability within the PIQUE-C#-Sec model when evaluating Project 42. The most impactful Diagnostics when measuring the change in integrity were CWE-554, SCS0017, SCS0021, SCS0022, SCS0023, SCS0024, and SCS0030. 71 Figure 6.17: Single vulnerability impacts on integrity using the diagnostics in the PIQUE- C#-Sec model to measure the change in integrity. The seven Diagnostic nodes that have the greatest impact on Integrity are also the same seven that have the greatest impact on TSI. Now that we have visualized the impacts of single vulnerabilities for the TSI and each Security Aspect, we will isolate the Diagnostics that have the greatest impact on Project 42’s TSI and observe in one plot the impact that those Diagnostics have on all the TSI and Security Aspect node values. This is shown in Figure 6.18. The Diagnostics that had the greatest impact were CWE-554, SCS0017, SCS0021, SCS0022, SCS0023, SCS0024, and SCS0030. Figure 6.18 represents the Change in Values for all seven of these Diagnostics. All these seven Diagnostics have the same impact on all our TSI and Security Aspect node values, so Figure 6.18 is representative of all seven Diagnostics. If we made a separate plot for each Diagnostic, they would all appear the 72 same as Figure 6.18. We can see that the impact on availability is the highest (change in availability = 0.403), followed by confidentiality (change in confidentiality = 0.278), followed by TSI (change in TSI = 0.155), and then integrity (change in integrity = 0.153), just to list several of the properties in order. Figure 6.18: Top Vulnerabilities’ Impact on PIQUE-C#-Sec TSI and Security Aspect Values 73 DISCUSSION The two linear models we have fit for open and closed source projects indicate that for open source projects, the correlation between the size (lines of code) of a C# project under analysis and the TSI is statistically significant (p-value = 0.0373). However, type and the interaction between size and type are not statistically significant. This means that when creating a benchmark repository for open source C# source code projects, the size attribute must be considered because of its statistical significance. For closed source projects, we did not run statistical analysis on the data as after a visual observation of the linear regression plot, we decided that statistical analysis would not yield any interesting or significant findings. However, we did look at three closed-source projects that had lower TSIs than the rest of the closed source project population. These three projects are all named after either security or reports. This is interesting because it indicates that the closed source repositories that relate to security and reports may be more prone to lower TSIs. This is based on our initial observations and there is the potential for other similarities between these projects besides this to be the reason for their low TSIs. Our PIQUE-C#-Sec model nodes cover 13 of the CWE Top 25 Most Dangerous Software Weaknesses for 2021 list directly, and 6 additional CWEs with a “ParentOf”, “ChildOf”, or “CanPrecede” relationship. This indicates that our model has adequate coverage over the current top 25 most common and impactful issues (according to the list by MITRE). This level of coverage of current dangerous security weaknesses indicates that the PIQUE-C#-Sec model is a good hierarchical model for users because it is successfully detecting these most dangerous software weaknesses. Additionally, we found that our PIQUE-C#-Sec model TSI is most impacted by the Diagnostics in Table 6.5, which fall under the View State, and Request Validation, and Other Product Factors. This indicates that for the current hierarchical model structure with the 74 NaiveWeighting, stakeholders especially interested in these Product Factors would likely see high impacts in those areas. Our work in developing and validating the PIQUE-C#-Sec model is significant because we have created a security quality model for both practitioners and researchers alike to use to evaluate source code projects written in C# and further advance the field of security quality modeling. Implications The implication for practitioners is that PIQUE-C#-Sec is an effective security quality model to use to help assess the security of C# source code projects. This is especially true in the case of open source projects. Stakeholders can use PIQUE-C#-Sec as a security quality gate when evaluating C# source code written by government contractors to ensure that a threshold TSI value is being met. The implication for academics and researchers is that we have developed and validated the PIQUE-C#-Sec security quality model, and other operationalized PIQUE security quality models or quality models can be built based on the PIQUE-C#-Sec technologies for other specific areas of evaluation (for example, source code written in other languages). Future Work While the development and validation of the PIQUE-C#-Sec model is at a code complete stage, there are several ways in which further work could be done. This includes adding additional C# source code projects to the PIQUE-C#-Sec benchmark repository. It would be interesting to seek out closed source projects of larger size in lines of code and open and closed source projects of known poor security quality to start. Adding additional static analysis tools to the PIQUE-C#-Sec model also is also a good direction in which to pursue 75 further research. Additionally, while the process has started for packaging PIQUE-C#-Sec as a jar file, further work needs to be done on continuing to test this jar file onto stakeholder systems and deploying the packaged jar file within stakeholder continuous integration and continuous deployment (CI/CD) environments. Finally, further research needs to be conducted on using PIQUE-C#-Sec to evaluate closed source projects. We can add closed source projects into our benchmarks that are of larger size in lines of code as mentioned previously, or change the mechanisms that PIQUE-C#-Sec uses to see if any other strategies are a better fit for closed source projects (thresholding, weighting, utility functions). This may involve continuing to write new custom functions. 76 THREATS TO VALIDITY Internal Validity In this project, the threats to internal validity were the potential for confounding variables that were unaccounted for and the manual tagging of the “type” attribute in the benchmark data. Threats to the internal validity of this study are threats that are of concern when causal relations are examined. When examining whether one factor affects a second factor there is a risk that the second factor is also affected by a third factor [45]. When this third factor or confounding variable is not identified and taken into account, this creates a threat to the internal validity. In this thesis, one threat to the internal validity in this project could be confounding variables that we have not accounted for causing us to make conclusions about certain variables that do not truly have an effect on the tools’ outputs [22]. One example of this is the manual tagging of the “type” attribute in the benchmark data. For each benchmark project, the documentation and source code were observed to decide whether that project was of type application or library. This manual process could have potentially led to some “type” attributes being incorrectly assigned. External Validity In this project, the threats to the external validity were that the C# source code projects used in the benchmark repository were not collected randomly, that the PIQUE- C#-Sec model Diagnostics do not provide coverage for every security vulnerability that exists, that our model may not be applicable to source code projects that are different from the ones we benchmarked, and that the static analysis tool documentation will change as 77 more findings and rules are added into what the tools can detect. Threats to the external validity of this study are threats that are concerned with to what extent it is possible to generalize findings, and how relevant those findings are to people outside the investigated case [45]. To start, one inherent threat to the external validity in this thesis is that the C# source code projects that were collected for the benchmark repository were not collected randomly. There is no way to collect the projects randomly, and additionally we wanted to collect benchmarks that would represent the types of projects that PIQUE-C#-Sec is expected to evaluate. However, this means that any conclusions that we make will be limited to the population of projects that we have collected and cannot be extended to a broader population since there was not random sampling. Another threat to the external validity in this thesis is the fact that the PIQUE-C#- Sec model Diagnostics do not provide coverage for every security vulnerability that exists. There are security vulnerabilities that will not be detected by our PIQUE-C#-Sec model. This threat can be mitigated as the static analysis tools in the POQUE-C#-Sec model update their documentation and rules to include more known security vulnerabilities over time. Additionally, this threat can be mitigated by integrating additional static analysis tools to the model. The benchmarks chosen for the PIQUE-C#-Sec model derivation reflects the type of source code projects that we expect to apply the PIQUE-C#-Sec model to. However, this model may not be applicable to source code projects that are different from the ones we benchmarked. Additionally, a threat to the external validity in this thesis is the static analysis tool documentation changing as more findings and rules are added into what the tools can detect. If either Security Code Scan or Insider add vulnerability rules to their documentation, this would put the current PIQUE-C#-Sec model Diagnostics out of date with the full scope of 78 what the tools are able to find. This affects the study’s ability to generalize to other settings and be replicated by other users and researchers [22]. Construct Validity In this project, the threats to construct validity were the static analysis tools used, modifying the PIQUE-C#-Sec model output files to make it appear as though vulnerabilities were injected but not altering the lines of code of the project, and assessing the assumptions of independence, normality, linearity, and equal variance for our linear regression models. Threats to the construct validity of this study are threats that reflect to what extent the operational measures that are studied really represent what the researcher has in mind and the research questions [45]. A threat to the construct validity can be when data, measurements, or questions are not clear or measuring what they are intended to measure. Threats to the construct validity in this thesis are the static analysis tools used. The static analysis tools that we selected to integrate into our PIQUE-C#-Sec model may not be representative of other similar tools. Other tools may find different vulnerabilities. Additionally, developers trust the tool and often assume that the tool is correct in its detection [9]. The false positive rate for a tool is an important factor in its usefulness and is a threat to be considered with the two tools used in our model. Another threat to the construct validity of this project is that by modifying the PIQUE- C#-Sec model output files to make it appear as though vulnerabilities were injected; we did not alter the lines of code of the project even though it technically contains a new vulnerability. This affects the normalization that the PIQUE-C#-Sec model uses. However, since we are not truly injecting the vulnerabilities into the source code, we do not know for sure how many lines of code should be attributed to each vulnerability type. Additionally, assessing the assumptions of independence, normality, linearity, and equal variance for our linear regression models is a threat to the construct validity because our 79 assumptions will never be fully met. Instead, we assess to what degree the assumptions are violated. 80 CONCLUSION This thesis detailed a hierarchical security quality model for projects with C# source code, PIQUE-C#-Sec. We presented the design, development, and model validation of PIQUE-C#-Sec. We began by giving a quick background on relevant topics to the model, including security quality modeling, security metrics, and static analysis tools. Next, we gave an overview on the supporting work in this field, which consists of the prior hierarchical model designs Quamoco, QATCH, PIQUE, PIQUE-C#, and PIQUE-Bin. The research around creating and validating the PIQUE-C#-Sec model was guided by outlining our research goal in the Goal Question Metric (GQM) format from Basili [10]. Our motivation for designing and developing this security quality model is to provide a modern security quality model that provides adequate support for security characteristics. Our research goal is to design and develop a security quality model that helps stakeholders assess the security of C# source code projects. The design of the PIQUE-C#-Sec model’s hierarchical structure has a root TSI node. At the Security Aspect layer, these nodes are defined by the ISO/IEC 25010 standard and Microsoft STRIDE model. The two static analysis tools in the PIQUE-C#-Sec model are Security Code Scan and Insider, and their documented findings make up the 59 nodes in the Diagnostic layer. Each Diagnostic maps to a CWE, which allows us to aggregate the Diagnostics upward to CWEs at the Measure layer. Next, we provide an overview to the PIQUE-C#-Sec model validation and the results from exploring our three research questions. Our sensitivity analysis has three main components. First, we analyzed PIQUE-C#-Sec’s benchmark projects and the attribute data associated with those projects to determine if any attributes in C# source code projects result in a different TSI and different Security Aspect scores. We found that according to 81 the two linear models we fit for open and closed source projects, the correlation between the size (lines of code) of a C# project under analysis and the TSI is significant in open source projects. However, type and the interaction between size and type are not. For closed source projects, we did not find any attributes or interactions of attributes to be significantly correlated to the TSI. Therefore, we can conclude that when creating a benchmark repository for open source C# source code projects, the size attribute must be considered because of its statistical significance. Next, we observed how effective the selected static analysis tools within the PIQUE- C#-Sec model are at measuring and reporting security vulnerabilities from the CWE Top 25 Most Dangerous Software Weaknesses for 2021 list by comparing all 59 Diagnostic nodes in our model to the CWE Top 25 list. Our PIQUE-C#-Sec model has direct coverage for 52% of the CWE Top 25 Most Dangerous Software Weaknesses for 2021 list. When we add the related responses to this count, it increases to 19 which covers 76% of the Top 25 list. We conclude that our selected static analysis tools are effective at measuring and reporting vulnerabilities from the Top 25 list since our tools cover over a majority of the list, which represents the most dangerous weaknesses that are commonly experienced. These are impactful weaknesses that we want our PIQUE-C#-Sec model to detect, and our model detects most of them. Finally, we analyzed the impact that each single vulnerability within our model had on the TSI and Security Aspect node values. We found seven Diagnostics within the PIQUE- C#-Sec model to have the greatest impact on the TSI. These Diagnostics are CWE-554, SCS0017, SCS0021, SCS0022, SCS0023, SCS0024, and SCS0030. All these seven Diagnostics aggregate upward to the Product Factors of Request Validation, View State, and Other. These seven Diagnostics aggregate upward to CWE-1173 at the Measure layer, which is a member of the OWASP Top Ten 2021 list. These seven Diagnostics also aggregate upward 82 into the Product Factors View State, Request Validation, and Other. We then discuss the implications of these results and address some potential areas for future work for the PIQUE-C#-Sec model. The implication for practitioners is that PIQUE-C#-Sec is an effective security quality model to use to help assess the security of C# source code projects. Stakeholders can use PIQUE-C#-Sec as a security quality gate when evaluating C# source code written by government contractors. The implication for academics and researchers is that we have developed and validated the PIQUE-C#-Sec security quality model, and other operationalized PIQUE security quality models or quality models can be built based on the PIQUE-C#-Sec technologies for other specific areas of evaluation. We have successfully addressed our research goal of designing, developing, and validating a security quality model that helps stakeholders assess the security of C# source code projects. While there will always be future work to be done regarding the PIQUE-C#- Sec model, the work done so far has resulted in many improvements in hierarchical security quality modeling. 83 REFERENCES CITED 84 [1] Inl qa white paper 1.1 (case study), 2021. [2] Adewole Adewumi, Sanjay Misra, and Nicholas Omoregbe. Evaluating open source software quality models against iso 25010. In 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, pages 872–877, 2015. [3] Sabreen Ahmadjee, Carlos Joseph Mera-Gómez, and Rami Bahsoon. Assessing smart contracts security technical debts. CoRR, abs/2103.09595, 2021. [4] Kenneth Alperin, Allan Wollaber, Dennis Ross, Pierre Trepagnier, and Leslie Leonard. Risk prioritization by leveraging latent vulnerability features in a contested environment. Proceedings of the ACM Conference on Computer and Communications Security, pages 49–57, 2019. [5] Tiago L. Alves, José Pedro Correia, and Joost Visser. Benchmark-based aggregation of metrics to ratings. Proceedings - Joint Conference of the 21st International Workshop on Software Measurement, IWSM 2011 and the 6th International Conference on Software Process and Product Measurement, MENSURA 2011, pages 20–29, 2011. [6] Ross Anderson and Tyler Moore. The economics of information security. Science, 314(5799):610–613, 2006. [7] Andrew Austin and Laurie Williams. One technique is not enough: A comparison of vulnerability discovery techniques. International Symposium on Empirical Software Engineering and Measurement, pages 97–106, 2011. [8] Nathaniel Ayewah, William Pugh, David Hovemeyer, J. David Morgenthaler, and John Penix. Using static analysis to find bugs. IEEE Software, 25(5):22–29, 2008. [9] Dejan Baca, Kai Petersen, Bengt Carlsson, and Lars Lundberg. Static code analysis to detect software security vulnerabilities - does experience matter? In 2009 International Conference on Availability, Reliability and Security, pages 804–810, 2009. [10] Victor R Basili, Gianluigi Caldiera, and H Dieter Rombach. The goal question metric approach. Encyclopedia of Software Engineering, 2:528–532, 1994. [11] Barry W. Boehm. Software engineering economics. IEEE Transactions on Software Engineering, SE-10(1):4–21, 1984. [12] G. Brys, M. Hubert, and A. Struyf. A comparison of some new measures of skewness. In Rudolf Dutter, Peter Filzmoser, Ursula Gather, and Peter J. Rousseeuw, editors, Developments in Robust Statistics, pages 98–113, Heidelberg, 2003. Physica-Verlag HD. [13] B. Chess and G. McGraw. Static analysis for security. IEEE Security Privacy, 2(6):76– 79, 2004. 85 [14] Puneet Kumar Goyal and Gamini Joshi. Qmood metric sets to assess quality of java program. In 2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), pages 520–533, 2014. [15] Ilja Heitlager, Tobias Kuipers, and Joost Visser. A practical model for measuring maintainability. In 6th International Conference on the Quality of Information and Communications Technology (QUATIC 2007), pages 30–39, 2007. [16] Hannes Holm and Khalid Khan Afridi. An expert-based investigation of the Common Vulnerability Scoring System. Computers and Security, 53:18–30, 2015. [17] Hannes Holm, Mathias Ekstedt, and Dennis Andersson. Empirical analysis of system- level vulnerability metrics through actual attacks. IEEE Transactions on Dependable and Secure Computing, 9(6):825–837, 2012. [18] Chien-Cheng Huang, Feng-Yu Lin, Frank Yeong-Sung Lin, and Yeali S Sun. A novel approach to evaluate software vulnerability prioritization. The Journal of Systems and Software, 86:2822–2840, 2013. [19] Michael T. Hunter, Russell J. Clark, and Frank S. Park. Security issues with the ip multimedia subsystem (ims). In Proceedings of the 2007 Workshop on Middleware for Next-Generation Converged Networks and Applications, MNCNA ’07, New York, NY, USA, 2007. Association for Computing Machinery. [20] ISO/IEC 25010:2011 Systems and software engineering: Systems and software Quality Requirements and Evaluation (SQuaRE)—System and software quality models. Standard, International Organization for Standardization, Geneva. [21] Clemente Izurieta, Isaac Griffith, and Chris Huvaere. An Industry Perspective to Comparing the SQALE and Quamoco Software Quality Models. International Symposium on Empirical Software Engineering and Measurement, 2017-Novem:287–296, 2017. [22] Andrew Johnson. The analysis of binary file security using a hierarchical quality model. Master’s thesis, Montana State University, 12 2021. [23] Foutse Khomh and Yann-Gaël Guéhéneuc. Dequalite: Building design-based software quality models. In Proceedings of the 15th Conference on Pattern Languages of Programs, PLoP ’08, New York, NY, USA, 2008. Association for Computing Machinery. [24] Barbara Kitchenham, Steve Linkman, Alberto Pasquini, and Vincenzo Nanni. The squid approach to defining a quality model. Software Quality Journal, 6(3):211–233, 1997. Copyright - Chapman and Hall 1997; Last updated - 2021-09-11. [25] Michael Kläs, Constanza Lampasona, and Jürgen Münch. Adapting software quality models: Practical challenges, approach, and first empirical results. Proceedings - 37th 86 EUROMICRO Conference on Software Engineering and Advanced Applications, SEAA 2011, pages 341–348, 2011. [26] James A. Kupsch, Elisa Heymann, Barton Miller, and Vamshi Basupalli. Bad and good news about using software assurance tools. Software - Practice and Experience, 47(1):143–156, 2017. [27] Jean-Louis Letouzey. The sqale method for evaluating technical debt. 06 2012. [28] R. Marinescu. Measurement and quality in object-oriented design. In 21st IEEE International Conference on Software Maintenance (ICSM’05), pages 701–704, 2005. [29] R Martin. Common weakness enumeration (cwe v1.8). National Cyber Security Division, US Dept. Of Homeland Security, 2010. [30] R.A. Martin. Managing vulnerabilities in networked systems. Computer, 34(11):32–38, 2001. [31] G. McGraw and J. Steven. Software [in]security: Comparing apples, oranges, and aardvarks (or, all static analysis tools are not created equal), January 2011. [32] Gary McGraw. Software security: Building security in. In 2006 17th International Symposium on Software Reliability Engineering, pages 6–6, 2006. [33] Daniel Mellado, Eduardo Fernández-Medina, and Mario Piattini. A comparison of software design security metrics. ACM International Conference Proceeding Series, (c):236–242, 2010. [34] Padmalata Nistala, Kesav Vithal Nori, and Raghu Reddy. Software quality models: A systematic mapping study. In 2019 IEEE/ACM International Conference on Software and System Processes (ICSSP), pages 125–134, 2019. [35] Andreas L. Opdahl and Guttorm Sindre. Experimental comparison of attack trees and misuse cases for security threat identification. Information and Software Technology, 51(5):916–932, 2009. SPECIAL ISSUE: Model-Driven Development for Secure Information Systems. [36] F. Ramsey and D. Schafer. The Statistical Sleuth: A Course in Methods of Data Analysis. Cengage Learning, 2012. [37] David Rice. An extensible, hierarchical architecture for analysis of software quality assurance. Master’s thesis, Montana State University, 12 2020. [38] Thomas Saaty. Decision making with the analytic hierarchy process. Int. J. Services Sciences Int. J. Services Sciences, 1:83–98, 01 2008. [39] Nahm Francis Sahngun. Nonparametric statistical tests for the continuous data: the basic concept and the practical use. Korean J Anesthesiol, 69(1):8–14, 2016. 87 [40] Miltiadis G. Siavvas, Kyriakos C. Chatzidimitriou, and Andreas L. Symeonidis. Qatch - an adaptive framework for software product quality assessment. Expert Systems with Applications, 86:350–366, 2017. [41] K. Sivakumar and K. Garg. Constructing a “common cross site scripting vulnerabilities enumeration (cxe)” using cwe and cve. In Patrick McDaniel and Shyam K. Gupta, editors, Information Systems Security, pages 277–291, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg. [42] Stefan Wagner, Andreas Goeb, Lars Heinemann, Michael Kläs, Constanza Lampasona, Klaus Lochmann, Alois Mayr, Reinhold Plösch, Andreas Seidl, Jonathan Streit, and Adam Trendowicz. Operationalised product quality models and assessment: The quamoco approach. Information and Software Technology, 62:101–123, jun 2015. [43] Stefan Wagner, Klaus Lochmann, Lars Heinemann, Michael Klas, Adam Trendowicz, Reinhold Plosch, Andreas Seidi, Andreas Goeb, and Jonathan Streit. The quamoco product quality modelling and assessment approach. 2012 34th International Conference on Software Engineering (ICSE), Jun 2012. [44] Stefan Wagner, Klaus Lochmann, Sebastian Winter, Florian Deissenboeck, Elmar Juergens, Markus Herrmannsdoerfer, Lars Heinemann, Michael Kläs, Adam Trendowicz, Jens Heidrich, et al. The quamoco quality meta-model. 2012. [45] C. Wohlin, P. Runeson, M. Höst, M.C. Ohlsson, B. Regnell, and A. Wesslén. Experimentation in Software Engineering. Computer Science. Springer Berlin Heidelberg, 2012. 88 APPENDICES 89 APPENDIX A EXPLORATORY STUDY 90 Before designing the current PIQUE-C#-Sec model that is now in place, an initial exploratory study was conducted on a bottom-up design for the hierarchical model. This involved running PIQUE-C# on a single project repository and identifying that project’s output findings. Then, the model was manually mapped upward into measures, product factors, and quality aspects that made sense for that specific project. While this allowed for precision in the hierarchical model with respect to the project repository being analyzed, this specialized PIQUE-C# model is not generalizable. It will not work for any other project, and therefore every project that needs to be analyzed will need to have a model design file manually created. Approach The study design for this approach is summarized in Figure A.1. After manually creating and calibrating the PIQUE-C# model, we conduct changes to both the model and the source code and observe if there are changes to the TQI. Our research questions focus on these two types of changes and their results. Figure A.1: PIQUE-C# Study Design 91 The PIQUE-C# model is tuned using two research questions: • RQ1: Does removing nodes from the model affect the TQI? • RQ2: Does introducing vulnerabilities into the source code affect the TQI? Based on these research questions, the following hypotheses have been formulated: For RQ1: • H0: Removing nodes from the model is associated with no observed difference in the TQI score. • Ha: Removing nodes from the model is associated with an observed difference in the TQI score. For RQ2: • H0: Introducing vulnerabilities into the source code is associated with no observed difference in the TQI score. • Ha: Introducing vulnerabilities into the source code is associated with an observed difference in the TQI score. RQ1 is interesting because if a government organization makes a hierarchical model for their source code but then later decides that they no longer want to include a node in the quality assessment, they can remove that node and know the sensitivity that node has on the TQI. RQ2 is interesting because it allows us to conduct sensitivity testing on the PIQUE-C# model. Introducing vulnerabilities should impact the TQI, as the model’s TQI is aggregated upward from tool output. These hypotheses are tested by manually changing a PIQUE-C# model. After Roslynator executed on the source code, we analyzed the tool output to view the list of diagnostics found. From here, the leaf nodes of the hierarchical model were populated. We manually mapped the PIQUE model diagnostics upward to Measures, Product Factors, and Quality Aspects. This model structure can be seen in Figure A.2. Validation of the PIQUE-C# model is not complete without calibration. This includes making modifications to the benchmarking and weighting processes which also impacts the model’s threshold values. PIQUE-C# utilizes the weighting and weighting strategies from PIQUE. Table A.1 shows the PIQUE-C# model version numbers and the resulting TQI throughout the calibration process. Version 1 considered running the original source code with no modifications and using the project root as the benchmark for model derivation. This resulted in a perfect score of 1.0, which is inaccurate because it indicates that the source code had no tool output findings for warnings or vulnerabilities. A TQI score can range between the values of 0.0 and 1.0. Scores of exactly 0.0 or 1.0 are likely due to incorrect model calibration. 92 Figure A.2: PIQUE-C# hierarchical model structure based on real-world government source code. Table A.1: PIQUE-C# calibration versions and the difference in TQI relative to Version 1. Version TQI TQI Difference 1 1 N/A 2 0.175 -0.825 3 0.245 +0.07 4 0.279 +0.034 5 0.546 +0.267 93 Version 2 included several small benchmark repositories that the project was compared against. Now that the model derivation has been benchmarked against other projects other than just itself, these benchmarked findings can be compared with the project root’s findings for a more accurate score. The score did drop 0.825, but this non-0.0 and non-1.0 score is an indicator that this score is more accurate. Version 3 incorporated more benchmarking repositories. Version 2’s benchmarking only resulted in 8 findings. This resulted in three of the project’s six diagnostics to have [0.0, 0.0] threshold values, and therefore caused nodes at the Measure layer to have a value of 0.0. By adding larger benchmarking projects, the benchmarking process found 9,116 diagnostics in Version 3. This eliminated two of the [0.0, 0.0] threshold values and therefore increased the TQI. Version 4 hard coded in the edge weights for the TQI and Quality Aspect layers. The uncalibrated model used equal weighting, which caused all nodes to equally weight their edge weights by the number of their children. In a calibrated model, the stakeholder should be able to weigh certain Quality Aspects and Product Factors more heavily than others, indicating that those aspects and factors are more important. The edge weights in this model were manually adjusted to test if the new weights made a difference in the TQI. This was completed by making the Security node 0.5 of the TQI’s weight, followed by Performance and Maintainability both at 0.2. This leaves Compatibility at 0.1. These were assigned based on which Quality Aspects we deemed most important to the stakeholders for this project. Then, for Quality Aspects Maintainability and Security, since they have more than one child node, we also manually adjusted these child edge weights based on what we think the stakeholder would view as more important Product Factors. This process would normally be completed by the stakeholders. This caused a change in the TQI as a result. Finally, Version 5 hard coded in threshold values for the last remaining diagnostic having [0.0, 0.0] threshold values. This was done by observing the other threshold values within the model and creating a comparable threshold here that made sense for the number of findings associated with this diagnostic. This means that the diagnostic was not found in the 9,116 diagnostics found during the benchmarking process. This can be fixed by including more benchmark projects in hopes of finding this particular diagnostic. Another approach would be to bootstrap the value of the diagnostic to generate more data to use for the threshold values. By hard coding in the threshold values, this introduces a threat to construct validity; however, due to our knowledge with this model, we believe that the values chosen are a good choice for the thresholds. This model can be seen in Figure A.2. RQ1: The variable of interest is the difference in TQIs among versions in Table A.2. These versions all include removing various nodes from the hierarchical model shown in Figure A.2. A random number generator online is used to choose two nodes from each level (Quality Aspect, Product Factor, and Measure). The random number generator is used to create random assignment so that any differences found in the TQI are attributable to the removal of model nodes and not to a confounding variable. 94 One version correlates to one node being removed. The TQI Difference column in Table A.2 compares that version’s TQI to Version 5’s TQI of 0.546. Table A.2: PIQUE-C# versions and the difference in TQI relative to Version 5 when removing nodes from the model. Version Node Removed TQI TQI Difference 6 Measure Naming 0.546 0 7 Measure Formatting 0.478 -0.068 8 Product Factor Runtime 0.493 -0.053 9 Product Factor Integrity 0.790 +0.244 10 Quality Aspect Compatibility 0.495 -0.051 11 Quality Aspect Maintainability 0.528 -0.018 The data in Table A.2 has a minimum value of -0.0680, a mean of 0.0090, and a maximum value of 0.2440. The standard deviation of the data is 0.1178. We will discuss more details of the individual points when we conduct a visual observation on a scatterplot of the data. RQ2: The variable of interest is the difference in TQIs among versions in Table A.3. These versions all include intentionally introducing vulnerabilities into the source code. The calibrated model as shown in Figure A.2 has six findings. Since these findings were already built into the model, these six were each introduced into a version number. The source code consists of three C# files, so a random number generator online is used to choose which vulnerability would be introduced to which file. The random number generator is used to create random assignment so that any differences found in the TQI are attributable to the introduction of vulnerabilities into the source code and not to a confounding variable. Each file is assigned two vulnerabilities. The TQI Difference column in Table A.3 is comparing that version’s TQI to Version 5’s TQI of 0.546. The data in Table A.3 has a minimum value of -0.091, a mean of -0.022, and a maximum value of 0.044. The standard deviation of the data is 0.0566. We will discuss more details of the individual points when we conduct a visual observation on a scatterplot of the data. 95 Table A.3: PIQUE-C# versions and the difference in TQI relative to Version 5 when introducing vulnerabilities into the source code. Version Vulnerability introduced TQI TQI Difference 12 CS0649 0.590 +0.044 13 RCS1036 0.532 -0.014 14 RCS1163 0.455 -0.091 15 RCS1213 0.455 -0.091 16 SCS0004 0.556 +0.01 17 VSTHRD 200 0.556 +0.01 Analysis Plan The original analysis plan for both research questions was to analyze the data using a one-sample t-test to generate a mean value and a p-value. However, upon reviewing this analysis plan, we decided it was essential to run non-parametric tests on the data as well due to the low sample size of six data points. With such a low sample size, the parametric t-test is not very robust. Parametric tests require the assumption of normality, which means that the distribution of the sample means is normally distributed [39]. However, nonparametric tests do not require that the normality assumption. This is because nonparametric tests are the statistical methods based on signs and ranks [39]. Therefore, we have added a non-parametric analysis for the data which is a one-sample Wilcoxon signed-rank test, which will be detailed in the Non-Parametric Tests section. The following section will detail the descriptive interpretations of the results. Visual Observation Figure A.3 and Figure A.4 show the results from our two research questions. Figure A.3 correlates to the removal of PIQUE-C# model nodes while Figure A.4 correlates to the introduction of vulnerabilities in the PIQUE-C# project source code. We created Figure A.3 to visually assess our data and see if there are any points of interest. This scatterplot is difficult to visually assess and gather any insight from as there are only six data points. However, we do want to point out two points of visual interest: Version 6 and Version 9. Version 6 removed the Measure Naming. Since Naming does not have any Diagnostics, this removal does not change the overall TQI which we can see in the plot by the TQI difference relative to Version 1 being 0. Version 9 removed the Product Factor Integrity. Removing Integrity leads to the Quality Aspect it aggregates to (Security) increasing from a value of 0.511 to 1.0. Additionally, Security is half of the TQI’s total weight, so this greatly 96 impacts the TQI difference relative to Version 1 which we can see in Figure A.3 by Version 9 having a high TQI value in comparison to the other versions in the scatterplot. Some descriptive statistics for the data in Figure A.3 are that it has a minimum value of -0.0680, a mean of 0.0090, and a maximum value of 0.2440. The standard deviation of the data is 0.1178. Figure A.3: Scatterplot with the difference in TQI for each PIQUE-C# version number relative to PIQUE-C# Version 1 when removing nodes from the PIQUE-C# model. We created Figure A.4 to visually assess our data and see if there are any points of interest. This scatterplot is also difficult to visually assess and gather any insight from as there are only six data points. However, we do want to point out two areas of visual interest: the fact that Versions 14 and 15 have the same TQI difference, and the fact that Versions 16 and 17 have the same TQI difference. Version 14 injected the vulnerability RCS1163, which according to the Roslynator Analyzers documentation1 is an “Unused parameter”. Version 15 injected the vulnerability RCS1213 which is “Remove unused member declaration”. These two vulnerabilities, when injected into the model, had the same resulting TQI impact after aggregating upward through the model nodes. 1https://github.com/JosefPihrt/Roslynator/blob/master/src/Analyzers/README.md 97 Similarly, this is the case for Versions 16 and 17. Version 16 injected the vulnerability SCS0004, which according to the Security Code Scan documentation2 is “Certificate Validation Disabled”. Version 17 injected the vulnerability VSTHRD 200, which according to the VS-Threading Analyzers documentation3 is “Use Async suffix for async”. These two vulnerabilities, when injected into the model, had the same resulting TQI impact after aggregating upward through the model nodes. Some descriptive statistics for the data in Figure A.4 are that it has a minimum value of -0.091, a mean of -0.022, and a maximum value of 0.044. The standard deviation of the data is 0.0566. Figure A.4: Scatterplot with the difference in TQI for each PIQUE-C# version number relative to PIQUE-C# Version 1 when introducing vulnerabilities into the source code. Non-Parametric Tests The non-parametric analysis plan for both research questions is to analyze the data using a one-sample Wilcoxon signed-rank test to generate a p-value. This test uses the ranks of the magnitudes of the differences in addition to their signs [36]. Since ranks are used, the test is resistant to outliers. 2https://security-code-scan.github.io/#Rules 3https://github.com/microsoft/vs-threading/blob/main/doc/analyzers/VSTHRD200.md 98 After running a one-sample Wilcoxon signed-rank test on the RQ1 data, we found that there is little to no evidence against the null hypothesis that there is a difference in the model’s TQI when removing nodes from the model (p-value = 0.5896). This would lead us to fail to reject the null hypothesis. After running a one-sample Wilcoxon signed-rank test on the RQ2 data, we found that there is little to no evidence against the null hypothesis that there is a difference in the model’s TQI when introducing vulnerabilities into the source code (p-value = 0.5271). This leads us to fail to reject the null hypothesis. We attribute the fact that both RQs are not statistically significant to the study’s very low sample size. Both research questions had sample sizes of six data points. The power of a test is the probability of rejecting the null hypothesis, and the probability of finding strong evidence for the alternative hypothesis. One way to increase power is to increase the study’s sample size. If we were to repeat this study again, we would obtain a sample size of at least 30 data points for each research question. Threats to Validity Internal Validity In this project, the threats to internal validity were the lack of random assignment in one of our research questions and our small sample size. Threats to the internal validity are threats that are of concern when causal relations are examined. When examining whether one factor affects a second factor there is a risk that the second factor is also affected by a third factor [45]. When this third factor or confounding variable is not identified and taken into account, this creates a threat to the internal validity. In this exploratory project, a threat to the internal validity is the lack of random assignment in RQ2. This introduces the potential for confounding variables and means that causation cannot be concluded between introducing vulnerabilities in the source code and the TQI in the second research question. The tuning of the model from the version numbers in Table A.2 and Table A.3 may not be enough data for proper sensitivity testing. The sample size is small, and other variables may have caused the change in the TQI. The small sample size for both of our research questions is also a threat to the internal validity because the lack of data points makes it difficult to assess the outlier assumption for the analysis. Additionally, all possible combinations of changes were not made. External Validity In this project, the threats to external validity were that we did not use random sampling and our small sample of source code is not representative of all source code. Threats to the external validity are threats that are concerned with to what extent it is possible to generalize findings, and how relevant those findings are to people outside the investigated case [45]. 99 A threat to the external validity in this project is that any cause-and-effect relationship we observe between the changes made to the model or the source code and the resulting TQI difference cannot be extended to a broader population of source code outside of this study because we did not use random sampling. Due to our method of sampling, statistical generalizations will not be possible, so therefore the results will not apply in other contexts. Another threat to the external validity in this study is that the small sample of government contractor source code used in the case study will not be representative of the real-world number of government contractor source code projects. Construct Validity In this project the threat to construct validity was the decision to hard code in one of the threshold values and assessing the assumptions. Threats to the construct validity are threats that reflect to what extent the operational measures that are studied really represent what the researcher has in mind and the research questions [45]. A threat to the construct validity can occur when data, measurements, or questions are not clear or measuring what they are intended to measure. The decision to hard code in one of the threshold values is a threat to the construct validity in this project. This could be mitigated by incorporating more benchmark repositories into the project or by using bootstrapping to generate data for the benchmarking to use. Additionally, assessing the assumptions is a threat to the construct validity because our assumptions will never be fully met. Instead, we assess to what degree the assumptions are violated. Conclusion The PIQUE-C# model was calibrated, and then analysis was performed by tuning the model for two research questions: Does removing nodes from the model affect the TQI, and does introducing vulnerabilities into the source code affect the TQI? We fail to reject our null hypothesis for RQ1 that removing nodes from the model causes no difference in the TQI, meaning that it is likely that what we observe is happening based on chance alone and is not statistically significant. We also fail to reject our null hypothesis for RQ2 that introducing vulnerabilities into the source code causes no difference in the TQI, meaning that it is likely that what we observe is happening based on change alone and is not statistically significant. This project could be extended by testing the research questions on multiple models. Additionally, more extensive testing could take place by removing more nodes and by introducing more vulnerabilities to get a larger sample size. The project could also be extended by continuing research on and developing a security quality model. 100 APPENDIX B TOOL DIAGNOSTICS 101 Table B.1: Tool Diagnostics Tool Name Diagnostic ID Description Security Code Scan SCS0001 Command Injection Security Code Scan SCS0002 SQL Injection Security Code Scan SCS0003 XPath Injection Security Code Scan SCS0007 XML eXternal Entity Injection (XXE) Security Code Scan SCS0018 Path Traversal Security Code Scan SCS0029 Cross-Site Scripting (XSS) Security Code Scan SCS0026 LDAP Distinguished Name Injection Security Code Scan SCS0031 LDAP Filter Injection Security Code Scan SCS0004 Certificate Validation Disabled Security Code Scan SCS0005 Weak Random Number Generator Security Code Scan SCS0006 Weak hashing function Security Code Scan SCS0010 Weak cipher algorithm Security Code Scan SCS0013 Potential usage of weak CipherMode mode Security Code Scan SCS0008 Cookie Without SSL Flag Security Code Scan SCS0009 Cookie Without HttpOnly Flag Security Code Scan SCS0023 View State Not Encrypted Security Code Scan SCS0024 View State MAC Disabled Security Code Scan SCS0017 Request Validation Disabled (Attribute) Security Code Scan SCS0021 Request Validation Disabled (Configuration File) Security Code Scan SCS0030 Request validation is enabled only for pages Security Code Scan SCS0015 Hardcoded Password Security Code Scan SCS0034 Password RequiredLength Not Set Security Code Scan SCS0032 Password RequiredLength Too Small Security Code Scan SCS0033 Password Complexity Security Code Scan SCS0011 Unsafe XSLT setting used Security Code Scan SCS0012 Controller method is potentially vulnerable Security Code Scan SCS0016 Cross-Site Request Forgery (CSRF) Security Code Scan SCS0019 OutputCache Conflict Security Code Scan SCS0022 Event Validation Disabled Security Code Scan SCS0027 Open Redirect Security Code Scan SCS0028 Insecure Deserialization 102 Table B.2: Tool Diagnostics Continued Tool Name Diagnostic ID Description Insider CWE-78 OS Injection Insider CWE-89 SQL Injection Insider CWE-643 XPath Injection Insider CWE-90 LDAP Injection Insider CWE-79 Cross-Site Scripting Insider CWE-611 XML eXternal Entity Reference (XXE) Insider CWE-310 Cryptographic Issues Insider CWE-259 Hardcoded Password Insider CWE-521 Weak Password Requirements Insider CWE-614 Sensitive Cookie in HTTPS Session Insider CWE-330 Use of Insufficiently Random Values Insider CWE-326 Inadequate Encryption Strength Insider CWE-311 Missing Encryption of Sensitive Data Insider CWE-554 ASP.NET Misconfiguration Insider CWE-20 Improper Input Validation Insider CWE-524 Use of Cache Containing Sensitive Information Insider CWE-377 Insecure Temporary Files Insider CWE-23 Relative Path Traversal Insider CWE-200 Exposure of Sensitive Information to Unauthorized Actor Insider CWE-502 Deserialization of Untrusted Data Insider CWE-11 ASP.NET Misconfiguration: Creating Debug Binary Insider CWE-129 Improper Validation of Array Index Insider CWE-316 Cleartext Storage of Sensitive Information in Memory Insider CWE-312 Cleartext Storage of Sensitive Information Insider CWE-12 ASP.NET Misconfiguration: Missing Custom Error Page Insider CWE-787 Out-of-bounds Write Insider CWE-352 Cross-Site Request Forgery (CSRF) Insider CWE-532 Insertion of Sensitive Information into Log Files 103 APPENDIX C CWE TOP 25 MOST DANGEROUS SOFTWARE WEAKNESSES FOR 2021 104 Table C.1: CWE Top 25 Most Dangerous Software Weaknesses for 2021 Rank ID In PIQUE-C#-Sec? 1 CWE-787 Yes 2 CWE-79 Yes 3 CWE-125 By relation 4 CWE-20 Yes 5 CWE-78 Yes 6 CWE-89 Yes 7 CWE-416 No 8 CWE-22 Yes 9 CWE-352 Yes 10 CWE-434 No 11 CWE-306 By relation 12 CWE-190 By relation 13 CWE-502 Yes 14 CWE-287 Yes 15 CWE-476 No 16 CWE-798 By relation 17 CWE-119 Yes 18 CWE-862 No 19 CWE-276 No 20 CWE-200 Yes 21 CWE-522 By relation 22 CWE-732 By relation 23 CWE-611 Yes 24 CWE-918 No 25 CWE-77 Yes 105 APPENDIX D BENCHMARK ATTRIBUTE TABLE 106 Table D.1: Benchmark Attribute Table Project Name Size (loc) Source Type AutoMapper 91429 Open Library BenchmarkDotNet 64498 Open Library choco 37695 Open Application Electron.NET 7455 Open Application Project 1 319 Closed Library Project 2 873 Closed Library Project 3 3049 Closed Application Project 4 208 Closed Application Project 5 11 Closed Library Project 6 599 Closed Library Project 7 13654 Closed Application Project 8 4873 Closed Library Project 9 1953 Closed Application Project 10 306 Closed Library Project 11 3133 Closed Application Project 12 423 Closed Library Project 13 53 Closed Library Project 14 364 Closed Library Project 15 46 Closed Library Project 16 2643 Closed Application Project 17 3510 Closed Library Project 18 45 Closed Application Project 19 3112 Closed Application Project 20 150 Closed Application Project 21 93 Closed Library Project 22 11 Closed Library Project 23 3779 Closed Application Project 24 89 Closed Application Project 25 3223 Closed Application Project 26 1631 Closed Library Project 27 2171 Closed Library Project 28 2621 Closed Library Project 29 2795 Closed Library Project 30 87 Closed Library Project 31 2174 Closed Library 107 Table D.2: Benchmark Attribute Table Continued Project 32 231 Closed Library Project 33 157 Closed Library Project 34 1984 Closed Library Project 35 48 Closed Library Project 36 65 Closed Library Project 37 1670 Closed Library Project 38 1764 Closed Library Project 39 31 Closed Library Project 40 130 Closed Library Project 41 549 Closed Application Project 42 1731 Closed Library Project 43 2701 Closed Application Project 44 160 Closed Library Project 45 2696 Closed Library Project 46 2455 Closed Library example-voting-app 652 Open Application FASTER 41827 Open Library FluentTerminal 12645 Open Application graphql-dotnet 41998 Open Library Hangfire 148574 Open Library iotedge 141812 Open Application jellyfin 144467 Open Application mRemoteNG 48073 Open Library Ocelot 45482 Open Application Ombi 36168 Open Application OpenRA 127159 Open Application Opserver 33472 Open Application Polly 182463 Open Library PushSharp 3448 Open Library ql 20607 Open Library RestSharp 29629 Open Application ScreenToGif 48303 Open Application server 35928 Open Application ServiceStack 286151 Open Library SparkleShare 10497 Open Application workflow-core 20661 Open Application Wox 12157 Open Application 108 APPENDIX E TSI/SECURITY ASPECT IMPACT TABLE 109 Table E.1: TSI/Security Aspect Impact Table Injected Diagnostic TSI Authenticity Availability Authorization Confidentiality Accountability Non-repudiation Integrity CWE-11 0.0476190 0.0277778 0.0277778 0.0555556 0.0277778 0.1111111 0.0555556 0.0277778 CWE-12 0.0297619 0.0416667 0.0000000 0.0833333 0.0416667 0.0000000 0.0000000 0.0416667 CWE-129 0.0476190 0.0277778 0.0277778 0.0555556 0.0277778 0.1111111 0.0555556 0.0277778 CWE-20 0.0357143 0.0000000 0.1250000 0.0000000 0.0000000 0.0000000 0.0000000 0.1250000 CWE-200 0.0054872 0.0049244 0.0019698 0.0098488 0.0049244 0.0078790 0.0039395 0.0049244 CWE-23 0.0054872 0.0049244 0.0019698 0.0098488 0.0049244 0.0078790 0.0039395 0.0049244 CWE-259 0.0178571 0.1250000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 CWE-311 0.0104285 0.0104285 0.0000000 0.0208570 0.0312856 0.0000000 0.0000000 0.0104285 CWE-312 0.0002299 0.0003219 0.0000000 0.0006438 0.0003219 0.0000000 0.0000000 0.0003219 CWE-316 0.0002299 0.0003219 0.0000000 0.0006438 0.0003219 0.0000000 0.0000000 0.0003219 CWE-326 0.0104285 0.0104285 0.0000000 0.0208570 0.0312856 0.0000000 0.0000000 0.0104285 CWE-330 0.0104285 0.0104285 0.0000000 0.0208570 0.0312856 0.0000000 0.0000000 0.0104285 CWE-352 0.0054061 0.0048516 0.0019407 0.0097033 0.0048516 0.0077626 0.0038813 0.0048516 CWE-377 0.0054872 0.0049244 0.0019698 0.0098488 0.0049244 0.0078790 0.0039395 0.0049244 CWE-502 0.0032810 0.0019139 0.0019139 0.0038278 0.0019139 0.0076556 0.0038278 0.0019139 CWE-521 0.0178571 0.1250000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 CWE-524 0.0054872 0.0049244 0.0019698 0.0098488 0.0049244 0.0078790 0.0039395 0.0049244 CWE-532 0.0074497 0.0000000 0.0000000 0.0000000 0.0521480 0.0000000 0.0000000 0.0000000 CWE-554 0.1547619 0.0277778 0.4027778 0.0555556 0.2777778 0.1111111 0.0555556 0.1527778 CWE-611 0.0619048 0.0277778 0.0777778 0.0555556 0.0277778 0.1111111 0.0555556 0.0777778 CWE-614 0.0104285 0.0104285 0.0000000 0.0208570 0.0312856 0.0000000 0.0000000 0.0104285 CWE-643 0.0142857 0.0000000 0.0500000 0.0000000 0.0000000 0.0000000 0.0000000 0.0500000 CWE-78 0.0142857 0.0000000 0.0500000 0.0000000 0.0000000 0.0000000 0.0000000 0.0500000 CWE-787 0.0018780 0.0026293 0.0000000 0.0052585 0.0026293 0.0000000 0.0000000 0.0026293 CWE-79 0.0041950 0.0000000 0.0146826 0.0000000 0.0000000 0.0000000 0.0000000 0.0146826 CWE-89 0.0142857 0.0000000 0.0500000 0.0000000 0.0000000 0.0000000 0.0000000 0.0500000 CWE-90 0.0142857 0.0000000 0.0500000 0.0000000 0.0000000 0.0000000 0.0000000 0.0500000 SCS0001 0.0142857 0.0000000 0.0500000 0.0000000 0.0000000 0.0000000 0.0000000 0.0500000 SCS0002 0.0142857 0.0000000 0.0500000 0.0000000 0.0000000 0.0000000 0.0000000 0.0500000 SCS0003 0.0142857 0.0000000 0.0500000 0.0000000 0.0000000 0.0000000 0.0000000 0.0500000 SCS0004 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 SCS0005 0.0072780 0.0169820 0.0000000 0.0000000 0.0000000 0.0000000 0.0339641 0.0000000 SCS0006 0.0169198 0.0394796 0.0000000 0.0000000 0.0000000 0.0000000 0.0789591 0.0000000 SCS0007 0.0619048 0.0277778 0.0777778 0.0555556 0.0277778 0.1111111 0.0555556 0.0777778 SCS0008 0.0104285 0.0104285 0.0000000 0.0208570 0.0312856 0.0000000 0.0000000 0.0104285 SCS0009 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 SCS0010 0.0169198 0.0394796 0.0000000 0.0000000 0.0000000 0.0000000 0.0789591 0.0000000 SCS0011 0.0619048 0.0277778 0.0777778 0.0555556 0.0277778 0.1111111 0.0555556 0.0777778 SCS0012 0.0476190 0.0277778 0.0277778 0.0555556 0.0277778 0.1111111 0.0555556 0.0277778 SCS0013 0.0169198 0.0394796 0.0000000 0.0000000 0.0000000 0.0000000 0.0789591 0.0000000 SCS0015 0.0178571 0.1250000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 SCS0016 0.0108933 0.0097760 0.0039104 0.0195521 0.0097760 0.0156416 0.0078208 0.0097760 SCS0017 0.1547619 0.0277778 0.4027778 0.0555556 0.2777778 0.1111111 0.0555556 0.1527778 SCS0018 0.0142857 0.0000000 0.0500000 0.0000000 0.0000000 0.0000000 0.0000000 0.0500000 SCS0019 0.0054872 0.0049244 0.0019698 0.0098488 0.0049244 0.0078790 0.0039395 0.0049244 SCS0021 0.1547619 0.0277778 0.4027778 0.0555556 0.2777778 0.1111111 0.0555556 0.1527778 SCS0022 0.1547619 0.0277778 0.4027778 0.0555556 0.2777778 0.1111111 0.0555556 0.1527778 SCS0023 0.1547619 0.0277778 0.4027778 0.0555556 0.2777778 0.1111111 0.0555556 0.1527778 SCS0024 0.1547619 0.0277778 0.4027778 0.0555556 0.2777778 0.1111111 0.0555556 0.1527778 SCS0026 0.0142857 0.0000000 0.0500000 0.0000000 0.0000000 0.0000000 0.0000000 0.0500000 SCS0027 0.0476190 0.0277778 0.0277778 0.0555556 0.0277778 0.1111111 0.0555556 0.0277778 SCS0028 0.0032810 0.0019139 0.0019139 0.0038278 0.0019139 0.0076556 0.0038278 0.0019139 SCS0029 0.0041950 0.0000000 0.0146826 0.0000000 0.0000000 0.0000000 0.0000000 0.0146826 SCS0030 0.1547619 0.0277778 0.4027778 0.0555556 0.2777778 0.1111111 0.0555556 0.1527778 SCS0031 0.0142857 0.0000000 0.0500000 0.0000000 0.0000000 0.0000000 0.0000000 0.0500000 SCS0032 0.0178571 0.1250000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 SCS0033 0.0178571 0.1250000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 SCS0034 0.0178571 0.1250000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000