What Is Software Due Diligence? A Practical Overview for Buyers and Sellers

Q: What is software due diligence?

Software due diligence is the process of evaluating the technical health, security posture, legal compliance, and engineering team risk of a software company before an acquisition or investment. It covers code quality, architecture scalability, IP ownership, open source license compliance, security vulnerabilities, and critically, the human factor: whether the team that maintains the software will remain capable and engaged after the deal closes.

Introduction
What Software Due Diligence Covers
The Due Diligence Timeline
How Software Due Diligence Affects Deal Terms
Automating Parts of the Process
- Engineering Team Risk with ContributorIQ
- Dependency Disclosure with DependencyDesk
Common Pitfalls
Conclusion

Introduction

Software due diligence is the process of evaluating a software company's technology, team, security posture, and intellectual property before completing an acquisition. It answers a question that financial statements cannot: is the technology actually worth what the seller claims, and will it continue to function and generate value after the deal closes?

The process sits alongside financial due diligence, legal due diligence, and commercial due diligence as one of several workstreams that inform an acquisition decision. But software due diligence carries a unique challenge. Unlike physical assets that can be inspected and appraised with established methods, software is abstract. Its value depends on code quality, the people who maintain it, the security of the systems running it, and the legal right to use every component it contains.

For buyers, software due diligence reduces the risk of expensive surprises after closing. For sellers, a clean due diligence process accelerates the deal timeline and strengthens negotiating position. Both sides benefit when the process is thorough and systematic.

What Software Due Diligence Covers

Software due diligence spans several areas, each examining a different dimension of the technology's viability as an asset. Not every deal requires equal depth across all areas. An acqui-hire (buying a company primarily for its team) may focus heavily on team assessment and lightly on architecture. A platform acquisition may require deep analysis across every dimension. The scope is calibrated to the deal.

Code Quality and Architecture

The most visible part of software due diligence is the code itself. Reviewers examine the codebase for organization, maintainability, and technical debt.

Architecture review assesses whether the system is designed in a way that supports future development. Monolithic applications aren't inherently bad, but they carry different scaling and maintenance characteristics than microservice architectures. The question isn't whether the architecture follows current trends. It's whether the architecture can support the buyer's plans for the product over the hold period.

Technical debt assessment identifies shortcuts and deferred maintenance that will require future investment. Every codebase has some technical debt. The due diligence process quantifies it: how much exists, where is it concentrated, and what will it cost to address? A codebase with manageable, well-understood debt is very different from one where years of shortcuts have created systemic fragility.

Test coverage and quality practices reveal how the team manages risk during development. Automated test suites, code review processes, continuous integration pipelines, and coding standards all indicate engineering maturity. Low test coverage doesn't necessarily mean the code is broken, but it means changes carry more risk because there's no safety net to catch regressions.

Security Assessment

Security due diligence evaluates how well the target protects its systems, data, and users. This is an area where problems discovered post-close can be extraordinarily expensive. A data breach affecting customer information can trigger regulatory penalties, lawsuits, customer churn, and reputational damage that erodes the acquisition's value far beyond the cost of remediation.

Penetration testing involves actively probing the application and infrastructure for exploitable vulnerabilities. This covers web application security (injection attacks, cross-site scripting, broken authentication), API security (access control flaws, data exposure), and infrastructure security (misconfigured servers, exposed services, weak credentials).

Beyond active testing, the security review examines the target's security practices. How are secrets managed? Are credentials hardcoded in source files or properly stored in vaults? Is there an incident response plan, and has it ever been exercised? Does the company hold relevant certifications (SOC 2, ISO 27001), and are they current?

Compliance assessment checks whether the target meets regulatory requirements relevant to its industry and customer base. Healthcare software needs HIPAA compliance. Companies handling EU customer data need GDPR compliance. Financial services software faces its own regulatory landscape. Gaps in compliance represent both legal risk and remediation cost.

Engineering Team Evaluation

Software is maintained by people, and those people carry institutional knowledge that doesn't exist anywhere else. Engineering team evaluation assesses whether the knowledge required to maintain and extend the software will survive the acquisition.

This dimension asks questions that code audits alone cannot answer. Who understands the critical systems? How many people would need to leave before a system becomes unmaintainable? Are key contributors engaged or showing signs of disengagement? Is knowledge spread across the team or concentrated in a few individuals?

Bus factor analysis measures the minimum number of contributors who would need to leave before a system loses maintainability. A bus factor of 1 on a revenue-critical system means one departure creates an immediate crisis.

Contributor lifecycle analysis classifies each team member's engagement level based on their activity patterns. Peak contributors are actively engaged. Ramping Up contributors are new and building context. Winding Down contributors show declining activity, a pattern that often precedes departure. If key knowledge holders are Winding Down and aren't covered by retention agreements, the buyer faces near-term risk.

Knowledge distribution analysis measures how evenly work spreads across the team. The Gini coefficient quantifies commit distribution inequality. High concentration (values above 0.7) means most of the codebase is maintained by very few people, making the organization fragile. Single-author files (code modified by only one person) and orphaned files (code where no active contributor has ownership) quantify how much of the codebase is one departure away from becoming unmaintainable.

Team interviews complement the quantitative analysis. Conversations with engineering leadership and individual contributors validate what the data shows, surface institutional knowledge that isn't documented, and reveal the team dynamics that determine whether key people will stay through the transition.

Intellectual Property and Licensing

Software companies rarely build everything from scratch. Modern applications depend on dozens or hundreds of open-source libraries, each carrying its own license terms. IP due diligence verifies that the target actually owns what they claim to own and has the legal right to use every component in the product.

The IP review covers several areas. First, ownership verification: does the company hold proper assignments for code written by employees and contractors? Are there any disputes or ambiguities about who owns what? Employment agreements and contractor agreements should include IP assignment clauses, and those clauses need to actually cover the work that was performed.

Second, open-source license compliance. Open-source licenses range from permissive (MIT, Apache 2.0) to restrictive (GPL, AGPL). Permissive licenses generally allow commercial use without significant obligations. Restrictive licenses can require the buyer to release their own source code under the same license, a condition that may conflict with the buyer's plans for the product. The due diligence process catalogs every dependency, identifies its license, and flags any that create legal exposure.

Third, third-party commercial licenses. If the product uses commercial SDKs, APIs, or data feeds, those agreements need to be reviewed for transferability. Some commercial licenses include change-of-control provisions that could affect the buyer's ability to continue using them post-close.

Infrastructure and Operations

The final dimension covers how the software runs in production. Infrastructure due diligence examines cloud costs, deployment processes, monitoring and alerting, disaster recovery, and operational maturity.

Cloud cost analysis determines whether the current infrastructure spend is reasonable and optimized. Over-provisioned resources represent immediate savings opportunities. Under-provisioned resources represent reliability risk. The review also examines whether the infrastructure is defined as code (repeatable and version-controlled) or manually configured (fragile and hard to reproduce).

Deployment process review assesses how changes get from a developer's machine to production. Automated, well-tested deployment pipelines reduce the risk of outages caused by human error. Manual deployment processes are slower, riskier, and often dependent on specific individuals who know the steps.

Disaster recovery evaluation checks whether the company can recover from catastrophic failure. Are backups taken regularly? Are they tested? How long would recovery take? Is there a documented runbook, or does recovery depend on someone remembering the steps?

The Due Diligence Timeline

Software due diligence typically runs for two to six weeks, depending on deal size and complexity. The process generally follows a structured sequence.

In the first phase, the buyer's team gains access to the target's repositories, documentation, and infrastructure. Access is typically read-only and governed by non-disclosure agreements. For GitHub-based organizations, this usually involves installing a GitHub App with read-only repository permissions.

In the second phase, automated analysis runs across the codebase while the due diligence team begins manual review. Automated tools can quickly surface metrics like bus factor, test coverage, dependency lists, and known vulnerabilities. Manual review focuses on architecture, code quality, and areas that automated tools can't assess.

The third phase involves interviews with the target's engineering team. These conversations are most productive when the reviewer has already examined the code and can ask specific, evidence-based questions rather than generic ones.

The fourth phase synthesizes findings into a report for the deal team. Findings are categorized by severity: critical issues that could block the deal, moderate issues that should affect price or terms, and minor issues that inform post-close integration planning.

How Software Due Diligence Affects Deal Terms

Due diligence findings don't just produce a pass/fail verdict. They inform deal structure in concrete ways.

Critical findings (security vulnerabilities on production systems, bus factor of 1 on core revenue platforms, regulatory non-compliance) may become pre-close remediation requirements. The seller must fix the issue before the deal can close.

Moderate findings typically influence price and earnout structure. Knowledge concentration risk might translate to retention agreements for named engineers. Security gaps might justify a price reduction that accounts for remediation costs. High technical debt might lower the valuation to reflect the investment required to bring the codebase to an acceptable state.

Minor findings feed into post-close integration planning. They don't affect deal terms, but they help the buyer plan realistic budgets and timelines for the first year of ownership.

Automating Parts of the Process

Software due diligence has historically been manual, time-consuming, and expensive. But several dimensions of the work are well-suited to automation, particularly the quantitative analysis that requires extracting and processing data from source repositories.

Engineering Team Risk with ContributorIQ

The engineering team evaluation described above requires analyzing commit history across every repository to calculate bus factor, measure knowledge concentration, classify contributor engagement levels, and identify orphaned code. Doing this manually means cloning repositories, writing scripts to parse git logs, and assembling findings in spreadsheets.

ContributorIQ automates this entire layer. Once connected to the target's GitHub organization via read-only access, it generates bus factor scores for every repository, classifies each contributor by lifecycle stage, calculates Gini coefficients for knowledge distribution, and produces a composite Organization Health Score from 0 to 100.

The M&A Advisory Report packages these findings into a format built for deal teams. It names the specific contributors who represent key person risk, flags repositories where bus factor is critical, and surfaces the orphaned and single-author file percentages that quantify maintainability risk. Rather than spending days assembling this data manually, the due diligence engineer can generate a quantitative risk assessment and focus their time on the work that requires human judgment: reading code, interviewing the team, and interpreting findings in context.

The data also makes team interviews more productive. Specific questions grounded in actual metrics ("Your payment service has a bus factor of 1 and 68% of commits from a single contributor; what's your succession plan?") produce more honest and useful answers than general inquiries about team structure.

Dependency Disclosure with DependencyDesk

IP and licensing due diligence requires a complete inventory of every third-party dependency across the target's codebase. For organizations with dozens of repositories spanning multiple programming languages, manually cataloging dependencies is tedious and error-prone.

DependencyDesk automates this by scanning an organization's GitHub repositories and extracting dependency data from package manifest files across JavaScript, PHP, Ruby, Python, and other languages. It identifies each dependency's name, version, and license, then provides downloadable CSV reports covering the entire organization. This gives both buyers and sellers a comprehensive inventory for license compliance review without requiring engineers to run command-line tools against each repository individually.

For sellers, generating this disclosure proactively demonstrates that the codebase is free of license conflicts that could complicate IP assignment. Buyers receive the structured data they need to verify that open-source license terms are compatible with their intended use and that no restrictive licenses (GPL, AGPL) create unexpected obligations. Either way, having a complete, organized dependency inventory removes one of the more tedious bottlenecks from the due diligence timeline.

Common Pitfalls

Several patterns cause software due diligence to fail or miss critical risks.

Focusing only on code quality while ignoring people risk is the most common mistake. A beautifully architected system is worthless if no one understands it well enough to maintain it. Code quality and team stability are equally important dimensions.

Accepting the seller's documentation at face value is another pitfall. Documentation may be outdated, incomplete, or aspirational rather than descriptive. The due diligence process should verify claims independently through code inspection and interviews.

Rushing the timeline under deal pressure leads to shallow analysis. Two weeks of serious review will surface more actionable findings than four days of skimming. Push back on timelines that don't allow for thorough assessment.

Treating due diligence as adversarial rather than collaborative harms both sides. Sellers who obstruct access or delay responses create suspicion. Buyers who approach the process looking to kill the deal rather than understand risks miss opportunities. The best outcomes happen when both sides view due diligence as a shared effort to understand what's being transferred and plan for success.

Conclusion

Software due diligence is the process that separates informed acquisitions from expensive gambles. It evaluates the technology, the team, the security posture, the IP position, and the operational maturity of a software company before a buyer commits capital.

No single dimension tells the complete story. Code quality matters, but only if people exist to maintain it. Security matters, but only in the context of the threat model and regulatory environment. IP compliance matters, but only if the underlying technology is worth owning. The value of thorough software due diligence is in connecting these dimensions into a complete picture that informs deal decisions.

The process doesn't need to be entirely manual. Tools that automate the quantitative analysis (contributor risk, dependency inventories, vulnerability scanning) free up due diligence engineers to spend their time on the work that requires experience and judgment. The goal is the same either way: understand what you're buying before you buy it.

Introduction#

What Software Due Diligence Covers#

Code Quality and Architecture#

Security Assessment#

Engineering Team Evaluation#

Intellectual Property and Licensing#

Infrastructure and Operations#

The Due Diligence Timeline#

How Software Due Diligence Affects Deal Terms#

Automating Parts of the Process#

Engineering Team Risk with ContributorIQ#

Dependency Disclosure with DependencyDesk#

Common Pitfalls#

Conclusion#