Challenge the convention: Software Confidence Index

Sunday, September 30, 2012

Software Confidence Index

[Finally I've materialized the concept that I had started in 2009 to measure the confidence index of a software or application. The initial idea of this indexing was mentioned in my another post in 2009 Project Confidence Level and that I've refined it through these last few years to finalize it to an Indexing scheme (I was influenced by the concept of Consumer confidence Index and partly by Credit Scoring scheme as I mentioned in my previous post)]

At the end of a successful software release, the members of the project team get some breathing space to reflect on the completed project and ask some questions specially surrounding the success of the project or lesson learned from it; as we don't like the word “mistake” to label our shortcomings so we sugar coat this as “lesson learned”. These questions are absolutely important to understand the level of success of any project but the irony is that, most of the time, the way these questions are answered are very vague in nature and almost impossible to quantify the success of that project.

Imagine a large conference room, full of hard core technical people, the managers and potentially the sponsors or business users of the delivered software, of the project completion meeting (there are so many fancy names of this kind of meetings - lesson learned, retrospective, project closure etc.) and everyone is eager to know how well or not so well was done. The first question usually is very simple which is most of the time asked by the Project Manager or Technical Manager of the project and that is “how have we done in this project?”. Though the question is so simple but the drama starts when people of different areas start to respond to this apparently simple question.

The range of answers could be -

"we've done a great job. The number of defect has gone down greatly …” - the person is most probably the technical lead of the project
"It was good. The developers had written comparatively more codes in this release (along with LOC, if available)" - the person probably is a senior developer or technical lead
"The quality of business requirement was better; the developer didn't have much complain about requirement issues in this release" - this answer is certainly from a business systems analyst
Now the business sponsors or users weigh in - “The software greatly simplifies our day to day work and improves the productivity of our employees” or “the performance of the software is not what we expected so doesn’t add much values in our business process”
“The throughput of the team has been increased significantly and we were able to deliver more software features compare to our previous release” - this one is probably from the Technical Manager of the project
And these go on and on ...

There is nothing wrong with above all the answers and they all are right in the perspective of their own business area. But the problem is how do you communicate success or failure to different group of people let’s say the senior management in the company, do you think that they’ve enough time or interest to listen to all the answers of number of lines written, the defect count, application performance or throughput of the development team? And moreover, how do you compare this result with your next or previous project or different software project in your company and in the industry? So, even though the above kind of answers tell us the success or failure of the software but it’s not easy to communicate to everyone across different interest groups, not comparable across different software projects and certainly does not give a single measure point of success or failure that can be projected on a trend graph unambiguously.

Wouldn’t it be simpler if we have a straightforward way of evaluating the success or failure or comparing with other releases of projects which would be unambiguous to everyone and most importantly, measurable and quantified in a single number or index? My goal is to debunk the myth that software development is something that you can’t compare one with another, even if the software is developed by the same team of programmers, because every software is a creative and an unique piece of work that make them incomparable. Let’s see if all the dimensions (e.g. quality, productivity, reliability, performance, business value, end user satisfaction etc.) of success of a software project can be combined in to an index to measure them; let's call it Software Confidence Index (SCI). After all, we develop software systems which follows a predictive path of execution and can be measured in number so why not the success of the software that we’ve delivered.

So now let’s change the answers that we had seen above in the conference room to a little different way and more unambiguous way:

"The project was great! The SCI was 85 in this release", or
"The release went somewhat good. The SCI was 75, we've room to improve in the field ...", or
"The SCI was 55. It didn't meet our expected goal. Lets talk about what went wrong ..." etc.

And if people have interest in specific facts like defect density or software performance, that can be dug down in that meeting or in a separate meeting with specific interest groups. The advantage of quantifying the success of the software through an index certainly is that it creates a new vocabulary to communicate across the board but on top of that, it can be used as a new goal setting parameter for the entire team involved in the development and implementation of the software.

So now the question would be how this SCI could be computed. The SCI will cover both sides of the aisle of the software product, the technical side and business side. I would like to give equal weight to technical and business factors but this may change depending on the project's importentce on the techincal or business priorities, as an example, for a software company technical factors may get higher weightage whereas for a non-software organizations may focus more on business values rather than technical excellence. This should be determined by the organization's goals and priorities. There is no hard list of facts that should be considered to compute the index but it has to be consistent across the organization so that the comparison among the softwares SCI would make sense. Let’s check out how it can be computed.

The index considers two kind of the factors - Technical factors and Business Factors. The Technical Factors includes such as - code quality (duplicate code, unused variables etc.), defect density, effective code coverage for unit tests, use of design documents and processes, use of standard coding practices (vulnerabilities in the code for security, memory leak etc.), benchmark of load test result etc.

Each factors have threshold values to provide a point. E.g. if the defect density is less than 2 in 1,000 Lines of Code (LOC) then the point is 5 and if it’s greater than 20 but less than 25 then the point is 1. The point between 1 and 5 is to cover the in between values.

Each factors have associated weights. The sum of weights should be 10.

The Business factors is a questionnaire and sent to the customers (the end users) as a survey. The sample questions in the survey could be:

Does the application have all the features delivered those were committed to business?

The application saves valuable time and simplifies end users day to day job

It is very easy and intuitive to use the features of the software

How satisfied are you with the performance of the software?

How satisfied are you with the response of the IT Team to any problem experienced in the software?

Overall how satisfied are you with the software?

How likely are you to recommend this IT Team to others with similar software need?

Each questions have points ranged from 1 to 5 where 5 is Extremely Satisfied or Agree and 1 is Extremely Dissatisfied or Completely Disagree. Similar to the Technical Factors, each business factor questions have weight and the sum of the all weights is 10.For both the Technical Factors and Business Factors, the point would be 0 if the value falls outside of the accepted range.

Once all the factor values are known, then get the Software Confidence Index using the below equation:

where T is Technical Factors and B is Business Factors

Here is the snapshot of factors along with their weights and points that are used to calculate the first SCI:

It's a starting point from where the discussion starts moving towards more of a rational and quantitative direction rather than vague and subjective discussion. This index is most effective and useful when the index is captured for a period of time that enables the comparison of historical data. This SCI can be used in many ways e.g. to create benchmark in an organization, setting goal to the software development team, compare the heterogeneous set of software delivered by an organization etc. This isn't a silver bullet to improve the business confidence on the software but it definitely will set the course to improve the business confidence in the developed software.

4 comments:

Mohammad Masud said...: When I talked about the Software Confidence Index with some of my friends and colleagues I received couple of comment on why SCI may not be much helpful. The primary reasoning behind that comment is that SCI does consider only a very few factors of a software and it's related development process and with that minimum number of factors the software confidence can't be computed into an index. Another quite appealing debate on it's effectiveness is that software development is a creative work and it can't be measured or quantified. Let's look into both the above comments one by one.

Creative works can't be computed: I briefly touched on this in my Software Confidence Index post but here I'm revisiting it as this is very appealing reason to most of the programmers. I don't disagree with the fact that programming is a creative work and it's very tough to measure the efficiency or compare the quality of one software with another. Even the idea of quantifying the software or ranking a software came to my mind in 2008 when at my work, my manger initiated the concept of "pay for performance" and started ranking the developer. But the ranking was done by using the Effective Lines of Code (ELOC) which we almost everyone opposed and I was in the front of this opposition. That found ineffective when at the starting of a new release the lead of the application repeatedly coming at the top when the merging of the new clear case project was synched with the last release changes and delivered by the application technical leads. More over the tendency to seeing oneself at the top started bad practice of writing inefficient code with redundant lines that could be avoided by writing efficient coding practice e.g. increased use of "if-else" over tertiary operator.

Even though I strongly opposed the use of ELOC but I always believed that we can use few factors consistently to get a sense of ranking and refine the method over the period of time to get close to perfect (there is nothing called "absolute perfect" but everything known as perfect in this world are actually more of "relatively perfect"). So if a good number of quantifiable factors are used consistently along with very few qualitative factors, a relatively perfect index can be achieved which won't be no way absolutely perfect but would provide value to the software industry to start measuring software in an industry and can be compared against a benchmark index

Limited number of factors can't be effective to compute the index: If this theory is true than a lot of famous numbers you see in the modern age would be vanished. For example - GDP. The Gross Domestic Product is computed by using a formula where, in most cases, the production of a country is taken into consideration along with few other factors. Do you think that it's possible to government agencies has capability to count every single products produced in a country? I know for sure that last year what I had produced in my back yard, at least 50 pounds tomato, almost same amount of cucumber, and a good amount of other vegetables, weren't counted in to the GDP and there are millions of people who do gardening every year and produces millions of pounds of vegetables in their back yard that go uncounted. There are some other way of computing GDP by not using production but through expenditure and other factors but for sure you'll find no method can cover each and everything in this world. You will see the more striking number of factors in the computation of inflation rate in a country. But the bottom line is if a method considers the same factors or group of factors over the period of time then the limitation of using limited number of factors become less and less important in the process but the comparison of the, in this case, GDP and Inflation over the period of time shows where the country is heading or if a country is doing better or worse.; October 8, 2012 at 1:09 PM
Shahidul Mahfuz said...: This comment has been removed by the author.; October 9, 2012 at 10:49 AM
Shahidul Mahfuz said...: It’s a nice idea and presentation of the idea. Make sense to use something like this in real world. Once software is released, like you said, the answer to the question “how have we done in this project?” is always gets vague answer The answer range could vary a lot depending on who is being presented to.

Some people may have different opinion about presenting this type of thing with number (SCI). Well, Number may not necessarily give all the answer, but yes, can get an idea how we did. Think about one simple scenario- when we go to a doctor office for some “pain”, they always ask the question about the pain in terms of pain scale 1-10 number. I would say it’s (SCI) a good approach.
I have few queries though:
1. Will we able to relate/ compare different software if the SCI is same between them? If the SCI for example is 78, where Business factors scored 48 out of 50 and Technical factor is 30 out of 50. What would the SCI will give us then? And any organization usages 70 on business and 30 in tech side and would get the same SCI 78 then how would you differentiate/ relate the SCI numbers?
2. Business part will be done base on the end user’s feedback. Feedback questionaries’ may vary to company to company.
3. How would you define the quality of the code? Who will be judging it? Who and how would any one judge the smart code writing?; October 9, 2012 at 10:51 AM
Mohammad Masud said...: Thanks for comment. The analogy when we go to doctor's office is very much the essence of why do we need to quantify an apparently unquantifiable event or state. Let me try if I can answer to your three queries.

Q1. This is a very good question as the goal of SCI is to create benchmark index that will be used to compare different software applications across organization or industry. It's very much possible to use a standard set of factors to index any kind of software (nonetheless all software is written in some programming languages and by programmers). Even though the goal of the SCI is to create a benchmark or reference point to provide an easy way to make the software or software development comparable to each other but the SCI can have different benchmarks and standards for different industries as well. There may have a standard set of factors and benchmark for software used by the research institutions, and a different set of standard factors and benchmark for software used by the Financial institutions.
Q2. Yes, the questionnaire for the business part of the index would depend on the nature of the business of the organization. It’s possible to come up with a set of common questions for the business factors but as I mentioned above it can be flexible as well. Apart from that, the formula mentioned in the computation of the index provides the flexibility of keeping technical and business factors independent of each other. So, not only the questions and their weights may change but the number of questions also may independently vary. There could have some scenarios where you would care mostly about technical factors with a very few business factors and vice versa.
Q3. Yes, defining the quality of code is something that no two programmers will agree with each other. Not only that, even most of programmers don’t agree with the quality of their own written code in few years. So, my approach is to achieve the “relatively perfect” result rather than wait and watch for the perfect world i.e. what’s available right now I'm going to use that and as and when new way is identified, I'll accommodate with that. To define a boundary of what's a quality code means, we can start with the below list and then extend when more is available– no code duplication, use of standard design patterns, use of constants over string literals, low cyclomatic complexity, well documented code (e.g. JavaDoc for software written in Java), memory leak (e.g. for Java programs - not closing the opened database connection and file, unnecessarily keeping the objects reference active), code that’s not vulnerable to security (refer to www.owasp.org for detail on software vulnerability). There are various static analysis tools that can be used to identify if the code is written in an efficient way. Just to name a couple: Parasoft® Jtest® (www.parasoft.com) to check code quality and IBM Security AppScan Source (www.ibm.com) to detect security vulnerability.; October 9, 2012 at 7:50 PM

Featured Post

The great debacle of healthcare.gov

Sunday, September 30, 2012

Software Confidence Index

4 comments: