Challenge the convention: June 2014

Saturday, June 28, 2014

Fighting estimation fatigue - Part 4: Choose the right estimation model

The availability of vast number of software estimation models sometimes make it confusing for the project manager to pick the right estimation model. Moreover, not every one wants to be expert in estimation techniques and models. If you have enough budget allocated for planning phase of your project (I doubt you would often get that in any software project), you can hire an estimation expert who would have detail expertise on various software cost estimation models and pick the right model for your project. But for those who don't want to invest their time on going through each of the models (some of which I briefly explained in the post Introduction to standard estimation models and methods), I've created a sample matrix that would provide some sort of guideline to help you identify the right estimation model. At the very least, gives you starting point.

For example, if the software project uses the Object Oriented Programming technology, almost all the estimation models can be used. Now if the development methodology is taken into consideration and if it happens to use Agile Development methodology, then the options become narrow. In that case a few of the above models remains to be a pick from the matrix e.g. Use Case Point, User Story Point, and Delphi Method. Though the above framework would provide some initial guideline to narrow down the list of models, but at the end it’s up to the Project Manager and the Software development team to decide on the best fit models for their project.

If I were you, I would not settle on a single estimation model from the above matrix but try out at least two to three estimation methods for my initial estimation and if all of them comes with a close proximity of 10% - 20%, then probably you can go with any one you like. But if you find out a large gap in your initial estimations from those different methods, then you probably should be very careful while providing your commitment to your company executives. The variation in estimations tells you that you're handling a project that has more unknowns and lot of uncertainties. I believe we haven't forgotten the lesson of the cone of uncertainty.

In my final post of this software estimation series, I would cover on the communication part of the software estimation and some of the industry best practices.

Sunday, June 22, 2014

Fighting estimation fatigue - Part 3: the cone of uncertainty

Software project is all about unknowns. At the beginning of a software project, the project charter takes a very simplistic view of the final product and try to estimate the dollar amount (because as we learned, the project sponsor always look at the bottom line which is the dollar cost of the project). At this stage, the larger amounts of unknown create a larger uncertainty in estimation. But as the project moves into the deeper level of planning and implementation, the more unknown becomes known, hence the uncertainty in the estimation becomes lesser compare to the previous stage. This phenomenon is described by the concept of “The Cone of Uncertainty”, originally used in the chemical industry by the founders of the American Association of Cost Engineers (now AACE International) and got wide popularity after it’s published in Steve McConnell’s famous book “Software Estimation: Demystifying the Black Art”.

Figure: The Cone of Uncertainty from www.agilenutshell.com

According to the above graph, it’s evident that the later in the project life cycle, the better the estimation. But the ‘catch 22’ of this reality is, no one would come to the one stage down towards the certainty if the initial estimation (which is bound to be inaccurate due to the high error margin) is not given at the project inception.

So, can this cone be beaten? As Steve McConnell mentioned –“Consider the effect of the Cone of Uncertainty on the accuracy of your estimate. Your estimate cannot have more accuracy than is possible at your project’s current position within the Cone. Another important – and difficult – concept is that the Cone of Uncertainty represents the best-case accuracy that is possible to have in software estimates at different points in a project. The Cone represents the error in estimates created by skilled estimators. It’s easily possible to do worse. It isn’t possible to be more accurate; it’s only possible to be more lucky”. Now the only option is left and that is to live with that. Below are some techniques on how to deal with that reality:

Be honest and upfront about the reality. Though it may not be taken as a positive gesture initially, but being truthful about the risk of estimating with the expectation of high accuracy. If the project sponsors can be made convinced with the reality of software project (probably by showing some of the past history of software projects within that organization), may be padding onto the final numbers may give everyone sufficient wiggle room
Attach the variability with the estimation when presenting to the project sponsors. There’s absolutely no benefit to anyone in that project to surprise the stakeholders
Actively work on resolving the uncertainty through making the unknowns known. This is the responsibility of the estimation team to force the cone of uncertainty to narrow down. Without an active and conscious actions, the narrow down of the cone, as it appears, won't happen won't happen on it's own with the progression of the project's life cycle

Let's try to take a postmortem look on why we have this cone of uncertainty in our software projects. The single most reason of the uncertainty of the estimation is that the estimation is actually doing a prediction (or forecast) on the capabilities of software developers. Unfortunately, by nature, human behavior is unpredictable. The same person can behave differently based on the presence of surrounding factors or absent of surrounding factors. A programmer may come up with the solution of a complex programming problem in a few minutes whereas the same person may struggle to resolve a lesser complex problem in another time. So the entire game of prediction is bound to fall apart when it's trying to predict the most unpredictable nature of human psychology. So the strategy shouldn't be to try to hit the bulls-eye with a single estimation value, rather try to maximize the chance of coming close to the actual with the estimation through the use of techniques like: range value, plus-minus factor, confidence level factor etc. That's why sometime it is being said that we don't have failed projects but just failed estimations.

Though we're living with this cone of uncertainty in every software projects but somehow we were so oblivious and pretending that didn't existed. Anyway, I hope we won't be from now onwards. In my next posts, I would talk about a model selection framework that would help to identify a standard estimation model and then finally I would provide some helpful tips and techniques on how to better communicate your estimations with some confidence and industry best practices.

Friday, June 13, 2014

Fighting estimation fatigue - Part 2: Introduction to standard estimation models and methods

Let's first start with the basics of estimation thing and then I will touch upon some of the popular and useful estimation models. This will give a solid foundation for the upcoming posts of this Software Estimation series.

A Google search reveals the “meaning of estimation” as “a rough calculation of the value, number, quantity, or extent of something”. Even though by definition “estimation” is a rough calculation but most of us when we hear the word “estimation” in software development, we internally treat it as “actual” and commitment. Software Estimation is often considered as a black art which is mostly neglected during the inception of an Information Technology Project, nonetheless, almost always is used to formulate the two most important attributes of a project i.e. the cost and the deadline, and often with rigid expectation of high level of accuracy.

Software Estimation is an widely, and in some cases overly, used topic in the field of computer science and software engineering. People were looking for a panacea where one can predict the schedule....A wide range of models have been developed since mid of twentieth century to solve the premise of sizing the software before it is built. Some of them are very effective at their time with a certain programing practice whereas the usefulness of some of them transcended the boundary of technology and programing practices. Here are some of the very popular and widely used software cost estimation models and metrics:

Line of Code (LOC/SLOC)

Though Line of Code or Source Line of Code (SLOC) is not a software estimation model in itself, this is the most widely used metric in software estimation models to determine the size e.g. COCOMO, SLIM, SEER-SEM, PRICE-S. Moreover, the most of ad-hoc estimation models used in the software house, and they are the majority in the software estimation landscape, uses this metric as in input to estimate the software development effort. The popularity of the LOC is not because that this gives an accurate picture of the size of the software but due to its simplistic connotation to the direct result of programming work of developing a software product. The primary advantage of SLOC is that it’s easily agreeable by the parties that are involved in the project to consider SLOC for software sizing as, in reality, source code is the apparent building block of the software. Despite of its sheer popularity, the use of LOC in software estimation is the biggest contributor to estimation inaccuracy. The reasons behind this inaccuracy are:

There’s no standard or single agreed upon method on counting Source Line of Code
SLOC is language dependent; changing the programming language immediately impacts it
It is inherently inconceivable to predict SLOC from scope document with a very high level of requirements
There’s a psychological toll of SLOC as it may incentivize the bad coding practice thus increases the SLOC

COCOMO

COCOMO, stands for COnstructive COst MOdel, is one of the early generation software cost estimation model and enjoyed its early popularity in nineteen eighties. It uses Line of Code as a basis of estimation and suffers all the shortcomings that I mentioned above in SLOC. I believe COCOMO is not very effective in this software development age where Object Oriented Programming rules the world and LOC doesn’t make a whole lot of sense in OOP in terms of effort estimation. If you’re still interested (after my all wrath on COCOMO), you can visit http://en.wikipedia.org/wiki/COCOMO and http://www.softstarsystems.com/overview.htm to learn more

Function Point Analysis

The obvious limitations on guessing LOC of a to-be-built software paved the way for another popular and somewhat realistic during early phase of a software project , which his called Function Point Analysis (FPA). In FPA, the software size is measured through a construct termed ‘Function Points’ (FP). Function points allow the measurement of software size in standard units, independent of the underlying language in which the software is developed. Instead of counting the lines of code that make up a system, count the number of externals (inputs, outputs, inquiries, and interfaces) that make up the system.

There are five types of externals to count: External inputs, External outputs, External inquiries, External interfaces, and Internal data files. The below Value Adjustment Multiplier (VAM) formula is used to obtain the function point count:

Where Vi is a rating of 0 to 5 for fourteen factors predefined factors.

The primary advantages of using Function Point Analysis model is that it measures the size of the solution rather than the problem and is extremely useful for the transaction processing systems (e.g. MIS applications). Moreover, FPA can be derived directly from the requirements and easily understood by the non-technical user. However, it does not provide an accurate estimate when dealing with command and control software, switching software, systems software or embedded systems. Moreover, FPA isn’t very effective in Object-Oriented software development that uses Use Cases and converting Use Cases into Function Points may be counter intuitive.

Use Case Points Method (UUCPM)

Similar in concept to function points, the theoretical basis of the Use Case Points Method, first described by Gustav Karner, is based on use cases as a basic notation for representation of functionality, and uses case points, like function points, measure the size of the system. Once we know the approximate size of a system, we can derive an expected duration for the project if we also know (or can estimate) the team’s rate of progress. In this approach, the first step is to calculate a measure called the unadjusted use case point (UUCP) count. To the UUCP count is applied a technical adjustment factor, as per FPA, albeit the factors themselves have been changed to reflect the different methodology that underpins development with use case. In addition, the UUCPM also defines a set of environmental (project) factors that contribute to a weighting factor that is also used to modify the UUCP measure. Four important formula used in UCP are Unadjusted Use case Points (UUCP), Technical Complexity Factor (TCF), Environment Factor (EF) and Use Case Point (UCP):

Once the number of UCP has been determined, an effort estimate can be calculated by multiplying the number of UCP by a fixed number of hours.

The primary advantages of UCP model is that it can be automated thus saving the team a great deal of estimating time. Of course, there’s the counter argument that an estimate is only as good as the effort put into it. Additionally, they are a very pure measure of size as it allows separating estimating size from deriving duration. Moreover, by establishing average implementation time per UCP, forecasting is possible for future schedules. In the contrary, the fundamental problem with UCP is that the estimate cannot be arrived at until all of the use cases are written. While use case points may work well for creating a rough, initial estimate of overall project size, they are much less useful in driving the iteration-to-iteration work of a team. A related issue is that the rules for determining what constitutes a transaction are imprecise and since the detail in a use case varies tremendously by the author of the use case, the approach is flawed. Moreover, few Technical Factors do not really have an impact across the overall project, yet, the way they are multiplied with the weight do impact the overall size. The installation ease factor is an example of such. Finally, like most other estimation models, they do not fit into the agile development methodology.

User Story Points (USP)

User stories, which help shifting the focus from writing about requirements to talking about them, are the foundational block of Agile development technique. All Agile user stories include a written sentence or two, and more importantly a series of conversations about the desired functionality from the perspective of the user of the system. In Agile world, the method of estimating the size of the software is the User story points which are a unit of measure for expressing the overall size of a user story, feature, or other piece of work. It tells us how big a story is, relative to others, either in terms of size or complexity by using relative sizing technique. Popular sizing techniques include- Fibonacci series (1, 2, 3, 5, 8, 13, 21, etc.) and T-shirt size (S, M, L, XL, XXL, etc.). Then estimate the size of each user stories with the entire team, usually through planning poker session.

The major advantage of User Story Point is that it is relatively easy and fun to estimate the relative size of a user story. Also the estimated size of the product is the outcome of a consensus of the team, so the ownership is, which has psychological influence to achieve higher throughput. On the contrary, benchmarking of size is challenging as the story points taken by one team cannot be compared with another team’s USP. Also some people may find it hard to get to the duration form the USP as there’s practically no direct relationship between user story point and person hour. If the team is not diverse enough to balance out the skewed sizing due to the biasness of a particular group, the sizing, eventually, could be proved to be useless. A subtle risk exists of inflated sizing by the development team if management has unrealistic expectation to show higher velocity to prove productivity

Delphi Method

The Delphi Method is an information gathering technique that was created in the 1950s by the RAND Corporation. The Delphi Method is based on the surveys and makes use of the information of the participants, who are mainly experts. Under this method of software, project specification would be given to a few experts and their opinion taken. The steps taken to get the estimation using are: (i) Selection of experts, (ii) Briefing the experts about the project, objective of the estimation and overall project scope and clarification, (iii) Collate the estimates (software size and development effort) received from the experts and finally (iv) Convergence of estimates and finalization

The major advantages of Delphi technique are that it’s very simple to administer and also can be derived relatively quicker. Also, it is useful when the organization does not have any in-house experts with the domain knowledge to come out with a quick estimate. On the contrary, the disadvantages primarily come from selecting the wrong experts as well as getting adequate number of experts willing to participate in the estimation. Moreover, it is not possible to determine the causes of variance between the estimated value and the actual values.

Heuristic Method

Heuristic methods of estimation are essentially based on the experience that exists within a particular organization where past projects are used to estimate the required resources necessary to deliver future projects. A convenient sub-classification is to divide heuristic methods into ‘top-down’ and ‘bottom-up’ approaches. These approaches are the de-facto methods by which estimates are produced and as such they are implicated in being poor reflections of the actual resources that are required as evidenced by the failure of projects to be estimated accurately. Top-down approaches to effort estimation may rely on the opinion of an expert whereas the Bottom-up estimation is the process by which time needed to code each identified module is estimated based on the discrete tasks that must be performed, such as analysis, design and project management.

Despite its simplistic approach, the error margin of heuristic method’s estimation is not proven to be worse than any other parametric or algorithmic (COCOMO, UUCP, etc.) estimation models. This doesn’t come free of risk either. The lack of access to historical data will cause high degree of error margin of future project and the underlying assumption of repeatability of organizations success could be proved to be deadly. Moreover, if not conducted systematically, tasks such as integration, quality and configuration could be overlooked

There is another category that usually people don't talk a lot, which is - home grown estimation models. Those models are created by the team members of a software project where they're working for quite a long period of time and those work pretty good for their projects. As those kind of estimation models typically aren't standardize to use for software projects outside of that group, those are usually not made public. To get a flavor of that kind of models, you can check out the JEE Software Development Estimation post where I have posted such a model which I used in 2008 in one of my software project.

In my next post of this series, I would cover the famous "Cone of Uncertainty" in software project management as well as the influence of human psychology in software estimation. Stay tuned!

Featured Post

The great debacle of healthcare.gov