Featured Post

The great debacle of healthcare.gov

This is the first time in history when the president of the United States of America, or probably for any head of state around the world,...

Thursday, November 16, 2023

From Artificial Neural Network (ANN) to Generative Pre-trained Transformer (GPT): Evolution of Large Language Model

!! This article is generated by Chat GPT 4.0 !!

Traces of the evolution of AI from Artificial Neural Networks (ANNs) to models like GPT involves covering a vast landscape of developments in machine learning and neural network architectures. Each step in this evolution brought new capabilities and understandings. Here's a revised overview with technical summaries of each key development.

1. Artificial Neural Networks (ANNs)

  • Technical Summary: ANNs consist of layers of interconnected nodes (neurons) that simulate the way biological brains process information. Each connection has a weight, which is adjusted during the training process to minimize the difference between the actual output and the predicted output.

  • Evolution: Early ANNs like perceptrons were limited to simple linear tasks. The advent of multi-layer networks and backpropagation algorithms in the 1980s allowed ANNs to learn from complex data, laying the foundation for modern deep learning.

2. Deep Learning and Convolutional Neural Networks (CNNs)

  • Technical Summary: Deep learning involves ANNs with multiple layers (deep networks) for feature extraction and transformation. CNNs, a class of deep neural networks, are specifically designed for processing data with a grid-like topology (e.g., images). They use convolutional layers to filter inputs for useful information.

  • Evolution: CNNs, exemplified by models like AlexNet, significantly advanced fields like image and video recognition, enabling systems to identify and classify content within images with high accuracy.

3. Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) Networks

  • Technical Summary: RNNs process sequences of data by maintaining a 'memory' of previous inputs in their internal state. LSTMs, an advanced RNN architecture, address the vanishing gradient problem of standard RNNs, allowing them to learn long-term dependencies.

  • Evolution: LSTMs improved the performance of models on sequential data, particularly in language processing tasks like translation and speech recognition.

4. Transformer Models and the Attention Mechanism

  • Technical Summary: Transformers, introduced in the "Attention Is All You Need" paper, use an attention mechanism to weigh the influence of different parts of the input data. Unlike RNNs, they process data in parallel, significantly improving efficiency.

  • Evolution: The Transformer model, through architectures like BERT and GPT, revolutionized NLP, allowing for more sophisticated understanding and generation of human language.

5. Generative Pre-trained Transformer (GPT) Series

  • Technical Summary: GPT models are large-scale transformer-based models pre-trained on vast amounts of text data. They use unsupervised learning to generate human-like text and can be fine-tuned for specific tasks. Each successive version of GPT has increased in model size and complexity, enhancing its capability.

  • Evolution: GPT models, especially GPT-3, demonstrated breakthroughs in generating coherent and contextually relevant text, answering questions, and even coding, showcasing the immense potential of transformer-based architectures in AI.

6. Beyond GPT: Emerging Trends and Future Directions

  • Technical Summary: Post-GPT developments are focusing on enhancing efficiency, context-awareness, and ethical considerations in AI. This includes exploring smaller, more efficient models, multi-modal learning (combining text, images, and other data types), and addressing issues like bias and fairness in AI.

  • Evolution: The future of AI promises advancements in AI ethics, explainability, and integration into various aspects of human life, marking a shift towards more responsible and versatile AI applications.

Conclusion

From simple ANNs to the sophisticated GPT models, AI has undergone remarkable transformations. Each stage of evolution has built upon the last, progressively enhancing the capabilities and applications of AI. As we move forward, the focus is shifting towards more efficient, ethical, and contextually intelligent AI systems, promising a new era of innovation and integration into everyday life.

Tuesday, October 17, 2017

What is Design Thinking

A hypothetical conversation is taking place in a conference room between a software engineer and a business user.

"I need to have the development team available and ready twenty-four by seven during the filing period. This is Fed mandated SLA and we would have to react within 4 to 24 hours. If any approval is needed to do immediate deployment to the production, secure the necessary management approval upfront.", said the business user.

"Do you really need the development and production support team to seat at their desk and waiting to jump in to reintegrate the financial models into the production environment? What problem you are trying to solve here? Are you looking for a way to have the changed models reintegrated into the production environment within a short period period of time to meet the stringent Fed mandates SLA?" The software engineer replied with an empathic voice. Further adding to it by proposing a potential solution to that problem, "How about we provide you a self-service capability to reintegrate the models into the production system? You can do that anytime you want it and any number of times you need it."

"That sounds interesting but I don't want anyone to change the production system anytime without a proper approval", the business user reacted in a receptive tone.

"I don't want that either", the Software Engineering Manager is now chipping into the conversation, "We can enforce four-eyes check but let's talk more about the detail before we jump into the final solution", and has steered the discussion towards finding the right solution.

Though this may be a hypothetical conversation but certainly you have seen the similar conversation where the business user approaches the software engineering or product development team with a "brilliant" IT solution of a business problem without even mentioning what business problem the user was trying to solve. However, the goals of the software engineering team should be to steer the conversation towards understanding the users' pain points, find the fundamental problem and then propose the right solution.

To me, this is the essence of Design Thinking.

Design Thinking is a not the new guy in the town even though its reincarnation sounds just like that. I don't want to spend whole lot about its historical aspect but let's put just enough history for the sake of giving a context.

Design Thinking as a concept came into existence in the late sixties when Herbert A. Simon published his book, "The science of the Artificials". This got into the mainstream through the establishment of Stanford University's Design School.

Before delving into the detail of the Design Thinking, let's first clarify, "what's Design?"

Design, though it sounds like the surface or outward appearance of a thing, however, this concept of design is furthest from that vain outwardly look and feel. IBM Design Thinking defines Design as "The Intent behind the outcome". But the most intricate definition of Design came from the man who had changed the way we perceive the computer products, Steve Jobs, who once said in his interview that the reason he doesn't like the Microsoft's product because "it doesn't have the taste", and defined Design as "...the fundamental soul of a man-made creation that ends up expressing itself in successive outer layers". And Design Thinking is the art of creation of Design.

Now, let's take the words from two other most influential persons who have helped the Design Thinking to come to its current state: Don Norman, the author of "The Design of Everyday Things", has described the Design Thinking as "...Designers resist the temptation to jump immediately to a solution for the stated problem. Instead, they first spend time determining what basic, fundamental (root) issue needs to be addressed. They don't try to search for a solution until they have determined the real problem, and even then,, instead of solving that problem, they stop to consider a wide range of potential solutions. Only then will they finally converge upon their proposal. This process is called design thinking." and Tom Brown, the founder of IDEO, has defined Design Thinking as "...a human-centered approach to innovation that draws from the designer's toolkit to integrate the needs of people, the possibilities of techno technology, and requirements for business success".

In the second part of this post on Design Thinking, I will cover the method of Design Thinking and shed some light on the IBM Design Thinking and finally on how the Agile development methodology can coexist with Design Thinking.

Sunday, September 24, 2017

Micro blog: Designing computer as our brain is designed


When we learn a new skill, such as, playing violin, driving or swimming, a set of neurons is used to execute the instructions and when done repeatedly, they are kind of hardwired to perform that job. That's why when we drive or walk, we actually don't think consciously but our subconscious mind executes most of the tasks to get the job done. It's like task is hardwired in our brain neurons. How about we design our computer memory and processors' transistors to act similarly. That would make a computer very much efficient and faster in processing. It was not practical at the early age of computers due to the cost of memory and processing units. As the memory is getting cheaper and cheaper, and the microprocessors are cramming double amount of transistors in every eighteen months, the execution of a software can now easily be allocated dedicatedly to a certain set of memories and processing units and reuse that set of memories when that particular function is executed. Currently it does similar thing in the memory when a software program is loaded but not by actually forming a physical connectivity among the memory cells and processor's transistors. This would need to create kind of physical/pseudo physical connectivity among those memory chips and processors. In this way, the hardware would behave like software in physical form. There could be so much optimization to efficiently utilize the hardware.