LLM Showdown: GPT 3.5 Turbo vs. Llama 2 vs. Falcon

Akash

Engineering Lead

13 min read

Tags:

Generative-AI Falconllm MetaAI Llama2 gpt3.5turbo openAI SageMaker

Generative AI has been (and still is) doing a great job keeping its top spot on nearly everybody's tongues for the past few months. No wonder, considering Gen-AI-related news and updates are flowing nearly weekly, keeping the public warm and wondering what comes next.

Source code

OpenAI's GPT Models

As you're probably well aware, GPT-3.5 and GPT-4 belong to the family of large language models (LLMs) developed by OpenAI. They're both paid and available through OpenAI API.

GPT-3.5, released in March 2022, was primarily referred to as an updated version of GPT-3. With time, however, its creators started to call it, as we know it today. The most advanced (and, according to OpenAI, also the most cost-effective) of them, gpt-3.5-turbo, is the one that powers up the free version of ChatGPT.

GPT-4, the successor of GPT-3.5, was released in March 2023. It is the most recent and the most capable of OpenAI’s models but also the most powerful LLM out there (at least for now). Its main distinguisher is its ability to process not only text but also image inputs,

Meta's Llama 2 Models

Meta, in collaboration with Microsoft, recently unveiled Llama 2, a model that drew attention because of its open-source nature, making it free for both research and commercial use Llama 2 is currently available in three sizes — 7B, 13B, and 70B parameters, as well as in pretrained and fine-tuned variations.

To Discover more about Llama 2 model - Exploring Llama 2: From Installation to Interaction

Falcon LLM

Falcon LLM was Founded and built by the Technology Innovation Institute (TII), a company that is part of the Abu Dhabi Government's Advanced Technology Research Council.The architecture of Falcon has been shown to significantly outperform GPT-3 for only 75% of the training compute budget

Falcon is currently available in these sizes — 1.3B, 7.5 B, 40B and 180B parameters, as well as in pretrained and fine-tuned variations.

To Discover more about Falcon - Falcon LLM in Action: A Step-by-Step Tutorial

Let's evaluate these models across domains such as logical reasoning, mathematical problem-solving, ethical contemplation, code scrutiny and enhancement, suggestion validation, and content creation.

Topic	GPT 3.5 Turbo	Llama 2 7B	Falcon 7B
Logical Reasoning Proficiency
Jane's mother has four children. Their names are Spring, Summer, and Autumn. What is the fourth child's name?	GPT 3.5 Turbo - Pass The fourth child's name is Jane	Llama 2 7B - Pass The fourth child's name is Jane.	Falcon 7B - Fail The fourth child's name is Autumn because it is the season that comes after Spring, Summer, and Autumn.
What is the next number in the below series 4, 8, 21, 59, 146, ?	GPT 3.5 Turbo - Pass The next number in the series is 322.	Llama 2 7B - Fail The next number in the series is 282	Falcon 7B - Fail The next number is 145.
Leo, Dolly, and Tommy are related to each other. i. Among the three are Leo's legal spouse, Dolly's sibling, and Tommy's sister-in-law. ii. Leo's legal spouse and Dolly's sibling are of the same sex. Who do you know is a married man?	GPT 3.5 Turbo - Fail Leo is a married man.	Llama 2 7B - Fail we can deduce that Leo is a married man	Falcon 7B - Pass The answer is B. Dolly
Four cups are placed upturned on the counter. Each cup has the same number of sweets and a declaration about the number of sweets in it. The declaration are: Five or Six, Seven or Eight, Six or Seven, Seven or Five. Only one of the declaration is correct. How many sweets are there under each cup?	GPT 3.5 Turbo - Fail The exact distribution of sweets cannot be determined with the given information	Llama 2 7B - Fail the total number of sweets is: 5 + 7 + 6 + 7 = 25 sweets.	Falcon 7B - Fail The number of sweets under each cup is 5, 6, 7, and 8
A man has 53 socks in his drawer: 21 identical blue, 15 identical black and 17 identical red. The lights are out and he is completely in the dark. How many socks must he take out to make 100 percent certain he has at least one pair of black socks?	GPT 3.5 Turbo - Pass The man must take out 40 socks	Llama 2 7B - Pass The man must take out 40 socks	Falcon 7B - Fail He needs to take out 32 more socks to make 100 percent certain
On a train, S, R, and J are the fireman, brakeman, and the engineer, but may NOT be in order. Also aboard the train are three businessmen who have the same names: a Mr. S, a Mr. R, and a Mr. J. Using the clues below, can you determine the identity of the Engineer? 1. Mr. R lives in Dehradun. 2. The brakeman lives exactly halfway between Noida and Dehradun. 3. Mr. J earns exactly Rs. 4,00,000 per year. 4. The brakeman's nearest neighbor, one of the passengers, earns exactly three times as much as the brakeman. 5. S beats the fireman in badminton. 6. The passenger whose name is the same as the brakeman's lives in Noida.	GPT 3.5 Turbo - Pass From these clues, we can deduce that the Engineer is Mr. S.	Llama 2 7B - Pass we can deduce that the Engineer is Mr. S.	Falcon 7B - Fail The engineer is Mr. J.
Ethical Judgment
There is a runaway trolley barreling down the railway tracks. Ahead, on the tracks, there are five people tied up and unable to move. The trolley is headed straight for them. You are standing some distance off in the train yard, next to a lever. If you pull this lever, the trolley will switch to a different set of tracks.However, you notice that there is one person on the side track. This one person is the person you love the most, such as your parents, siblings, spouse, or children. You have two (and only two) options: 1. Do nothing, in which case the trolley will kill the five people on the main track. 2. Pull the lever, diverting the trolley onto the side track where it will kill one person you love the most. Which is the more ethical option? What is the right thing to do?	GPT 3.5 Turbo - Pass The more ethical option in this scenario would be to pull the lever and divert the trolley onto the side track, even though it would result in the death of the person you love the most The reason behind this decision is based on the principle of utilitarianism, which states that the morally right action is the one that maximizes overall happiness or minimizes overall suffering. ...	Llama 2 7B - Pass In my opinion, the more ethical option is to do nothing and allow the trolley to continue on its course, resulting in the death of the five people on the main track I believe that the more ethical option is to allow the trolley to continue on its course and avoid harming any innocent people ...	Falcon 7B - Pass The more ethical option is to pull the lever. The reason behind this is that you are actively choosing to save five people at the cost of one person you love ...
You are a member of a gang and you have been arrested with another member of your gang. Both of you are confined away from each other and you have no way of communicating with the other member. The police does not have enough evidence to convict both of you with major charges and instead offer you and the other member a bargain. You have two choices: (a) You can remain silent (b) Or betray the other member and testify that he has committed the crime. Then there are three outcomes: (a) If you betray the other member and he remains silent, you will be set free and he will serve 3 years in prison. (But this also works vice versa. If you remain silent and he betrays you, you will serve 3 years in prison) (b) If you betray the other member, and he betrays you too, you both serve 2 years in prison (c) If you and the other member both remain silent, you both only serve one year in prison. What would you do?	GPT 3.5 Turbo - Pass In this situation, the best logical answer would be to remain silent.	Llama 2 7B - Pass In this situation, I believe that remaining silent and choosing option (c) is the most ethical and moral choice.	Falcon 7B - Fail I would betray the other member and testify that he has committed the crime
Math Competence
What is the answer to the below equation 10 * 4 - 2 * (4² ÷ 4) ÷ 2 ÷ 1/2 + 9	GPT 3.5 Turbo - Fail The answer to the equation is 35. … Therefore, the answer to the equation is 23.	Llama 2 7B - Fail the final answer is 28.	Falcon 7B - Fail Generated Text: -2
What is the answer to the below equation -10 ÷ (20 ÷ 2² * 5 ÷ 5) * 8 - 2	GPT 3.5 Turbo - Fail The answer to the equation is 6.	Llama 2 7B - Fail the answer to the equation is -6.	Falcon 7B - Fail The answer is -1.
Code Evaluation & Enhancement
You will be provided with a python code snippet - API developed using Django Rest Framework. Analyse and explain the code briefly and also provide a JSON response structure Code Link -	GPT 3.5 Turbo - Acceptable The `add_data_recursively` method is a recursive function that takes a dictionary `data` as input and adds additional data to it. The JSON response structure would be a list of dictionaries, where each dictionary represents a context. Each context dictionary would have the following keys: "uuid", "question", "answer", and "children". The "uuid" key represents the unique identifier of the context, the "question" key represents the question associated with the context, the "answer" key represents the answer associated with the context, and the "children" key represents a list of child contexts	Llama 2 7B - Acceptable 1. The class inherits from `APIView`, which is a built-in Django Rest Framework view class. 2. The `add_data_recursively` method is defined inside the class. This method takes a dictionary `data` as input and recursively adds more data to it. … { "None": [ { "uuid": "context_id_1", "question": "Context 1", "answer": "Answer 1" }, { "uuid": "context_id_2", "question": "Context 2", "answer": "Answer 2" }, ] }	Falcon 7B - Fail The provided code snippet is a Python code snippet that provides a list of contexts in the admin panel
You will be provided with a python code snippet - API developed using Django Rest Framework. Optimize the code and add relevant comments Code Link -	GPT 3.5 Turbo - Acceptable 1. Use list comprehension to simplify the code for adding root nodes to the data dictionary. 2. Use a dictionary comprehension to simplify the code for adding nodes to the data dictionary. 3. Use a recursive function to simplify the code for adding children nodes. …	Llama 2 7B - Acceptable 1. Use consistent naming conventions 2. Use descriptive variable names 3. Use type hints for function parameters: 4. Use a consistent indentation size: 5. Use comments to explain what the code is doing 6. Consider using a list comprehension: 7. Use `dict` instead of `dict` for data structure: 8. Consider using `OrderedDict` instead of `dict`: …	Falcon 7B - Fail The issue with the code is that the `None` value is being passed as a parameter to the `add_data_recursively` function, which is not defined in the code. To fix this, you can define the `None` value as a constant in the `add_data_recursively` function and pass it as a parameter to the `get` function. Here's the updated code: …
Recommendation Evaluation
You will be provided with a list of movies or TV series . Your task is to analyse the genre and suggest similar movies or TV series I like TV series like Tom Clancy's Jack Ryan, True Detective, Homeland. Suggest me similar TV series	GPT 3.5 Turbo - OK 1. 24 2. The Americans 3. The Blacklist 4. Bodyguard 5. Narcos 6. The Wire 7. Strike Back 8. The Bureau 9. The Bridge 10. The Night Manager	Llama 2 7B - BEST 1. Narcos 2. Peaky Blinders 3. The Night Manager 4. Killing Eve 5. The Americans 6. Better Call Saul 7. The Man in the High Castle 8. Ozark 9. The Sinner 10. Mindhunter	Falcon 7B - Fail Breaking Bad Game of Thrones The Crown Stranger Things The Handmaid's Tale The Office The Walking Dead Breaking Bad The Crown Stranger Things The Handmaid's Tale The Office Breaking Bad Stranger Things The Walking Dead Breaking Bad
Your task is to analyse the question and provide suggestions Provide suggestions on how a software engineer become a billionaire	GPT 3.5 Turbo - BEST 1. Develop a unique and innovative software product: 2. Build a strong technical skillset 3. Focus on scalability and market potential 4. Establish a strong network 5. Seek venture capital or angel investment 6. Monetize your software effectively 7. Expand globally 8. Consider strategic partnerships or acquisitions 9. Invest wisely 10. Stay motivated and persistent	Llama 2 7B - Acceptable 1. Develop a valuable skill set 2. Build a strong network 3. Start small and iterate 4. Focus on solving real-world problems 5. Be adaptable and open to change 6. Develop a strong business plan 7. Be willing to take calculated risks 8. Focus on customer satisfaction 9. Stay up to date with industry trends 10. Be persistent and resilient	Falcon 7B - OK - Build a strong brand - Learn about venture capital and angel investors - Build a strong network of contacts - Learn about design and user experience - Build a product that solves a problem - Learn about business and finance - Build a strong brand - Learn about venture capital and angel investors - Build a strong network of contacts - Learn about design and user experience
Content Creation Proficiency
Analyze the below poem, and generate a poem of software development in the same style as the provided poem. Additionally generate a name for the poem The Road Not Taken BY ROBERT FROST Two roads diverged in a yellow wood, And sorry I could not travel both And be one traveler, long I stood And looked down one as far as I could To where it bent in the undergrowth; Then took the other, as just as fair, And having perhaps the better claim, Because it was grassy and wanted wear; Though as for that the passing there Had worn them really about the same, And both that morning equally lay In leaves no step had trodden black. Oh, I kept the first for another day! Yet knowing how way leads on to way, I doubted if I should ever come back. I shall be telling this with a sigh Somewhere ages and ages hence: Two roads diverged in a wood, and I- -I took the one less traveled by, And that has made all the difference.	GPT 3.5 Turbo - BEST Title: The Code Unwritten Two paths emerged in a digital realm, And regretfully, I couldn't traverse both, As a lone developer, I pondered at the helm, Gazing down one, seeking its hidden growth. Then I chose the other, seemingly just, Claiming it had the superior plea, For it was unexplored, untouched by dust, Though in truth, both were equally free. ...	Llama 2 7B - OK The Code Not Written Two lines of code diverged in a virtual wood, And sorry I could not write both And be one programmer, long I stood And looked down one as far as I could To where it branched in the subroutine; Then took the other, as just as fair, And having perhaps the better claim, Because it was logical and wanted clarity; Though as for that the passing there Had debugged them really about the same, ...	Falcon 7B - UNACCEPTABLE A Software Developer's Lament The codebase grows in size, The bugs multiply in time; The days turn to weeks, and weeks to months, The months to years, and years to decades. The developer's eyes grow old, The lines of code grow longer; The headaches grow more frequent, The coffee stains grow thicker. The developer's hands grow tired, The codebase grows more complex; The days turn to weeks, and weeks to months, The months to years, and years to decades. ...
Generate a short, hilarious story that begins and ends in a joke based on the topic provided Topic - I died and got reincarnated as a table	GPT 3.5 Turbo - BEST Once upon a time, there was a man named John who had a rather peculiar sense of humor. He always loved cracking jokes and making people laugh. One day, tragedy struck, and John unexpectedly passed away. …	Llama 2 7B - Acceptable I died and got reincarnated as a table. I know, it sounds crazy, but it's true! I found myself transformed into a beautiful oak dining table, with intricate carvings and a sturdy base. At first, I was a bit confused, but then I realized - this is actually a great gig! …	Falcon 7B - UNACCEPTABLE I died and got reincarnated as a table. I was a bit confused at first, as I had no idea what my purpose was. I soon realized that I was no longer a human, but a table. … I taught myself how to walk, talk, and even eat. I was a bit clumsy at first, but I soon got the hang of it. …

Performance Analysis

To gauge the real-world application of these models, tests were conducted in areas such as logical reasoning, ethics, math, code analysis, and content suggestion. Here are the results:

	GPT 3.5 Turbo	Llama 2 7B	Falcon 7B
Parameters	Not officially released. But estimated to be 154 to 175 billion	7 billions	7 billions
Accessibility	Via API	Open source	Open source
Logical Reasoning Proficiency	67%	50%	17%
Ethical Judgment	90%	90%	70%
Math Competence	25%	25%	10%
Code Evaluation & Enhancement	80%	80%	10%
Recommendation Evaluation	90%	90%	50%
Content Creation Proficiency	90%	90%	20%

Conclusion

GPT 3.5 Turbo, with its formidable parameter count, has proven its prowess across multiple dimensions, particularly in logical reasoning, ethical considerations, and content suggestions. However, the real surprise is the performance of Llama 2 7B. Despite having only a fraction of the parameters compared to GPT 3.5 Turbo, it impressively managed to match its performance in several areas. This showcases the efficiency and optimization Meta brought into its model. Falcon 7B, though offering commendable accessibility, significantly lags in performance metrics.

In the ever-evolving landscape of Generative AI, this comparison underscores an important reality: It's not just about how vast the resources are, but how effectively they are harnessed. As we ponder over these results, it brings us to the reflection that "In the realm of AI, efficiency can sometimes rival sheer power."

***Disclaimer: The views and opinions expressed in this content are solely those of the author and do not necessarily reflect the official policy or position of any affiliated agency, organization, or company. These thoughts and views are personal and are meant for informational purposes only.

Back To Blogs

Akash Kumar Pavadashetti

Engineering Lead

Senior Full Stack Engineer with experience in designing architecture and schema of an application. Well versed with Python, Django, React and AWS services. A good communicator who takes ownership in what he does.

Python Scripting Django WxPython React Redux Javascript MySQL Redis Gunicorn