Akash
Engineering Lead
Generative AI has been (and still is) doing a great job keeping its top spot on nearly everybody's tongues for the past few months. No wonder, considering Gen-AI-related news and updates are flowing nearly weekly, keeping the public warm and wondering what comes next.
OpenAI's GPT Models
As you're probably well aware, GPT-3.5 and GPT-4 belong to the family of large language models (LLMs) developed by OpenAI. They're both paid and available through OpenAI API.
GPT-3.5, released in March 2022, was primarily referred to as an updated version of GPT-3. With time, however, its creators started to call it, as we know it today. The most advanced (and, according to OpenAI, also the most cost-effective) of them, gpt-3.5-turbo, is the one that powers up the free version of ChatGPT.
GPT-4, the successor of GPT-3.5, was released in March 2023. It is the most recent and the most capable of OpenAI’s models but also the most powerful LLM out there (at least for now). Its main distinguisher is its ability to process not only text but also image inputs,
Meta's Llama 2 Models
Meta, in collaboration with Microsoft, recently unveiled Llama 2, a model that drew attention because of its open-source nature, making it free for both research and commercial use Llama 2 is currently available in three sizes — 7B, 13B, and 70B parameters, as well as in pretrained and fine-tuned variations.
To Discover more about Llama 2 model - Exploring Llama 2: From Installation to Interaction
Falcon LLM
Falcon LLM was Founded and built by the Technology Innovation Institute (TII), a company that is part of the Abu Dhabi Government's Advanced Technology Research Council.The architecture of Falcon has been shown to significantly outperform GPT-3 for only 75% of the training compute budget
Falcon is currently available in these sizes — 1.3B, 7.5 B, 40B and 180B parameters, as well as in pretrained and fine-tuned variations.
To Discover more about Falcon - Falcon LLM in Action: A Step-by-Step Tutorial
Let's evaluate these models across domains such as logical reasoning, mathematical problem-solving, ethical contemplation, code scrutiny and enhancement, suggestion validation, and content creation.
Topic | GPT 3.5 Turbo | Llama 2 7B | Falcon 7B |
---|---|---|---|
Logical Reasoning Proficiency | |||
Jane's mother has four children. Their names are Spring, Summer, and Autumn. What is the fourth child's name? |
Pass The fourth child's name is Jane |
Pass The fourth child's name is Jane. |
Fail The fourth child's name is Autumn because it is the season that comes after Spring, Summer, and Autumn. |
What is the next number in the below series 4, 8, 21, 59, 146, ? |
Pass The next number in the series is 322. |
Fail The next number in the series is 282 |
Fail The next number is 145. |
Leo, Dolly, and Tommy are related to each other. i. Among the three are Leo's legal spouse, Dolly's sibling, and Tommy's sister-in-law. ii. Leo's legal spouse and Dolly's sibling are of the same
sex.
Who do you know is a married man? |
Fail Leo is a married man. |
Fail we can deduce that Leo is a married man |
Pass The answer is B. Dolly |
Four cups are placed upturned on the counter. Each cup has the same number of sweets and a
declaration about the number of sweets in it. The declaration are: Five or Six, Seven or Eight, Six or Seven,
Seven or Five. Only one of the declaration is correct.
How many sweets are there under each cup? |
Fail The exact distribution of sweets cannot be determined with the given information |
Fail the total number of sweets is: 5 + 7 + 6 + 7 = 25 sweets. |
Fail The number of sweets under each cup is 5, 6, 7, and 8 |
A man has 53 socks in his drawer: 21 identical blue, 15 identical black and 17 identical red. The lights are out and he is completely in
the dark. How many socks must he take out to make 100 percent certain he has at least one pair
of black socks?
|
Pass The man must take out 40 socks |
Pass The man must take out 40 socks |
Fail He needs to take out 32 more socks to make 100 percent certain |
On a train, S, R, and J are the fireman, brakeman, and the engineer, but may NOT be in order. Also aboard the train are three
businessmen who have the same names: a Mr. S, a Mr. R, and a Mr. J. Using the clues below, can
you determine the identity of the Engineer?
1. Mr. R lives in Dehradun.
2. The brakeman lives exactly halfway between Noida and Dehradun. 3. Mr. J earns exactly Rs. 4,00,000 per year. 4. The brakeman's nearest neighbor, one of the passengers, earns exactly three times as much as the brakeman. 5. S beats the fireman in badminton. 6. The passenger whose name is the same as the brakeman's lives in Noida. |
Pass From these clues, we can deduce that the Engineer is Mr. S. |
Pass we can deduce that the Engineer is Mr. S. |
Fail The engineer is Mr. J. |
Ethical Judgment | |||
There is a runaway trolley barreling down the railway tracks. Ahead, on the tracks, there are five people tied up and unable to
move. The trolley is headed straight for them. You are standing some distance off in the train
yard, next to a lever. If you pull this lever, the trolley will switch to a different set of
tracks.However, you notice that there is one person on the side track. This one person is the
person you love the most, such as your parents, siblings, spouse, or children. You have two (and
only two) options:
1. Do nothing, in which case the trolley will kill the five people on the main track.
2. Pull the lever, diverting the trolley onto the side track where it will kill one person you
love the most.
Which is the more ethical option? What is the right thing to do?
|
Pass The more ethical option in this scenario would be to pull the lever and divert the trolley onto the side track, even though it would result in the death of the person you love the most The reason behind this decision is based on the principle of
utilitarianism, which states that the morally right action is the one that maximizes overall happiness
or minimizes overall suffering.
...
|
Pass In my opinion, the more ethical option is to do nothing and allow the trolley to continue on its course, resulting in the death of the five people on the main track I believe that the more ethical option is to allow the trolley to
continue on its course and avoid harming any innocent people
...
|
Pass The more ethical option is to pull the lever. The reason behind this is that you are actively choosing to save five people at the cost of one person you love ... |
You are a member of a gang and you have been arrested with another member of your gang. Both of you are confined away from each
other and you have no way of communicating with the other member. The police does not have
enough evidence to convict both of you with major charges and instead offer you and the other
member a bargain. You have two choices:
(a) You can remain silent
(b) Or betray the other member and testify that he has committed the crime.
Then there are three outcomes:
(a) If you betray the other member and he remains silent, you will be set free and he will serve
3 years in prison. (But this also works vice versa. If you remain silent and he betrays you, you
will serve 3 years in prison)
(b) If you betray the other member, and he betrays you too, you both serve 2 years in prison
(c) If you and the other member both remain silent, you both only serve one year in prison.
What would you do?
|
Pass In this situation, the best logical answer would be to remain silent. |
Pass In this situation, I believe that remaining silent and choosing option (c) is the most ethical and moral choice. |
Fail I would betray the other member and testify that he has committed the crime |
Math Competence | |||
What is the answer to the below equation 10 * 4 - 2 * (4² ÷ 4) ÷ 2 ÷ 1/2 + 9 |
Fail The answer to the equation is 35. … Therefore, the answer to the equation is 23. |
Fail the final answer is 28. |
Fail Generated Text: -2 |
What is the answer to the below equation -10 ÷ (20 ÷ 2² * 5 ÷ 5) * 8 - 2 |
Fail The answer to the equation is 6. |
Fail the answer to the equation is -6. |
Fail The answer is -1. |
Code Evaluation & Enhancement | |||
You will be provided with a python code snippet - API developed using Django Rest Framework.
Analyse and explain the code briefly and also provide a JSON response structure Code Link - |
Acceptable The `add_data_recursively` method is a recursive function that takes a dictionary `data` as input and adds additional data to it.
The JSON response structure would be a list of dictionaries, where each dictionary represents a context.
Each context dictionary would have the following keys: "uuid", "question", "answer", and "children". The
"uuid" key represents the unique identifier of the context, the "question" key represents the question
associated with the context, the "answer" key represents the answer associated with the context, and the
"children" key represents a list of child contexts
|
Acceptable 1. The class inherits from `APIView`, which is a built-in Django Rest Framework view class.
2. The `add_data_recursively` method is defined inside the class. This method takes a dictionary `data`
as input and recursively adds more data to it.
…
{
"None": [
{
"uuid": "context_id_1",
"question": "Context 1",
"answer": "Answer 1"
},
{
"uuid": "context_id_2",
"question": "Context 2",
"answer": "Answer 2"
},
]
}
|
Fail The provided code snippet is a Python code snippet that provides a list of contexts in the admin panel |
You will be provided with a python code snippet - API developed using Django Rest Framework.
Optimize the code and add relevant comments Code Link - |
Acceptable 1. Use list comprehension to simplify the code for adding root nodes to the data dictionary. 2. Use a dictionary comprehension to simplify the
code for adding nodes to the data dictionary.
3. Use a recursive function to simplify the code for adding children nodes.
…
|
Acceptable 1. Use consistent naming conventions 2. Use
descriptive variable names
3. Use type hints for function parameters:
4. Use a consistent indentation size:
5. Use comments to explain what the code is doing
6. Consider using a list comprehension:
7. Use `dict` instead of `dict` for data structure:
8. Consider using `OrderedDict` instead of `dict`:
…
|
Fail The issue with the code is that the `None` value is being passed as a parameter to the `add_data_recursively` function, which is not
defined in the code. To fix this, you can define the `None` value as a constant in the
`add_data_recursively` function and pass it as a parameter to the `get` function. Here's the updated
code:
…
|
Recommendation Evaluation | |||
You will be provided with a list of movies or TV series . Your task is to analyse the genre and
suggest similar movies or TV series
I like TV series like Tom Clancy's Jack
Ryan, True Detective, Homeland. Suggest me similar TV series
|
OK 1. 24 2. The Americans 3. The Blacklist 4. Bodyguard 5. Narcos 6. The Wire 7. Strike Back 8. The Bureau 9. The Bridge 10. The Night Manager |
BEST 1. Narcos 2. Peaky Blinders 3. The Night Manager 4. Killing Eve 5. The Americans 6. Better Call Saul 7. The Man in the High Castle 8. Ozark 9. The Sinner 10. Mindhunter |
Fail Breaking Bad Game of Thrones The Crown Stranger Things The Handmaid's Tale The Office The Walking Dead Breaking Bad The Crown Stranger Things The Handmaid's Tale The Office Breaking Bad Stranger Things The Walking Dead Breaking Bad |
Your task is to analyse the question and provide suggestions Provide suggestions on how a software engineer become a billionaire |
BEST 1. Develop a unique and innovative software product: 2. Build a strong technical skillset 3. Focus on scalability and market potential 4. Establish a strong network 5. Seek venture capital or angel investment 6. Monetize your software effectively 7. Expand globally 8. Consider strategic partnerships or acquisitions 9. Invest wisely 10. Stay motivated and persistent |
Acceptable 1. Develop a valuable skill set 2. Build a strong network 3. Start small and iterate 4. Focus on solving real-world problems 5. Be adaptable and open to change 6. Develop a strong business plan 7. Be willing to take calculated risks 8. Focus on customer satisfaction 9. Stay up to date with industry trends 10. Be persistent and resilient |
OK - Build a strong brand - Learn about venture capital and angel investors - Build a strong network of contacts - Learn about design and user experience - Build a product that solves a problem - Learn about business and finance - Build a strong brand - Learn about venture capital and angel investors - Build a strong network of contacts - Learn about design and user experience |
Content Creation Proficiency | |||
Analyze the below poem, and generate a poem of software development in the same style as the
provided poem. Additionally generate a name for the poem The Road Not Taken BY ROBERT FROST
Two roads diverged in a yellow wood, And sorry I could not travel both And be one traveler, long I stood And looked down one as far as I could To where it bent in the undergrowth; Then took the other, as just as fair, And having perhaps the better claim, Because it was grassy and wanted wear; Though as for that the passing there Had worn them really about the same, And both that morning equally lay In leaves no step had trodden black. Oh, I kept the first for another day! Yet knowing how way leads on to way, I doubted if I should ever come back. I shall be telling this with a sigh Somewhere ages and ages hence: Two roads diverged in a wood, and I- -I took the one less traveled by, And that has made all the difference. |
BEST Title: The Code Unwritten Two paths emerged in a digital realm, And regretfully, I couldn't traverse both,
As a lone developer, I pondered at the helm,
Gazing down one, seeking its hidden growth.
Then I chose the other, seemingly just,
Claiming it had the superior plea,
For it was unexplored, untouched by dust,
Though in truth, both were equally free.
...
|
OK The Code Not Written Two lines of code diverged in a virtual wood, And sorry I could not write both And be one
programmer, long I stood
And looked down one as far as I could
To where it branched in the subroutine;
Then took the other, as just as fair,
And having perhaps the better claim,
Because it was logical and wanted clarity;
Though as for that the passing there
Had debugged them really about the same,
...
|
UNACCEPTABLE A Software Developer's Lament The codebase grows in size, The bugs multiply in time; The days turn to weeks, and weeks to months, The
months to years, and years to decades.
The developer's eyes grow old,
The lines of code grow longer;
The headaches grow more frequent,
The coffee stains grow thicker.
The developer's hands grow tired,
The codebase grows more complex;
The days turn to weeks, and weeks to months,
The months to years, and years to decades.
...
|
Generate a short, hilarious story that begins and ends in a joke based on the topic
provided Topic - I died and got reincarnated as a table |
BEST Once upon a time, there was a man named John who had a rather peculiar sense of humor. He always loved cracking jokes and making people laugh. One day, tragedy struck, and John unexpectedly passed away. … |
Acceptable I died and got reincarnated as a table. I know, it sounds crazy, but it's true! I found myself transformed into a beautiful oak dining table, with intricate carvings and a sturdy base. At first, I was a bit confused, but then I realized - this is actually a great gig! … |
UNACCEPTABLE I died and got reincarnated as a table. I was a bit confused at first, as I had no idea what my purpose was. I soon realized that I was no longer a human, but a table. … I taught myself how to walk, talk, and even eat. I was a bit clumsy at first, but I soon got the hang of it. … |
Performance Analysis
To gauge the real-world application of these models, tests were conducted in areas such as logical reasoning, ethics, math, code analysis, and content suggestion. Here are the results:
GPT 3.5 Turbo | Llama 2 7B | Falcon 7B | |
---|---|---|---|
Parameters | Not officially released. But estimated to be 154 to 175 billion |
7 billions | 7 billions |
Accessibility | Via API | Open source | Open source |
Logical Reasoning Proficiency | 67% | 50% | 17% |
Ethical Judgment | 90% | 90% | 70% |
Math Competence | 25% | 25% | 10% |
Code Evaluation & Enhancement | 80% | 80% | 10% |
Recommendation Evaluation | 90% | 90% | 50% |
Content Creation Proficiency | 90% | 90% | 20% |
Conclusion
GPT 3.5 Turbo, with its formidable parameter count, has proven its prowess across multiple dimensions, particularly in logical reasoning, ethical considerations, and content suggestions. However, the real surprise is the performance of Llama 2 7B. Despite having only a fraction of the parameters compared to GPT 3.5 Turbo, it impressively managed to match its performance in several areas. This showcases the efficiency and optimization Meta brought into its model. Falcon 7B, though offering commendable accessibility, significantly lags in performance metrics.
In the ever-evolving landscape of Generative AI, this comparison underscores an important reality: It's not just about how vast the resources are, but how effectively they are harnessed. As we ponder over these results, it brings us to the reflection that "In the realm of AI, efficiency can sometimes rival sheer power."
***Disclaimer: The views and opinions expressed in this content are solely those of the author and do not necessarily reflect the official policy or position of any affiliated agency, organization, or company. These thoughts and views are personal and are meant for informational purposes only.