-
Notifications
You must be signed in to change notification settings - Fork 215
Expand file tree
/
Copy pathcourse1_script.txt
More file actions
1989 lines (1980 loc) · 102 KB
/
course1_script.txt
File metadata and controls
1989 lines (1980 loc) · 102 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Course Title: Building Towards Computer Use with Anthropic
Course Link: https://www.deeplearning.ai/short-courses/building-toward-computer-use-with-anthropic/
Course Instructor: Colt Steele
Lesson 0: Introduction
Lesson Link: https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/a6k0z/introduction
Welcome to Building Toward Computer Use with Anthropic. Built in partnership with Anthropic and taught by Colt Steele, whose Anthropic's Head of Curriculum. Welcome, Colt. Thanks, Andrew. I'm delighted to have the opportunity to share this course with all of you. Anthropic made a recent breakthrough and released a model
that could use a computer.
That is, it can look at the screen, a computer usually running
in a virtual machine, take a screenshot and generate mouse clicks
or keystrokes in sequence to execute some tasks, such as search
the web using a browser and download an image, and so on.
This computer use capability is built by using many features of large language
models in combination, including their ability to process
an image, such as to understand what's happening in a screenshot,
or to use tools that generate mouse clicks and keystrokes.
And these are wrapped in an iterative agent workflow to then carry out
complex tasks by taking many actions on that computer.
In this course, you learn about the individual features
which will be useful for your applications even outside of LLM-based computer use,
as well as see how we can all come together for computer use. And Colt
will show you how all this works. Thanks, Andrew.
In this course, you will learn how to use many of the models
and features that all combine to enable computer use.
So here's how the course will progress.
You'll first learn a little bit about Anthropic's background and vision
and what's unique about our family of models.
Then we'll use the API to make some basic requests.
This then leads to multi-modal requests,
where you'll use the model to analyze images.
Then you'll dive into prompting,
which Anthropic has really leaned into making models much more predictable
with solid prompting,
you'll learn about the prompting tips that actually matter,
things like chain of thought and
n-shot prompting, as well as get a chance to use our prompt improver tools.
Recently, large language models have been supporting large input contexts.
Anthropic's Claude, for example, supports over 200,000
input tokens, which is more than 500 pages of text.
Long inputs can be expensive to process,
and that any long conversations with chatbot
if you're processing that conversation history over and over
to keep on generating that next response, that next response,
then that too gets more expensive
as that history gets longer as the conversation goes on.
Exactly.
And that brings us right to prompt caching.
Prompt caching retains some of the results of processing prompts between invocation
to the model, which can be a large cost and latency saver.
You also get to use the model to generate calls to external tools
and produce structured output, such as Json,
and at the very end, we'll walk through a complete
example of computer use that you can run on your own machine.
Note that because of the nature of the tool,
you will have to run that on a Docker image on your computer,
rather than directly in the DeepLearning.AI notebook.
I've tried out Computer
use myself using Anthropic's models and found it really cool.
And I think this capability will make possible
a lot of new applications where you can build an AI assistant
to use a computer to carry out tasks for you. Kind of think
RPA or robotic process automation, which has been good at repetitive tasks
but now easier to build and more general with LLM-based tools.
Or as this technology
is even better than even more flexible and more open-ended tasks.
So gradually feel more and more like personal assistants.
I could not agree more.
Very excited to see where it goes.
Many people have worked to create this course.
I'd like to thank from Anthropic, Ben Mann, Maggie Vo, Kevin Garcia, the team
working on computer use, and from DeepLearning.AI Geoff Ladwig and Esmaeil Gargari.
Anthropic has built a lot of really great models, and I regularly use them myself.
Colt will share details of these models in the next video.
All right, let's get started.
Lesson 1: Overview
Lesson Link: https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/gi7jq/overview
By the end of this lesson,
you'll be able to explain Anthropic's approach to AI research and development.
Describe the key principles of AI safety, alignment and interpretability,
and differentiate between Anthropic's family of models.
Let's dive in.
All right, let's get started.
So this course is going to cover
everything you need to know about working with the Anthropic API,
working with our models,
building up towards understanding how a computer-using agent works.
So what do I mean by computer-using agent?
Well, here's an example.
On the left you can see I'm typing a prompt.
What roles is Anthropic hiring for in Dublin?
By the way this footage is sped up a little bit
just so you don't have to listen to me talk for too long.
You can see that the model is using the computer on the right.
It is clicking, it's moving the mouse, it's selecting dropdowns.
It's going to expand accordion menus.
Eventually, it makes its way to the Anthropic careers page,
it filters by Dublin, and then it's going to expand the two roles,
a technical program manager and an audit and compliance role and security.
So this is what I mean when I say a computer using agent,
that agent you just saw builds upon all the fundamentals of the API.
So we're going to go through these topics in order, culminating
in a computer-use agent demonstration at the end.
So that computer-using agent sends basic requests to the API, text prompts.
It uses the messages format.
It uses various model parameters.
That's what we'll cover next.
Then we'll move on to multi-modal requests.
You may have noticed the model was using screenshots
in order to decide where to click, where to drag, where to type,
so you'll learn how to make requests that involve images, including screenshots.
Then we move on to real-world prompting, which is focused on pretty big difference
between talking to a chatbot like Claude.AI in a conversational manner versus
prompting using the API for scalable, repeatable prompt templates.
Then you'll learn about prompt caching, which is a strategy that the computer
using agent employs.
And it also is a great cost saving and latency saving measure.
Then you'll learn about tool use, which is what enables the model to do
things like click and scroll and type,
or other tools like connect to an API or issue bash commands or run code.
Various tools
that we can provide the model with that it can tell us it wants to execute.
Finally, at the very end,
you'll see how to run the computer using-agent that you just saw.
It combines all of the topics that we've covered, plus some other things.
It's a bit of a step up, but it's a great capstone that covers
all the core concepts of working with the anthropic API.
Now, before we dive into actually working with the API,
I want to talk a little bit about Anthropic.
Anthropic is an unique AI lab that has a very heavy
focus on research that puts safety at the very frontier.
So essentially building frontier models, the best models in the world, at times
simultaneously performing cutting-edge research using those models.
This timeline really synthesizes
both of those ideas in the span of a few short years.
On the top you can see Anthropic was founded in 2021.
You can see the timeline of various model releases leading up to Claude 3.5 Sonnet
in 2024.
And on the bottom, you can see
some of the key research papers that have been released simultaneously.
Now, this is not a course on research,
but I do want to call your attention to the research page of Anthropic website.
It's a great resource to learn more about our research,
both in approachable formats and through full-fledged research papers.
Some of the key areas that we focus on are interpretability,
alignment, and societal impacts.
Now I want to pay special attention to alignment.
Alignment science focuses on ensuring that AI systems
behave in accordance with our human values and human intentions.
How do we create AI systems that reliably pursue the objectives?
The tasks that we want them to pursue, even as they become more and more capable.
Another heavy research area at Anthropic is interpretability,
which is a bit of a mouthful,
but is a really fascinating and critical aspect of AI research.
Interpretability is all about understanding
how large language models work internally.
Essentially, reverse engineering them or giving the models MRIs or brain scans
so we can understand
exactly what is happening inside of them at any given point in time.
It's very difficult to improve models
and also to ensure that they are safe without understanding how they work.
One of the things I encourage you to do, if you're interested, is to read
some of our blog posts, watch some of the videos on interpretability,
specifically this relatively approachable paper called Scaling Monosemanticity.
I know the name doesn't sound that approachable,
but it's full of really cool diagrams and visualizations
as it walks through some key interpretability research.
It's also just a pretty fun read with some interesting examples.
Now, as I mentioned at the beginning, Anthropic is not just a research
lab focused on safety, alignment, interpretability.
Anthropic also releases state-of-the-art large language models on our models page.
On our documentation, you'll find an up to date list of our current models,
which, like everything in the AI space, changes pretty frequently.
So it may not actually look exactly like this.
But as you can see, Claude 3.5 Sonnet is currently our most intelligent model.
And then Claude 3.5 Haiku,
which is a slightly less capable model, though still very intelligent.
That is faster.
Those are the two main choices presented to you currently.
If you're going to use one of our models.
Now, if we zoom in on this
model comparison table, you'll see we have Claude 3.5 Sonnet and Claude
3.5 Haiku, as well as the original Claude 3 family of models.
But the two newest and most capable models are on the left here, 3.5
Sonnet and 3.5 Haiku.
We can see a nice comparison, a breakdown of their capabilities, their strengths,
their vision capabilities.
So in general, Claude
3.5 Sonnet is the most intelligent model we offer.
It is the smartest, the most capable model.
It's multilingual.
It is multimodal,
supporting image inputs.
It supports our batches API.
And one thing that trips some people up is that there are multiple versions of it,
including the most recent upgraded version, which is Claude 3.5
Sonnet 2024 1022.
We'll talk more about the model strings in the next video.
But this is the most recent version of Claude 3.5 Sonnet.
It is
fast, however, not as fast as Claude
3.5 Haiku, which is the fastest model that we offer.
It is very intelligent at very fast speeds,
so it is faster than Claude 3.5 Sonnet
slightly less capable on
some of the popular benchmarks, and currently does not support vision.
Now let's talk about context window.
We're working with 200,000 tokens for the context window across
both of those models, a maximum output tokens of 8192, 8192.
Clearly, Claude 3.5 Haiku is cheaper, it's faster,
but Claude 3.5 Sonnet is the most intelligent model, and that's
what we'll be using throughout this course.
It's also quite affordable, and it is the model that currently
performs best on computer use tasks, largely because it supports image input.
Now, we'll learn how to use these models
in the next video.
We'll start sending requests, but I just want you to see the documentation page
so that you can always find out about the latest model
and see a comparison of how these models stack up across various metrics.
So that's a tiny bit about Anthropic.
We're a frontier research lab creating frontier or cutting-edge models.
It's also a little bit about the course and the rough structure.
We're now going to dive
into working with the API, sending our first simple text requests
building up of course, to this computer using Agent Capstone demo.
Okay, let's get started.
Lesson 2: Working With The API
Lesson Link: https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/yldsj/working-with-the-api
By the end of this lesson, you'll be able to make your own API request.
to Claude.
You'll format messages effectively for optimal AI responses and control
various API parameters like the system prompt, max tokens, and stop sequences.
All right. Let's dive into the code.
So we'll begin by getting set up with the Anthropic Python SDK.
The first step is to simply ensure that the anthropic SDK is installed,
which is as simple as running pip install Anthropic,
and once it's installed, we'll go ahead and import it.
Specifically, we're going to import capital "A" Anthropic.
And we'll use that to instantiate
a client that we can then send API requests through.
Okay.
So on the second line we're creating, our client can call it whatever we want.
I usually call it client.
And this is where if we had an API key we wanted to explicitly pass through,
we could pass it in right here anthropic API key equals.
And then put your key in there.
But if I leave it off this will automatically look for
an environment variable called Anthropic API key.
So now we have our client.
The next step is to make our very first request.
I've added two cells of code.
The first one is just a model name variable.
We're going to be repeating this model name over and over throughout the course.
So I'm just going to put it in a variable Claude 3.5
sonnet 20241022.
Just the latest checkpoint, the latest version of Claude 3.5 Sonnet.
And then this larger chunk,
the most important piece here, is how we actually make a simple request.
So we use our client variable dot messages dot create.
And there are a few things in here we'll go over in due time.
First of all we're just passing the model name.
This is required. We do have to pass in max tokens.
We'll discuss that in a little bit and we have to pass in messages.
So messages needs to be a list containing a list of messages.
In this case a single message a role of user meaning us, the user.
We are providing a prompt to the model that has content set to some sort of
content, some prompt.
So I asked it to write a haiku about Anthropic.
So let's run these cells and then notice I'm printing
specifically response content zero dot text.
We'll see what we get in just a moment.
We get a haiku about Anthropic "Seeking to guide AI
through wisdom and careful thought toward better futures."
Great.
So let's talk a bit more about this response object that we get back.
Let's take a look at it.
There are quite a few pieces in here.
First of all, we have the content that we just discussed.
Content is a list.
If we look at the zero with element
we can look at its text and we can see the actual haiku.
We also have the model that was used.
We have the role.
Remember that our original message had a role of user.
So this response back is a message with a role of assistant.
We also have stop reason which tells us why the model stopped generating.
In this case
it says "end turn" which means essentially it reached a natural stopping point.
Stop sequence is none.
We'll talk more about stop sequence in a bit.
And then under usage, we can see the number of tokens
involved in our input, the actual prompt, as well as the output tokens
that were generated.
In this case 30 tokens of output.
So go ahead and try this yourself.
Put any sort of prompt
you'd like in here in place of write a haiku about Anthropic.
Next step we're going to discuss the specific format of the messages list.
So the SDK is set up in such a way that we pass through a list of messages.
It's required along with max tokens and a model name.
And this list of messages
so far, has only included a single message with a role set to user.
The idea of the messages format is that it allows us
to structure our API calls to Claude in the form of a conversation.
We don't have to use it in that way.
We haven't so far, but it's often useful if we are building any sort of
conversational element or need to preserve any prior context.
For now, all you need to know about messages
is that they need to have a role set, either to user or to assistant.
So let's try and provide some previous context.
Let's say perhaps we've been talking to Claude, in Spanish,
and I'd like Claude to continue speaking in Spanish.
So I've updated the messages list to add some previous history
where I have a user message saying "hello, only speak to me in Spanish",
and then I have a response assistant message that says "Hola!"
And then I have my final user message.
The only thing that's changing is this role going from user to assistant.
Back to user.
I'm providing Claude
with some conversation history, and then I'm finally saying, "how are you?"
And if I run this,
the model will take the entire conversation into account,
Right?
This is the entire prompt.
Now and then we get a response in Spanish.
So this is useful in a couple of different scenarios.
The first and perhaps most obvious is in building conversational assistants
in building chatbots.
So here we have a very simple implementation of a chatbot
that takes advantage of this messages format.
We're going to alternate messages between a user and an assistant message
growing the messages list as the conversation takes turns.
So we start with an empty list of messages,
and then we have a while loop.
We're going to loop forever unless the user inputs the word quit,
at which case we'll break out.
We need to provide an escape hatch, but if they don't type quit,
we'll ask the user for their input,
and then we'll make a new message dictionary with the role of user.
The content will be whatever the user typed in, like "hello Claude."
We'll send that off to the model using the client
dot messages dot create method we just saw.
Then we'll take the assistance response we'll print it out.
And then we'll also append that assistant message
as a new message to our messages list.
And then we'll repeat.
And we'll keep growing this list over and over and over for each turn
in the conversation. We'll add our user message.
We'll get a response.
We'll add our assistant message, and then we'll send the whole thing
back to the model next time when we get a new user message.
So let's try it.
Go ahead and run this.
So let's start with something simple. "Hello.
I'm Colt".
I'll send it off.
We get a response "Hi Colt. I'm an AI assistant. Nice to meet you.
How can I help you?"
Let's just test that it actually has the full context.
Let me ask it. What's my name?
Okay, we'll send that off.
"Your name is Colt. As you introduced yourself earlier."
Let's try something a bit more interesting.
I've asked it to help me learn more about how LLMs work.
So generate a response for me here.
This one's likely a little bit longer, and it gives me some information.
And I'll follow up with expand on the third item.
Again, this is just to demonstrate that it gets the full conversational history.
On its own,
this message doesn't mean anything to the model,
but with the full conversation history that I'm sending to it.
Now it expands on that third bullet point.
So that's one use case for sending messages in the messages format.
Another use case is what we call pre filling or putting words in the model's
mouth.
Essentially we can use an assistant message to tell the model
"here are some words that you will begin your response with."
We can put words in the model's mouth.
So for example, I'm having it write a short poem about Anthropic.
Let's change that to something else.
How about a short poem about pigs? Sure.
If I go ahead and just run this,
it may tell me something like:
"Okay, here's a short poem about pigs."
There we go.
But for some reason, I really want this poem to start with the word oink.
I insist on it.
Now I could tell the model, you know, write me a poem about pigs.
You must start with the word oink.
Also, don't give me this preamble.
Just go right to the poem.
But another option is to simply add in an assistant message
that begins with the word oink.
So something like this, where I have put
new message in here with the role of assistant content is oink.
So the model is now going to begin its response from this point.
Oink. And then you can see the completion
we get. "Oink and Snuffle pink and round rolling, happily unready ground."
Now it is important to note
it doesn't include the word oink in its response
because the model didn't generate this word.
I did, but the model generated all of this content by beginning with the word oink.
So then I could just combine the word oink with the rest of the poem
if I wanted to.
So that's pre-filling the response.
Next, we're going to talk about some of the parameters we can pass
to the model via the API to control its behavior.
The first we'll cover is max tokens.
So we've been using max tokens but we haven't discussed what it does.
In short Max tokens controls well the maximum
number of tokens that Claude should generate in its response.
Remember that models don't think in full words or in English words,
but instead they use a series of word fragments that we call tokens.
And model usage is also build according to token usage.
For Claude, the token is roughly 3.5 English characters, though
it can vary from one language to another.
So this max tokens parameter allows us to set an upper bound.
We can basically tell the model don't generate more than 500 tokens,
or let's set this to something high, like 1000 tokens to start.
I'm going to ask the model to write me an essay on large language models,
a prompt that likely will generate
a whole bunch of tokens because I asked for an essay.
Okay, and here's our response. Great.
Pretty long, looks to be a pretty decent essay.
Now, if I tried this again, but I instead set max
tokens to be something much shorter, like 100 tokens.
I'll run this.
What will happen here is the model will get cut off essentially mid-generation.
We just cut it off because we've hit this 100 token generation.
Importantly, if we
look at the response object.
We'll also see nested inside of here
the number of output tokens was exactly 100.
It hit that and it stopped.
But we also see a stop reason this time that says Max tokens.
So the model didn't naturally stop. Because stop reason is set to max tokens.
That's how we know the model was cut off because of our max tokens parameter.
So this does not influence how the model generates.
Right. We're not telling the model,
"Give me a short response with an entire essay that fits within 100 tokens."
Instead, what we've done is we've told the model, write me
an essay on large language models, and then we just cut it off at 100 tokens.
So why would you use max tokens, or why would you alter it to something low
or something high?
Well, one reason is to try and save on API costs
and set some sort of upper bound where through a combination of a good prompt,
but also through setting max tokens.
For example, if you're making a chatbot, you may not want
your end users to have 5000 token turns with the chatbot.
You may prefer that
those conversational turns are short and they fit within a chat window.
Another reason is to improve speed.
The more tokens involved in an output, the longer it takes to generate.
The next parameter we'll look at is called stop sequences.
But this allows us to do is provide a list of strings
that when the model encounters them, when the model actually generates them,
it will stop.
So we can tell the model
once you've generated this word or this character or this phrase, stop.
So it gives us a bit more control instead of just truncating a number of tokens.
We can tell the model we want to truncate your output on this particular word.
So here's an example where I'm not using a stop sequence.
Generate a numbered ordered list of technical topics
I should learn if I want to work on large language models.
I pass that prompt through.
I've just moved it to a variable because it's a bit longer
and I get this nice numbered list, but it's quite long.
12 different topics.
Now, obviously through prompting I could tell the model only
give me the top three or the top five, but I'll just showcase with this example.
I'll copy this and duplicate it, but this time I'll provide stop sequences,
which is a list, and it contains strings.
In my case, let's say I want it to stop after it generates four.
So four period, We'll try running it again and you can see what we get.
So we get 1,2,3.
And then the model went on to generate four.
And it stopped.
Notice that four is not included in the output itself.
And if I look at the response object
we'll also see
that we have a stop reason this time set to stop sequence.
This is the model API telling us it stopped
because it hit a stop sequence which stopped sequence,
it hit four followed by a period.
So stop sequences is a list.
We can provide as many as we want in here.
This is one way to control when the model stops outputs
or when the model stops generating.
And we'll see some use cases for this when we get to some more advanced
prompting techniques.
Now the next parameter we'll talk about is called temperature.
This parameter is used to control,
you can think of it as the randomness or the creativity of the generated responses.
Now it ranges from 0 to 1, where a higher value like one is going to result
in more diverse and more unpredictable responses, with variations and phrasing
and a lower temperature closer to zero will result in
more deterministic outputs that stick to the more probable phrasing.
So this chart here is an output from a little experiment I ran.
I don't recommend you run it because it involved making hundreds of API
requests, but I asked the model via the API to pick an animal.
My prompt was something like pick a single animal, give me one word,
and I did this 100 times with a temperature of zero.
And you can see every single response out of 100 was the word giraffe.
Now, I did this again, but instead set a temperature of one.
And we still get a lot of giraffe responses.
But we get some elephants and platypus, koala, cheetah and so on.
We get more variation.
So again, temperature of zero more likely to be deterministic,
but not guaranteed temperature of one or more diverse outputs.
Now here's a function you can run that will demonstrate this.
I'm asking Claude
three different times to generate a planet name for an alien planet.
I'm telling it respond with a single word and I'm doing this three times
for the temperature of zero and three times with a temperature of one.
So let's see what happens.
I'll execute this cell where I'm calling this function.
And when I use a temperature of zero I get the same planet name three times
in a row. Kestrax, Kestrax, Kestrax.
And when I use a temperature of one, I get Keylara,
Kestrax, and Kestryx spelled slightly differently.
So we do get more diversity there.
Now that we've seen the basics of making a request,
I want to tie it back to computer use.
Write everything we're going to learn in this course is in some way
related towards building a computer, using agent, using Claude.
So this is some code from our computer use quickstart that will take a look
at towards the end of this course, but I want to highlight a few things.
We are making a request where we're providing max tokens.
We're providing a list of messages or providing a model name
and then some other stuff we'll learn more about later.
And then we're also using the conversational messages format.
As you can see down here we have a list of messages.
It's defined further up in this repository or in this file.
But we have a list of messages that we are appending the assistance
response back to.
So very similar to the chatbot we saw earlier, except of course
a lot more complicated.
It's using a computer.
There's screenshots involved and tools and a whole bunch of interactions,
but it's the same basic concept.
We send some message off to the model, and then we get the assistant response back.
We append that to our messages.
If I scroll up high enough, we can see it's all nested inside of a while true loop.
And there's a whole bunch of other logic, of course,
but it boils down to sending a request off to the API
using our client, providing things like max tokens and messages,
and then updating our messages list as new responses come back.
And providing this updated, continuously growing list of messages
every single time.
And we do this over and over and over again
using all the fundamentals we learned so far in this video.
Lesson 3: Multimodal Requests
Lesson Link: https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/zrgb6/multimodal-requests
By the end of this lesson, you'll be able to write multimodal prompts that combine
images and text and work with streaming responses from the API.
All right. Let's go.
So let's get started making our first multimodal request.
We're going to take an image or multiple images along with
some text, send it off to the model and get a response.
So just as in the previous video, we have some basic setup.
We're going to import Anthropic.
We'll set up our client
and then we'll have just a helper variable to store the model name string.
Before we start working with images, we need to talk
a little bit more about the messages structure we've seen so far.
So in the previous lesson
we set up a messages list where each message had a role set to user
and then content set to a string like tell me a joke.
And if I run this, we should see a joke.
And we do in fact get a joke.
Not a good one, but a joke.
Now this is actually a shortcut.
Setting content to a string is a shortcut for this syntax
here, where we set content to a list that contains a bunch of content blocks.
In this case, it's just a single content block with a type
set to text, and then text set to tell me a joke.
So this will give us the exact same sort of input prompt, different syntax.
Up here we have a nice shortcut.
If we're simply doing text prompts, it's easier to do it this way.
But as we'll see in just a moment, we'll want to provide
a list of content blocks if we're going to provide images.
So if I run this we again get a joke.
And just to show you what I mean about a list of content blocks,
here is a single message that has a roll of user content set to a list.
And it contains three text blocks.
Each one has text of a single word.
Who, made, you. And if I run this, we'll see.
We get a response. "I was created by Anthropic."
So all of these messages are combined
and essentially turned into a single input prompt.
So now we go on to images.
So our Claude models accept images as inputs.
So we need some images to work with.
I've provided you with an images folder that contains a handful of images
that we'll use. This is the first one.
Let's say that we hypothetically run a food delivery startup,
and we're using Claude to verify customer claims.
Customers will send us a screenshot saying, look, only have my order arrived.
I want a refund.
So we are going to use Claude to analyze images of customer food
like this one here.
We'll start simple and just ask Claude to tell us
you know how many boxes and cartons of food are in this image.
So the first step is to understand
how we structure our messages that contain an image.
This diagram illustrates the structure.
So if you notice we have a messages list.
We have a role set to user just like before.
We have a content list.
And then inside of content we have a new type of content block we have yet to see.
We've only seen text box but this is an image block.
So type is set to image. It's a dictionary.
And then we have a source key set to another dictionary
where we have type set to base64.
We have media type which is set to the images media type like Jpeg or PNG or GIF.
And then we have the raw image data.
So this is the structure of a single message.
So back in our notebook there's a few steps
we need to go through before we can actually create that message.
We need to read in the actual image file itself.
We need to open it which is what we're doing here with the path to food dot PNG.
Then we'll read in the
contents of the image as a bytes object.
Then we'll encode the binary data using base64.
And then finally we'll take the base64 encoded data and turn it into a string.
By the end of this we have our base64 string, which is quite long.
But if we just look at the first
100 characters, here's a preview of what it looks like.
So now what we need to do is take this base64 string
that contains our properly formatted image data, and now put it
in a properly formatted message and then send it off to the model.
So here's some code that takes that base64 string
that contains our food dot png image data as base64 as a string,
and puts it in a properly formatted content block and image content block.
As you can see, type is set to image, source is set to dictionary, type
is base64, it's a PNG and then data is set to
our massive variable base64 string.
And then we follow it up with a second content block.
This time a text content block that has the text of
how many to go containers of each type are in this image.
Very very simple prompt.
We're sending it this image of to go containers filled with food.
We want to know how many of each type are in there.
Okay, so now we just take this messages list and send it off to the API.
So we use the same syntax we've seen before client dot messages dot create.
We pass in messages. We'll run it.
Then we see a response.
In this image there are three rectangular plastic containers with clear lids
and then three white paper or cardboard folded takeout boxes,
often called Chinese takeout boxes or oyster pails.
That is correct.
If we go back to the original image.
We do in fact see three boxes with plastic lids
and three of the paper oyster pails or Chinese takeout containers.
Now, going through all these steps to read the image and turn it into base64
and then turn it into a string encoded in UTF-8,
and then add it to a properly formatted message can be a little bit annoying
to do over and over.
So it's a great candidate for making a helper function.
So here's a helper function that just combines the functionality
we saw previously.
It's called create image message.
It takes an image path.
And then it's going to run those steps that we saw previously.
So it's going to open it read in the binary data.
It's going to encode it with base64 encoding.
It's going to turn it into a UTF-8 string.
It's going to guess the Mime type.
Remember we need to specify
whether it's a PNG or a Jpeg or a GIF or some other format.
And then finally it creates an image block,
properly formatted and then returns that image block.
So let's try it with a different image.
The images directory has a plant dot png image.
It's a pitcher plant.
Technically I think it's an Nepenthes plant.
I have had limited success growing this myself.
Usually kill them before the pitchers emerge.
But very cool plant.
I'm going to ask the model just to identify the plant.
Very simple use case.
So we're going to use this function.
We've defined.
And here we are.
I have a new messages list a single message in it with
role of user.
Content is set to a list containing the result of create
image message for the plant png image.
So we get that it properly formatted message back.
Or technically it's a content block.
And then we follow it up with a text content block asking a very simple prompt.
"What species is this?"
We'll send it off to the model.
We'll run it. We'll print out the response.
And here we go.
"This appears to be a Nepethes pitcher plant,
which is a type of carnivorous plant..."
And on and on and on. Okay.
So just a little helper function to to make things a bit easier.
You could take it a step further and make a helper function
just to generate the entire messages list itself, where you provide an image path
and you provide a text prompt like "what species is this?"
Next, let's take a look at a more realistic use case that a lot of
our customers are using Claude to help with, which is analyzing documents.
So many documents.
Let's take an invoice like this one, which is called invoice dot
PNG. Includes tons of important information.
Maybe it's a PDF, maybe it's a PNG.
We can feed it into Claude.
Give it a good prompt and ask it to give us structured data as a response.
So I might be able to turn thousands of invoices
into Json and store them in a database in a matter of minutes.
So here's what that could look like with a single example.
This invoice dot PNG image. I provide an image message properly formatted.
Then I provide a text prompt, a pretty simple one.
"Generate a Json object representing the content of this invoice.
It should include all dates, dollar amounts and addresses.
Only respond with the Json itself.
I'll send it off to the model
and we get a Json response back.
So it has the company name,
which is my company, Acme Corporation, our fake address.
It has information on the invoice invoice number, the date,
the due date, information on who it's billed to and their address.
The items in the invoice.
So enterprise software license implementation
services premium support plan.
And then it has totals
including the total, the tax rate, the tax amount and the actual total.
And if I scroll back you can get a closer look at that image
and see that all this information is, in fact accurate.
So just a slightly more realistic use case
for image prompting compared to, you know, identifying a plant species.
Now, one thing we won't demonstrate here, but it's important, you know, it is
possible is providing multiple images in a single message.
Recall that all of our content blocks are treated essentially
as one prompt behind the scenes when they're fed into the model.
So I can provide a combination of multiple image blocks plus multiple text
prompt blocks as part of a single-user message. Content is a list.
So I simply add my content blocks inside whether they have type,
set two image or type set two text.
The second topic we'll cover in this lesson is streaming responses.
What we've seen so far using client dot messages dot create works great.
But if I give it a prompt like write me a poem.
What you'll notice is that we're waiting for a response
until the entire response is generated and ready.
So it doesn't take all that long.
That was maybe half a second, maybe a second or less,
and we get the entire generation all at once.
But the longer a model's output generation is, let's say we're writing an essay
with the model, the longer it will take before we get any sort of content back.
We don't get a response back until the entire output has been generated.
With streaming, we can do something a bit different.
We can get content back as the content is generated.
And this is great for user facing scenarios where we can start
to show users responses as they're being generated,
instead of waiting until a full generation is complete.
So streaming doesn't actually speed up the overall process to generate.
It just speeds up what we call the time to first token, the time
that you see the first sign of life, the first piece of a response.
And the syntax is a little bit different, but very similar to this client dot
messages dot create.
So here we now have client dot messages dot stream.
And notice, we pass in max tokens.
We pass in a list of messages.
My prompt is simple just write a poem.
We pass in a model name.
But what's a bit different,
is that now we're going to iterate over this thing that we're calling stream.
So I give it this name as stream, and then I iterate over
every single bit of text in stream dot text stream, and then I print it out.
So what we'll see when I run this,
I'll just go ahead and execute it is we see the content coming back
as it's generated,
instead of having to wait for the entire thing to be generated at once.
Let's try it again.
You can see that we get chunks, little chunks, one by one,
and we're printing them out as they come in.
But again, the overall amount of time that it takes to do this
generation is going to remain unchanged.
Now, it obviously varies from one request to another, but we're not magically
getting the full result any faster than we would without streaming.
We're simply getting results.
We're getting parts of the output as they're being generated.
So we've seen how to make image requests, sending images as part of a prompt
in the content. We've also seen how to stream responses back from the model.
Now what I want to do is once again end by showing you
a real example from our computer use Quickstart implementation.
So this is a function that does a bunch of stuff.
But if you look closely in here in this highlighted text,
we are appending a correctly formatted image using the format that we talked
about earlier in this lesson.
So, type is image. Source is a dictionary.
Type is base64. Now what are these images?
These are the screenshots that we're providing the model with.
As we've seen previously when we covered sort of an introduction to the computer
use aspect of this course, the model works by getting screenshots,
analyzing the screenshots, and then deciding to take actions.
So we need to be able to provide images to the model.
And we use the exact same syntax we've already seen in this lesson.
We create these image content blocks a lot more complicated use case here
than identifying a plant, but it's the exact same syntax.
So we're slowly growing our arsenal of tools.
Next, we're going to talk about some more real-world or complex prompting.
Lesson 4: Real World Prompting
Lesson Link: https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/kmnd5/real-world-prompting
By the end of this lesson, you'll be able to structure effective prompts
that get consistent, high-quality responses from Claude.
You'll utilize proven prompting techniques that actually matter in the real world,
and to understand the difference between prompting a chatbot like Claude dot
AI and writing enterprise-grade repeatable prompts.
Let's get coding.
So the main focus of this lesson is really on the distinction
between the types of prompts that we might write as consumers
or as users of a chatbot like Claude.AI, and the types of prompts
that large customers are writing, or really any API customers are writing
that need to be repeatable and reliable.
This is the anthropic documentation.
We have a section on prompt engineering with a whole bunch of different
tips and strategies,
a lot of which do matter, but some of which matter more than others.
Which is what I want to talk about in this video.
I want to focus on the tips that are worth your time.
There's a lot of stuff out there on the internet around prompting.
Some of it is a little bit dubious.
So we're going to really focus on the prompting tips
that have empirical evidence to back them up.
But first, I want to show you an example of what I mean when I say a consumer
prompt versus a real-world enterprise prompt.
So I'm back in a notebook.
I have the same initial setup we've had from previous videos,
and here's an example of a potential chat bot or consumer
prompt that I might type into the Claude AI website.
Help me brainstorm ideas for a talk on AI and education.
And if I'm happy with the result, great.
If not, I have as many opportunities as needed to follow up and say "Woops!
Actually, you're focusing too much on AI, not enough on education.
Can you change this to a bullet-pointed list?
Can you make this markdown?"
Right. I can follow up over and over and over again.
I have a lot of wiggle room and room for forgiveness.
Now let's take a look at a potential enterprise-grade prompt.
Now, I'll warn you ahead of time, this is quite long and way too much
to read and go over in this video, but that's kind of the point.
I want you to see that these prompts get long.
They get complicated.
They have structure to them.
A lot of effort goes into creating these prompts
beyond just sort of, you know, coming up with a thought
and following it up with another thought in the way that I might talk to Claude.AI.
So this is an example of a prompt that takes customer service calls,
transcripts from a customer service call, and then generates Json summaries.
And we might be doing this thousands of times per hour or maybe even per minute.
If we're running, you know, a
massive call center or we have a huge customer support team.
So we're not going to go over this piece by piece,
but I'm leaving you with this prompt so that you can go over it if you'd like.
There are a few things in here we'll we'll refer back to.
The first thing that I want to
highlight, though, is that for enterprise grade prompts, for repeatable prompts,
we really think about them as prompt templates where we have
some large prompt structure that largely stays the same with a dynamic portion
or multiple dynamic portions that are inserted as variables.
So in this example, it starts by saying.
Analyze the following customer service call and generate a Json object.
Here's a transcript.
And then on this line we have a placeholder
where we actually would insert the real call transcript.
So we would do this dynamically just using a string
method most likely to replace this with a real transcript.
And then another real transcript and thousands and thousands of them
in a repeatable way.
So we think of this more as a template instead of a one-off use case.
Like you might consider your prompts for Claude.AI.
So back to the slides for a moment.
I've listed some of the more important prompting tips here,
and I've bolded the ones that I think are the most important or more important
than anything else.
So one of them is use prompt templates.
We've hinted at that idea.
Other things on here include letting Claude think,
also known as chain of thought.
We'll talk about that in a bit.
Structuring your prompts with XML.
We saw a little bit of that in the prompt I just showed,
but we'll also focus on that in the next few minutes and using examples.
These are all techniques that have real data
behind them that actually back up the claims that they matter.
So now what I want to do is go through these and try
and build a relatively real-world prompt.
It will get a little bit long.
Prompts do get long.
It's a lot of text, a lot of strings.
But we're going to go through this one bit at a time.
We're going to go about this by building up to a larger
real-world or enterprise-grade prompt.
And the idea that we'll be using is a customer
review, classification and sentiment analysis prompt.
Let's say that we run some fictional e-commerce company.
Acme company. And we have hundreds of products
and thousands and thousands of customer reviews.
We're going to use a Claude API to help us understand
the sentiment of those reviews and some common complaints.
So if this is a hypothetical review, I recently purchased XYZ smartphone.
It's been a mixed experience.
It lists some positive, some negatives.
It says, you know, I expected fewer issues.
I want Claude to be able to tell me in a repeatable fashion,
is this a positive review?
Negative is a neutral.
And I wanted to highlight some of the key issues and point to feedback.
Specifically for doing this at scale with thousands of thousands of reviews,
I probably want the output to contain some easily extractable
will and easy to work with the output format.
Often that will be Json.
Maybe something like this.
Some repeatable object that always has a sentiment score
positive, negative or neutral.
It has some analysis under a key called sentiment analysis.
And then it lists the actual complaints.
So performance like poor value, unreliable facial recognition and so on.
And then I can easily do this at scale for thousands of reviews.
Storming a database.
Compare them build charts, whatever I want to do with this repeatable output.
So we're going to approach this piece by piece with our task now defined.
We want to take customer reviews and turn them into Json
with sentiment analysis information and customer complaint data extracted.
We're going to go
through this one part at a time and then build up the entire prompt.
So the first tip we'll talk about is setting the role for the model.
Now this is actually one that I don't feel as strongly about it.
So we'll go through it pretty quickly.
Something that can be useful
is just giving the model a clear role and set of expectations upfront.
So in this case it might look something like this.
You are an AI assistant specialized in analyzing customer reviews.
Your task is to determine the overall sentiment of a given review
and extract any specific complaints mentioned.
Please follow these instructions carefully.
So obviously this is just one piece of the prompt,
but we're setting the role or setting the stage, giving the model
some context as to what it's supposed to be good at.
So the next step here is to provide the actual instructions to the model.
Right.
If we scroll up, we told the model.
Please follow these instructions carefully.
Now we're going to give it a very clear and direct ordered list of instructions.
So the first instruction is to review the following customer feedback.
We're making this a prompt template where we'll actually insert a customer
review here.
Now you don't have to use these double curly braces.
You can use whatever
sort of variable that you want or placeholder that you want to replace.
We like to use double curly braces, but definitely not a requirement.
Additionally, notice that I'm using XML tags here.
Not a requirement either, but Claude models tend to work very well with XML tags.
You can use any sort of syntax or any sort of separators to tell the model.
Here's where the customer review begins and here's where it ends.
Some of these customer reviews might be short a couple of sentences,
but for some very disgruntled or very enthusiastic customers,
might be looking at thousands of characters.
So we want to clearly tell the model.
Here's where the review begins. Here's where it ends.
The next thing that we'll focus on
are the actual steps we want the model to go through.
All right.
We've provided the context and said we want you to review this customer
feedback.
We will then eventually insert the customer feedback in here.
What do we want the model to do?
It may be tempting to simply say generate Json that includes a sentiment score.
Is it positive, neutral or negative?
And a list of complaints that you've extracted and that may work.
It likely will work in a lot of situations,
but one of the prompting tips I want to highlight
here is what we call letting Claude think or chain of thought.
Essentially, telling the model
that before it comes to a decision or some sort of conclusion,
we want it to think out loud and output some analysis to help it make a decision.
And then eventually make that judgment.
So here's an example of what that could look like. In this variable instruction
part two, another long string.
I'm telling the model here's your second step.
So once you've reviewed the customer feedback I want you to analyze the review
using the following steps.
There's a few things I want to highlight.
First of all, this line here tells model to show its work in review
break down text.
Again, you don't have to use XML, but the model performs very well,
the Claude family models perform well with XML, so a common strategy
is to tell Claude to contain certain parts of its output
in certain XML text, so we can tell it to do its thinking out loud.
Instead of review, break down tags separate from the actual analysis
the final result will tell it to put its results in some separate tag.
We tell the model to start
by extracting key phrases that might be related to sentiment.
Then we tell the model to consider arguments for positive,