ragchatbot-codebase/docs/course1_script.txt at main · https-deeplearning-ai/ragchatbot-codebase

History

1989 lines (1980 loc) · 102 KB

Raw

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

Course Title: Building Towards Computer Use with Anthropic

Course Link: https://www.deeplearning.ai/short-courses/building-toward-computer-use-with-anthropic/

Course Instructor: Colt Steele

Lesson 0: Introduction

Lesson Link: https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/a6k0z/introduction

Welcome to Building Toward Computer Use with Anthropic. Built in partnership with Anthropic and taught by Colt Steele, whose Anthropic's Head of Curriculum. Welcome, Colt. Thanks, Andrew. I'm delighted to have the opportunity to share this course with all of you. Anthropic made a recent breakthrough and released a model

that could use a computer.

That is, it can look at the screen, a computer usually running

in a virtual machine, take a screenshot and generate mouse clicks

or keystrokes in sequence to execute some tasks, such as search

the web using a browser and download an image, and so on.

This computer use capability is built by using many features of large language

models in combination, including their ability to process

an image, such as to understand what's happening in a screenshot,

or to use tools that generate mouse clicks and keystrokes.

And these are wrapped in an iterative agent workflow to then carry out

complex tasks by taking many actions on that computer.

In this course, you learn about the individual features

which will be useful for your applications even outside of LLM-based computer use,

as well as see how we can all come together for computer use. And Colt

will show you how all this works. Thanks, Andrew.

In this course, you will learn how to use many of the models

and features that all combine to enable computer use.

So here's how the course will progress.

You'll first learn a little bit about Anthropic's background and vision

and what's unique about our family of models.

Then we'll use the API to make some basic requests.

This then leads to multi-modal requests,

where you'll use the model to analyze images.

Then you'll dive into prompting,

which Anthropic has really leaned into making models much more predictable

with solid prompting,

you'll learn about the prompting tips that actually matter,

things like chain of thought and

n-shot prompting, as well as get a chance to use our prompt improver tools.

Recently, large language models have been supporting large input contexts.

Anthropic's Claude, for example, supports over 200,000

input tokens, which is more than 500 pages of text.

Long inputs can be expensive to process,

and that any long conversations with chatbot

if you're processing that conversation history over and over

to keep on generating that next response, that next response,

then that too gets more expensive

as that history gets longer as the conversation goes on.

Exactly.

And that brings us right to prompt caching.

Prompt caching retains some of the results of processing prompts between invocation

to the model, which can be a large cost and latency saver.

You also get to use the model to generate calls to external tools

and produce structured output, such as Json,

and at the very end, we'll walk through a complete

example of computer use that you can run on your own machine.

Note that because of the nature of the tool,

you will have to run that on a Docker image on your computer,

rather than directly in the DeepLearning.AI notebook.

I've tried out Computer

use myself using Anthropic's models and found it really cool.

And I think this capability will make possible

a lot of new applications where you can build an AI assistant

to use a computer to carry out tasks for you. Kind of think

RPA or robotic process automation, which has been good at repetitive tasks

but now easier to build and more general with LLM-based tools.

Or as this technology

is even better than even more flexible and more open-ended tasks.

So gradually feel more and more like personal assistants.

I could not agree more.

Very excited to see where it goes.

Many people have worked to create this course.

I'd like to thank from Anthropic, Ben Mann, Maggie Vo, Kevin Garcia, the team

working on computer use, and from DeepLearning.AI Geoff Ladwig and Esmaeil Gargari.

Anthropic has built a lot of really great models, and I regularly use them myself.

Colt will share details of these models in the next video.

All right, let's get started.

Lesson 1: Overview

Lesson Link: https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/gi7jq/overview

By the end of this lesson,

you'll be able to explain Anthropic's approach to AI research and development.

Describe the key principles of AI safety, alignment and interpretability,

and differentiate between Anthropic's family of models.

Let's dive in.

All right, let's get started.

So this course is going to cover

everything you need to know about working with the Anthropic API,

working with our models,

building up towards understanding how a computer-using agent works.

So what do I mean by computer-using agent?

Well, here's an example.

On the left you can see I'm typing a prompt.

What roles is Anthropic hiring for in Dublin?

By the way this footage is sped up a little bit

just so you don't have to listen to me talk for too long.

You can see that the model is using the computer on the right.

It is clicking, it's moving the mouse, it's selecting dropdowns.

It's going to expand accordion menus.

Eventually, it makes its way to the Anthropic careers page,

it filters by Dublin, and then it's going to expand the two roles,

a technical program manager and an audit and compliance role and security.

So this is what I mean when I say a computer using agent,

that agent you just saw builds upon all the fundamentals of the API.

So we're going to go through these topics in order, culminating

in a computer-use agent demonstration at the end.

So that computer-using agent sends basic requests to the API, text prompts.

It uses the messages format.

It uses various model parameters.

That's what we'll cover next.

Then we'll move on to multi-modal requests.

You may have noticed the model was using screenshots

in order to decide where to click, where to drag, where to type,

so you'll learn how to make requests that involve images, including screenshots.

Then we move on to real-world prompting, which is focused on pretty big difference

between talking to a chatbot like Claude.AI in a conversational manner versus

prompting using the API for scalable, repeatable prompt templates.

Then you'll learn about prompt caching, which is a strategy that the computer

using agent employs.

And it also is a great cost saving and latency saving measure.

Then you'll learn about tool use, which is what enables the model to do

things like click and scroll and type,

or other tools like connect to an API or issue bash commands or run code.

Various tools

that we can provide the model with that it can tell us it wants to execute.

Finally, at the very end,

you'll see how to run the computer using-agent that you just saw.

It combines all of the topics that we've covered, plus some other things.

It's a bit of a step up, but it's a great capstone that covers

all the core concepts of working with the anthropic API.

Now, before we dive into actually working with the API,

I want to talk a little bit about Anthropic.

Anthropic is an unique AI lab that has a very heavy

focus on research that puts safety at the very frontier.

So essentially building frontier models, the best models in the world, at times

simultaneously performing cutting-edge research using those models.

This timeline really synthesizes

both of those ideas in the span of a few short years.

On the top you can see Anthropic was founded in 2021.

You can see the timeline of various model releases leading up to Claude 3.5 Sonnet

in 2024.

And on the bottom, you can see

some of the key research papers that have been released simultaneously.

Now, this is not a course on research,

but I do want to call your attention to the research page of Anthropic website.

It's a great resource to learn more about our research,

both in approachable formats and through full-fledged research papers.

Some of the key areas that we focus on are interpretability,

alignment, and societal impacts.

Now I want to pay special attention to alignment.

Alignment science focuses on ensuring that AI systems

behave in accordance with our human values and human intentions.

How do we create AI systems that reliably pursue the objectives?

The tasks that we want them to pursue, even as they become more and more capable.

Another heavy research area at Anthropic is interpretability,

which is a bit of a mouthful,

but is a really fascinating and critical aspect of AI research.

Interpretability is all about understanding

how large language models work internally.

Essentially, reverse engineering them or giving the models MRIs or brain scans

so we can understand

exactly what is happening inside of them at any given point in time.

It's very difficult to improve models

and also to ensure that they are safe without understanding how they work.

One of the things I encourage you to do, if you're interested, is to read

some of our blog posts, watch some of the videos on interpretability,

specifically this relatively approachable paper called Scaling Monosemanticity.

I know the name doesn't sound that approachable,

but it's full of really cool diagrams and visualizations

as it walks through some key interpretability research.

It's also just a pretty fun read with some interesting examples.

Now, as I mentioned at the beginning, Anthropic is not just a research

lab focused on safety, alignment, interpretability.

Anthropic also releases state-of-the-art large language models on our models page.

On our documentation, you'll find an up to date list of our current models,

which, like everything in the AI space, changes pretty frequently.

So it may not actually look exactly like this.

But as you can see, Claude 3.5 Sonnet is currently our most intelligent model.

And then Claude 3.5 Haiku,

which is a slightly less capable model, though still very intelligent.

That is faster.

Those are the two main choices presented to you currently.

If you're going to use one of our models.

Now, if we zoom in on this

model comparison table, you'll see we have Claude 3.5 Sonnet and Claude

3.5 Haiku, as well as the original Claude 3 family of models.

But the two newest and most capable models are on the left here, 3.5

Sonnet and 3.5 Haiku.

We can see a nice comparison, a breakdown of their capabilities, their strengths,

their vision capabilities.

So in general, Claude

3.5 Sonnet is the most intelligent model we offer.

It is the smartest, the most capable model.

It's multilingual.

It is multimodal,

supporting image inputs.

It supports our batches API.

And one thing that trips some people up is that there are multiple versions of it,

including the most recent upgraded version, which is Claude 3.5

Sonnet 2024 1022.

We'll talk more about the model strings in the next video.

But this is the most recent version of Claude 3.5 Sonnet.

It is

fast, however, not as fast as Claude

3.5 Haiku, which is the fastest model that we offer.

It is very intelligent at very fast speeds,

so it is faster than Claude 3.5 Sonnet

slightly less capable on

some of the popular benchmarks, and currently does not support vision.

Now let's talk about context window.

We're working with 200,000 tokens for the context window across

both of those models, a maximum output tokens of 8192, 8192.

Clearly, Claude 3.5 Haiku is cheaper, it's faster,

but Claude 3.5 Sonnet is the most intelligent model, and that's

what we'll be using throughout this course.

It's also quite affordable, and it is the model that currently

performs best on computer use tasks, largely because it supports image input.

Now, we'll learn how to use these models

in the next video.

We'll start sending requests, but I just want you to see the documentation page

so that you can always find out about the latest model

and see a comparison of how these models stack up across various metrics.

So that's a tiny bit about Anthropic.

We're a frontier research lab creating frontier or cutting-edge models.

It's also a little bit about the course and the rough structure.

We're now going to dive

into working with the API, sending our first simple text requests

building up of course, to this computer using Agent Capstone demo.

Okay, let's get started.

Lesson 2: Working With The API

Lesson Link: https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/yldsj/working-with-the-api

By the end of this lesson, you'll be able to make your own API request.

to Claude.

You'll format messages effectively for optimal AI responses and control

various API parameters like the system prompt, max tokens, and stop sequences.

All right. Let's dive into the code.

So we'll begin by getting set up with the Anthropic Python SDK.

The first step is to simply ensure that the anthropic SDK is installed,

which is as simple as running pip install Anthropic,

and once it's installed, we'll go ahead and import it.

Specifically, we're going to import capital "A" Anthropic.

And we'll use that to instantiate

a client that we can then send API requests through.

Okay.

So on the second line we're creating, our client can call it whatever we want.

I usually call it client.

And this is where if we had an API key we wanted to explicitly pass through,

we could pass it in right here anthropic API key equals.

And then put your key in there.

But if I leave it off this will automatically look for

an environment variable called Anthropic API key.

So now we have our client.

The next step is to make our very first request.

I've added two cells of code.

The first one is just a model name variable.

We're going to be repeating this model name over and over throughout the course.

So I'm just going to put it in a variable Claude 3.5

sonnet 20241022.

Just the latest checkpoint, the latest version of Claude 3.5 Sonnet.

And then this larger chunk,

the most important piece here, is how we actually make a simple request.

So we use our client variable dot messages dot create.

And there are a few things in here we'll go over in due time.

First of all we're just passing the model name.

This is required. We do have to pass in max tokens.

We'll discuss that in a little bit and we have to pass in messages.

So messages needs to be a list containing a list of messages.

In this case a single message a role of user meaning us, the user.

We are providing a prompt to the model that has content set to some sort of

content, some prompt.

So I asked it to write a haiku about Anthropic.

So let's run these cells and then notice I'm printing

specifically response content zero dot text.

We'll see what we get in just a moment.

We get a haiku about Anthropic "Seeking to guide AI

through wisdom and careful thought toward better futures."

Great.

So let's talk a bit more about this response object that we get back.

Let's take a look at it.

There are quite a few pieces in here.

First of all, we have the content that we just discussed.

Content is a list.

If we look at the zero with element

we can look at its text and we can see the actual haiku.

We also have the model that was used.

We have the role.

Remember that our original message had a role of user.

So this response back is a message with a role of assistant.

We also have stop reason which tells us why the model stopped generating.

In this case

it says "end turn" which means essentially it reached a natural stopping point.

Stop sequence is none.

We'll talk more about stop sequence in a bit.

And then under usage, we can see the number of tokens

involved in our input, the actual prompt, as well as the output tokens

that were generated.

In this case 30 tokens of output.

So go ahead and try this yourself.

Put any sort of prompt

you'd like in here in place of write a haiku about Anthropic.

Next step we're going to discuss the specific format of the messages list.

So the SDK is set up in such a way that we pass through a list of messages.

It's required along with max tokens and a model name.

And this list of messages

so far, has only included a single message with a role set to user.

The idea of the messages format is that it allows us

to structure our API calls to Claude in the form of a conversation.

We don't have to use it in that way.

We haven't so far, but it's often useful if we are building any sort of

conversational element or need to preserve any prior context.

For now, all you need to know about messages

is that they need to have a role set, either to user or to assistant.

So let's try and provide some previous context.

Let's say perhaps we've been talking to Claude, in Spanish,

and I'd like Claude to continue speaking in Spanish.

So I've updated the messages list to add some previous history

where I have a user message saying "hello, only speak to me in Spanish",

and then I have a response assistant message that says "Hola!"

And then I have my final user message.

The only thing that's changing is this role going from user to assistant.

Back to user.

I'm providing Claude

with some conversation history, and then I'm finally saying, "how are you?"

And if I run this,

the model will take the entire conversation into account,

Right?

This is the entire prompt.

Now and then we get a response in Spanish.

So this is useful in a couple of different scenarios.

The first and perhaps most obvious is in building conversational assistants

in building chatbots.

So here we have a very simple implementation of a chatbot

that takes advantage of this messages format.

We're going to alternate messages between a user and an assistant message

growing the messages list as the conversation takes turns.

So we start with an empty list of messages,

and then we have a while loop.

We're going to loop forever unless the user inputs the word quit,

at which case we'll break out.

We need to provide an escape hatch, but if they don't type quit,

we'll ask the user for their input,

and then we'll make a new message dictionary with the role of user.

The content will be whatever the user typed in, like "hello Claude."

We'll send that off to the model using the client

dot messages dot create method we just saw.

Then we'll take the assistance response we'll print it out.

And then we'll also append that assistant message

as a new message to our messages list.

And then we'll repeat.

And we'll keep growing this list over and over and over for each turn

in the conversation. We'll add our user message.

We'll get a response.

We'll add our assistant message, and then we'll send the whole thing

back to the model next time when we get a new user message.

So let's try it.

Go ahead and run this.

So let's start with something simple. "Hello.

I'm Colt".

I'll send it off.

We get a response "Hi Colt. I'm an AI assistant. Nice to meet you.

How can I help you?"

Let's just test that it actually has the full context.

Let me ask it. What's my name?

Okay, we'll send that off.

"Your name is Colt. As you introduced yourself earlier."

Let's try something a bit more interesting.

I've asked it to help me learn more about how LLMs work.

So generate a response for me here.

This one's likely a little bit longer, and it gives me some information.

And I'll follow up with expand on the third item.

Again, this is just to demonstrate that it gets the full conversational history.

On its own,

this message doesn't mean anything to the model,

but with the full conversation history that I'm sending to it.

Now it expands on that third bullet point.

So that's one use case for sending messages in the messages format.

Another use case is what we call pre filling or putting words in the model's

mouth.

Essentially we can use an assistant message to tell the model

"here are some words that you will begin your response with."

We can put words in the model's mouth.

So for example, I'm having it write a short poem about Anthropic.

Let's change that to something else.

How about a short poem about pigs? Sure.

If I go ahead and just run this,

it may tell me something like:

"Okay, here's a short poem about pigs."

There we go.

But for some reason, I really want this poem to start with the word oink.

I insist on it.

Now I could tell the model, you know, write me a poem about pigs.

You must start with the word oink.

Also, don't give me this preamble.

Just go right to the poem.

But another option is to simply add in an assistant message

that begins with the word oink.

So something like this, where I have put

new message in here with the role of assistant content is oink.

So the model is now going to begin its response from this point.

Oink. And then you can see the completion

we get. "Oink and Snuffle pink and round rolling, happily unready ground."

Now it is important to note

it doesn't include the word oink in its response

because the model didn't generate this word.

I did, but the model generated all of this content by beginning with the word oink.

So then I could just combine the word oink with the rest of the poem

if I wanted to.

So that's pre-filling the response.

Next, we're going to talk about some of the parameters we can pass

to the model via the API to control its behavior.

The first we'll cover is max tokens.

So we've been using max tokens but we haven't discussed what it does.

In short Max tokens controls well the maximum

number of tokens that Claude should generate in its response.

Remember that models don't think in full words or in English words,

but instead they use a series of word fragments that we call tokens.

And model usage is also build according to token usage.

For Claude, the token is roughly 3.5 English characters, though

it can vary from one language to another.

So this max tokens parameter allows us to set an upper bound.

We can basically tell the model don't generate more than 500 tokens,

or let's set this to something high, like 1000 tokens to start.

I'm going to ask the model to write me an essay on large language models,

a prompt that likely will generate

a whole bunch of tokens because I asked for an essay.

Okay, and here's our response. Great.

Pretty long, looks to be a pretty decent essay.

Now, if I tried this again, but I instead set max

tokens to be something much shorter, like 100 tokens.

I'll run this.

What will happen here is the model will get cut off essentially mid-generation.

We just cut it off because we've hit this 100 token generation.

Importantly, if we

look at the response object.

We'll also see nested inside of here

the number of output tokens was exactly 100.

It hit that and it stopped.

But we also see a stop reason this time that says Max tokens.

So the model didn't naturally stop. Because stop reason is set to max tokens.

That's how we know the model was cut off because of our max tokens parameter.

So this does not influence how the model generates.

Right. We're not telling the model,

"Give me a short response with an entire essay that fits within 100 tokens."

Instead, what we've done is we've told the model, write me

an essay on large language models, and then we just cut it off at 100 tokens.

So why would you use max tokens, or why would you alter it to something low

or something high?

Well, one reason is to try and save on API costs

and set some sort of upper bound where through a combination of a good prompt,

but also through setting max tokens.

For example, if you're making a chatbot, you may not want

your end users to have 5000 token turns with the chatbot.

You may prefer that

those conversational turns are short and they fit within a chat window.

Another reason is to improve speed.

The more tokens involved in an output, the longer it takes to generate.

The next parameter we'll look at is called stop sequences.

But this allows us to do is provide a list of strings

that when the model encounters them, when the model actually generates them,

it will stop.

So we can tell the model

once you've generated this word or this character or this phrase, stop.

So it gives us a bit more control instead of just truncating a number of tokens.

We can tell the model we want to truncate your output on this particular word.

So here's an example where I'm not using a stop sequence.

Generate a numbered ordered list of technical topics

I should learn if I want to work on large language models.

I pass that prompt through.

I've just moved it to a variable because it's a bit longer

and I get this nice numbered list, but it's quite long.

12 different topics.

Now, obviously through prompting I could tell the model only

give me the top three or the top five, but I'll just showcase with this example.

I'll copy this and duplicate it, but this time I'll provide stop sequences,

which is a list, and it contains strings.

In my case, let's say I want it to stop after it generates four.

So four period, We'll try running it again and you can see what we get.

So we get 1,2,3.

And then the model went on to generate four.

And it stopped.

Notice that four is not included in the output itself.

And if I look at the response object

we'll also see

that we have a stop reason this time set to stop sequence.

This is the model API telling us it stopped

because it hit a stop sequence which stopped sequence,

it hit four followed by a period.

So stop sequences is a list.

We can provide as many as we want in here.

This is one way to control when the model stops outputs

or when the model stops generating.

And we'll see some use cases for this when we get to some more advanced

prompting techniques.

Now the next parameter we'll talk about is called temperature.

This parameter is used to control,

you can think of it as the randomness or the creativity of the generated responses.

Now it ranges from 0 to 1, where a higher value like one is going to result

in more diverse and more unpredictable responses, with variations and phrasing

and a lower temperature closer to zero will result in

more deterministic outputs that stick to the more probable phrasing.

So this chart here is an output from a little experiment I ran.

I don't recommend you run it because it involved making hundreds of API

requests, but I asked the model via the API to pick an animal.

My prompt was something like pick a single animal, give me one word,

and I did this 100 times with a temperature of zero.

And you can see every single response out of 100 was the word giraffe.

Now, I did this again, but instead set a temperature of one.

And we still get a lot of giraffe responses.

But we get some elephants and platypus, koala, cheetah and so on.

We get more variation.

So again, temperature of zero more likely to be deterministic,

but not guaranteed temperature of one or more diverse outputs.

Now here's a function you can run that will demonstrate this.

I'm asking Claude

three different times to generate a planet name for an alien planet.

I'm telling it respond with a single word and I'm doing this three times

for the temperature of zero and three times with a temperature of one.

So let's see what happens.

I'll execute this cell where I'm calling this function.

And when I use a temperature of zero I get the same planet name three times

in a row. Kestrax, Kestrax, Kestrax.

And when I use a temperature of one, I get Keylara,

Kestrax, and Kestryx spelled slightly differently.

So we do get more diversity there.

Now that we've seen the basics of making a request,

I want to tie it back to computer use.

Write everything we're going to learn in this course is in some way

related towards building a computer, using agent, using Claude.

So this is some code from our computer use quickstart that will take a look

at towards the end of this course, but I want to highlight a few things.

We are making a request where we're providing max tokens.

We're providing a list of messages or providing a model name

and then some other stuff we'll learn more about later.

And then we're also using the conversational messages format.

As you can see down here we have a list of messages.

It's defined further up in this repository or in this file.

But we have a list of messages that we are appending the assistance

response back to.

So very similar to the chatbot we saw earlier, except of course

a lot more complicated.

It's using a computer.

There's screenshots involved and tools and a whole bunch of interactions,

but it's the same basic concept.

We send some message off to the model, and then we get the assistant response back.

We append that to our messages.

If I scroll up high enough, we can see it's all nested inside of a while true loop.

And there's a whole bunch of other logic, of course,

but it boils down to sending a request off to the API

using our client, providing things like max tokens and messages,

and then updating our messages list as new responses come back.

And providing this updated, continuously growing list of messages

every single time.

And we do this over and over and over again

using all the fundamentals we learned so far in this video.

Lesson 3: Multimodal Requests

Lesson Link: https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/zrgb6/multimodal-requests

By the end of this lesson, you'll be able to write multimodal prompts that combine

images and text and work with streaming responses from the API.

All right. Let's go.

So let's get started making our first multimodal request.

We're going to take an image or multiple images along with

some text, send it off to the model and get a response.

So just as in the previous video, we have some basic setup.

We're going to import Anthropic.

We'll set up our client

and then we'll have just a helper variable to store the model name string.

Before we start working with images, we need to talk

a little bit more about the messages structure we've seen so far.

So in the previous lesson

we set up a messages list where each message had a role set to user

and then content set to a string like tell me a joke.

And if I run this, we should see a joke.

And we do in fact get a joke.

Not a good one, but a joke.

Now this is actually a shortcut.

Setting content to a string is a shortcut for this syntax

here, where we set content to a list that contains a bunch of content blocks.

In this case, it's just a single content block with a type

set to text, and then text set to tell me a joke.

So this will give us the exact same sort of input prompt, different syntax.

Up here we have a nice shortcut.

If we're simply doing text prompts, it's easier to do it this way.

But as we'll see in just a moment, we'll want to provide

a list of content blocks if we're going to provide images.

So if I run this we again get a joke.

And just to show you what I mean about a list of content blocks,

here is a single message that has a roll of user content set to a list.

And it contains three text blocks.

Each one has text of a single word.

Who, made, you. And if I run this, we'll see.

We get a response. "I was created by Anthropic."

So all of these messages are combined

and essentially turned into a single input prompt.

So now we go on to images.

So our Claude models accept images as inputs.

So we need some images to work with.

I've provided you with an images folder that contains a handful of images

that we'll use. This is the first one.

Let's say that we hypothetically run a food delivery startup,

and we're using Claude to verify customer claims.

Customers will send us a screenshot saying, look, only have my order arrived.

I want a refund.

So we are going to use Claude to analyze images of customer food

like this one here.

We'll start simple and just ask Claude to tell us

you know how many boxes and cartons of food are in this image.

So the first step is to understand

how we structure our messages that contain an image.

This diagram illustrates the structure.

So if you notice we have a messages list.

We have a role set to user just like before.

We have a content list.

And then inside of content we have a new type of content block we have yet to see.

We've only seen text box but this is an image block.

So type is set to image. It's a dictionary.

And then we have a source key set to another dictionary

where we have type set to base64.

We have media type which is set to the images media type like Jpeg or PNG or GIF.

And then we have the raw image data.

So this is the structure of a single message.

So back in our notebook there's a few steps

we need to go through before we can actually create that message.

We need to read in the actual image file itself.

We need to open it which is what we're doing here with the path to food dot PNG.

Then we'll read in the

contents of the image as a bytes object.

Then we'll encode the binary data using base64.

And then finally we'll take the base64 encoded data and turn it into a string.

By the end of this we have our base64 string, which is quite long.

But if we just look at the first

100 characters, here's a preview of what it looks like.

So now what we need to do is take this base64 string

that contains our properly formatted image data, and now put it

in a properly formatted message and then send it off to the model.

So here's some code that takes that base64 string

that contains our food dot png image data as base64 as a string,

and puts it in a properly formatted content block and image content block.

As you can see, type is set to image, source is set to dictionary, type

is base64, it's a PNG and then data is set to

our massive variable base64 string.

And then we follow it up with a second content block.

This time a text content block that has the text of

how many to go containers of each type are in this image.

Very very simple prompt.

We're sending it this image of to go containers filled with food.

We want to know how many of each type are in there.

Okay, so now we just take this messages list and send it off to the API.

So we use the same syntax we've seen before client dot messages dot create.

We pass in messages. We'll run it.

Then we see a response.

In this image there are three rectangular plastic containers with clear lids

and then three white paper or cardboard folded takeout boxes,

often called Chinese takeout boxes or oyster pails.

That is correct.

If we go back to the original image.

We do in fact see three boxes with plastic lids

and three of the paper oyster pails or Chinese takeout containers.

Now, going through all these steps to read the image and turn it into base64

and then turn it into a string encoded in UTF-8,

and then add it to a properly formatted message can be a little bit annoying

to do over and over.

So it's a great candidate for making a helper function.

So here's a helper function that just combines the functionality

we saw previously.

It's called create image message.

It takes an image path.

And then it's going to run those steps that we saw previously.

So it's going to open it read in the binary data.

It's going to encode it with base64 encoding.

It's going to turn it into a UTF-8 string.

It's going to guess the Mime type.

Remember we need to specify

whether it's a PNG or a Jpeg or a GIF or some other format.

And then finally it creates an image block,

properly formatted and then returns that image block.

So let's try it with a different image.

The images directory has a plant dot png image.

It's a pitcher plant.

Technically I think it's an Nepenthes plant.

I have had limited success growing this myself.

Usually kill them before the pitchers emerge.

But very cool plant.

I'm going to ask the model just to identify the plant.

Very simple use case.

So we're going to use this function.

We've defined.

And here we are.

I have a new messages list a single message in it with

role of user.

Content is set to a list containing the result of create

image message for the plant png image.

So we get that it properly formatted message back.

Or technically it's a content block.

And then we follow it up with a text content block asking a very simple prompt.

"What species is this?"

We'll send it off to the model.

We'll run it. We'll print out the response.

And here we go.

"This appears to be a Nepethes pitcher plant,

which is a type of carnivorous plant..."

And on and on and on. Okay.

So just a little helper function to to make things a bit easier.

You could take it a step further and make a helper function

just to generate the entire messages list itself, where you provide an image path

and you provide a text prompt like "what species is this?"

Next, let's take a look at a more realistic use case that a lot of

our customers are using Claude to help with, which is analyzing documents.

So many documents.

Let's take an invoice like this one, which is called invoice dot

PNG. Includes tons of important information.

Maybe it's a PDF, maybe it's a PNG.

We can feed it into Claude.

Give it a good prompt and ask it to give us structured data as a response.

So I might be able to turn thousands of invoices

into Json and store them in a database in a matter of minutes.

So here's what that could look like with a single example.

This invoice dot PNG image. I provide an image message properly formatted.

Then I provide a text prompt, a pretty simple one.

"Generate a Json object representing the content of this invoice.

It should include all dates, dollar amounts and addresses.

Only respond with the Json itself.

I'll send it off to the model

and we get a Json response back.

So it has the company name,

which is my company, Acme Corporation, our fake address.

It has information on the invoice invoice number, the date,

the due date, information on who it's billed to and their address.

The items in the invoice.

So enterprise software license implementation

services premium support plan.

And then it has totals

including the total, the tax rate, the tax amount and the actual total.

And if I scroll back you can get a closer look at that image

and see that all this information is, in fact accurate.

So just a slightly more realistic use case

for image prompting compared to, you know, identifying a plant species.

Now, one thing we won't demonstrate here, but it's important, you know, it is

possible is providing multiple images in a single message.

Recall that all of our content blocks are treated essentially

as one prompt behind the scenes when they're fed into the model.

So I can provide a combination of multiple image blocks plus multiple text

prompt blocks as part of a single-user message. Content is a list.

So I simply add my content blocks inside whether they have type,

set two image or type set two text.

The second topic we'll cover in this lesson is streaming responses.

What we've seen so far using client dot messages dot create works great.

But if I give it a prompt like write me a poem.

What you'll notice is that we're waiting for a response

until the entire response is generated and ready.

So it doesn't take all that long.

That was maybe half a second, maybe a second or less,

and we get the entire generation all at once.

But the longer a model's output generation is, let's say we're writing an essay

with the model, the longer it will take before we get any sort of content back.

We don't get a response back until the entire output has been generated.

With streaming, we can do something a bit different.

We can get content back as the content is generated.

And this is great for user facing scenarios where we can start

to show users responses as they're being generated,

instead of waiting until a full generation is complete.

So streaming doesn't actually speed up the overall process to generate.

It just speeds up what we call the time to first token, the time

that you see the first sign of life, the first piece of a response.

And the syntax is a little bit different, but very similar to this client dot

messages dot create.

So here we now have client dot messages dot stream.

And notice, we pass in max tokens.

We pass in a list of messages.

My prompt is simple just write a poem.

We pass in a model name.

But what's a bit different,

is that now we're going to iterate over this thing that we're calling stream.

So I give it this name as stream, and then I iterate over

every single bit of text in stream dot text stream, and then I print it out.

So what we'll see when I run this,

I'll just go ahead and execute it is we see the content coming back

as it's generated,

instead of having to wait for the entire thing to be generated at once.

Let's try it again.

You can see that we get chunks, little chunks, one by one,

and we're printing them out as they come in.

But again, the overall amount of time that it takes to do this

generation is going to remain unchanged.

Now, it obviously varies from one request to another, but we're not magically

getting the full result any faster than we would without streaming.

We're simply getting results.

We're getting parts of the output as they're being generated.

So we've seen how to make image requests, sending images as part of a prompt

in the content. We've also seen how to stream responses back from the model.

Now what I want to do is once again end by showing you

a real example from our computer use Quickstart implementation.

So this is a function that does a bunch of stuff.

But if you look closely in here in this highlighted text,

we are appending a correctly formatted image using the format that we talked

about earlier in this lesson.

So, type is image. Source is a dictionary.

Type is base64. Now what are these images?

These are the screenshots that we're providing the model with.

As we've seen previously when we covered sort of an introduction to the computer

use aspect of this course, the model works by getting screenshots,

analyzing the screenshots, and then deciding to take actions.

So we need to be able to provide images to the model.

And we use the exact same syntax we've already seen in this lesson.

We create these image content blocks a lot more complicated use case here

than identifying a plant, but it's the exact same syntax.

So we're slowly growing our arsenal of tools.

Next, we're going to talk about some more real-world or complex prompting.

Lesson 4: Real World Prompting

Lesson Link: https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/kmnd5/real-world-prompting

By the end of this lesson, you'll be able to structure effective prompts

that get consistent, high-quality responses from Claude.

You'll utilize proven prompting techniques that actually matter in the real world,

and to understand the difference between prompting a chatbot like Claude dot

AI and writing enterprise-grade repeatable prompts.

Let's get coding.

So the main focus of this lesson is really on the distinction

between the types of prompts that we might write as consumers

or as users of a chatbot like Claude.AI, and the types of prompts

that large customers are writing, or really any API customers are writing

that need to be repeatable and reliable.

This is the anthropic documentation.

We have a section on prompt engineering with a whole bunch of different

tips and strategies,

a lot of which do matter, but some of which matter more than others.

Which is what I want to talk about in this video.

I want to focus on the tips that are worth your time.

There's a lot of stuff out there on the internet around prompting.

Some of it is a little bit dubious.

So we're going to really focus on the prompting tips

that have empirical evidence to back them up.

But first, I want to show you an example of what I mean when I say a consumer

prompt versus a real-world enterprise prompt.

So I'm back in a notebook.

I have the same initial setup we've had from previous videos,

and here's an example of a potential chat bot or consumer

prompt that I might type into the Claude AI website.

Help me brainstorm ideas for a talk on AI and education.

And if I'm happy with the result, great.

If not, I have as many opportunities as needed to follow up and say "Woops!

Actually, you're focusing too much on AI, not enough on education.

Can you change this to a bullet-pointed list?

Can you make this markdown?"

Right. I can follow up over and over and over again.

I have a lot of wiggle room and room for forgiveness.

Now let's take a look at a potential enterprise-grade prompt.

Now, I'll warn you ahead of time, this is quite long and way too much

to read and go over in this video, but that's kind of the point.

I want you to see that these prompts get long.

They get complicated.

They have structure to them.

A lot of effort goes into creating these prompts

beyond just sort of, you know, coming up with a thought

and following it up with another thought in the way that I might talk to Claude.AI.

So this is an example of a prompt that takes customer service calls,

transcripts from a customer service call, and then generates Json summaries.

And we might be doing this thousands of times per hour or maybe even per minute.

If we're running, you know, a

massive call center or we have a huge customer support team.

So we're not going to go over this piece by piece,

but I'm leaving you with this prompt so that you can go over it if you'd like.

There are a few things in here we'll we'll refer back to.

The first thing that I want to

highlight, though, is that for enterprise grade prompts, for repeatable prompts,

we really think about them as prompt templates where we have

some large prompt structure that largely stays the same with a dynamic portion

or multiple dynamic portions that are inserted as variables.

So in this example, it starts by saying.

Analyze the following customer service call and generate a Json object.

Here's a transcript.

And then on this line we have a placeholder

where we actually would insert the real call transcript.

So we would do this dynamically just using a string

method most likely to replace this with a real transcript.

And then another real transcript and thousands and thousands of them

in a repeatable way.

So we think of this more as a template instead of a one-off use case.

Like you might consider your prompts for Claude.AI.

So back to the slides for a moment.

I've listed some of the more important prompting tips here,

and I've bolded the ones that I think are the most important or more important

than anything else.

So one of them is use prompt templates.

We've hinted at that idea.

Other things on here include letting Claude think,

also known as chain of thought.

We'll talk about that in a bit.

Structuring your prompts with XML.

We saw a little bit of that in the prompt I just showed,

but we'll also focus on that in the next few minutes and using examples.

These are all techniques that have real data

behind them that actually back up the claims that they matter.

So now what I want to do is go through these and try

and build a relatively real-world prompt.

It will get a little bit long.

Prompts do get long.

It's a lot of text, a lot of strings.

But we're going to go through this one bit at a time.

We're going to go about this by building up to a larger

real-world or enterprise-grade prompt.

And the idea that we'll be using is a customer

review, classification and sentiment analysis prompt.

Let's say that we run some fictional e-commerce company.

Acme company. And we have hundreds of products

and thousands and thousands of customer reviews.

We're going to use a Claude API to help us understand

the sentiment of those reviews and some common complaints.

So if this is a hypothetical review, I recently purchased XYZ smartphone.

It's been a mixed experience.

It lists some positive, some negatives.

It says, you know, I expected fewer issues.

I want Claude to be able to tell me in a repeatable fashion,

is this a positive review?

Negative is a neutral.

And I wanted to highlight some of the key issues and point to feedback.

Specifically for doing this at scale with thousands of thousands of reviews,

I probably want the output to contain some easily extractable

will and easy to work with the output format.

Often that will be Json.

Maybe something like this.

Some repeatable object that always has a sentiment score

positive, negative or neutral.

It has some analysis under a key called sentiment analysis.

And then it lists the actual complaints.

So performance like poor value, unreliable facial recognition and so on.

And then I can easily do this at scale for thousands of reviews.

Storming a database.

Compare them build charts, whatever I want to do with this repeatable output.

So we're going to approach this piece by piece with our task now defined.

We want to take customer reviews and turn them into Json

with sentiment analysis information and customer complaint data extracted.

We're going to go

through this one part at a time and then build up the entire prompt.

So the first tip we'll talk about is setting the role for the model.

Now this is actually one that I don't feel as strongly about it.

So we'll go through it pretty quickly.

Something that can be useful

is just giving the model a clear role and set of expectations upfront.

So in this case it might look something like this.

You are an AI assistant specialized in analyzing customer reviews.

Your task is to determine the overall sentiment of a given review

and extract any specific complaints mentioned.

Please follow these instructions carefully.

So obviously this is just one piece of the prompt,

but we're setting the role or setting the stage, giving the model

some context as to what it's supposed to be good at.

So the next step here is to provide the actual instructions to the model.

Right.

If we scroll up, we told the model.

Please follow these instructions carefully.

Now we're going to give it a very clear and direct ordered list of instructions.

So the first instruction is to review the following customer feedback.

We're making this a prompt template where we'll actually insert a customer

review here.

Now you don't have to use these double curly braces.

You can use whatever

sort of variable that you want or placeholder that you want to replace.

We like to use double curly braces, but definitely not a requirement.

Additionally, notice that I'm using XML tags here.

Not a requirement either, but Claude models tend to work very well with XML tags.

You can use any sort of syntax or any sort of separators to tell the model.

Here's where the customer review begins and here's where it ends.

Some of these customer reviews might be short a couple of sentences,

but for some very disgruntled or very enthusiastic customers,

might be looking at thousands of characters.

So we want to clearly tell the model.

Here's where the review begins. Here's where it ends.

The next thing that we'll focus on

are the actual steps we want the model to go through.

All right.

We've provided the context and said we want you to review this customer

feedback.

We will then eventually insert the customer feedback in here.

What do we want the model to do?

It may be tempting to simply say generate Json that includes a sentiment score.

Is it positive, neutral or negative?

And a list of complaints that you've extracted and that may work.

It likely will work in a lot of situations,

but one of the prompting tips I want to highlight

here is what we call letting Claude think or chain of thought.

Essentially, telling the model

that before it comes to a decision or some sort of conclusion,

we want it to think out loud and output some analysis to help it make a decision.

And then eventually make that judgment.

So here's an example of what that could look like. In this variable instruction

part two, another long string.

I'm telling the model here's your second step.

So once you've reviewed the customer feedback I want you to analyze the review

using the following steps.

There's a few things I want to highlight.

First of all, this line here tells model to show its work in review

break down text.

Again, you don't have to use XML, but the model performs very well,

the Claude family models perform well with XML, so a common strategy

is to tell Claude to contain certain parts of its output

in certain XML text, so we can tell it to do its thinking out loud.

Instead of review, break down tags separate from the actual analysis

the final result will tell it to put its results in some separate tag.

We tell the model to start

by extracting key phrases that might be related to sentiment.

Then we tell the model to consider arguments for positive,

View remainder of file in raw view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

course1_script.txt

Latest commit

History

course1_script.txt

File metadata and controls