forked from mikeckennedy/talk-python-transcripts
-
Notifications
You must be signed in to change notification settings - Fork 5
Expand file tree
/
Copy path011-pyimagesearch.txt
More file actions
1508 lines (754 loc) · 54.5 KB
/
011-pyimagesearch.txt
File metadata and controls
1508 lines (754 loc) · 54.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
00:00:00 Does a computer see in color or black and white?
00:00:02 It's time to find out on episode 11 of Talk Python To Me with our guest, Adrian Rosebrock,
00:00:08 recorded Thursday, May 20th, 2015.
00:00:12 Welcome to Talk Python To Me.
00:00:42 A weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.
00:00:47 This is your host, Michael Kennedy.
00:00:48 Follow me on Twitter where I'm @mkennedy and keep up with the show and listen to past episodes
00:00:54 at talkpythontome.com.
00:00:55 This episode, we'll be talking with Adrian Rosebrock about computer vision, OpenCV, and PyImage Search.
00:01:03 Hello, everyone.
00:01:05 I have a bunch of cool news and announcements for you this week.
00:01:08 First, this show on PyImage Search is a listener-suggested show.
00:01:13 Thank you to J.I. Lorenzetti for reaching out to me and suggesting this topic.
00:01:18 You can find his contact details in the show notes.
00:01:21 As always, I'm also excited to be able to tell you that this episode is brought to you by CodeShip.
00:01:27 CodeShip is a platform for continuous integration and continuous delivery as a service.
00:01:33 Please take a moment and check them out at codeship.com or follow them on Twitter where they're at codeship.
00:01:38 Did you know that most of our shows come with full transcripts and a cool little search filter feature?
00:01:43 If you're looking for something you hear in an episode, just click the full transcript button on the episode page
00:01:49 and search for it.
00:01:49 Also, I want to say thank you to everyone who has been participating in the conversation on Twitter
00:01:55 where we're at Talk Python.
00:01:57 It's a great feeling to see all the feedback and thoughts every week when we release a new show.
00:02:01 But if you have something more nuanced to say that doesn't fit in 140 characters
00:02:06 or you want it to be more permanent than Twitter, every episode page has a Discus comment section at the bottom.
00:02:13 I encourage you to post your thoughts there.
00:02:15 This week, I ran across a really awesome GitHub project called Python-Patterns.
00:02:20 You can find it at github.com/f-a-i-f slash python dash patterns.
00:02:26 It's a collection of really crisp design patterns implemented in a Pythonic manner.
00:02:31 For example, you'll find patterns such as the adapter, builder, chain, decorator, facade,
00:02:37 and flyweight patterns, just to name a few.
00:02:39 It's really extensive and pretty cool.
00:02:41 I think you'll learn something if you check it out.
00:02:42 Finally, I put together a cool YouTube playlist.
00:02:45 This is a series of nine lectures from Dr. Philip Gao, a professor at the University of Rochester, New York.
00:02:53 Find him on Twitter where he's at P-G-B-O-V-I-N-E, P-G-Bovine.
00:02:58 The video series is entitled C Internals, a 10-hour code walk through the Python interpreter source code.
00:03:05 You can find it at bit.ly.com slash cpythonwalk, all lowercase, no spaces.
00:03:12 Also, you'll find all these links in the show notes.
00:03:16 Now, let's get to the interview with Adrian.
00:03:18 Let me introduce Adrian.
00:03:23 Adrian Rosebrock is an author and blogger at pyimagesearch.com.
00:03:28 He has a PhD in computer science with a focus on computer vision and machine learning
00:03:32 and has been studying computer vision his entire adult life.
00:03:36 He has consulted for the National Cancer Institute to develop methods to predict breast cancer risks
00:03:41 using breast histology images and authored a book, Practical Python and OpenCV,
00:03:48 on utilizing Python and OpenCV to build real-world computer vision applications.
00:03:52 Adrian, welcome to the show.
00:03:56 Oh, thank you.
00:03:57 It's great to be here.
00:03:58 I'm very excited about computer vision and sort of merging the real world with computer science,
00:04:04 with robotics.
00:04:05 And I think there's just some really neat stuff going on.
00:04:07 And you're doing a very cool part in that.
00:04:10 Oh, thank you.
00:04:11 So we're going to talk about PyImageSearch.
00:04:13 We're going to talk about OpenCV and some of the challenges and even the future of these types of technologies.
00:04:18 But before we get there, you know, everyone's interested in how people got started in programming and Python.
00:04:22 What's your story?
00:04:23 I started programming when I was in high school.
00:04:27 I started out with the basics of HTML, JavaScript, CSS, did some basic programming.
00:04:33 And, you know, I'm probably getting a lot of hate mail about this.
00:04:36 But when I first started learning how to program, I did not like the Python programming language that much.
00:04:41 And this was around the early version 2 of Python.
00:04:45 I didn't like the syntax.
00:04:47 I didn't like the white space.
00:04:49 And for a long time, I was really, really put off by Python.
00:04:52 And that was a huge mistake on my part.
00:04:55 I don't know what was wrong with me back then.
00:04:57 I guess it was just high school ignorance or something.
00:04:59 But by the time I got to college, I started working in Python a lot more.
00:05:04 And that's especially true in the scientific area.
00:05:07 You see all these incredible packages in Python like NumPy and SciPy that just integrate with computer vision and machine learning.
00:05:15 And all other types of libraries.
00:05:17 And more and more people were transitioning over from languages like MATLAB to languages like Python.
00:05:25 And that's so cool.
00:05:26 And it really wasn't until college that I got into Python.
00:05:30 And I remember this one girl.
00:05:32 She was in my machine learning class.
00:05:34 And she had a sticker on the back of her laptop that said, Python will save the world.
00:05:38 I don't know how, but it will.
00:05:40 And that resonated with me.
00:05:42 I'm like, that sticker's true.
00:05:44 That is absolutely true.
00:05:46 It's such a great language.
00:05:48 So unfortunately, I did not have the best first experience with Python.
00:05:52 It took me four or five years later to actually come around.
00:05:56 But now that I'm here, I love it.
00:05:58 And I can't imagine programming in any other language.
00:06:01 It's almost a freeing feeling, a relaxing zen when you're coding in Python.
00:06:07 Yeah, that's a funny story.
00:06:09 It really is a wonderful language.
00:06:11 I also took a while to get there.
00:06:13 But looking back, I would have enjoyed being there sooner.
00:06:16 I went from MATLAB to C++ on Silicon Graphics machines.
00:06:21 So I had a bit of a torturous introduction.
00:06:23 But it was all good.
00:06:24 CodeShip is a hosted, continuous delivery service focused on speed, security, and customizability.
00:06:41 You can set up continuous integration in a matter of seconds and automatically deploy when your tests have passed.
00:06:46 CodeShip supports your GitHub and BitBucket projects.
00:06:50 You can get started with CodeShip's free plan today.
00:06:52 Should you decide to go with a premium plan, Talk Python listeners can save 20% off any plan for the next three months by using the code TALKPYTHON.
00:07:01 All caps, no spaces.
00:07:03 Check them out at CodeShip.com.
00:07:05 And tell them thanks for sponsoring the show on Twitter where they're at CodeShip.
00:07:10 So you're focused on computer vision and image processing.
00:07:24 Where did that story begin?
00:07:25 That story also started in high school.
00:07:29 Originally, I had this idea that I wanted to go work for Adobe.
00:07:33 And I wanted to work on developing Photoshop and Illustrator.
00:07:37 I love the idea of being able to write code that could analyze an image.
00:07:42 And for whatever reason, that just like really captured my imagination.
00:07:47 I could see these algorithms running in Photoshop.
00:07:49 And I was like, you know, what's really going on behind the scenes?
00:07:52 Like, how are they manipulating these images?
00:07:54 What does this code look like?
00:07:56 So for the longest time, I really wanted to develop these graphic editing applications.
00:08:02 But I didn't have the math experience.
00:08:07 This may surprise some people, given that I have a PhD in computer science.
00:08:11 But up until late high school, I did not do well in mathematics courses.
00:08:17 I got C's in algebra and geometry.
00:08:22 And it really wasn't until I kind of really put my back against the wall.
00:08:27 And I said, you know what?
00:08:28 I got to learn calculus and statistics.
00:08:30 So I did a self-study in AP Calc.
00:08:33 And I took AP statistics.
00:08:34 And I did well with those.
00:08:36 I'm like, man, math is fun now.
00:08:38 Like, I understand this.
00:08:40 So I got to college.
00:08:41 And I only took one computer vision course at the, because the school I went to didn't really
00:08:47 have a computer vision focus.
00:08:49 They had a wonderful machine learning focus, but not really a computer vision focus.
00:08:53 And what I found out was that, you know, you don't need a mathematical background to get
00:09:00 started in computer vision.
00:09:01 And I think this is true in a lot of areas of computer science, whether or not people want
00:09:05 to admit it.
00:09:06 A lot of people talk themselves out of getting started and stuff, especially challenging things,
00:09:11 because they're just scared of it.
00:09:13 They don't want to fail.
00:09:14 And that's the cool thing about Python.
00:09:17 Like, you almost don't have to worry about the code.
00:09:20 You get to focus on learning a new skill.
00:09:23 And for me, that was computer vision.
00:09:25 And that was the OpenCV library, an open source library that makes working with computer vision
00:09:30 a lot easier.
00:09:31 So again, it really wasn't until college that I really started to get into it.
00:09:36 getting interested.
00:09:37 Or not necessarily getting interested.
00:09:38 More so being able to take action on what I wanted to do.
00:09:42 Right.
00:09:43 Maybe, you know, it felt a little unattainable.
00:09:45 Like, I'm going to go be an engineer, but I don't know math.
00:09:48 And so there's no way I can do this.
00:09:49 But once you kind of got over that hump, then it was no big deal, right?
00:09:53 Right.
00:09:53 Yeah.
00:09:54 That's very freeing.
00:09:55 So you mentioned OpenCV and your project is PyImageSearch.
00:10:00 What's the relationship there?
00:10:01 So OpenCV, again, it's a computer vision library that makes working with images a lot easier.
00:10:08 You know, it abstracts the code that loads an image off of disk or does edge detection or
00:10:14 thresholding or any other simple image processing function like that.
00:10:19 allows you to actually build complicated computer vision programs.
00:10:23 You can do things like tracking objects and images or video streams, for example.
00:10:28 Detecting faces.
00:10:29 Recognizing whose face it is.
00:10:31 And OpenCV really facilitates this process.
00:10:35 And OpenCV really is like the de facto library for computer vision and image processing.
00:10:41 And you have bindings for it in countless languages.
00:10:44 The library itself is written in C and C++, but you can get bindings and access it in Java
00:10:52 and any of the .NET frameworks in Python.
00:10:57 So again, while you can access it in a programming language, I have this love for Python now.
00:11:03 And when I took the course, the computer vision course in college, I realized, man, like, people
00:11:10 are spending a lot of time writing their class projects in C and C++.
00:11:14 Why are they doing that?
00:11:15 Like, you're fighting over these weird compile time errors.
00:11:18 And, you know, you're not really learning anything.
00:11:21 And that's kind of the tenet behind PyImageSearch.
00:11:24 It's a blog that I run dedicated to teaching computer vision, image processing, and OpenCV using
00:11:30 the Python programming language.
00:11:31 That's great.
00:11:32 And I think that, you know, using Python seems like the perfect choice.
00:11:36 You're sort of orchestrating these high-level functions that are calling down into C++, doing
00:11:41 high-performance stuff, and then giving you the answer.
00:11:43 And that seems like the right way to be using Python.
00:11:46 So what's the actual package I use?
00:11:48 If I were to say pip install something, what do I type to get started?
00:11:51 So unfortunately, OpenCV is not pip installable.
00:11:56 I wish it was, but it is not.
00:11:58 And it is not the easiest package to get installed on your system.
00:12:04 If you're using Ubuntu or any Debian-based operating system, you technically can do an app git install.
00:12:14 But that's going to pull down a previous version of OpenCV.
00:12:18 You're going to run into a lot of problems with the Python bindings, and it's not a very
00:12:22 good experience.
00:12:22 So what you actually have to do is compile it from source, download the code from their
00:12:28 GitHub or the SourceForge account, and manually compile it and install it.
00:12:32 And in fact, that's really the only way to do it if you're interested in using virtual environments,
00:12:37 which, as most Python developers are interested in, sequestering their packages.
00:12:45 Yeah, absolutely.
00:12:45 Okay.
00:12:46 So I go and I download that.
00:12:47 And then what packages are in there that I would work with?
00:12:51 Is that CV2?
00:12:53 Is that the one I would import?
00:12:55 Yep.
00:12:55 So if you were to open up your favorite editor, you would just type in import CV2, and I'll
00:13:01 give you access to all of your OpenCV bindings.
00:13:04 Okay, great.
00:13:04 And now I looked at some samples on your blog about how I might go and grab like an image
00:13:11 from a camera hooked to an Adreno.
00:13:14 Maybe we could talk a little bit about the type of hardware that you need and the spectrum
00:13:20 of devices you can interact with and that kind of stuff before we get into the more theoretical
00:13:24 bits.
00:13:24 Sure.
00:13:25 So that's kind of the cool thing about OpenCV is that it's meant to be run in real time.
00:13:31 So you can easily process video files, raw video streams without too much of a problem,
00:13:38 again, depending on the complexity of your algorithm.
00:13:41 And OpenCV is meant to run on a variety of different devices.
00:13:46 I personally develop applications on my MacBook, but I also own a Raspberry Pi and a camera module
00:13:54 for the Raspberry Pi.
00:13:55 And using OpenCV, I can access the Raspberry Pi video stream and then actually build like
00:14:02 a home surveillance system using nothing but OpenCV and a Raspberry Pi.
00:14:08 I built this one project where I had a Raspberry Pi camera mounted on my kitchen cabinets looking
00:14:14 over the front door of my apartment.
00:14:17 And it would detect motion, such as when you're opening the door and somebody's walking inside.
00:14:22 So once it detected motion, it would snap a photo of whoever was walking inside, try and identify
00:14:27 their face, and then it would take that screenshot or the screen capture and then upload it to my
00:14:33 personal Dropbox.
00:14:34 So I had like this real-time home surveillance system.
00:14:37 That was really, really cool to develop.
00:14:39 And again, like this is using simple hardware.
00:14:43 The Raspberry Pi is not a powerful machine, but you could still build some really cool computer
00:14:48 vision applications with it.
00:14:49 Yeah.
00:14:50 And it's cheap too, right?
00:14:51 Yeah.
00:14:52 The Pi itself is, I think, $35 and probably another $20 for the camera module.
00:14:58 Yeah.
00:14:59 That's really easy to get started.
00:15:01 So very cool.
00:15:02 Very cool.
00:15:03 How does computer vision work?
00:15:05 I mean, I have a little bit of a background in trying to identify things and images.
00:15:11 I worked at this place called eye tracking, E-Y-E, tracking, not I, the letter I, tracking.
00:15:18 And we did a lot of stuff with image recognition and detecting eyes.
00:15:22 And I know enough to know that it seems really hard, but how does it work?
00:15:27 So computer vision as a field is really just encompassing methods on acquiring, processing,
00:15:34 analyzing, and just understanding and interpreting the contents of an image.
00:15:38 For humans, this is really, really easy.
00:15:41 We see a picture of a cat, and we know, like, oh, that's a cat.
00:15:43 And we see a picture of a dog, and we obviously know that's a dog.
00:15:46 But a computer, it doesn't have a clue.
00:15:48 It just sees a bunch of pixels, just a big matrix of pixels.
00:15:53 And the challenging part, as you suggested, is writing code and creating these algorithms
00:15:58 that can understand the contents of an image.
00:16:00 You can't open up your Python source file and then write if statements that say,
00:16:06 if this pixel equals, you know, whatever RGB code, you know, then this is a cat, right?
00:16:12 If it equals this pixel value, then it's a dog.
00:16:15 Like, you can't do that.
00:16:16 So what happens is computer vision really leverages machine learning as well.
00:16:21 So we can take this data-driven approach and say, here's a ton of examples of a cat,
00:16:26 and here's a ton of examples of a dog.
00:16:28 Let's see how we can abstractly quantify and represent this huge image.
00:16:35 And just like a small, what they call feature vector.
00:16:37 It's a fancy academic way of saying a list of numbers.
00:16:40 I'm going to quantify this big 3,000 by 3,000 pixel image into a feature vector that's 128 numbers long.
00:16:49 And then I can compare them to each other.
00:16:50 I can rank them for similarity.
00:16:52 I can pass them to machine learning algorithms to actually classify them.
00:16:57 So the field of computer vision is very large.
00:17:00 And again, it spans so many different areas of processing and analyzing images.
00:17:06 But if we're talking strictly about classifying an image and detecting objects in an image,
00:17:10 then we're most likely leveraging some machine learning at some point.
00:17:14 Okay, and cool.
00:17:15 And when you say machine learning, is that like neural networks or what's going on back there?
00:17:21 The machine learning algorithm you would use really depends on your application.
00:17:25 Deep learning has gotten so much attention over the past few years.
00:17:31 And deep learning has its roots in neural networks.
00:17:34 So we see a lot of that.
00:17:36 You also see very simple machine learning methods like support vector machines, logistic aggression.
00:17:44 You see that a lot as well.
00:17:46 And these methods, while simple, they're actually – the bulk of the work is actually happening on describing the image itself, you know, quantifying it.
00:17:57 So if you have a really good quantification of an image, it's a lot easier for the machine learning algorithm to take that and perform the classification.
00:18:06 Right, sure.
00:18:07 And so how much of this exists in external libraries like scikit-learn or OpenCV or something like this?
00:18:16 And how much of that is like I've got to create that system for myself when I'm getting started based on my application?
00:18:24 So OpenCV does include some machine learning components, but I really don't recommend that people use them just because they're a little finicky and they're not that fun to use.
00:18:35 And especially in the Python ecosystem, you have scikit-learn.
00:18:38 So you should be defaulting to that.
00:18:42 And to give an example, I wrote my entire dissertation, gathered all the examples using OpenCV and scikit-learn.
00:18:52 I took the results that OpenCV was giving me and I passed them on to the machine learning methods and scikit-learn.
00:18:59 Right.
00:18:59 Oh, that sounds very useful.
00:19:01 I think a lot of the challenging aspects of getting started in something new like this, if you're not already involved in it, is just knowing what exists, what you can reuse, and what you have to write yourself.
00:19:12 So knowing that that's out there is really nice.
00:19:14 Yeah, for sure.
00:19:15 And some of these algorithms you definitely don't want to be implementing yourself.
00:19:20 No, I'm sure you don't.
00:19:22 Unless you're really, really into high-performance matrix multiplication and other types of processing that make your day, right?
00:19:29 Exactly.
00:19:31 I have some sort of mental models of how I might use computer vision.
00:19:34 And then you have the Hollywood models, right?
00:19:37 Like Minority Report and so on.
00:19:39 But what's the current state of the art?
00:19:41 Like where do you see computer vision really prominently being used in the world?
00:19:44 So computer vision is used in your everyday life, whether you realize it or not.
00:19:51 And it's kind of scary, but it's also kind of cool.
00:19:56 Back about a year ago, I was traveling back and forth between Maryland and Connecticut on the East Coast of the United States constantly for work-related activities.
00:20:07 And one day I was exhausted.
00:20:10 It was Friday.
00:20:12 I just really wanted to leave Maryland and get back up to Connecticut and sleep in my own bed and just pass out.
00:20:20 So I left to work a little early.
00:20:22 And I started tearing up 95.
00:20:24 It was a beautiful summer day.
00:20:26 Sunlight streaming down.
00:20:28 Had my windows open and the wind blowing in.
00:20:33 And if you've ever driven on 95, specifically on the East Coast of the United States, you know there's always just like a ton of traffic or just lots of construction to ruin your drive.
00:20:42 And for whatever reason this day, there was no traffic.
00:20:46 There was no construction.
00:20:47 And I was just flying down the road.
00:20:49 And I made excellent time getting home.
00:20:51 However, two weeks later, I get my mail.
00:20:55 And I noticed that there is a speeding citation addressed to me.
00:21:00 Apparently, I had passed one of the speeding cameras that was mounted along the side of the road.
00:21:05 It detected that my car was traveling above the posted speed limit.
00:21:09 It snapped a photo of my license plate.
00:21:11 And then it applied what's called automatic license plate recognition where it takes the image, automatically analyzes it, finds my license plate, and then looks it up in a database and mails me a ticket.
00:21:24 I was like, man, I've written code to do this.
00:21:28 I know exactly how this worked.
00:21:30 So, like, it was the only time where I had a smile on my face as I was writing out the $40 check or whatever it was.
00:21:38 I'm like, you guys got me.
00:21:39 And I know how you did it.
00:21:41 Yeah, exactly.
00:21:42 Like, your love for image recognition is slightly turned against you.
00:21:47 Just briefly there.
00:21:48 Just briefly.
00:21:49 You said you worked on this thing called ID My Pill.
00:21:53 What's that?
00:21:54 So, ID My Pill is an iPhone application and an API that allows you to identify your prescription pills in the snap of your phone.
00:22:04 So, the general idea is that we are a little too faithful in pharmacists and our doctors.
00:22:13 And not to say there's anything against that, but mistakes do happen.
00:22:16 And people do get hurt.
00:22:19 They get sick.
00:22:20 And some of them do die every year due to taking the wrong medication.
00:22:23 So, the idea behind ID My Pill is to validate your prescription pills.
00:22:27 And it's also a way for pharmacies, for healthcare providers to facilitate better care for their patients.
00:22:35 So, you just take your pills, snap a photo of them, and then computer vision algorithms are used to automatically analyze and recognize the pill.
00:22:43 That way, you can validate that, yes, this is the correct pill.
00:22:46 This is what it says on the pill bottle.
00:22:48 And I know what I'm taking is correct.
00:22:52 That sounds really useful.
00:22:53 What do you think the chances are, like, something along those lines could be automated?
00:22:58 So, as the pharmacists are, like, actually filling the prescription, you know, the computer knows what they're filling and it sees what they're putting into bottles.
00:23:05 Could it say, whoa, whoa, whoa, this does not look legit?
00:23:08 Yeah, I think it absolutely can be automated.
00:23:12 The current systems right now that pharmacies use, especially within hospitals, some of them are taking RFID chips.
00:23:20 So...
00:23:21 You know, when you take a pill bottle off the shelf, it's able to validate that you are taking the correct medication and filling it.
00:23:30 But again, that's not perfect.
00:23:32 And pills can get mixed up.
00:23:35 So, in a perfect world, what you end up doing is you would have that RFID mechanism in place.
00:23:40 And then, you know, you have a mounted camera looking down at their workstation, at their desk.
00:23:48 And then, you know, it validates the pills in real time.
00:23:51 And they get a nice little thumbs up on the screen or whatever heads up display that they have in front of them.
00:23:56 And they can continue filling the medication.
00:23:58 Yeah, that sounds really helpful.
00:24:00 So, I used a slightly less productive, contribute to society sort of computer vision last night.
00:24:06 I was out with some friends and my wife.
00:24:08 We were having some wine.
00:24:10 And there's this really cool iPhone app called Vivino.
00:24:13 I think I'm saying it right.
00:24:14 Yeah.
00:24:14 And you can just take a picture of a bottle, even if the background is all messy and there's people around and so on.
00:24:21 And it'll tell you what the ratings are, how much it should cost at, you know, sort of standard retail price.
00:24:27 And I was really impressed with that.
00:24:29 You know, there's a lot of interesting sort of consumer style uses as well, I think.
00:24:35 Oh, for sure.
00:24:36 Vivino is one of the major ones.
00:24:40 And another one that you can see computer vision used a lot in, and I don't think people really notice it, but they appreciate it, is within sports.
00:24:50 So, in America, if you're watching a football game, you'll notice that they have, like, a yellow line drawn across the field marking the, you know, the spot of the ball along with the first down marker.
00:25:04 And these lines are drawn using calibrated cameras.
00:25:08 So, computer vision is used to calibrate the cameras and then know where to actually draw that line on the broadcast of the game.
00:25:15 And then, similarly, you can use computer vision and machine learning in the back end and analyze games to determine, you know, what's the optimal strategy.
00:25:25 And this is actually done in Europe a lot in soccer games.
00:25:29 So, they'll detect how players are moving around, how the ball is passed back and forth.
00:25:34 And they can almost run these data mining algorithms to learn how you're going to beat the other team and try and learn their strategy.
00:25:41 It's pretty cool.
00:25:43 Yeah, that really is amazing.
00:25:45 I feel like these days watching sports on television is actually a better experience than going to them live a lot of times because of these types of things, right?
00:25:54 It's very clear.
00:25:55 Oh, look, they've got to go, like, a foot and a half forward and then they get the first down.
00:25:59 Otherwise, they're going to fail.
00:26:00 And, you know, you don't really quite get the same feel live, which is ironic.
00:26:04 Right.
00:26:04 It's almost that it used to be that you didn't get the full story unless you were there.
00:26:09 And now it's kind of flipped around.
00:26:10 Like, you get more than the full story if you're watching the game on TV.
00:26:14 You get all the detail that you could possibly want.
00:26:16 Yeah, that's right.
00:26:17 So, you know, that sort of leads into the whole story of augmented reality and stuff like that with, you know, the Microsoft HoloLens, Google Glass, and, you know, a bunch of iPhone apps and other mobile apps as well.
00:26:29 What kind of interesting stuff do you see out in that world?
00:26:31 I have not had a chance to play around with the HoloLens.
00:26:34 I have used the Google Glass and played around with that.
00:26:37 Most of the applications I see, again, this is just because of my work with ID mypyll, is medical related.
00:26:46 So, you'll see surgeons going into really, really long, you know, 10 plus hour surgeries that they perform these complex operations.
00:26:54 And they may need to look up some sort of reference material while they're doing this surgery.
00:27:00 And instead of having an assistant doing that for them, I mean, they could put on the Google Glass and have this information right there in front of them.
00:27:06 So, you see a lot of that.
00:27:08 And since the Glass has a camera, there's a lot of research focused on, especially within medicine, identifying various body parts as you're performing this surgery.
00:27:20 So, you can have this documented procedure pulled up in front of you as you're working.
00:27:24 There's no need to instruct the Google Glass to do it for you.
00:27:27 Yeah, that's pretty wild.
00:27:28 I can imagine calibrating some kind of augmented reality thing to the body that is there.
00:27:34 And you could almost see, like, the organs and stuff overlaid.
00:27:37 Yeah, absolutely.
00:27:38 Some really interesting new uses there.
00:27:41 Very cool.
00:27:43 What about things like the Google self-driving cars?
00:27:47 What role does image recognition and computer vision work there versus, say, GPS versus laser versus whatever else?
00:27:53 Do you know?
00:27:54 That remains to be seen.
00:27:56 I guess I am a little bit of a pessimist when it comes to computer vision being used for driving cars.
00:28:02 You maybe know too much about the little problems you're into, right?
00:28:06 Well, that's the thing is I don't believe computer vision itself will ever be adequate, again, by itself, for self-driving cars.
00:28:17 I think you need a lot more sensors.
00:28:20 I think you do need things like radar to help you out with that and help detect objects in front of you.
00:28:27 And that's a problem computer vision does face is when you take a photo of something, it is a 2D representation of a 3D world.
00:28:37 So, it makes it very hard to compute depth based off of that.
00:28:41 And now we have things like the Xbox 360 Kinect and stereo cameras where we can compute depth, which is making things like the self-driving cars more feasible.
00:28:51 But, again, for something like a self-driving car, it doesn't make sense to rely strictly on computer vision.
00:28:57 I think you want to incorporate as many sensors as you possibly can.
00:29:01 Yeah, I think that makes a lot of sense.
00:29:03 And it might make sense to actually have the road have things like RFID style stuff in it where the car can be sure it's between the lanes, at least on the major long part of the drives.
00:29:14 Oh, for sure.
00:29:15 And I think that's kind of the point I will say is just because you can use computer vision to solve something doesn't necessarily mean that you should.
00:29:25 There might be better solutions out there.
00:29:28 Yeah, sure.
00:29:29 You don't want to be the solution in search of a problem.
00:29:33 You want to be the solution to a problem.
00:29:37 Yeah, yeah.
00:29:39 There's a really good show I'd like to recommend to people, by the way, about this whole computers, driving cars, and that.
00:29:44 Nova did a show called The Great Robot Race.
00:29:48 And you can just Google The Great Robot Race or whatever.
00:29:51 And it's all about this DARPA competition that kind of preceded the whole Google self-driving cars and so on.
00:29:58 And it's like a two-hour really sort of technical documentary on, like, the problems these teams technically faced and stuff.
00:30:05 It was very cool.
00:30:06 So if you want to learn more, check that out.
00:30:07 Nice.
00:30:08 Yeah.
00:30:09 It's a really good sort of conceptual idea of what's going on.
00:30:11 But, you know, what do I do?
00:30:13 Like, what kind of code do I write as a Python developer to get started?
00:30:17 I mean, it's tough to talk about code on audio only.
00:30:21 So don't do too much.
00:30:22 But just, like, can you give me a sense of what kind of code I would write to maybe grab an image and ID something in it?
00:30:29 Yeah.
00:30:30 So let's take a fun example.
00:30:33 If you've ever used your smartphone to scan a document, you've basically taken, you've, like, set a piece of paper on your desk, held your phone above it, and snapped a photo.
00:30:44 And then the document's already scanned and stored as an image on your phone.
00:30:49 And you could email it to yourself or text it to someone.
00:30:53 What's really cool is that while those programs and those applications almost seem like magic, they're really, really simple to build.
00:31:02 So if I were to build a mobile document scanner, I would basically say, first up, capture the image.
00:31:08 So I'm going to read it from disk.
00:31:10 I'm going to read it from a camera sensor.
00:31:11 Nice.
00:31:12 I'm going to convert it to grayscale because color doesn't really matter.
00:31:18 I'm going to assume there's enough contrast between a document and a desk that I'll be able to detect it.
00:31:23 And I'm probably going to blur it, get rid of any type of high-frequency noise, just allowing me to focus more on the structural objects inside the image and less on the detail.
00:31:32 From there, I'll say, yeah, let's perform edge detection.
00:31:36 Let's find all the edges in the image.
00:31:39 And based on the outlines of the objects in the image, I'm going to take these, and I'm going to look for a rectangle.
00:31:47 And if you consider the geometry of a rectangle, it just has four points, four vertices.
00:31:53 So I'm going to loop over the largest regions in this image, and I'm going to find the largest one that has four vertices.
00:32:01 And I'm going to assume that's my document.
00:32:04 And then once I have the region of the image, I'm going to perform a perspective transform to give me this top-down, bird's-eye view of the document.
00:32:14 And from there, you've basically built yourself a document scanner.
00:32:19 You can apply OCR, optical character recognition to try and convert the text in the image to a string in Python.
00:32:29 Or you could just save the scanned document as a raw image.
00:32:33 That works, too.
00:32:34 It's really actually not a complicated system to build.
00:32:38 How do you deal with the slight imperfections of reality, like a crumpled receipt or document or something like that?
00:32:48 So to handle, as you suggested, like these slight imperfections, I would suggest for the document scanner example to do what's called contour approximation.
00:33:00 So, again, if a region of an image can be defined as a set of vertices, and due to the crumpling of the piece of paper, maybe I find a region of an image that has eight vertices or 12 vertices.
00:33:14 It's not a perfect rectangle.
00:33:16 It's kind of jagged in some places.
00:33:18 Well, what I can actually do is I can approximate that contour.
00:33:21 And I could basically do line splitting and try and reduce the number of points to form that contour.
00:33:28 And that's actually a very critical step in building a mobile document scanner because you're not going to find perfect rectangles in the real world just due to noise capturing the photo.
00:33:40 Even perspective, right?
00:33:42 Yeah, even perspective.
00:33:44 That can dramatically distort things.
00:33:45 Are there libraries out there that help you with that kind of stuff, or is that where you need to know some math?
00:33:49 You don't really need to know that much math at all for it.
00:33:52 It's all about gluing the right functions together at the right time.
00:33:56 And I think that's one of the harder points of learning computer vision and why I run the PyMid Search blogs.
00:34:01 I want to show people real-world solutions to problems using computer vision.
00:34:06 And when I do that, it's not really the actual code that matters.
00:34:12 It's learning why I'm applying certain functions at different times in the context of the end goal.
00:34:19 It's about taking these functions and gluing them together in a way that gives you a real solution.