Lourdes Agapito On Synthetic Media and Its Way To Transform The Film Industry, Online Shopping, and More

LDV Capital invests in people building businesses powered by visual technologies. We thrive on collaborating with deep tech teams leveraging computer vision, machine learning, and artificial intelligence to analyze visual data. We are the only venture capital firm with this thesis. We regularly host Vision events – check out when the next one is scheduled.

Our Women Leading Visual Tech series is here to showcase the leading women whose work in visual tech is reshaping business and society. In this interview, LDV Capital’s Abigail Hunter-Syed had a chance to speak with Dr. Lourdes Agapito, a Professor of 3D Vision in the Department of Computer Science at University College London. Her research has been consistently focused on 3D dynamic scene understanding from video. 

Dr. Agapito was granted Marie Curie Postdoctoral Research Fellowship and throughout her career, she’s been a Programme chair for the most prominent events in the industry. She is an elected member of the Executive Committee of the British Machine Vision Association, a member of the Vision and Imaging Science group and the Centre for Inverse Problems.

She also co-founded Synthesia, a company that generates localized and personalized videos using AI. It’s one of our LDV portfolio companies. Lourdes spoke to LDV’s Abigail Hunter-Syed. (Note: After five years with LDV Capital, Abby decided to leave LDV to take a corporate role with fewer responsibilities that will allow her to have more time to focus on her young kids during these crazy times.)

The following is the shortened text version of the interview and the unedited video can be found below.

You may have seen her presentation “Capturing vivid 3D models of the world from video” at our sixth annual LDV Vision Summit © Robert Wright

You may have seen her presentation “Capturing vivid 3D models of the world from video” at our sixth annual LDV Vision Summit © Robert Wright

Abby: You've got so many roles that you're balancing as a professor and also a co-founder of your startup. What does your work-life balance look like, especially in these times? 

Lourdes: There are so many things to juggle. Just being a professor means that you're doing more than one job and then on top of that, entrepreneurship with Synthesia. I think the key to all this is to surround yourself with a great team. I try to work very closely with my Ph.D. students, with my postdocs and have a tight-knit group that helps. You can move a lot faster and achieve a lot more if everybody is working together. Likewise, with Synthesia: we have a team where everybody can focus on whatever they do best.

Abby: Divide and conquer. We talk about it all the time.

Lourdes: Having said that, our daily routine is pretty crazy. And especially these days when we are running everything from home. If anything, I have to admit that for some things it's been fantastic. No commuting means that you can spend time on things that matter more, like spending time with your family as well.

Abby: That's the #1 benefit that everybody I know talks about! I'm glad that it's going well and that you've got such a great relationship with your team and you make all these different pieces work together into such a symbiotic relationship. I know you've got kids at home, so I wonder how you described what you do at work to your kids when they were two years old.

Lourdes: With a two-year-old, I would play a game. I would tell them something like, "Go into that room, find your favorite teddy bear and bring it back!" They would probably run and find it very quickly and then bring it back. Then I would say, "Okay, I'm going to blindfold you and now find this other thing and bring it back." Obviously, they would very quickly realize that vision is so important to carry out many of the tasks that we do. 

I'll give them a picture of a few people and say, "Find your mom in that picture! Is she happy or is she a bit sleepy?" I try to get them to reason about the information that's in images and to see that it is something that comes so naturally to us. And then you can explain how you can make a computer do the same thing. You put a camera and get images, but you still have to make sense of those, and these are the kinds of things that we do at work. 

If they're a little bit older, then it's easier. I remember going to my son's class when he was about nine or ten to talk to kids about visual effects for the film industry and how you could synthesize all these images. They can connect with that and also with video games. I think we're very fortunate that it's a field that everybody can relate to.

Lourdes started her keynote with a story about the most memorable gift she has ever received © Robert Wright

Lourdes started her keynote with a story about the most memorable gift she has ever received © Robert Wright

Abby: Do you remember when you first became fascinated with computer vision?

Lourdes: After I'd finished my Master's in computer science, I went to a summer school and there was a presentation from a research group that had a robot that was moving around the environment, avoiding obstacles based on the information that it was gathering from an ultrasound sensor. I was so fascinated by this and robotics in general that I then started my Ph.D. with that research group. In my thesis, I researched the idea of having two cameras that were looking at the world and trying to infer information about the world to guide a robot.

Abby: When was that? Because now that's kind of overtaken the robotics industry and this is seen as the future of it, right?

Lourdes: I started my Ph.D. in 1991. What we could do in those days was pretty limited. It's quite interesting that we have moved quite a lot in this field, but it's such a complex perception problem that we're trying to solve, that in some ways we're only scratching the surface just now.

Abby: Fascinating! Your work is focused on the 3D reconstruction process from videos and images. What are some of the applications for this technology?

Lourdes: Many tasks can be solved just with 2D information but I’m more interested in the tasks in which a computer or a robot needs to interact with the world. Imagine that you want to build a robot that is going to take out all the clean stuff from the dishwasher, empty the dishwasher, and clear everything away. You need 3D information for that. And cognitively, those are very complex tasks, because they require not just recognition and being able to identify the objects, but also to be able to estimate their shape, how am I going to grasp the objects and what's the actual interaction.

In the future, we'll probably shop online for clothes a lot more and we will be able to try things on to the extent where we can synthesize an image with ourselves wearing those clothes and see what we look like from one side or the other. We still can't do that! For that, we need to be able to synthesize those images and we need to be able to model clothes as well as the whole human body. I’m fascinated by the fact that we can go from images to the third dimension. The question is: how can we recover that third dimension that was lost when you project all this stuff in images?

Abby: My husband would agree on the complexity of unloading the dishwasher! Many experts have talked that AI is going to come alive when we're able to put vision and perception and everything else into robots because then they can start to collect the same kinds of information that we do and develop real intelligence around it.

Lourdes: At the moment we're good at solving very small sub-problems and very specific problems that have an input and an output. If we want to be able to generalize more, then the perception, and in particular 3D perception, is going to be crucial.

Abby: I'm a little far off from being able to buy a robot who can do my dishes for me. I'm excited for that day!

Lourdes: We have advanced surprisingly quickly in certain things. Maybe even just three or four years ago, there were things that we never imagined we'd be able to do today. Hopefully, we are moving exponentially.

Abby: Can you tell me a little bit more about one of the recent publications that you've got, FroDO: From Detections to 3D objects?

Lourdes: This has been a collaboration between some of my Ph.D. students and myself with Facebook Reality Labs and the University of Adelaide. We took a bunch of images of a room, where there are different kinds of objects of different object categories – tables, chairs, etc. We take an image sequence and then what we do is recognize and localize those objects in the images, and then we build a 3D map where we reconstruct those objects. And the way we do it is unique because we've taught our machine learning algorithm how the shape of objects – for instance, chairs – varies across the category. As a result, it has a very nice representation of how the shape changes across a specific category of objects. We then take these images and find out which of those chairs correspond to this particular image. 

You feed the algorithm some images and what you get at the end is a map of the room where you have the chairs and the tables in a particular place with their correct orientation and their correct shape.

It's going a bit beyond what more traditional reconstruction algorithms do. Here, we reconstruct individual objects completely independently, and we reconstruct them a bit like CAD models that come to be a much more realistic envisioning of the room that those pictures are trying to represent. Designers can manipulate these objects – rotate them around and then render new images and say, "Well, if we moved your furniture around, this is what it would look like." 

We're trying to go from images to 3D representations that are easy to edit and that are easy to understand by a robot or to edit by someone who's going to be synthesizing your images – a content creator, for instance.

Lourdes de Agapito was one of the judges of our Entrepreneurial Computer Vision Challenge [ECVC] at LDV Vision Summit in 2019 © Robert Wright

Lourdes de Agapito was one of the judges of our Entrepreneurial Computer Vision Challenge [ECVC] at LDV Vision Summit in 2019 © Robert Wright

Abby: Very cool! Tell us a little bit more about Synthesia. When was it founded? How did you guys get the company off the ground?

Lourdes: We met with Synthesia’s co-founders in 2017 through the Digital Catapult, which is a government organization that wanted to build a 4D capture studio in London. We were involved in this process and then we started talking about our interest in creating content and in helping content creators in their process of synthesizing video. Victor Riparbelli, the CEO of the company, and Steffen Tjerrild, the COO and CFO, saw the technical side and knew how that could be exploited as a business. I came to this from much more of a scientific side, same as Matthias Niessner, the company’s fourth co-founder.

It's good to be able to have people in the founding team who really know how to build a company and a product. They also know how to go out to investors. That's been one of the keys – the fact that we had good science, good technology, and a fantastic business and entrepreneurship team that is passionate and knowledgeable about how to go about building a business around these products.

Our CTO, Jon Starck, also joined the team right from the beginning and he'd been working for the visual effects industry for many years. We just had the perfect team!

Abby: We think about this a lot because we specialize in looking at deep technical founders. The marrying of the business and commercial side of things with science and technology is critical to success in so many ways. It's nice that you were able to find that right off the bat in a way that is great for everyone involved!

Lourdes: In the end, everyone is driven by the same passion, no matter whether you're a scientist or you're a business person. Our passion is to create tools for content creators to enable them to change their videos and to create something much more easily.

Abby: One of the things that I really like is a new product that you guys are launching, where you can just go in and type in a question or a statement, and it can automatically produce a video for you of somebody saying exactly your script. Are these AI presenters based on real people or are their faces AI-generated?

Lourdes: They are real people. We capture a 3D model of those people and we can then synthesize a new video where the person is now saying whatever you type in. We're also working towards the body language, we’d like to be able to edit it as well.

Abby: Wow, very interesting! You really could think of it as a stock video type of platform where somebody can sign in as an individual content creator and generate a video that is exactly what they're looking for without ever having to film a single thing.

Lourdes: Exactly! We currently have a certain number of available avatars, but obviously, the idea is to build more of these. We could also build custom avatars as a premium offering.

Synthesia worked with Reuters to create the world's first automated, presenter-led video reports. Using sports match data, the system can generate real-time news bulletins on the latest results.

Synthesia worked with Reuters to create the world's first automated, presenter-led video reports. Using sports match data, the system can generate real-time news bulletins on the latest results.

Abby: Is it the direction that you see the future of this industry going?

Lourdes: There are many directions in which this technology can go. Think about AI assistants like Alexa or Siri. At the moment, they're just based on voice but it would be nice to be able to have an image of a person, a video with some emotions, and even body language.

Now let’s think about the movie industry. Imagine you have much higher levels of control over the way that you generate your video. You can just say where you want the body to move, what motion to happen, or which emotion to be expressed according to each situation..

Abby: From my limited understanding of the video industry and videography, a lot of times there are multiple takes of every single scene until they can get it pieced together just right in the way that they want it to. If you could instead do it in one take and then alter it in all the ways that you wanted to see, then I can imagine that cuts down significantly on production time and cost.

Lourdes: And for that to be as automated as possible, right? A new video synthesized with just text is a breakthrough. Companies are now able to create their videos much easier. This is something that we've now experienced, for instance, with the current pandemic, with lockdowns in different countries. No one can shoot video at the moment, everything is closed. So many companies have come to us and said, "We do want to create some new videos. Can we repurpose footage that we had from before? And can we reuse it and create something new, but with the existing stock video?"

Abby: I'm sure that they love that ability to be able to monetize what was just all of the outtakes before, or that wasn't just perfect. I think we would be remiss if we didn't mention the fact that there are a lot of people who are afraid of misuse of this kind of technology. Everything with deepfakes and whatnot. What is your response to that? What is Synthesia doing to showcase the fact that while it is possible to do that, there are good ways to go about generating video that is helpful for people?

Lourdes: We're doing a lot. We have a very strong ethics code. One of the first things that we decided as a company was that we would always need consent from everyone, from whoever is involved in the process.

We also have consent from the people that we have recorded for our video translations. That's one of the other aspects of our company. If you shoot a campaign in one language and you want to then translate into multiple other languages, this is something that we do as well. And for that, we need to take a video of someone saying the same words in another language, right? So obviously we also have consent from those people because they're also lending their voice for the reenactment.

We are working towards creating awareness around this. We want to be able to have a unique kind of code that tells us about the provenance of that video.

Abby: That sounds interesting! You're probably going to garner a lot of support for that across the industry, from all the people like yourself who have been working so hard on this for all of the right reasons.

Lourdes: This process has already happened for images, right? It's important to know that there's already a lot of knowledge in this kind of industry about how to go about this, how to protect information so that we know the provenance.

Abby: What kind of advice do you have for other professors who are looking to commercialize their academic work?

Lourdes: Surround yourself with a team that can touch on all the different types of expertise that are needed to build a company.

Abby: That's great advice! And now the last question: what would your career choice be if computers didn't exist?

Lourdes: I've always said that if I hadn't done computer vision, I would have been a dancer. I think it's a wonderful way of expressing what you feel about music and space. I just really love it.

Abby: That's fantastic! My toddler would agree with you. She hasn't changed out of her ballerina outfit in three days. This has been a wonderful conversation. I appreciate you taking the time to talk with us about your research, about your journey through computer vision and what it's like now to be the co-founder of such a fantastic growing company in the synthetic media space.


A video version of this interview is available below:

Stay tuned and get to know other ladies whose work with visual technologies moves the needle!