Erin Shellman, Data Scientist at Nordstrom Innovation Lab Interview


Nordstrom innovation Lab


We recently caught up with Erin Shellman from Nordstrom Innovation Lab to hear more about her and the Lab. Specifically, we wanted to get a better picture of what it means to be a data scientist within a company as well as the ups and downs that come along with the journey of getting there. Lastly, we wanted to hear more about her upcoming presentation (with David Von Lehman) at the Strata NY Conference titled "How Nordstrom utilizes humans as learning machines to blend brick-and-mortar with online commerce".

Nordstrom Innovation Lab is a team of techies, designers, entrepreneurs, statisticians, researches, and artists, all trying to discover the future of retail. Nordstrom, which started in 1901 in Seattle, is an American Upscale fashion retailer with 247 stores located in 33 USA States as well as an online store. Nordstrom's philosophy, started by the founder and carried through four generations of family oversight, is "offer the customer the best possible service, selection, quality and value." To that end Nordstrom Labs is a internal technology lab focused on innovating on the technology, operations, products, business models and even management of Nordstrom. In addition to using D3.js, D3 / Data Viz stand out Jim Vallandingham recently joined the company, technologies such as node.js, Objective-C, AngularJS, R, Go, Hadoop, Mahout and others are being used within the labs.

The format below is based on question / answer.

Hi Erin, thank you for the interview. Let's start with your background.

Q - What is your 30 second bio?
A - I am a data scientist in the Nordstrom Data Lab and a specialist in statistical computing. I am interested in the development of scalable machine learning methods with applications in recommendation systems, market segmentation and customer engagement. I have a master of science degree in biostatistics and a PhD in Bioinformatics, both from the University of Michigan in Ann Arbor. Big data problems are my passion.

Q - When did you realize you wanted to work with data as a career?
A - I think I always sort of knew. I went into econ as an undergrad because I wanted to study human behavior and the only way of describing it formally was through math. I’ve always been kind of obsessed with communication and I think I’ve always been drawn to mathy things because of the clarity of communication. If I say that a behavior can be described as a process with exponential decay, that can be understood in any language. The language of math is universal!

Q - How did you get involved with / interested in machine learning?
A - I did my undergraduate work at Case Western Reserve University in Cleveland, Ohio where I studied economics and evolutionary biology and minored in math. While I was there I did an internship at the National Institutes of Health in the Division of Computational Biosciences and was first introduced to machine learning through my advisor, Jim Malley. He’s a huge open-source and machine learning advocate so that’s when I first started programming in R and using machine learning methods. I loved my work at the NIH so much that I went on to get a master of science degree in biostatistics from the University of Michigan, and then a PhD in bioinformatics also from the University of Michigan. I’m a scientist at heart.

Q - Who took a chance on you?
A - Lots of people have taken chances on me, and I’m grateful to all of them. Most recently the director of the Nordstrom Data Lab, Jason Gowans (@jasongowans) took a chance on me when I was fresh out of grad school, had no retail experience and would become the first (and for a little while the only) person on his team. It’s a big vote of confidence, and a big motivator to do great work.

Erin, very compelling background. Thank you for sharing. Next, let's talk about your work.

Q - Why work at Nordstrom Data Labs?
A - Well, we’re the coolest people I know and we work on some of the coolest problems in the business.

Q - In your own words, what is the goal of Nordstrom Lab?
A - The goal of the Nordstrom Data Lab is to deliver data-driven products to inform business decisions internally, and to enhance customer experience externally.

Q - How would you describe your work to someone who is not familiar with it?
A - I recommend things! I would roughly describe my work as constructing and delivering data-driven products to the web for the purpose of making the customer experience online as enjoyable as it is in the store. Nordstrom has a reputation for best in class customer service in our stores, and as more people shop at nordstrom.com we’re trying to extend that legacy to serve those shoppers as well. Whether that’s by serving up recommendations to make products easier to find, or creating engaging new ways to interact with the website, it’s ultimately about creating a great shopping experience.

Q - Who does your work appeal to and why?
A - Well, I think my work is appealing to anyone who shops online and thinks there’s room for improvement in that experience.

Q - What does a typical day at work look like?
I come in, sit down and type symbols into vim all day. I also talk with tons of bright, motivated people all over the company and work on a wide range of data-driven projects. I build recommenders, but I’m a statistician so I also do data analysis. For example, I analyzed the data from the Nordstrom Pinterest Experiment we ran a few weeks ago. The experiment involved tagging products in the physical Nordstrom Store with the Pinterest logo as a kind of social proof for potential buyers.

Q - What tools do you use at work?
A - vim, tmux, Python, R, PostgreSQL, Heroku, Hadoop, Mahout

Q - What are your favorite tools to work with?
A - Historically my favorite tool has been R. I’ve been an R fanatic since I started using it in 2005, and it’s still my go-to for a lot of things. As I’ve started working on bigger problems however, I’ve been butting into R’s computational limitations so I’m adding new languages and technologies to my tool belt.

Q - What other mediums/tools are you working in?
A - I’m really excited to start working more with Mahout and am planning on using it for my next recommender.

Q - Whose work / tools do you admire?
A - I can’t say enough about the work of Hadley Wickham (@hadleywickham) and Mike Bostock (@mbostock). I use ‘ggplot2,’ ‘plyr’ and ‘lubridate’ everyday. I’m still learning d3.js so I’ve mostly made a few ugly figures at this point, but we’re all in love with it and it’s power in data storytelling and engagement can’t be understated.

Q - What is your process at Nordstrom Lab?
A - Our process in the lab is to get as quickly to a working prototype as possible. That helps us incorporate outside feedback quickly. We throw quick front-ends on all our recommenders so that people internally can see and interact with them, and that helps us get our work on the web faster, and results in a better final product.

Q - How long does it take to create a project?
A - We’re full of ideas so thinking up projects to work on takes as much time as it takes to drink a pint, or just short of two. Executing projects tends to vary a bit. The first project I worked on after joining the lab was a relatively simple recommendation engine and we had a working prototype of that built in about a week or so. Currently I’m on a more complicated, fully personalized recommender engine and it’s taking several weeks of mostly concentrated time.

Q - Where do you get your ideas for things to study / analyze at work?
A - We generate ideas internally through collaborations with people all over the company, from personalization to user experience. We also generate ideas at our weekly ‘retros,’ weekly retrospectives held over a beer. Retros are a tradition of the Nordstrom Innovation Lab that we quickly adopted.

Q - Before we get deeper into who you are as a Data Scientist and what drives you, what else should we know about Nordstrom Data Labs?
A - Well, one thing you should know about the NDL (oh snap, I really like that look of that), is that we’re hiring. If you’re part math/stats nerd, part programmer, part data storyteller hit us up. Also, we’re all really cool people who work on really cool projects. You can reach out to us through twitter or our website at nordstrominnovationlab.com. As we mentioned above, Jim Vallandingham (@vlandham) recently joined our group - so it is a very exciting place to be right now.

Nordstrom Data Labs (NDL!) sounds like a great place to work. Readers: Definitely reach out to them. Erin, next, let's talk about you a little more.

Q - Who or what is your greatest inspiration?
A - I’m inspired everyday by the communities working to promote women and girls in STEM fields. In particular Black Girls Code, Lady Coders, Girl Develop It, and the Association for Women in Mathematics, where I mentor young women and girls interested in pursuing degrees and careers in mathematical fields. I understand first-hand the challenges of staying motivated in a program or career that is intellectually demanding when mentors who understand your unique perspective are difficult to find. I’ve been in classes where I was the only women, but that’s changing everyday because women working in the industry are educating girls and women about the massive benefits available to them in highly technical industries.

Q - What is your personal process?
A - I like doing things the old-timey way, so I write ideas down a lot...like with a pen. I like to diagrammatically write out how my product would look from start to finish and that helps me organize my product into discrete, completable units. Of course that map gets re- drawn a lot as the project evolves, so I go through a lot of paper.

Q - What do you consider to be the most important aspects of your work?
A - Attention to detail and curiosity tempered with the ability to formulate research questions.

Q - What do you see as weaknesses in your work?
A - I think the biggest weakness in my work, and probably in my line of work generally, is the amount of time spent really trying to understand the problems. As an industry, we’re always moving to test this against that, and measuring lift and hitting various metrics, but very little time is spent in research trying to understand the mechanisms that drive differences in lift or whatever metric. In a world where you’re always trying to release the latest and greatest, you don’t spend a lot of time reflecting on why the stuff is working (or not).

Q - What in your career are you most proud of so far?
A - Besides finishing my PhD, I’m most proud of the first recommender David, Paul and I built because it’s the first thing I’ve built for the web that so many people have used. It’s an amazing feeling to create something that people actually see and use.

Q - Do you have any regrets?
A - None. I think it’s an exciting time to be in the industry and there is a seemingly infinite supply of new technology to learn about and to develop. The best part about this field is that everything is changing all the time and there are tons of important problems yet to be solved, so there’s lots of potential to make a big contribution.

Q - How about - what mistakes have you made?
A - How much space do you have? I’m primarily a statistician, so all the “big data” stuff is pretty new to me. I’m learning a lot of stuff, and making tons of mistakes along the way.

Q - What conditions do you need in order to work to your full capacity?
A - I need to be around great people. I like to be always learning new things, and it’s most fun to learn from others.

Q - What distinguishes your work from that of your contemporaries?
A - Hmm, well I think perhaps I have a more developed aesthetic than many of my contemporaries. I spend a lot of time with ‘ggplot’ making axes and labeling perfect and making sure that anyone could get the entire story from each figure I make, even without a caption. I don’t always accomplish that, but it’s what I strive for. I write up all my documents in LaTeX and TikZ because it’s just too pretty and I’m of the opinion that your point will come across much more effectively it’s easy on the eyes.

Q - How would you describe your style as Data Visualizer/Scientist?
A - I prefer a classic look, with neutral colors and clean lines... Data-wise, I’m a statistical pragmatist and an open-source purist. In general, I care about delivering high-quality results and not about the various philosophical arguments between statisticians, and I think embracing open-source technology is the best way to move fast.

Q - How important is Data to you in your personal life?
A - We’re currently running a little experiment in the lab to see what we can learn about ourselves through new technology like Jawbone UP, so I’ve been collecting loads of data on myself.

Very thought-provoking. Really appreciate your honesty. Next, let's talk about Data Science and your thoughts on it.

Q - What work is currently inspiring you?
A - There’s tons of amazing content out there being generated in d3.js and others. Obviously, I love Mike Bostock’s blocks page.

Q - What pisses you off most in the data science world?
A - The never-ending flow of idiotic data articles written by popular news outlets.

Q - What is one problem you think the world of data science needs to fix?
A - Well, I don’t think data science can fix anything really. The primary power of data is illumination of things that are already happening, so by it’s very nature it is reactive...unless you’re talking about forecasting and most of that is rubbish. That said there are lots of areas that could use an injection of data.

Q - What do you look for in other peoples work?
A - I guess I harp on this a lot, but I’m always looking for the story. What’s the point if you can’t tell a compelling story or engage me to think about or interact with the data. I’m also always looking at presentation. I instantly dismiss work where axes aren’t labeled or legends don’t appear where they should. It’s about craftsmanship as much as anything and why would I believe that you did a careful, valid analysis if you can’t even be bothered to correctly label your results?

Q - How can someone do work like you?
A - There’s a ton of material on the web today about data science, machine learning, Hadoop, whatever data buzzword you want, but I think the most valuable asset to have in this field is deep expertise in mathematics and statistics. I might get in trouble saying this, but I think it’s easier to pick up the programming than it is to learn the math. It can take a long time to develop an intuition for mathematical problem solving and I think that people with those skills are relatively few compared with those who are solid programmers. I’d say if you’re in college and thinking about what to study, I’d focus hard on your stats and math curriculum because those skills are valuable in every industry and highly transferable across industries.

Q - What does it take to do great data science?
A - First it takes a curious mind. You have to care about answering questions and telling stories. Second you have to temper your curiosity with what I loosely call “mathematical thinking.” I don’t mean anything formal by that, really I should say “reasoned thinking.” The ability to prioritize your research questions and see your way to a solution through the constituent parts is the most valuable skill. I’m constantly asking myself “what question am I answering by doing this?” and that mindset is critical when you have essentially infinite questions your data could attempt to answer.

Q - Do you have any words of wisdom for data science students or practitioners starting out?
A - Read a lot. Go to meet-ups. Go to seminars and talks when you can. Work with public data, there’s more than ever so go build something!

Q - What blogs do you think are hidden gems??
A - http://flowingdata.com/
http://www.datawrangling.com/
http://rfunction.com/
http://learnr.wordpress.com/
http://bost.ocks.org/mike/

Very insightful. Thank you. Last but not least, let's discuss your upcoming presentation at Strata Conf NY.

Q - What is the title of your talk?
A - "How Nordstrom utilizes humans as learning machines to blend brick-and-mortar with online commerce"

Q - Who is the talked aimed at?
A - This talk is for people with large and varied data who are interested in novel applications of data analytics, machine learning, and visualization to influence stakeholders and put the power of data to work for their businesses. We’ll walk through three case studies where we have delivered insights and/or products that blend data and experiences from physical and online commerce: A recommender system powered by the collective fashion expertise of our personal stylists, Social media sentiment and activity analyzer and Clothing color trend visualizer.

Q - What technologies will be covered?
A - Some of the technologies that will be discussed include: R, Python, Ruby, D3, Node.js and Hadoop.

Q - Why are you and your colleague David Von Lehman excited to give this talk?
A - Yeah! I’m super pumped for this! I think Nordstrom has a really interesting story to tell about how to transform a traditional retail model into a data-driven, e-commerce-loving retail model. We’re going to share a few case studies illustrating how we use behavior from our stores to construct the customer experience on the web, and on the flip side, how we’re using data from the web to improve the customer experience in our stores.

Q - Where can we find out more?
A - The Strata Conference NY Website => Nordstrom Innovation Labs Strata Talk.

Thanks - excited for your talk!

.............

Erin – Thank you so much for your time! Really enjoyed speaking with you, learning more about Nordstrom Innovation Labs, understanding more about how you view Data Science as well as your upcoming presentation at the Strata Conference NY.

Nordstrom Innovation Labs can be found online at http://nordstrominnovationlab.com/ and the Erin Shellman can be found online at @erinshellman.


© 2012-2013 DashingD3js.com. All rights reserved.