## Google Data Scientist Algorithmic Coding Interview

[ad_1]

Today I’m joined by Sergey who is a data scientist consultant and former programming competition winner.

👉 Subscribe to my channel: https://bit.ly/2xYkyUM

📧 Sign up for free weekly interview questions: https://www.interviewquery.com

🔑 Get professional help for your next interview: https://www.interviewquery.com/coaching

❓ Try a data science algorithms interview question: https://www.interviewquery.com/questions/equivalent-index

Sergey has worked at a variety of places such as a quant on wall street to a startup that got acquired by Grubhub. He joins me today in solving a very tough algorithms problem asked by Google. If you’re looking to learn programming from him, check out Interview Query coaching!

More from Jay:

Follow me on Twitter: https://twitter.com/datasciencejay

Follow me on LinkedIn: https://www.linkedin.com/in/jay-feng-ab66b049/

Read More:

The Google data scientist interview guide: https://www.interviewquery.com/blog-the-google-data-scientist-interview/

Example Google data science interview questions: https://www.interviewquery.com/blog-google-data-science-interview-questions-and-solutions/

Source

[ad_2]

Wow, Sergey is an intense thinker and this video is quite a pleasure to watch an expert problem solver in action!

The smaller problem would be for just one dimension. Assume 9 points at index 0 and one point at index 50. The centre of mass is at 5, but the real answer is to choose final location at index zero. Because every index move away from that, the 9 people have to move away and one person from the index 50 has to travel less. After playing with few more examples, I realized we need to think of geometric median. And ANY point between the two middle ones would be an optimal place. As example, consider below:

1,2,3,5,19,20

The solution above would be any point between 3 and 5, inclusive. If we choose 3, total distance is: 2+1+0+2+16+17 which would be same as if we chose 5 as host: 4+3+2+0+14+15. Hence the median. Any point between 3 and 5 would also suffice.

Coming to two dimensions, or more – I couldn't think of a rigorous proof if this would work – but had an idea of doing PCA for the three dimensions, to reduce to one dimension and then do this geometric median. Kindly let me know any thoughts on this approach.

When you asked him to make it faster were you actually looking for a numerical method for speed or to make the code run faster algorithmically. The simplest speed up I can see is to store already computed distances for the pairs so you dont have to recompute. Might get a tiny bit tricky to do this in the best way possible but that seems like a solid way forward to me

this code gets it done in 2n:

friends = [{'name':'bob', 'location':(5,2,10)},

{'name':'david', 'location':(2,3,5)},

{'name':'mary', 'location':(19,3,4)},

{'name':'skyler', 'location':(3,5,1)}]

means = [0,0,0]

for host in friends:

means[0] += host.get('location')[0]/len(friends)

means[1] += host.get('location')[1]/len(friends)

means[2] += host.get('location')[2]/len(friends)

best_host_index = None

best_host_sum = 9999999

for idx,host in enumerate(friends):

temp_sum = abs(host.get('location')[0]-means[0]) + abs(host.get('location')[1]-means[1]) + abs(host.get('location')[2]-means[2])

if temp_sum < best_host_sum:

best_host_sum = temp_sum

best_host_index = idx

print(friends[best_host_index])

I'm so happy that expert programmers also need a million goes to get something to run

If you have a million points, you can randomly sample 10000 to approximate a solution. If you do it a couple of times, you may get an idea of how stable solutions are (=how close they are to each other). That validates your answer.

Incredible! Very helpful to see the mindset and walkthrough – would be awesome to see more of these with a "feedback" of solution as well

Shouldn't this output 0 in cases where we are computing costs with the same point? and in such case 0 would be the best cost? We have not taken into consideration to not calculate cost for points with itself.

Can we not find the center of the group of friends by taking the average for x, y and z coordinates, and then find the closest friend to that center? One linear scan for computing average coordinates, one more linear scan to compare all points to average, so O(n) time. O(1) space also.

What resources would you suggest to prepare for coding interview with python?

Good vids. About to check out some more of your videos. By the way, go and search for smzeus . c o m!!! It will help you promote your videos.

what about a solution using the centroid?

Great video

This is great, thanks Jay!