High Performance Data Processing in Python || Donald Whyte




[ad_1]

numpy and numba are popular Python libraries for processing large quantities of data. This talk explains how numpy/numba work under the hood and how they use vectorisation to process large amounts of data extremely quickly. We use these tools to reduce the processing time of a large, real 600GB dataset from one month to 40 minutes, even when the code is run on a single Macbook Pro.

EVENT:

PyData Warsaw 2018

SPEAKER:

Donald Whyte

PERMISSIONS:

PyData provided Coding Tech with the permission to republish this video.

CREDITS:

PyData YouTube channel: https://www.youtube.com/channel/UCOjD18EJYcsBog4IozkF_7w

Source


[ad_2]

Comment List

  • Coding Tech
    December 20, 2020

    >high performance
    >python
    pich one moron

  • Coding Tech
    December 20, 2020

    They go through all this trouble only to use python to avoid running python. The result is extremely slower than what you would've got from a more performant approach like stream cores or the like.

  • Coding Tech
    December 20, 2020

    Not good at all… just go to quantopian

  • Coding Tech
    December 20, 2020

    Use Nim

  • Coding Tech
    December 20, 2020

    No mention of PyPy! It performs JIT compilation on the fly!

  • Coding Tech
    December 20, 2020

    I wonder if instead of using pure Python you could actually leverage a 'big data' ecosystem like hadoop, map/reduce and Pig or Spark? This allows you to scale horizontally (something that is normally discouraged but that is very useful in data processing) and you can use python to tell the map/reduce framework or Spark what to do but most of the processing itself is actually done using Java (with is not exactly THE fastest running language ever but it's pretty good). Maybe it wouldn't be efficient because of the nature of the dataset which only has 2 columns but idk, curious to hear other developers's opinion on this!

  • Coding Tech
    December 20, 2020

    Great talk

  • Coding Tech
    December 20, 2020

    Exactly the video I have been looking for.

  • Coding Tech
    December 20, 2020

    >high performance
    >python

    Buahahahaha lol rofl lmao

    Best joke I heard in a long time. You can't get decent performance in Python, let alone high one without using a lot of modules written in C, at which point you may as well write the entire thing in C or if you need a glue language, use something like lua or Java or c#.

  • Coding Tech
    December 20, 2020

    Very interesting

  • Coding Tech
    December 20, 2020

    A lot of things he didn’t mention 😓

  • Coding Tech
    December 20, 2020

    oxymoron

  • Coding Tech
    December 20, 2020

    I understand that this is old code, but for fast matrix operations, why not use GPGPU programming instead of the hassle of optimizing python code?
    Anyway, great presentation, I learn a lot, might come in handy.

  • Coding Tech
    December 20, 2020

    Guy looks like Captain America

  • Coding Tech
    December 20, 2020

    Which one among Numba or Cython is better? What's the best use case for Cython instead of Numba? 🙂

  • Coding Tech
    December 20, 2020

    I though I would never hear "python" and "high performance" in the same sentence.

    *Waiting for the dislike shower*

  • Coding Tech
    December 20, 2020

    It's like C++/OOP is the Donald Trump of software engineering. Everybody has to throw their hacky jabs at them to let everyone know they're super hip lol.

Write a comment