Python NumPy: Importing flat files




[ad_1]

Learn how to import flat files using NumPy: https://www.datacamp.com/courses/importing-data-in-python-part-1

Okay so you now know how to use Python’s built-in open function to open text files. What if you now want to import a flat file and assign it to a variable? If all the data are numerical, you can use the package numpy to import the data as a numpy array. Why would we want to do this? First off, numpy arrays are the Python standard for storing numerical data. They are efficient, fast and clean. Secondly, numpy arrays are often essential for other packages, such as scikit-learn, a popular Machine Learning package for Python.

Numpy itself has a number of built-in functions that make it far easier and more efficient for us to import data as arrays. Enter the NumPy functions loadtxt and genfromtxt. To use either of these we first need to import NumPy . We then call loadtxt and pass it the filename as the first argument, along with the delimiter as the 2nd argument. Note that the default delimiter is any white space so we’ll usually need to specify it explicitly.

There are a number of additional arguments you may wish to specify. If, for example, your data consists of numerics and your header has strings in it, such as in the MNIST digits data, you will want to skip the first row by calling loadtxt with the argument skiprows = 1; if you want only the 1st and 3rd columns of the data, you’ll want to set usecols= the list containing ints 0 and 2. You can also import different datatypes into NumPy arrays: for example, setting the argument dtype = ‘str’ will ensure that all entries are imported as strings. Loadtxt is great for basic cases, but tends to break down when we have mixed datatypes, for example, columns consisting of floats AND columns consisting of strings, such as we saw in the Titanic dataset.

Now it’s your turn to have fun with loadtxt. You’ll also gain hands-on experience with other functions that can handle mixed datatypes. In the next video we’ll see that, although NumPy arrays can handle data of mixed types, the natural place for such data really is the dataframe.

Source


[ad_2]

Comment List

  • DataCamp
    December 2, 2020

    code should be written on live video and explanation at the same time,because most time it does not execute for the learners watching the video

  • DataCamp
    December 2, 2020

    I could load only one column from the file. I couldn't load any other column it simply throws error messages if my entire data in a single column. If I have multiple columns it shows an error message ", could not convert string to float: '1.0 2.0 4.0'"

  • DataCamp
    December 2, 2020

    In case you have your dataset and you wants to import it via pandas or numpy using import function, do you need to save your data numpy library or?

    I have a dataset from internet Im trying to practice to manipulate the data using numpy import function, but it tells me file error.

    Im thinking how do i need to save the file for it to be recognise as by numpy import function.

  • DataCamp
    December 2, 2020

    why i am getting below error ?
    ValueError: could not convert string to float: b'line'

Write a comment