Data Preparation (Latin NLP with Python 05)




[ad_1]

In this video, I speak about preparing your Latin text data for natural language processing.

THE CODE:
from cltk.stem.latin.j_v import JVReplacer
import re

with open (‘data/pl.txt’, “r”) as f:
text = f.read()

def jvtext(text):
j = JVReplacer()
text = j.replace(text)
return (text)

def clean_pl(text, lower=False):
cleaned = re.sub(r”[([].*?[)]]”, “”, text)
cleaned = cleaned.replace(” “, ” “).replace(” “, ” “)
if lower==True:
lower_cleaned = cleaned.lower()
return (cleaned, lower_cleaned)

return(cleaned)

text = jvtext(text)
text = clean_pl(text, lower=True)
print (text[1])

Video on Functions:
https://youtu.be/6TS5NUZ0RMo

Videos on Regex:
https://www.youtube.com/watch?v=PpRifN_hQp8
https://www.youtube.com/watch?v=CR459J7TnKo

If you enjoy this video, please subscribe. I provide all my content at no cost. If you want to support my channel, please donate via
PayPal: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=AZ73QW52SUX8N&currency_code=USD&source=url
Patreon: https://www.patreon.com/WJBMattingly (its my www.themedievalworld.com account as well).

If there’s a specific video you would like to see or a tutorial series, let me know in the comments and I will try and make it.

If you liked this video, check out www.PythonHumanities.com, where I have Coding Exercises, Lessons, on-site Python shells where you can experiment with code, and a text version of the material discussed here.

You can follow me at:
https://twitter.com/wjb_mattingly

Source


[ad_2]

Comment List

  • Python Tutorials for Digital Humanities
    December 28, 2020

    I added this as a comment to the wrong video! Yikes! Here is the code as it appears in the video:

    from cltk.stem.latin.j_v import JVReplacer
    import re

    with open ("data/pl.txt", "r") as f:
    text = f.read()

    def jvtext(text):
    j = JVReplacer()
    text = j.replace(text)
    return (text)

    def clean_pl(text, lower=False):
    cleaned = re.sub(r"[([].*?[)]]", "", text)
    cleaned = cleaned.replace(" ", " ").replace(" ", " ")
    if lower==True:
    lower_cleaned = cleaned.lower()
    return (cleaned, lower_cleaned)

    text = jvtext(text)
    text = clean_pl(text, lower=True)
    print (text[1])

  • Python Tutorials for Digital Humanities
    December 28, 2020

    The code under more is seriously different from that in the video. More useful for the general case but people who want to copy-n-paste the code from the video, so they are getting the same results as you, are going to be SOL. I'm breaking for lunch but will try to capture the script as videoed and paste it into a comment.

Write a comment