Data Preparation (Latin NLP with Python 05)
[ad_1]
In this video, I speak about preparing your Latin text data for natural language processing.
THE CODE:
from cltk.stem.latin.j_v import JVReplacer
import re
with open (‘data/pl.txt’, “r”) as f:
text = f.read()
def jvtext(text):
j = JVReplacer()
text = j.replace(text)
return (text)
def clean_pl(text, lower=False):
cleaned = re.sub(r”[([].*?[)]]”, “”, text)
cleaned = cleaned.replace(” “, ” “).replace(” “, ” “)
if lower==True:
lower_cleaned = cleaned.lower()
return (cleaned, lower_cleaned)
return(cleaned)
text = jvtext(text)
text = clean_pl(text, lower=True)
print (text[1])
Video on Functions:
https://youtu.be/6TS5NUZ0RMo
Videos on Regex:
https://www.youtube.com/watch?v=PpRifN_hQp8
https://www.youtube.com/watch?v=CR459J7TnKo
If you enjoy this video, please subscribe. I provide all my content at no cost. If you want to support my channel, please donate via
PayPal: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=AZ73QW52SUX8N¤cy_code=USD&source=url
Patreon: https://www.patreon.com/WJBMattingly (its my www.themedievalworld.com account as well).
If there’s a specific video you would like to see or a tutorial series, let me know in the comments and I will try and make it.
If you liked this video, check out www.PythonHumanities.com, where I have Coding Exercises, Lessons, on-site Python shells where you can experiment with code, and a text version of the material discussed here.
You can follow me at:
https://twitter.com/wjb_mattingly
Source
[ad_2]
I added this as a comment to the wrong video! Yikes! Here is the code as it appears in the video:
from cltk.stem.latin.j_v import JVReplacer
import re
with open ("data/pl.txt", "r") as f:
text = f.read()
def jvtext(text):
j = JVReplacer()
text = j.replace(text)
return (text)
def clean_pl(text, lower=False):
cleaned = re.sub(r"[([].*?[)]]", "", text)
cleaned = cleaned.replace(" ", " ").replace(" ", " ")
if lower==True:
lower_cleaned = cleaned.lower()
return (cleaned, lower_cleaned)
text = jvtext(text)
text = clean_pl(text, lower=True)
print (text[1])
The code under more is seriously different from that in the video. More useful for the general case but people who want to copy-n-paste the code from the video, so they are getting the same results as you, are going to be SOL. I'm breaking for lunch but will try to capture the script as videoed and paste it into a comment.