Install Jupyter Notebook with pip (pip3 install ---upgrade pip), (pip3 install jupyter)
Start the Jupyter server in directory where you have permissions such as $home/Documents/ (jupyter notebook)
Reference (https://jupyter.readthedocs.io/en/latest/install.html)
Set up the python packages (modules)
sudo pip3 install beautifulsoup4
sudo pip3 install nltk
sudo pip3 install numpy
sudo pip3 install scipy
sudo pip3 install sklearn
Running Jupyter:
In $home/Documents: jupyter notebook
Run the script:
from bs4 import BeautifulSoup
import nltk
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
cds = open('/home/brent/Downloads/cd_catalog.xml').read()
print(cds)
soup = BeautifulSoup(cds)
postTxt = soup.findAll('artist')
postDocs = [x.text for x in postTxt]
print(postDocs)
postDocs.pop[0] (error postDocs is not scriptable)
postDocs = [x.lower() for x in postDocs] (changes everything to lowercase)
stopset.update(['lt','p','/p','br','amp','quot','field','front','normal','span','Opx','rgb','style','51','spacing','text','helvetica','size','family','space','arial','height','indent','letter','line','none','sans','serif','transform','line','variant','weight','times','new','strong','video','title','white','word','letter','roman','0pt','16','color','12','14','21','neue','apple','class',
])
print(postDocs)
stopset = set(stopwords.words('english'))
No comments:
Post a Comment