Tuesday, July 18, 2017

Getting Jupyter Server started

For Jupyter Notebook use python3 (python3 --version => 3.4), (which python3 /usr/local/bin/python3)

Install Jupyter Notebook with pip (pip3 install ---upgrade pip), (pip3 install jupyter)

Start the Jupyter server in directory where you have permissions such as $home/Documents/ (jupyter notebook)

Reference (https://jupyter.readthedocs.io/en/latest/install.html)


Set up the python packages (modules) sudo pip3 install beautifulsoup4 sudo pip3 install nltk
sudo pip3 install numpy
sudo pip3 install scipy
sudo pip3 install sklearn

Running Jupyter:

In $home/Documents: jupyter notebook

Run the script:

from bs4 import BeautifulSoup
import nltk
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD

cds = open('/home/brent/Downloads/cd_catalog.xml').read()
print(cds)
soup = BeautifulSoup(cds)
postTxt = soup.findAll('artist')
postDocs = [x.text for x in postTxt]
print(postDocs)
postDocs.pop[0]   (error postDocs is not scriptable)
postDocs = [x.lower() for x in postDocs]     (changes everything to lowercase)

stopset.update(['lt','p','/p','br','amp','quot','field','front','normal','span','Opx','rgb','style','51','spacing','text','helvetica','size','family','space','arial','height','indent','letter','line','none','sans','serif','transform','line','variant','weight','times','new','strong','video','title','white','word','letter','roman','0pt','16','color','12','14','21','neue','apple','class',
               ])
 
print(postDocs)
stopset = set(stopwords.words('english'))

No comments:

Post a Comment