Tuesday, May 22, 2018

Running Nose Tests in Python

Start of running tests with nose in Python.

https://nose.readthedocs.io/en/latest/usage.html

We can use nose to run one of the tests from Ext-Rescal:

nosetests extrescalFunctionsTest.py



Sunday, May 13, 2018

Getting the D matrix in Ext-Rescal

Theory::

To get a D matrix in Ext-Rescal [1,2], we need to check whether the objects are strings [3].

[1] https://github.com/nzhiltsov/Ext-RESCAL
[2] "Factorizing YAGO"
http://www.dbs.ifi.lmu.de/%7Etresp/papers/p271.pdf

[3] https://stackoverflow.com/questions/25259134/how-can-i-check-whether-a-url-is-valid-using-urlparse

Friday, May 4, 2018

Looking at Ext-RESCAL Xk prediction w.r.t. won tensor-util

In my previous blog post "Notes for Ext-Rescal (may 3rd)"[1] I talked about:

Xk = A*R*A.T where A*R*A.T is a prediction for Xk .

Today I am going to use a utility from the Web of Needs to (attempt to) verify this assumption.

tensor-utils.py in won-matcher-rescal/../python/tools/ contains lines 240 to 262. Line 240 states
"
TESTING METHOD for rescal algorithm output predict hints"

Line 244 states

# - threshold: write out only those predictions that are above the threshold

Line 249 to 250 show how to create predictions.

# compute prediction array with scores
hint_prediction_matrix = np.dot(A,np.dot(R[SparseTensor.CONNECTION_SLICE], A.T))

Following numpy documentation, numpy.dot is for a 2-D array
"If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred."


I also guessed that SparseTensor.CONNECTION_SLICE was 0. A short python program derived from lines 29 and 31 verifies this.

class SparseTensor:
   CONNECTION_SLICE = 0

print SparseTensor.CONNECTION_SLICE






>> 0

A short program to implement lines 249 to 250 would be:

import numpy as np

 A = np.array([[-0.70710678, 0.70710678], [ 0.52943053, 0.52943053], [ 0.52943053, 0.52943053],[ 0.00206809, 0.00206809]])

R = np.array([np.array([[  5.47627165e-01,  -1.16883182e-16],[ -6.07013365e-17,   1.29500171e-32]]), np.array([[  1.06958431e-03,  -1.65920612e-19], [ -2.28287465e-19,   3.54015545e-35]]), np.array([[  1.74139035e-33,   5.09236793e-17], [  5.09233343e-17,   1.47866314e+00]])])

hint_prediction_matrix = np.dot(A,np.dot(R[0], A.T))


Lines 252 to 253:
# choose indices above threshold to keep
hint_indices = hint_prediction_matrix > threshold

This is like the first paragraph describing theta in "4.4 Solving Relational Learning Tasks" in
M. Nickel et al., "A Three-Way Model for Collective Learning on Multi-Relational Data"

A short program to implement lines 252 to 253 would be:

threshold = 5.99602451e-04
hint_indices = hint_prediction_matrix > threshold

print hint_indices
>> array([[ True, False, False, False],
       [False,  True,  True, False],
       [False,  True,  True, False],
       [False, False, False, False]], dtype=bool)

Lines 252 to 257:

   # choose indices above threshold to keep
    hint_indices = hint_prediction_matrix > threshold
    if not keepScore:
        hint_prediction_matrix[hint_indices] = 1
    hint_mask_matrix = np.zeros(hint_prediction_matrix.shape)
    hint_mask_matrix[hint_indices] = 1




A short program to implement lines 252 to 257 would be:
#if not keepScore:
#       hint_prediction_matrix[hint_indices] = 1
hint_mask_matrix = np.zeros(hint_prediction_matrix.shape)
hint_mask_matrix[hint_indices] = 1

print hint_indices
print hint_mask_matrix

I am not sure what lines253 and 254 do, so I commented them out (keepScore = True or keepScore=False) and got the same result...



















Matching data with sparql queries for tiny-mixed-example in Ext-Rescal

(1) Create turtle data:

@prefix dbr: <http://dbpedia.org/resource/> .
@prefix : <http://example.org/> .
dbr:Vibeke :member-of dbr:Tristania .
dbr:Morten :member-of dbr:Tristania .
dbr:Tristania :genre dbr:Metal .
:author1 :cites :author1 .
:author2 :cites :author1 .
:author2 :cites :author2 .

(2) Load data into Blazegraph with a sparql update.


(2) Find all distinct predicates

SELECT DISTINCT ?p
WHERE { ?s ?p ?o . }


<http://example.org/cites>
<http://example.org/genre>
<http://example.org/member-of>

(2a.r) Select the subjects as rows  for  :member-of


SELECT ?s
WHERE { ?s <http://example.org/member-of> ?o . }

s
<http://dbpedia.org/resource/Morten>      === > 2
<http://dbpedia.org/resource/Vibeke>      === > 1 


Check:
Matches  1-rows in tiny-mixed-example:
1 2

(2a.c)  Select the objects as columns  for  :member-of


SELECT ?o
WHERE { ?s <http://example.org/member-of> ?o . }

<http://dbpedia.org/resource/Tristania>  === > 0
<http://dbpedia.org/resource/Tristania>   === > 0



Check:
Matches  1-cols in tiny-mixed-example:
0 0

(2b.r) Select the subjects as rows  for  :genre


SELECT  ?s
WHERE { ?s <http://example.org/genre> ?o . }

<http://dbpedia.org/resource/Tristania>  ==> 0

Check:
Matches  2-rows in tiny-mixed-example:
0

(2b.c) Select the objects as rows  for  :genre

SELECT  ?o
WHERE { ?s <http://example.org/genre> ?o . }

<http://dbpedia.org/resource/Metal>  ==> 3


Check:
Matches  2-cols in tiny-mixed-example:
0

(2c.r) Select the subjects as rows  for  :cites

SELECT  ?s
WHERE { ?s <http://example.org/cites> ?o . }

<http://example.org/author1>  ==> 4
<http://example.org/author2>  ==> 5
<http://example.org/author2>  ==> 5

Check:
Matches  3-rows in tiny-mixed-example:
4 5 5


(2c.c) Select the objects as rows  for  :cites

SELECT  ?o
WHERE { ?s <http://example.org/cites> ?o . }

<http://example.org/author1>   ==> 4
<http://example.org/author1>   ==> 4
<http://example.org/author2>   ==> 5


Check:
Matches  3-cols in tiny-mixed-example:
4 4 5

Thursday, May 3, 2018

Notes for ext. RESCAL cont. --- plots in latent space (May 3rd)

Plot the results from: https://github.com/nzhiltsov/Ext-RESCAL

cat term.embeddings.csv     ( matrix A)                                                          
4.730825851270915039e-01 -6.977337002813972351e-17
1.157697052140589156e+00 -1.451761439325526721e-16
4.522254109924176389e-03 -5.860147020766050828e-19
4.822701779013331954e-17 1.404572401961088790e+00

Data to Plot : L1*300  L2*300

cat entity.embeddings.csv    ( matrix V.transpose() )
5.287400282344740798e-01 -2.648288074937905172e-17
6.472985753526332431e-01 -1.883524571199644730e-17
6.472985753526332431e-01 -1.883524571199644730e-17
2.528510059971223606e-03 -8.399779633628449750e-20
9.723663381107547794e-17 7.118437655042088030e-01
9.723718319815804104e-17 7.118437655042088030e-01

Data to Plot : L1*300  L2*300




Assuming that both matrices can be plotted together gives:



Not plotted: latent.factors.csv ( matrices Rk concatenated )

Notes for Ext-Rescal (may 3rd)

I know this should be stunningly obvious, but I lack the background and insight you do.

I know that we are following the rank-r factorization:
Xk = A*R*A.T


I ran a version of RESCAL called Ext-RESCAL, and chose the tiny-example dataset. It is the graph on the left under “
Let's imagine we have the following semantic graph:” at the url: https://github.com/nzhiltsov/Ext-RESCAL . I guessed that it would give the probability of links.

----------------

For the slice for relation 1, member-of:

X1 = ([[0,0,0,0],[1,0,0,0],[1,0,0,0],[0,0,0,0]])

Link-Representation = ([[AA,AB,AC,AD],[BA,BB,BC,BD],[CA,CB,CC,CD],[DA,DB,DC,DD]])

A*R1*A.T =
([[ 4.30000000e-06, -3.81164826e-21, -3.81164826e-21, -1.48892510e-23],
[ 9.99989146e-01, -8.88178420e-16, -8.88178420e-16, -3.46944695e-18],
[ 9.99989146e-01, -8.88178420e-16, -8.88178420e-16, -3.46944695e-18],
[ 3.90620760e-03, -3.46944695e-18, -3.46944695e-18,-1.35525272e-20]])

Most probable links:

AA = 4.30000000e-06 ( dbr:Tristania member-of dbr:Tristania )
BA = 9.99989146e-01 ( dbr:Vibecke member-of dbr:Tristania )
CA = 9.99989146e-01 ( dbr:Morten member-of dbr:Tristania )
DA = 3.90620760e-03 ( dbr:Metal member-of dbr:Tristania )


For the slice for relation 2, genre:

X2 = ([[0,0,0,1],[0,0,0,0],[0,0,0,0],[0,0,0,0]])

Link-Representation = ([[AA,AB,AC,AD],[BA,BB,BC,BD],[CA,CB,CC,CD],[DA,DB,DC,DD]])

A*R2*A.T =
([[ -2.16840434e-19, 1.95311646e-03, 1.95311646e-03, 7.62936119e-06],
[ 1.92592994e-34, -1.72202920e-18, -1.72202920e-18, -6.72667656e-21],
[ 1.92592994e-34 -1.72202920e-18, -1.72202920e-18, -6.72667656e-21],
[ 7.52316385e-37, -6.72667656e-21, -6.72667656e-21, -2.62760803e-23]])

Most probable links:

AB = 1.95311646e-03 ( dbr:Tristania genre dbr:Vibecke )
AC = 1.95311646e-03 ( dbr:Tristania genre dbr:Morten )
AD = 7.62936119e-06 ( dbr:Tristania genre dbr:Metal )

Thanks for your time. Best regards, Brent .

---------------------------------------------------------------------------

Prediction of Unknown Triples (Section 3.3: Factorizing YAGO)::
or Canonical Relational Learning (section 4.4: A Three-Way Model for Collective Learning on Multi-Relational Data)

A*Rk*aj

Entity A, [AA,AB,AC,AD] :

s1 = np.matmul(A,R1)
np.matmul(s1,A[0,:])
array([ 4.30000000e-06, 9.99989146e-01, 9.99989146e-01,
3.90620760e-03]

Entity B, [BA,BB,BC,BD]:

np.matmul(s1,A[1,:])
array([ -3.81164826e-21, -8.88178420e-16, -8.88178420e-16,
-3.46944695e-18])

Entity C, [CA,CB,CC,CD]:

np.matmul(s1,A[2,:])
array([ -3.81164826e-21, -8.88178420e-16, -8.88178420e-16,
-3.46944695e-18])

Entity D, [DA,DB,DC,DD]:

np.matmul(s1,A[3,:])
array([ -1.48892510e-23, -3.46944695e-18, -3.46944695e-18,
-1.35525272e-20])

Create Ranking:::

AB => 9.99989146e-01 ( dbr:Tristania member-of dbr:Vibecke)
AC => 9.99989146e-01 ( dbr:Tristania member-of dbr:Vibecke )
AD => 3.90620760e-03 (dbr:Tristania member-of dbr:Metal)
AA => 4.30000000e-06 ( dbr:Tristania member-of dbr:Tristania )

==================================================================

Entity A, [AA,AB,AC,AD] :

s2 = np.matmul(A,R2)
np.matmul(s2,A[0,:])
array([ -2.16840434e-19, 1.92592994e-34, 1.92592994e-34,
7.52316385e-37])

Entity B, [BA,BB,BC,BD]:

np.matmul(s2,A[1,:])
array([ 1.95311646e-03, -1.72202920e-18, -1.72202920e-18,
-6.72667656e-21])

Entity C, [CA,CB,CC,CD]:

np.matmul(s2,A[2,:])
array([ 1.95311646e-03, -1.72202920e-18, -1.72202920e-18,
-6.72667656e-21])

Entity D, [DA,DB,DC,DD]:

np.matmul(s2,A[3,:])
array([ 7.62936119e-06, -6.72667656e-21, -6.72667656e-21,
-2.62760803e-23])

Create Ranking:::

BA => 1.95311646e-03 (dbr:Vibecke genre dbr:Tristania)
CA => 1.95311646e-03 ( dbr:Morten genre dbr:Tristania)
DA => 7.62936119e-06 ( dbr:Metal genre dbr:Tristania)

================================================
Retrieval of similar entities (Section 3.3.2: Factorizing YAGO)::
or Link-based clustering (section 4.4: A Three-Way Model for Collective Learning on Multi-Relational Data)


A
array([[-0.70710678, 0.70710678],
[ 0.52943053, 0.52943053],
[ 0.52943053, 0.52943053],
[ 0.00206809, 0.00206809]])

Corresponds to Entities A. B. C, D in row order

Hypothesis: B and C are alike…

B and C are closer to D than they are to A

=====================================
Collective Classification::

Add an additional slice mapping all entities to classes with the classOf relationship:















which gives us an entity-term matrix to add as a slice??











But possibly this is not the case, since R only has three slices???


member-of








genre




 



cites



 




Aha, do it seperately. “The basic idea is to process attribute values just as described above, but to add the <predicate, value> pairs to a separate entity-attributes matrix D and not to tensor X” (Factorizing YAGO)
D = AV

D =
[[ 1. 0. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 0. 1. 0.]
[ 0. 0. 0. 1.]
[ 0. 0. 0. 1.]]

A =

[[ 5.28740028e-01 -2.64828807e-17]
[ 6.47298575e-01 -1.88352457e-17]
[ 6.47298575e-01 -1.88352457e-17]
[ 2.52851006e-03 -8.39977963e-20]
[ 9.72366338e-17 7.11843766e-01]
[ 9.72371832e-17 7.11843766e-01]]


V =
[[ 4.73082585e-01 1.15769705e+00 4.52225411e-03 4.82270178e-17]
[ -6.97733700e-17 -1.45176144e-16 -5.86014702e-19 1.40457240e+00]]


import numpy as np
>>> A = np.array([[ 5.28740028e-01, -2.64828807e-17],
... [ 6.47298575e-01, -1.88352457e-17],
... [ 6.47298575e-01, -1.88352457e-17],
... [ 2.52851006e-03, -8.39977963e-20],
... [ 9.72366338e-17, 7.11843766e-01],
... [ 9.72371832e-17, 7.11843766e-01]])


V = np.array([[ 4.73082585e-01, 1.15769705e+00, 4.52225411e-03, 4.82270178e-17],
[ -6.97733700e-17, -1.45176144e-16, -5.86014702e-19, 1.40457240e+00]])


D = np.matmul(A,V)


D
array([[ 2.50137699e-01, 6.12120771e-01, 2.39109676e-03,
-1.16975686e-17],
[ 3.06225683e-01, 7.49375651e-01, 2.92724864e-03,
4.76181364e-18],
[ 3.06225683e-01, 7.49375651e-01, 2.92724864e-03,
4.76181364e-18],
[ 1.19619408e-03, 2.92724864e-03, 1.14345650e-05,
3.96151333e-21],
[ -3.66678039e-18, 9.22783102e-18, 2.25778544e-20,
9.99836107e-01],
[ -3.66652048e-18, 9.22846706e-18, 2.25803390e-20,
9.99836107e-01]])


[[ 2.50137699e-01, 6.12120771e-01, 2.39109676e-03, -1.16975686e-17],
[ 3.06225683e-01, 7.49375651e-01, 2.92724864e-03, 4.76181364e-18],
[ 3.06225683e-01, 7.49375651e-01, 2.92724864e-03, 4.76181364e-18],
[ 1.19619408e-03, 2.92724864e-03, 1.14345650e-05, 3.96151333e-21],
[ -3.66678039e-18, 9.22783102e-18, 2.25778544e-20, 9.99836107e-01],
[ -3.66652048e-18, 9.22846706e-18, 2.25803390e-20,
9.99836107e-01]])
=

[[ 1. 0. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 0. 1. 0.]
[ 0. 0. 0. 1.]
[ 0. 0. 0. 1.]]


with








Predicted Triples are in Bold (but some of these look incorrect):

dbr:Tristania, band : 2.50137699e-01
dbr:Vibeke, band : 3.06225683e-01
dbr:Morten, band : 3.06225683e-01
dbr:Metal, band : 1.19619408e-03
dbr:Vibeke, member : 7.49375651e-01
dbr:Morten, member : 7.49375651e-01
dbr:Tristania, member : 6.12120771e-01
dbr:Metal, member : 2.92724864e-03
dbr:Metal, genre : 1.14345650e-05
dbr:Tristania, genre : 2.39109676e-03
dbr:Vibeke, genre : 2.92724864e-03
dbr:Morten, genre : 2.92724864e-03
author1, tensor : 9.99836107e-01
author2, tensor : 9.99836107e-01