Start of running tests with nose in Python.
https://nose.readthedocs.io/en/latest/usage.html
We can use nose to run one of the tests from Ext-Rescal:
nosetests extrescalFunctionsTest.py
Tuesday, May 22, 2018
Sunday, May 13, 2018
Getting the D matrix in Ext-Rescal
Theory::
To get a D matrix in Ext-Rescal [1,2], we need to check whether the objects are strings [3].
[1] https://github.com/nzhiltsov/Ext-RESCAL
[2] "Factorizing YAGO"
http://www.dbs.ifi.lmu.de/%7Etresp/papers/p271.pdf
[3] https://stackoverflow.com/questions/25259134/how-can-i-check-whether-a-url-is-valid-using-urlparse
To get a D matrix in Ext-Rescal [1,2], we need to check whether the objects are strings [3].
[1] https://github.com/nzhiltsov/Ext-RESCAL
[2] "Factorizing YAGO"
http://www.dbs.ifi.lmu.de/%7Etresp/papers/p271.pdf
[3] https://stackoverflow.com/questions/25259134/how-can-i-check-whether-a-url-is-valid-using-urlparse
Friday, May 4, 2018
Looking at Ext-RESCAL Xk prediction w.r.t. won tensor-util
In my previous blog post "Notes for Ext-Rescal (may 3rd)"[1] I talked about:
Xk = A*R*A.T where A*R*A.T is a prediction for Xk .
Today I am going to use a utility from the Web of Needs to (attempt to) verify this assumption.
tensor-utils.py in won-matcher-rescal/../python/tools/ contains lines 240 to 262. Line 240 states
"TESTING METHOD for rescal algorithm output predict hints"
Line 244 states
# - threshold: write out only those predictions that are above the threshold
Line 249 to 250 show how to create predictions.
# compute prediction array with scores hint_prediction_matrix = np.dot(A,np.dot(R[SparseTensor.CONNECTION_SLICE], A.T)) Following numpy documentation, numpy.dot is for a 2-D array "If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred."I also guessed that SparseTensor.CONNECTION_SLICE was 0. A short python program derived from lines 29 and 31 verifies this. class SparseTensor: CONNECTION_SLICE = 0 print SparseTensor.CONNECTION_SLICE >> 0 A short program to implement lines 249 to 250 would be: import numpy as np A = np.array([[-0.70710678, 0.70710678], [ 0.52943053, 0.52943053], [ 0.52943053, 0.52943053],[ 0.00206809, 0.00206809]]) R = np.array([np.array([[ 5.47627165e-01, -1.16883182e-16],[ -6.07013365e-17, 1.29500171e-32]]), np.array([[ 1.06958431e-03, -1.65920612e-19], [ -2.28287465e-19, 3.54015545e-35]]), np.array([[ 1.74139035e-33, 5.09236793e-17], [ 5.09233343e-17, 1.47866314e+00]])]) hint_prediction_matrix = np.dot(A,np.dot(R[0], A.T)) Lines 252 to 253: # choose indices above threshold to keep hint_indices = hint_prediction_matrix > threshold This is like the first paragraph describing theta in "4.4 Solving Relational Learning Tasks" in M. Nickel et al., "A Three-Way Model for Collective Learning on Multi-Relational Data" A short program to implement lines 252 to 253 would be: threshold = 5.99602451e-04 hint_indices = hint_prediction_matrix > threshold print hint_indices >> array([[ True, False, False, False], [False, True, True, False], [False, True, True, False], [False, False, False, False]], dtype=bool) Lines 252 to 257:
#if not keepScore: # hint_prediction_matrix[hint_indices] = 1 hint_mask_matrix = np.zeros(hint_prediction_matrix.shape) hint_mask_matrix[hint_indices] = 1 print hint_indices print hint_mask_matrix I am not sure what lines253 and 254 do, so I commented them out (keepScore = True or keepScore=False) and got the same result... | ||||||
Matching data with sparql queries for tiny-mixed-example in Ext-Rescal
(1) Create turtle data:
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix : <http://example.org/> .
dbr:Vibeke :member-of dbr:Tristania .
dbr:Morten :member-of dbr:Tristania .
dbr:Tristania :genre dbr:Metal .
:author1 :cites :author1 .
:author2 :cites :author1 .
:author2 :cites :author2 .
(2) Load data into Blazegraph with a sparql update.
(2) Find all distinct predicates
SELECT DISTINCT ?p
WHERE { ?s ?p ?o . }
<http://example.org/cites>
<http://example.org/genre>
<http://example.org/member-of>
(2a.r) Select the subjects as rows for :member-of
SELECT ?s
WHERE { ?s <http://example.org/member-of> ?o . }
s
<http://dbpedia.org/resource/Morten> === > 2
<http://dbpedia.org/resource/Vibeke> === > 1
Check:
Matches 1-rows in tiny-mixed-example:
1 2
(2a.c) Select the objects as columns for :member-of
SELECT ?o
WHERE { ?s <http://example.org/member-of> ?o . }
<http://dbpedia.org/resource/Tristania> === > 0
<http://dbpedia.org/resource/Tristania> === > 0
Check:
Matches 1-cols in tiny-mixed-example:
0 0
(2b.r) Select the subjects as rows for :genre
SELECT ?s
WHERE { ?s <http://example.org/genre> ?o . }
<http://dbpedia.org/resource/Tristania> ==> 0
Check:
Matches 2-rows in tiny-mixed-example:
0
(2b.c) Select the objects as rows for :genre
SELECT ?o
WHERE { ?s <http://example.org/genre> ?o . }
<http://dbpedia.org/resource/Metal> ==> 3
Check:
Matches 2-cols in tiny-mixed-example:
0
(2c.r) Select the subjects as rows for :cites
SELECT ?s
WHERE { ?s <http://example.org/cites> ?o . }
<http://example.org/author1> ==> 4
<http://example.org/author2> ==> 5
<http://example.org/author2> ==> 5
Check:
Matches 3-rows in tiny-mixed-example:
4 5 5
(2c.c) Select the objects as rows for :cites
SELECT ?o
WHERE { ?s <http://example.org/cites> ?o . }
<http://example.org/author1> ==> 4
<http://example.org/author1> ==> 4
<http://example.org/author2> ==> 5
Check:
Matches 3-cols in tiny-mixed-example:
4 4 5
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix : <http://example.org/> .
dbr:Vibeke :member-of dbr:Tristania .
dbr:Morten :member-of dbr:Tristania .
dbr:Tristania :genre dbr:Metal .
:author1 :cites :author1 .
:author2 :cites :author1 .
:author2 :cites :author2 .
(2) Load data into Blazegraph with a sparql update.
(2) Find all distinct predicates
SELECT DISTINCT ?p
WHERE { ?s ?p ?o . }
<http://example.org/cites>
<http://example.org/genre>
<http://example.org/member-of>
(2a.r) Select the subjects as rows for :member-of
SELECT ?s
WHERE { ?s <http://example.org/member-of> ?o . }
s
<http://dbpedia.org/resource/Morten> === > 2
<http://dbpedia.org/resource/Vibeke> === > 1
Check:
Matches 1-rows in tiny-mixed-example:
1 2
(2a.c) Select the objects as columns for :member-of
SELECT ?o
WHERE { ?s <http://example.org/member-of> ?o . }
<http://dbpedia.org/resource/Tristania> === > 0
<http://dbpedia.org/resource/Tristania> === > 0
Check:
Matches 1-cols in tiny-mixed-example:
0 0
(2b.r) Select the subjects as rows for :genre
SELECT ?s
WHERE { ?s <http://example.org/genre> ?o . }
<http://dbpedia.org/resource/Tristania> ==> 0
Check:
Matches 2-rows in tiny-mixed-example:
0
(2b.c) Select the objects as rows for :genre
SELECT ?o
WHERE { ?s <http://example.org/genre> ?o . }
<http://dbpedia.org/resource/Metal> ==> 3
Check:
Matches 2-cols in tiny-mixed-example:
0
(2c.r) Select the subjects as rows for :cites
SELECT ?s
WHERE { ?s <http://example.org/cites> ?o . }
<http://example.org/author1> ==> 4
<http://example.org/author2> ==> 5
<http://example.org/author2> ==> 5
Check:
Matches 3-rows in tiny-mixed-example:
4 5 5
(2c.c) Select the objects as rows for :cites
SELECT ?o
WHERE { ?s <http://example.org/cites> ?o . }
<http://example.org/author1> ==> 4
<http://example.org/author1> ==> 4
<http://example.org/author2> ==> 5
Check:
Matches 3-cols in tiny-mixed-example:
4 4 5
Thursday, May 3, 2018
Notes for ext. RESCAL cont. --- plots in latent space (May 3rd)
Plot the results from: https://github.com/nzhiltsov/Ext-RESCAL
cat term.embeddings.csv ( matrix A)
4.730825851270915039e-01 -6.977337002813972351e-17
1.157697052140589156e+00 -1.451761439325526721e-16
4.522254109924176389e-03 -5.860147020766050828e-19
4.822701779013331954e-17 1.404572401961088790e+00
cat entity.embeddings.csv ( matrix V.transpose() )
5.287400282344740798e-01 -2.648288074937905172e-17
6.472985753526332431e-01 -1.883524571199644730e-17
6.472985753526332431e-01 -1.883524571199644730e-17
2.528510059971223606e-03 -8.399779633628449750e-20
9.723663381107547794e-17 7.118437655042088030e-01
9.723718319815804104e-17 7.118437655042088030e-01
Data to Plot : L1*300 L2*300
Assuming that both matrices can be plotted together gives:
Not plotted: latent.factors.csv ( matrices Rk concatenated )
cat term.embeddings.csv ( matrix A)
4.730825851270915039e-01 -6.977337002813972351e-17
1.157697052140589156e+00 -1.451761439325526721e-16
4.522254109924176389e-03 -5.860147020766050828e-19
4.822701779013331954e-17 1.404572401961088790e+00
Data to Plot : L1*300
L2*300
5.287400282344740798e-01 -2.648288074937905172e-17
6.472985753526332431e-01 -1.883524571199644730e-17
6.472985753526332431e-01 -1.883524571199644730e-17
2.528510059971223606e-03 -8.399779633628449750e-20
9.723663381107547794e-17 7.118437655042088030e-01
9.723718319815804104e-17 7.118437655042088030e-01
Data to Plot : L1*300 L2*300
Assuming that both matrices can be plotted together gives:
Not plotted: latent.factors.csv ( matrices Rk concatenated )
Notes for Ext-Rescal (may 3rd)
I know this should be stunningly obvious, but I
lack the background and insight you do.
I
know that we are following the rank-r factorization:
Xk = A*R*A.T
I ran a version of RESCAL called Ext-RESCAL, and chose the tiny-example dataset. It is the graph on the left under “Let's imagine we have the following semantic graph:” at the url: https://github.com/nzhiltsov/Ext-RESCAL . I guessed that it would give the probability of links.
Xk = A*R*A.T
I ran a version of RESCAL called Ext-RESCAL, and chose the tiny-example dataset. It is the graph on the left under “Let's imagine we have the following semantic graph:” at the url: https://github.com/nzhiltsov/Ext-RESCAL . I guessed that it would give the probability of links.
----------------
For the slice for relation 1, member-of:
For the slice for relation 1, member-of:
X1
= ([[0,0,0,0],[1,0,0,0],[1,0,0,0],[0,0,0,0]])
Link-Representation
= ([[AA,AB,AC,AD],[BA,BB,BC,BD],[CA,CB,CC,CD],[DA,DB,DC,DD]])
A*R1*A.T
=
([[
4.30000000e-06, -3.81164826e-21, -3.81164826e-21, -1.48892510e-23],
[ 9.99989146e-01, -8.88178420e-16, -8.88178420e-16, -3.46944695e-18],
[ 9.99989146e-01, -8.88178420e-16, -8.88178420e-16, -3.46944695e-18],
[ 3.90620760e-03, -3.46944695e-18, -3.46944695e-18,-1.35525272e-20]])
[ 9.99989146e-01, -8.88178420e-16, -8.88178420e-16, -3.46944695e-18],
[ 9.99989146e-01, -8.88178420e-16, -8.88178420e-16, -3.46944695e-18],
[ 3.90620760e-03, -3.46944695e-18, -3.46944695e-18,-1.35525272e-20]])
Most
probable links:
AA
= 4.30000000e-06 ( dbr:Tristania member-of dbr:Tristania )
BA = 9.99989146e-01 ( dbr:Vibecke member-of dbr:Tristania )
CA = 9.99989146e-01 ( dbr:Morten member-of dbr:Tristania )
DA = 3.90620760e-03 ( dbr:Metal member-of dbr:Tristania )
BA = 9.99989146e-01 ( dbr:Vibecke member-of dbr:Tristania )
CA = 9.99989146e-01 ( dbr:Morten member-of dbr:Tristania )
DA = 3.90620760e-03 ( dbr:Metal member-of dbr:Tristania )
For
the slice for relation 2, genre:
X2 = ([[0,0,0,1],[0,0,0,0],[0,0,0,0],[0,0,0,0]])
Link-Representation
= ([[AA,AB,AC,AD],[BA,BB,BC,BD],[CA,CB,CC,CD],[DA,DB,DC,DD]])
A*R2*A.T
=
([[
-2.16840434e-19, 1.95311646e-03, 1.95311646e-03, 7.62936119e-06],
[ 1.92592994e-34, -1.72202920e-18, -1.72202920e-18, -6.72667656e-21],
[ 1.92592994e-34 -1.72202920e-18, -1.72202920e-18, -6.72667656e-21],
[ 7.52316385e-37, -6.72667656e-21, -6.72667656e-21, -2.62760803e-23]])
[ 1.92592994e-34, -1.72202920e-18, -1.72202920e-18, -6.72667656e-21],
[ 1.92592994e-34 -1.72202920e-18, -1.72202920e-18, -6.72667656e-21],
[ 7.52316385e-37, -6.72667656e-21, -6.72667656e-21, -2.62760803e-23]])
Most probable links:
AB
= 1.95311646e-03 ( dbr:Tristania genre dbr:Vibecke )
AC = 1.95311646e-03 ( dbr:Tristania genre dbr:Morten )
AD = 7.62936119e-06 ( dbr:Tristania genre dbr:Metal )
Thanks for your time. Best regards, Brent .
AC = 1.95311646e-03 ( dbr:Tristania genre dbr:Morten )
AD = 7.62936119e-06 ( dbr:Tristania genre dbr:Metal )
Thanks for your time. Best regards, Brent .
–---------------------------------------------------------------------------
Prediction of Unknown Triples (Section 3.3: Factorizing YAGO)::
or Canonical Relational Learning (section 4.4: A Three-Way Model for Collective Learning on Multi-Relational Data)
Prediction of Unknown Triples (Section 3.3: Factorizing YAGO)::
or Canonical Relational Learning (section 4.4: A Three-Way Model for Collective Learning on Multi-Relational Data)
A*Rk*aj
Entity A, [AA,AB,AC,AD] :
s1 = np.matmul(A,R1)
np.matmul(s1,A[0,:])
Entity A, [AA,AB,AC,AD] :
s1 = np.matmul(A,R1)
np.matmul(s1,A[0,:])
array([
4.30000000e-06, 9.99989146e-01, 9.99989146e-01,
3.90620760e-03]
Entity
B, [BA,BB,BC,BD]:
np.matmul(s1,A[1,:])
np.matmul(s1,A[1,:])
array([
-3.81164826e-21, -8.88178420e-16, -8.88178420e-16,
-3.46944695e-18])
Entity
C, [CA,CB,CC,CD]:
np.matmul(s1,A[2,:])
np.matmul(s1,A[2,:])
array([
-3.81164826e-21, -8.88178420e-16, -8.88178420e-16,
-3.46944695e-18])
Entity
D, [DA,DB,DC,DD]:
np.matmul(s1,A[3,:])
array([
-1.48892510e-23, -3.46944695e-18, -3.46944695e-18,
-1.35525272e-20])
Create
Ranking:::
AB
=> 9.99989146e-01 ( dbr:Tristania member-of dbr:Vibecke)
AC => 9.99989146e-01 ( dbr:Tristania member-of dbr:Vibecke )
AD => 3.90620760e-03 (dbr:Tristania member-of dbr:Metal)
AA => 4.30000000e-06 ( dbr:Tristania member-of dbr:Tristania )
==================================================================
AC => 9.99989146e-01 ( dbr:Tristania member-of dbr:Vibecke )
AD => 3.90620760e-03 (dbr:Tristania member-of dbr:Metal)
AA => 4.30000000e-06 ( dbr:Tristania member-of dbr:Tristania )
==================================================================
Entity
A, [AA,AB,AC,AD] :
s2 = np.matmul(A,R2)
np.matmul(s2,A[0,:])
s2 = np.matmul(A,R2)
np.matmul(s2,A[0,:])
array([
-2.16840434e-19, 1.92592994e-34, 1.92592994e-34,
7.52316385e-37])
Entity
B, [BA,BB,BC,BD]:
np.matmul(s2,A[1,:])
array([
1.95311646e-03, -1.72202920e-18, -1.72202920e-18,
-6.72667656e-21])
Entity
C, [CA,CB,CC,CD]:
np.matmul(s2,A[2,:])
array([
1.95311646e-03, -1.72202920e-18, -1.72202920e-18,
-6.72667656e-21])
Entity D, [DA,DB,DC,DD]:
np.matmul(s2,A[3,:])
Entity D, [DA,DB,DC,DD]:
np.matmul(s2,A[3,:])
array([
7.62936119e-06, -6.72667656e-21, -6.72667656e-21,
-2.62760803e-23])
Create Ranking:::
Create Ranking:::
BA => 1.95311646e-03 (dbr:Vibecke genre dbr:Tristania)
CA => 1.95311646e-03 ( dbr:Morten genre dbr:Tristania)
DA => 7.62936119e-06 ( dbr:Metal genre dbr:Tristania)
================================================
Retrieval of similar entities
(Section 3.3.2: Factorizing YAGO)::
or Link-based clustering (section 4.4: A Three-Way Model for Collective Learning on Multi-Relational Data)
A
or Link-based clustering (section 4.4: A Three-Way Model for Collective Learning on Multi-Relational Data)
A
array([[-0.70710678,
0.70710678],
[ 0.52943053,
0.52943053],
[ 0.52943053,
0.52943053],
[ 0.00206809,
0.00206809]])
Corresponds to Entities A. B. C, D in row order
Hypothesis: B and C are alike…
B and C are closer to D than they are to A
=====================================
Corresponds to Entities A. B. C, D in row order
Hypothesis: B and C are alike…
B and C are closer to D than they are to A
=====================================
Collective
Classification::
Add an additional slice mapping all entities to classes with the classOf relationship:
Add an additional slice mapping all entities to classes with the classOf relationship:
which gives us an entity-term
matrix to add as a slice??
But possibly this is not the
case, since R only has three slices???
member-of
genre
cites
Aha,
do it seperately. “The basic idea is to process attribute values
just as described above, but to add the <predicate, value>
pairs to a separate entity-attributes matrix D and not to tensor X”
(Factorizing YAGO)
D = AV
D =
D = AV
D =
[[ 1. 0. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 0. 1. 0.]
[ 0. 0. 0. 1.]
[ 0. 0. 0. 1.]]
A =
[[ 5.28740028e-01 -2.64828807e-17]
A =
[[ 5.28740028e-01 -2.64828807e-17]
[ 6.47298575e-01
-1.88352457e-17]
[ 6.47298575e-01
-1.88352457e-17]
[ 2.52851006e-03
-8.39977963e-20]
[ 9.72366338e-17
7.11843766e-01]
[ 9.72371832e-17
7.11843766e-01]]
V =
V =
[[ 4.73082585e-01
1.15769705e+00 4.52225411e-03 4.82270178e-17]
[ -6.97733700e-17
-1.45176144e-16 -5.86014702e-19 1.40457240e+00]]
import numpy as np
>>> A = np.array([[
5.28740028e-01, -2.64828807e-17],
... [ 6.47298575e-01,
-1.88352457e-17],
... [ 6.47298575e-01,
-1.88352457e-17],
... [ 2.52851006e-03,
-8.39977963e-20],
... [ 9.72366338e-17,
7.11843766e-01],
... [ 9.72371832e-17,
7.11843766e-01]])
V = np.array([[
4.73082585e-01, 1.15769705e+00, 4.52225411e-03, 4.82270178e-17],
[ -6.97733700e-17,
-1.45176144e-16, -5.86014702e-19, 1.40457240e+00]])
D = np.matmul(A,V)
D
array([[ 2.50137699e-01,
6.12120771e-01, 2.39109676e-03,
-1.16975686e-17],
[ 3.06225683e-01,
7.49375651e-01, 2.92724864e-03,
4.76181364e-18],
[ 3.06225683e-01,
7.49375651e-01, 2.92724864e-03,
4.76181364e-18],
[ 1.19619408e-03,
2.92724864e-03, 1.14345650e-05,
3.96151333e-21],
[ -3.66678039e-18,
9.22783102e-18, 2.25778544e-20,
9.99836107e-01],
[ -3.66652048e-18,
9.22846706e-18, 2.25803390e-20,
9.99836107e-01]])
[[ 2.50137699e-01,
6.12120771e-01, 2.39109676e-03, -1.16975686e-17],
[ 3.06225683e-01,
7.49375651e-01, 2.92724864e-03, 4.76181364e-18],
[ 3.06225683e-01,
7.49375651e-01, 2.92724864e-03, 4.76181364e-18],
[ 1.19619408e-03,
2.92724864e-03, 1.14345650e-05, 3.96151333e-21],
[ -3.66678039e-18,
9.22783102e-18, 2.25778544e-20, 9.99836107e-01],
[ -3.66652048e-18,
9.22846706e-18, 2.25803390e-20,
9.99836107e-01]])
=
[[ 1. 0. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 0. 1. 0.]
[ 0. 0. 0. 1.]
[ 0. 0. 0. 1.]]
with
Predicted Triples are in Bold (but some of these look incorrect):
dbr:Tristania, band : 2.50137699e-01
dbr:Vibeke, band :
3.06225683e-01
dbr:Morten, band :
3.06225683e-01
dbr:Metal, band
: 1.19619408e-03
dbr:Vibeke, member : 7.49375651e-01
dbr:Morten, member : 7.49375651e-01
dbr:Vibeke, member : 7.49375651e-01
dbr:Morten, member : 7.49375651e-01
dbr:Tristania, member
: 6.12120771e-01
dbr:Metal, member
: 2.92724864e-03
dbr:Metal, genre : 1.14345650e-05
dbr:Metal, genre : 1.14345650e-05
dbr:Tristania, genre
: 2.39109676e-03
dbr:Vibeke, genre : 2.92724864e-03
dbr:Morten, genre : 2.92724864e-03
author1, tensor : 9.99836107e-01
dbr:Vibeke, genre : 2.92724864e-03
dbr:Morten, genre : 2.92724864e-03
author1, tensor : 9.99836107e-01
author2, tensor :
9.99836107e-01
Subscribe to:
Posts (Atom)