Geometric
Geometric technique notes
getting images to work in jekyll + github pages is under construction
Geometric Techniques:
define distance
construct graph
filter graph
use graph
compare graphs
define distance:
Defining distance depends on the problem and data being used.
In this context, it might help to think of distance as being a similarity metric.
Further, these distances are relative in a space.
There are established ways of calculating distance:
Tunable distance metrics
“where dµ is the occurrence of diagnosis µ, and a and c are tunable constants. The first term positively rewards shared diagnoses. Note that the d−1µ term incorporates the idea that two patients sharing a rare diagnosis is more significant than a common one. The second term penalises the total number of diagnoses – this is to prevent patients with many diagnoses becoming ‘hubs’ of high connectivity, attracting imprecise matches with several non-shared diagnoses. We examine M under a k-Nearest Neighbour (k-NN) scheme to establish k edges per node. The parameters a, c and k were treated as hyperparameters (c = 0.001, a = 5 and k = 3 in the final model).
Distance between learned embeddings
The previous examplet takes a vector of patient diagnoses as an embedding.
This embedding could also be learned by using methods such as BERT as referenced in the same work.
For tabular data, you could also use K Nearest Neighbors to get distances.
construct graph
Types of graphs
- Directed
- Weighted
- Multigraph (parallel edges)
- Multiplex (multiple layers, often modeled as single layer with node/edge types)
- Hypergraph (single edge can point to multiple nodes)
Once distances are created, they can be used to create edges between the nodes.
Nodes are the entities that the distances are computed between.
It is possible to create a graph where the edges are the nodes.
It is also feasible to create a line graph. A graph where the edges are treated as nodes.
filter graph
These graphs can be complex and contain many edges. Some types of graphs can be fully connected. These will be hard to utilize as there is much to consider in a fully connected graph. This creates a need for a clever way to filter a graph’s edges while still containing the original graph’s information and hierarchical organization.
“However, the complexity of the system is ingeneral reflected in the associated graph which resultsin an intricate interweaved structure. There is thereforea general need to find methods which are able to singleout the key information by filtering such a complex graph into a simpler relevant subgraph.” Source
Paper on filtering correlation networks
Three types used in this paper:
- Dynamic Graph
- filters edges by ranking that changes per period (dynamic)
- Dynamic Threshold
- filters edges by threshold that changes per period (dynamic)
- Dynamic Minimal Spanning Tree (DMST)
- filters edges by MST per period (dynamic)