Skip to content

Commit

Permalink
Prepare new release (#11)
Browse files Browse the repository at this point in the history
* prepare release

* fix conflict version and prepare release

* clean up and prepare release

* test branch before merging

* fix lint format

* refractor src

* refractor src

* refractor src

* add testcase

* enhance hydrogen inference faster

* add limitation for hydrogen inference, tested working well on 200k data points

* enhance its_extraction

* update changelog

* update readme.md

* fix lint
  • Loading branch information
TieuLongPhan authored Dec 9, 2024
1 parent af0dc37 commit d4935a6
Show file tree
Hide file tree
Showing 36 changed files with 1,590 additions and 771 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,5 @@ Data/Temp/Benchmark/Complete/*
Data/Temp/Benchmark/Hier/*
Data/Temp/Benchmark/Raw/*
*.ipynb
*backup
bug.py
44 changes: 44 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Changelog

All notable changes to this project will be documented in this file.

## [0.0.6] - 2024-12-09

### Added
- **New feature**: Added `ITSArbitrary` to generate all possible ITS Graphs from AAM (Atom-Atom Mapping).
- **Note**: This feature is still under development and may experience slow performance and potential memory issues when the number of combinations exceeds 8! (i.e., 40,320).

- **Dependencies split**: Separate lightweight installation options:
- `pip install syntemp`: Installs the basic version (without Atom-Atom Mapping tools).
- `pip install syntemp[all]`: Installs the full set of dependencies, including tools for Atom-Atom Mapping (`rxnmapper`, `localmapper`, and `graphormermapper`).

### Changed
- **Enhanced rule clustering**: Improved rule clustering functionality to handle batch processing and mitigate the combinatorial explosion problem.
- **Note**: This change does not yet integrate with Hierarchical Clustering.

- **Improved Isomorphic Filter**: Integrated a new graph signature filter, reducing the ITS clustering time from 78 minutes to just 2 minutes.

- **Hydrogen Inference Refactor**:
- Refactored the hydrogen inference function for better readability and efficiency.
- Integrated graph signature to reduce isomorphism checks, improving processing time by 50%.
- Removed the `timeout` option (planned for removal in version 0.0.10 due to redundancy).
- **Issue**: The process remains slow, and there may be potential memory explosion if the number of combinations exceeds 8! (i.e., 40,320).

- **ITSExtraction refactor**: Removed redundant `deepcopy` calls, resulting in a 50% reduction in processing time.

- **Code Cleanup**:
- Removed redundant Python functions, which have now been moved to the `synutility` repository.
- Removed unused variables in the `syntemp` command-line interface (CLI).

### Deprecated
- **Deprecation Warning**: The following features will be removed in version 0.0.10:
- `SynUtils` and `SynChemistry` modules have been moved to the `synutility` repository.
- `ITSConstruction` will also be removed as it has been moved to `synutility`.

### Fixed
- **Memory Usage & Performance**: Improved memory management and processing performance in several functions, especially in hydrogen inference and ITS extraction.

### Security
- No security updates in this release.

---
Binary file added Data/Testcase/hydrogen_test.pkl.gz
Binary file not shown.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,10 @@ If you want to run ensemble AAMs
```
pip install syntemp
```
Optional if you want to install full version
```
pip install syntemp[all]
```

4. **Verify Installation:**
After installation, you can verify that Syn Temp is correctly installed by running a simple test
Expand Down Expand Up @@ -116,7 +120,6 @@ If you want to run ensemble AAMs
safe_mode=False,
save_dir=None,
fix_hydrogen=True,
refinement_its=False,
)

(gml_rules, reaction_dicts, templates, hier_templates,
Expand Down
84 changes: 84 additions & 0 deletions Test/SynITS/test_hydrogen_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
import unittest
import networkx as nx
from synutility.SynIO.data_type import load_from_pickle
from syntemp.SynITS.hydrogen_utils import (
check_explicit_hydrogen,
check_hcount_change,
get_cycle_member_rings,
get_priority,
)


class TestGraphFunctions(unittest.TestCase):

def setUp(self):
# Create a test graph for the tests
self.data = load_from_pickle("./Data/Testcase/hydrogen_test.pkl.gz")

def test_check_explicit_hydrogen(self):
# Test the check_explicit_hydrogen function
# Note, usually only appear in reactants (+H2 reactions)
count_r, hydrogen_nodes_r = check_explicit_hydrogen(
self.data[20]["ITSGraph"][0]
)
self.assertEqual(count_r, 2)
self.assertEqual(hydrogen_nodes_r, [45, 46])

def test_check_hcount_change(self):
# Test the check_hcount_change function
max_change = check_hcount_change(
self.data[20]["ITSGraph"][0], self.data[20]["ITSGraph"][0]
)
self.assertEqual(max_change, 2)

def test_get_cycle_member_rings_minimal(self):
# Test get_cycle_member_rings with 'minimal' cycles
member_rings = get_cycle_member_rings(self.data[1]["GraphRules"][2], "minimal")
self.assertEqual(member_rings, [4]) # Cycles of size 4 and 3

def test_get_priority(self):
# Create a test graph for the tests
self.graph = nx.Graph()
self.graph.add_nodes_from(
[
(1, {"element": "H", "hcount": 2}),
(2, {"element": "C", "hcount": 1}),
(3, {"element": "H", "hcount": 1}),
]
)
self.graph.add_edges_from([(1, 2), (2, 3)])

# Create another graph for `check_hcount_change` tests
self.prod_graph = nx.Graph()
self.prod_graph.add_nodes_from(
[
(1, {"element": "H", "hcount": 1}),
(2, {"element": "C", "hcount": 1}),
(3, {"element": "H", "hcount": 2}),
]
)
self.prod_graph.add_edges_from([(1, 2), (2, 3)])

# Create a more complex graph for cycle tests
self.complex_graph = nx.Graph()
self.complex_graph.add_edges_from(
[
(1, 2),
(2, 3),
(3, 4),
(4, 1), # A simple square cycle
(3, 5),
(5, 6),
(6, 3), # Another cycle
]
)
reaction_centers = [self.graph, self.prod_graph, self.complex_graph]

# Get priority indices
priority_indices = get_priority(reaction_centers)

self.assertEqual(priority_indices, [0, 1])


if __name__ == "__main__":
unittest.main()
30 changes: 8 additions & 22 deletions Test/SynITS/test_its_extraction.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
from syntemp.SynITS.its_extraction import ITSExtraction
from syntemp.SynITS.its_construction import ITSConstruction

from synutility.SynIO.Format.smi_to_graph import rsmi_to_graph


class TestITSExtraction(unittest.TestCase):

Expand Down Expand Up @@ -29,31 +31,15 @@ def setUp(self):
]
self.mapper_names = ["local_mapper", "rxn_mapper", "graphormer"]

def test_graph_from_smiles(self):
graph = ITSExtraction.graph_from_smiles(self.smiles1)
self.assertEqual(len(graph.nodes()), 4)
self.assertEqual(len(graph.edges()), 3)

def test_check_equivariant_graph(self):
react_local_mapper, prod_local_mapper = self.mapped_smiles_list[0][
"local_mapper"
].split(">>")
G_local = ITSExtraction.graph_from_smiles(react_local_mapper)
H_local = ITSExtraction.graph_from_smiles(prod_local_mapper)
G_local, H_local = rsmi_to_graph(self.mapped_smiles_list[0]["local_mapper"])
ITS_local = ITSConstruction.ITSGraph(G_local, H_local)

react_rxn_mapper, prod_rxn_mapper = self.mapped_smiles_list[0][
"rxn_mapper"
].split(">>")
G_rxn = ITSExtraction.graph_from_smiles(react_rxn_mapper)
H_rxn = ITSExtraction.graph_from_smiles(prod_rxn_mapper)
G_rxn, H_rxn = rsmi_to_graph(self.mapped_smiles_list[0]["rxn_mapper"])
ITS_rxn = ITSConstruction.ITSGraph(G_rxn, H_rxn)

react_graphormer, prod_graphormer = self.mapped_smiles_list[0][
"graphormer"
].split(">>")
G_graphormer = ITSExtraction.graph_from_smiles(react_graphormer)
H_graphormer = ITSExtraction.graph_from_smiles(prod_graphormer)
G_graphormer, H_graphormer = rsmi_to_graph(
self.mapped_smiles_list[0]["graphormer"]
)
ITS_graphormer = ITSConstruction.ITSGraph(G_graphormer, H_graphormer)

classified, equivariant = ITSExtraction.check_equivariant_graph(
Expand Down Expand Up @@ -82,7 +68,7 @@ def test_parallel_process_smiles(self):
self.assertIsNotNone(results[0]["GraphRules"])

# Inequivalent AAM
self.assertEqual(results_wrong[0]["equivariant"], 0)
self.assertEqual(results_wrong[0]["equivariant"], -1) # -1 mean exit early

def test_unsanitize_smiles(self):
test_2 = {
Expand Down
161 changes: 96 additions & 65 deletions Test/SynITS/test_its_hadjuster.py
Original file line number Diff line number Diff line change
@@ -1,65 +1,96 @@
# import unittest
# import networkx as nx
# from SynTemp.SynITS.its_hadjuster import ITSHAdjuster


# class TestITSHAdjuster(unittest.TestCase):

# def create_mock_graph(self, hcounts: dict) -> nx.Graph:
# """Utility function to create a mock graph with specified
# hydrogen counts for nodes."""
# graph = nx.Graph()
# for node_id, hcount in hcounts.items():
# graph.add_node(node_id, hcount=hcount)
# return graph

# def test_check_hcount_change(self):
# # Mock reactant and product graphs with specified hydrogen counts
# react_graph = self.create_mock_graph({1: 1, 2: 2})
# prod_graph = self.create_mock_graph({1: 0, 2: 3})

# # Expected: one hydrogen formation (node 1) and one hydrogen break (node 2)
# max_hydrogen_change = ITSHAdjuster.check_hcount_change(react_graph, prod_graph)
# self.assertEqual(max_hydrogen_change, 1)

# def test_add_hydrogen_nodes(self):
# # Mock reactant and product graphs with specified hydrogen counts
# react_graph = self.create_mock_graph({1: 1})
# prod_graph = self.create_mock_graph({1: 0})

# # Add hydrogen nodes to reactant and product graphs
# updated_react_graph, _ = ITSHAdjuster.add_hydrogen_nodes(
# react_graph, prod_graph
# )

# # Verify that hydrogen nodes have been added correctly
# self.assertIn(
# max(updated_react_graph.nodes), updated_react_graph.nodes
# ) # Hydrogen node added to reactant graph
# self.assertEqual(
# updated_react_graph.nodes[max(updated_react_graph.nodes)]["element"], "H"
# ) # Check element of added node

# def test_add_hydrogen_nodes_multiple(self):
# # Mock reactant and product graphs with specified hydrogen counts
# react_graph = self.create_mock_graph({1: 2, 2: 1})
# prod_graph = self.create_mock_graph({1: 0, 2: 2})

# # Generate updated graph pairs with multiple hydrogen nodes added
# updated_graph_pairs = ITSHAdjuster.add_hydrogen_nodes_multiple(
# react_graph, prod_graph
# )

# # Verify that multiple updated graph pairs are generated
# self.assertTrue(len(updated_graph_pairs) > 1) # Multiple permutations generated
# for react_graph, prod_graph in updated_graph_pairs:
# self.assertIn(
# max(react_graph.nodes), react_graph.nodes
# ) # Hydrogen node added to reactant graph
# self.assertIn(
# max(prod_graph.nodes), prod_graph.nodes
# ) # Hydrogen node added to product graph


# if __name__ == "__main__":
# unittest.main()
import unittest
import networkx as nx
from copy import deepcopy
from synutility.SynIO.data_type import load_from_pickle
from syntemp.SynITS.its_hadjuster import ITSHAdjuster


class TestITSHAdjuster(unittest.TestCase):

def setUp(self):
"""Setup before each test."""
# Create sample graphs
self.data = load_from_pickle("./Data/Testcase/hydrogen_test.pkl.gz")

def test_process_single_graph_data_success(self):
"""Test the process_single_graph_data method."""
processed_data = ITSHAdjuster.process_single_graph_data(
self.data[0], "ITSGraph"
)
for value in processed_data["ITSGraph"]:
self.assertTrue(isinstance(value, nx.Graph))
for value in processed_data["GraphRules"]:
self.assertTrue(isinstance(value, nx.Graph))

def test_process_single_graph_data_fail(self):
"""Test the process_single_graph_data method."""
processed_data = ITSHAdjuster.process_single_graph_data(
self.data[16], "ITSGraph"
)
self.assertIsNone(processed_data["ITSGraph"])
self.assertIsNone(processed_data["GraphRules"])

def test_process_single_graph_data_empty_graph(self):
"""Test that an empty graph results in empty ITSGraph and GraphRules."""
empty_graph_data = {
"ITSGraph": [None, None, None],
"GraphRules": [None, None, None],
}

processed_data = ITSHAdjuster.process_single_graph_data(
empty_graph_data, "ITSGraph"
)

# Ensure the result is None or empty as expected for an empty graph
self.assertIsNone(processed_data["ITSGraph"])
self.assertIsNone(processed_data["GraphRules"])

def test_process_single_graph_data_safe(self):
"""Test the process_single_graph_data method."""
processed_data = ITSHAdjuster.process_single_graph_data_safe(
self.data[0], "ITSGraph", job_timeout=0.0001
)
self.assertIsNone(processed_data["ITSGraph"])
self.assertIsNone(processed_data["GraphRules"])

def test_process_graph_data_parallel(self):
"""Test the process_graph_data_parallel method."""
result = ITSHAdjuster().process_graph_data_parallel(
self.data, "ITSGraph", n_jobs=1, verbose=0, get_priority_graph=True
)
result = [value for value in result if value["ITSGraph"]]
# Check if the result matches the input data structure
self.assertEqual(len(result), 48)

def test_process_graph_data_parallel_safe(self):
"""Test the process_graph_data_parallel method."""
result = ITSHAdjuster().process_graph_data_parallel(
self.data,
"ITSGraph",
n_jobs=1,
verbose=0,
get_priority_graph=True,
safe=True,
job_timeout=0.0001, # lower timeout will fail all process
)
result = [value for value in result if value["ITSGraph"]]
# Check if the result matches the input data structure
self.assertEqual(len(result), 0)

def test_process_multiple_hydrogens(self):
"""Test the process_multiple_hydrogens method."""
graphs = deepcopy(self.data[0])
react_graph, prod_graph, _ = graphs["ITSGraph"]

result = ITSHAdjuster.process_multiple_hydrogens(
graphs, react_graph, prod_graph, ignore_aromaticity=False, balance_its=True
)

for value in result["ITSGraph"]:
self.assertTrue(isinstance(value, nx.Graph))
for value in result["GraphRules"]:
self.assertTrue(isinstance(value, nx.Graph))


if __name__ == "__main__":
unittest.main()
Loading

0 comments on commit d4935a6

Please sign in to comment.