Prepare new release (#11)

* prepare release * fix conflict version and prepare release * clean up and prepare release * test branch before merging * fix lint format * refractor src * refractor src * refractor src * add testcase * enhance hydrogen inference faster * add limitation for hydrogen inference, tested working well on 200k data points * enhance its_extraction * update changelog * update readme.md * fix lint
TieuLongPhan · Dec 9, 2024 · d4935a6 · d4935a6
1 parent af0dc37
commit d4935a6
Show file tree

Hide file tree

Showing 36 changed files with 1,590 additions and 771 deletions.
diff --git a/.gitignore b/.gitignore
@@ -15,3 +15,5 @@ Data/Temp/Benchmark/Complete/*
 Data/Temp/Benchmark/Hier/*
 Data/Temp/Benchmark/Raw/*
 *.ipynb
+*backup
+bug.py
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,44 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.  
+
+## [0.0.6] - 2024-12-09
+
+### Added
+- **New feature**: Added `ITSArbitrary` to generate all possible ITS Graphs from AAM (Atom-Atom Mapping).  
+  - **Note**: This feature is still under development and may experience slow performance and potential memory issues when the number of combinations exceeds 8! (i.e., 40,320).
+
+- **Dependencies split**: Separate lightweight installation options:
+  - `pip install syntemp`: Installs the basic version (without Atom-Atom Mapping tools).
+  - `pip install syntemp[all]`: Installs the full set of dependencies, including tools for Atom-Atom Mapping (`rxnmapper`, `localmapper`, and `graphormermapper`).
+
+### Changed
+- **Enhanced rule clustering**: Improved rule clustering functionality to handle batch processing and mitigate the combinatorial explosion problem.
+  - **Note**: This change does not yet integrate with Hierarchical Clustering.
+
+- **Improved Isomorphic Filter**: Integrated a new graph signature filter, reducing the ITS clustering time from 78 minutes to just 2 minutes.
+
+- **Hydrogen Inference Refactor**: 
+  - Refactored the hydrogen inference function for better readability and efficiency.
+  - Integrated graph signature to reduce isomorphism checks, improving processing time by 50%.
+  - Removed the `timeout` option (planned for removal in version 0.0.10 due to redundancy).
+  - **Issue**: The process remains slow, and there may be potential memory explosion if the number of combinations exceeds 8! (i.e., 40,320).
+
+- **ITSExtraction refactor**: Removed redundant `deepcopy` calls, resulting in a 50% reduction in processing time.
+
+- **Code Cleanup**: 
+  - Removed redundant Python functions, which have now been moved to the `synutility` repository.
+  - Removed unused variables in the `syntemp` command-line interface (CLI).
+
+### Deprecated
+- **Deprecation Warning**: The following features will be removed in version 0.0.10:
+  - `SynUtils` and `SynChemistry` modules have been moved to the `synutility` repository.
+  - `ITSConstruction` will also be removed as it has been moved to `synutility`.
+
+### Fixed
+- **Memory Usage & Performance**: Improved memory management and processing performance in several functions, especially in hydrogen inference and ITS extraction.
+
+### Security
+- No security updates in this release.
+
+---
diff --git a/Data/Testcase/hydrogen_test.pkl.gz b/Data/Testcase/hydrogen_test.pkl.gz
diff --git a/README.md b/README.md
@@ -82,6 +82,10 @@ If you want to run ensemble AAMs
   ```
   pip install syntemp
   ```
+  Optional if you want to install full version
+  ```
+  pip install syntemp[all]
+  ```
 
 4. **Verify Installation:**
   After installation, you can verify that Syn Temp is correctly installed by running a simple test
@@ -116,7 +120,6 @@ If you want to run ensemble AAMs
       safe_mode=False,
       save_dir=None,
       fix_hydrogen=True,
-      refinement_its=False,
   )
 
   (gml_rules, reaction_dicts, templates, hier_templates,

diff --git a/Test/SynITS/test_hydrogen_utils.py b/Test/SynITS/test_hydrogen_utils.py
@@ -0,0 +1,84 @@
+import unittest
+import networkx as nx
+from synutility.SynIO.data_type import load_from_pickle
+from syntemp.SynITS.hydrogen_utils import (
+    check_explicit_hydrogen,
+    check_hcount_change,
+    get_cycle_member_rings,
+    get_priority,
+)
+
+
+class TestGraphFunctions(unittest.TestCase):
+
+    def setUp(self):
+        # Create a test graph for the tests
+        self.data = load_from_pickle("./Data/Testcase/hydrogen_test.pkl.gz")
+
+    def test_check_explicit_hydrogen(self):
+        # Test the check_explicit_hydrogen function
+        # Note, usually only appear in reactants (+H2 reactions)
+        count_r, hydrogen_nodes_r = check_explicit_hydrogen(
+            self.data[20]["ITSGraph"][0]
+        )
+        self.assertEqual(count_r, 2)
+        self.assertEqual(hydrogen_nodes_r, [45, 46])
+
+    def test_check_hcount_change(self):
+        # Test the check_hcount_change function
+        max_change = check_hcount_change(
+            self.data[20]["ITSGraph"][0], self.data[20]["ITSGraph"][0]
+        )
+        self.assertEqual(max_change, 2)
+
+    def test_get_cycle_member_rings_minimal(self):
+        # Test get_cycle_member_rings with 'minimal' cycles
+        member_rings = get_cycle_member_rings(self.data[1]["GraphRules"][2], "minimal")
+        self.assertEqual(member_rings, [4])  # Cycles of size 4 and 3
+
+    def test_get_priority(self):
+        # Create a test graph for the tests
+        self.graph = nx.Graph()
+        self.graph.add_nodes_from(
+            [
+                (1, {"element": "H", "hcount": 2}),
+                (2, {"element": "C", "hcount": 1}),
+                (3, {"element": "H", "hcount": 1}),
+            ]
+        )
+        self.graph.add_edges_from([(1, 2), (2, 3)])
+
+        # Create another graph for `check_hcount_change` tests
+        self.prod_graph = nx.Graph()
+        self.prod_graph.add_nodes_from(
+            [
+                (1, {"element": "H", "hcount": 1}),
+                (2, {"element": "C", "hcount": 1}),
+                (3, {"element": "H", "hcount": 2}),
+            ]
+        )
+        self.prod_graph.add_edges_from([(1, 2), (2, 3)])
+
+        # Create a more complex graph for cycle tests
+        self.complex_graph = nx.Graph()
+        self.complex_graph.add_edges_from(
+            [
+                (1, 2),
+                (2, 3),
+                (3, 4),
+                (4, 1),  # A simple square cycle
+                (3, 5),
+                (5, 6),
+                (6, 3),  # Another cycle
+            ]
+        )
+        reaction_centers = [self.graph, self.prod_graph, self.complex_graph]
+
+        # Get priority indices
+        priority_indices = get_priority(reaction_centers)
+
+        self.assertEqual(priority_indices, [0, 1])
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/Test/SynITS/test_its_extraction.py b/Test/SynITS/test_its_extraction.py
@@ -2,6 +2,8 @@
 from syntemp.SynITS.its_extraction import ITSExtraction
 from syntemp.SynITS.its_construction import ITSConstruction
 
+from synutility.SynIO.Format.smi_to_graph import rsmi_to_graph
+
 
 class TestITSExtraction(unittest.TestCase):
 
@@ -29,31 +31,15 @@ def setUp(self):
         ]
         self.mapper_names = ["local_mapper", "rxn_mapper", "graphormer"]
 
-    def test_graph_from_smiles(self):
-        graph = ITSExtraction.graph_from_smiles(self.smiles1)
-        self.assertEqual(len(graph.nodes()), 4)
-        self.assertEqual(len(graph.edges()), 3)
-
     def test_check_equivariant_graph(self):
-        react_local_mapper, prod_local_mapper = self.mapped_smiles_list[0][
-            "local_mapper"
-        ].split(">>")
-        G_local = ITSExtraction.graph_from_smiles(react_local_mapper)
-        H_local = ITSExtraction.graph_from_smiles(prod_local_mapper)
+        G_local, H_local = rsmi_to_graph(self.mapped_smiles_list[0]["local_mapper"])
         ITS_local = ITSConstruction.ITSGraph(G_local, H_local)
-
-        react_rxn_mapper, prod_rxn_mapper = self.mapped_smiles_list[0][
-            "rxn_mapper"
-        ].split(">>")
-        G_rxn = ITSExtraction.graph_from_smiles(react_rxn_mapper)
-        H_rxn = ITSExtraction.graph_from_smiles(prod_rxn_mapper)
+        G_rxn, H_rxn = rsmi_to_graph(self.mapped_smiles_list[0]["rxn_mapper"])
         ITS_rxn = ITSConstruction.ITSGraph(G_rxn, H_rxn)
 
-        react_graphormer, prod_graphormer = self.mapped_smiles_list[0][
-            "graphormer"
-        ].split(">>")
-        G_graphormer = ITSExtraction.graph_from_smiles(react_graphormer)
-        H_graphormer = ITSExtraction.graph_from_smiles(prod_graphormer)
+        G_graphormer, H_graphormer = rsmi_to_graph(
+            self.mapped_smiles_list[0]["graphormer"]
+        )
         ITS_graphormer = ITSConstruction.ITSGraph(G_graphormer, H_graphormer)
 
         classified, equivariant = ITSExtraction.check_equivariant_graph(
@@ -82,7 +68,7 @@ def test_parallel_process_smiles(self):
         self.assertIsNotNone(results[0]["GraphRules"])
 
         # Inequivalent AAM
-        self.assertEqual(results_wrong[0]["equivariant"], 0)
+        self.assertEqual(results_wrong[0]["equivariant"], -1)  # -1 mean exit early
 
     def test_unsanitize_smiles(self):
         test_2 = {

diff --git a/Test/SynITS/test_its_hadjuster.py b/Test/SynITS/test_its_hadjuster.py
@@ -1,65 +1,96 @@
-# import unittest
-# import networkx as nx
-# from SynTemp.SynITS.its_hadjuster import ITSHAdjuster
-
-
-# class TestITSHAdjuster(unittest.TestCase):
-
-#     def create_mock_graph(self, hcounts: dict) -> nx.Graph:
-#         """Utility function to create a mock graph with specified
-#         hydrogen counts for nodes."""
-#         graph = nx.Graph()
-#         for node_id, hcount in hcounts.items():
-#             graph.add_node(node_id, hcount=hcount)
-#         return graph
-
-#     def test_check_hcount_change(self):
-#         # Mock reactant and product graphs with specified hydrogen counts
-#         react_graph = self.create_mock_graph({1: 1, 2: 2})
-#         prod_graph = self.create_mock_graph({1: 0, 2: 3})
-
-#         # Expected: one hydrogen formation (node 1) and one hydrogen break (node 2)
-#         max_hydrogen_change = ITSHAdjuster.check_hcount_change(react_graph, prod_graph)
-#         self.assertEqual(max_hydrogen_change, 1)
-
-#     def test_add_hydrogen_nodes(self):
-#         # Mock reactant and product graphs with specified hydrogen counts
-#         react_graph = self.create_mock_graph({1: 1})
-#         prod_graph = self.create_mock_graph({1: 0})
-
-#         # Add hydrogen nodes to reactant and product graphs
-#         updated_react_graph, _ = ITSHAdjuster.add_hydrogen_nodes(
-#             react_graph, prod_graph
-#         )
-
-#         # Verify that hydrogen nodes have been added correctly
-#         self.assertIn(
-#             max(updated_react_graph.nodes), updated_react_graph.nodes
-#         )  # Hydrogen node added to reactant graph
-#         self.assertEqual(
-#             updated_react_graph.nodes[max(updated_react_graph.nodes)]["element"], "H"
-# )  # Check element of added node
-
-# def test_add_hydrogen_nodes_multiple(self):
-#     # Mock reactant and product graphs with specified hydrogen counts
-#     react_graph = self.create_mock_graph({1: 2, 2: 1})
-#     prod_graph = self.create_mock_graph({1: 0, 2: 2})
-
-#     # Generate updated graph pairs with multiple hydrogen nodes added
-#     updated_graph_pairs = ITSHAdjuster.add_hydrogen_nodes_multiple(
-#         react_graph, prod_graph
-#     )
-
-#     # Verify that multiple updated graph pairs are generated
-#     self.assertTrue(len(updated_graph_pairs) > 1)  # Multiple permutations generated
-#     for react_graph, prod_graph in updated_graph_pairs:
-#         self.assertIn(
-#             max(react_graph.nodes), react_graph.nodes
-#         )  # Hydrogen node added to reactant graph
-#         self.assertIn(
-#             max(prod_graph.nodes), prod_graph.nodes
-#         )  # Hydrogen node added to product graph
-
-
-# if __name__ == "__main__":
-#     unittest.main()
+import unittest
+import networkx as nx
+from copy import deepcopy
+from synutility.SynIO.data_type import load_from_pickle
+from syntemp.SynITS.its_hadjuster import ITSHAdjuster
+
+
+class TestITSHAdjuster(unittest.TestCase):
+
+    def setUp(self):
+        """Setup before each test."""
+        # Create sample graphs
+        self.data = load_from_pickle("./Data/Testcase/hydrogen_test.pkl.gz")
+
+    def test_process_single_graph_data_success(self):
+        """Test the process_single_graph_data method."""
+        processed_data = ITSHAdjuster.process_single_graph_data(
+            self.data[0], "ITSGraph"
+        )
+        for value in processed_data["ITSGraph"]:
+            self.assertTrue(isinstance(value, nx.Graph))
+        for value in processed_data["GraphRules"]:
+            self.assertTrue(isinstance(value, nx.Graph))
+
+    def test_process_single_graph_data_fail(self):
+        """Test the process_single_graph_data method."""
+        processed_data = ITSHAdjuster.process_single_graph_data(
+            self.data[16], "ITSGraph"
+        )
+        self.assertIsNone(processed_data["ITSGraph"])
+        self.assertIsNone(processed_data["GraphRules"])
+
+    def test_process_single_graph_data_empty_graph(self):
+        """Test that an empty graph results in empty ITSGraph and GraphRules."""
+        empty_graph_data = {
+            "ITSGraph": [None, None, None],
+            "GraphRules": [None, None, None],
+        }
+
+        processed_data = ITSHAdjuster.process_single_graph_data(
+            empty_graph_data, "ITSGraph"
+        )
+
+        # Ensure the result is None or empty as expected for an empty graph
+        self.assertIsNone(processed_data["ITSGraph"])
+        self.assertIsNone(processed_data["GraphRules"])
+
+    def test_process_single_graph_data_safe(self):
+        """Test the process_single_graph_data method."""
+        processed_data = ITSHAdjuster.process_single_graph_data_safe(
+            self.data[0], "ITSGraph", job_timeout=0.0001
+        )
+        self.assertIsNone(processed_data["ITSGraph"])
+        self.assertIsNone(processed_data["GraphRules"])
+
+    def test_process_graph_data_parallel(self):
+        """Test the process_graph_data_parallel method."""
+        result = ITSHAdjuster().process_graph_data_parallel(
+            self.data, "ITSGraph", n_jobs=1, verbose=0, get_priority_graph=True
+        )
+        result = [value for value in result if value["ITSGraph"]]
+        # Check if the result matches the input data structure
+        self.assertEqual(len(result), 48)
+
+    def test_process_graph_data_parallel_safe(self):
+        """Test the process_graph_data_parallel method."""
+        result = ITSHAdjuster().process_graph_data_parallel(
+            self.data,
+            "ITSGraph",
+            n_jobs=1,
+            verbose=0,
+            get_priority_graph=True,
+            safe=True,
+            job_timeout=0.0001,  # lower timeout will fail all process
+        )
+        result = [value for value in result if value["ITSGraph"]]
+        # Check if the result matches the input data structure
+        self.assertEqual(len(result), 0)
+
+    def test_process_multiple_hydrogens(self):
+        """Test the process_multiple_hydrogens method."""
+        graphs = deepcopy(self.data[0])
+        react_graph, prod_graph, _ = graphs["ITSGraph"]
+
+        result = ITSHAdjuster.process_multiple_hydrogens(
+            graphs, react_graph, prod_graph, ignore_aromaticity=False, balance_its=True
+        )
+
+        for value in result["ITSGraph"]:
+            self.assertTrue(isinstance(value, nx.Graph))
+        for value in result["GraphRules"]:
+            self.assertTrue(isinstance(value, nx.Graph))
+
+
+if __name__ == "__main__":
+    unittest.main()
-Original file line number
+Diff line change
@@ Expand Up / @@ -15,3 +15,5 @@ Data/Temp/Benchmark/Complete/* @@
     Data/Temp/Benchmark/Hier/*
     Data/Temp/Benchmark/Raw/*
     *.ipynb
+    *backup
+    bug.py