Commit
·
2ef9d8f
1
Parent(s):
4fdb3ab
Update README.md
Browse files
README.md
CHANGED
|
@@ -83,18 +83,19 @@ class KeyphraseExtractionPipeline(TokenClassificationPipeline):
|
|
| 83 |
|
| 84 |
```python
|
| 85 |
# Load pipeline
|
| 86 |
-
model_name = "
|
| 87 |
extractor = KeyphraseExtractionPipeline(model=model_name)
|
| 88 |
```
|
| 89 |
```python
|
| 90 |
# Inference
|
| 91 |
text = """
|
| 92 |
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
|
| 93 |
-
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
|
| 94 |
-
Currently, classical machine learning methods, that use statistics and linguistics,
|
| 95 |
-
The fact that these methods have been widely used in the community
|
| 96 |
-
|
| 97 |
-
|
|
|
|
| 98 |
""".replace(
|
| 99 |
"\n", ""
|
| 100 |
)
|
|
@@ -106,10 +107,9 @@ print(keyphrases)
|
|
| 106 |
|
| 107 |
```
|
| 108 |
# Output
|
| 109 |
-
['
|
| 110 |
-
'
|
| 111 |
-
'
|
| 112 |
-
'semantics' 'statistics' 'text analysis' 'transformers']
|
| 113 |
```
|
| 114 |
|
| 115 |
## 📚 Training Dataset
|
|
@@ -172,7 +172,7 @@ def preprocess_fuction(all_samples_per_split):
|
|
| 172 |
```
|
| 173 |
|
| 174 |
### Postprocessing
|
| 175 |
-
For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive
|
| 176 |
```python
|
| 177 |
# Define post_process functions
|
| 178 |
def concat_tokens_by_tag(keyphrases):
|
|
@@ -216,4 +216,4 @@ The model achieves the following results on the Inspec test set:
|
|
| 216 |
For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
|
| 217 |
|
| 218 |
## 🚨 Issues
|
| 219 |
-
Please feel free to
|
|
|
|
| 83 |
|
| 84 |
```python
|
| 85 |
# Load pipeline
|
| 86 |
+
model_name = "ml6team/keyphrase-extraction-distilbert-inspec"
|
| 87 |
extractor = KeyphraseExtractionPipeline(model=model_name)
|
| 88 |
```
|
| 89 |
```python
|
| 90 |
# Inference
|
| 91 |
text = """
|
| 92 |
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
|
| 93 |
+
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
|
| 94 |
+
Currently, classical machine learning methods, that use statistics and linguistics,
|
| 95 |
+
are widely used for the extraction process. The fact that these methods have been widely used in the community
|
| 96 |
+
has the advantage that there are many easy-to-use libraries. Now with the recent innovations in NLP,
|
| 97 |
+
transformers can be used to improve keyphrase extraction. Transformers also focus on the semantics
|
| 98 |
+
and context of a document, which is quite an improvement.
|
| 99 |
""".replace(
|
| 100 |
"\n", ""
|
| 101 |
)
|
|
|
|
| 107 |
|
| 108 |
```
|
| 109 |
# Output
|
| 110 |
+
['artificial intelligence', 'classical machine learning methods',
|
| 111 |
+
'keyphrase extraction', 'linguistics', 'statistics',
|
| 112 |
+
'text analysis']
|
|
|
|
| 113 |
```
|
| 114 |
|
| 115 |
## 📚 Training Dataset
|
|
|
|
| 172 |
```
|
| 173 |
|
| 174 |
### Postprocessing
|
| 175 |
+
For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive Bs and Is. As last you strip the keyphrase to ensure all spaces are removed.
|
| 176 |
```python
|
| 177 |
# Define post_process functions
|
| 178 |
def concat_tokens_by_tag(keyphrases):
|
|
|
|
| 216 |
For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
|
| 217 |
|
| 218 |
## 🚨 Issues
|
| 219 |
+
Please feel free to start discussions in the Community Tab.
|