I will definitely fix bugs, but adding new functionality (especially new languages) and improving API / docs will depend on interest by the community.įixing discrepancies between NLPRule and LanguageTool behaviour will have high priority if any are found. NLPRule is currently pretty bare bones in terms of API and documentation. NLPRule is approximately 1.7x - 2.8x faster than LanguageTool. tokenize_sentence ( " She was not been here since Monday. It's not as fancy as spaCy but could be faster and had to be done anyway to apply the rules so I thought I might as well add a public API: tokens = tokenizer. NLPRule does rule + dictionary-based part-of-speech tagging and lemmatization as well as chunking with a model ported from OpenNLP. suggest_sentence also has a multi-sentence counterpart in. suggest_sentence ( " She was not been here since Monday. " ) # returns: 'He wants you to send him an email. correct ( " He wants that you send him an email. If a sentence splitter is set, you can call. load ( " en ", tokenizer, lambda texts : for text in splitter. Pro tip: You can use NNSplit for more robust sentence segmentation: from nnsplit import NNSplit splitter = NNSplit. load ( " en ", tokenizer, SplitOn ( ) ) A splitter that splits on fixed characters is included in NLPRule for convenience: from nlprule import SplitOn rules = Rules. A sentence splitter can be any function that takes a list of texts as input and returns a list of lists of sentences. If you want to correct an arbitrary text, pass a sentence_splitter at initialization. " ) # returns: 'He wants you to send him an email.'Ĭorrect_sentence expects a single sentence as input. correct_sentence ( " He wants that you send him an email. The objects will be downloaded the first time, then cached. Using Intel(R) Core(TM) i5-8600K CPU 3.60GHz Usageįrom nlprule import Tokenizer, Rules tokenizer = Tokenizer. load ( " en ", tokenizer, SplitOn ( ) ) In : % timeit rules. In : from nlprule import Tokenizer, Rules, SplitOn. NLPRule currently supports English and German. My goal with this library was creating a fast, lightweight engine to run natural language rules without having to rely on the JVM (and its speed / memory implications) and without all the extra stuff LanguageTool does such as spellchecking, n-gram based error detection, etc. # 4 16 WAS_BEEN.1 Did you mean was not or has not been? suggest ( " She was not been here since Monday. " ) # returns: 'Thanks for yours and Lucy’s help.' correct ( " Thanks for your’s and Lucy’s help. " ) # returns: 'He wants you to send him an email.' load ( " en ", tokenizer, SplitOn ( ) ) rules. from nlprule import Tokenizer, Rules, SplitOn tokenizer = Tokenizer. NLPRule is a library for rule-based grammatical error correction written in pure Rust with bindings for Python.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |