{"id":260,"date":"2024-04-10T15:45:38","date_gmt":"2024-04-10T15:45:38","guid":{"rendered":"https:\/\/citestu16.savecicadabuzz.org\/?page_id=260"},"modified":"2024-04-23T17:04:10","modified_gmt":"2024-04-23T17:04:10","slug":"call","status":"publish","type":"page","link":"https:\/\/citestu16.savecicadabuzz.org\/index.php\/call\/","title":{"rendered":"CALL"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-page\" data-elementor-id=\"260\" class=\"elementor elementor-260\">\n\t\t\t\t<div class=\"elementor-element elementor-element-8bdc34e e-flex e-con-boxed e-con e-parent\" data-id=\"8bdc34e\" data-element_type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-f977bca elementor-widget elementor-widget-heading\" data-id=\"f977bca\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t<style>\/*! elementor - v3.21.0 - 18-04-2024 *\/\n.elementor-heading-title{padding:0;margin:0;line-height:1}.elementor-widget-heading .elementor-heading-title[class*=elementor-size-]>a{color:inherit;font-size:inherit;line-height:inherit}.elementor-widget-heading .elementor-heading-title.elementor-size-small{font-size:15px}.elementor-widget-heading .elementor-heading-title.elementor-size-medium{font-size:19px}.elementor-widget-heading .elementor-heading-title.elementor-size-large{font-size:29px}.elementor-widget-heading .elementor-heading-title.elementor-size-xl{font-size:39px}.elementor-widget-heading .elementor-heading-title.elementor-size-xxl{font-size:59px}<\/style><h2 class=\"elementor-heading-title elementor-size-default\">Is White Space Tokenization enough?<\/h2>\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-d6033d1 e-flex e-con-boxed e-con e-parent\" data-id=\"d6033d1\" data-element_type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-9ea70e3 elementor-widget elementor-widget-heading\" data-id=\"9ea70e3\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Example Sentences Used<\/h2>\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6a79a04 elementor-widget elementor-widget-text-editor\" data-id=\"6a79a04\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t<style>\/*! elementor - v3.21.0 - 18-04-2024 *\/\n.elementor-widget-text-editor.elementor-drop-cap-view-stacked .elementor-drop-cap{background-color:#69727d;color:#fff}.elementor-widget-text-editor.elementor-drop-cap-view-framed .elementor-drop-cap{color:#69727d;border:3px solid;background-color:transparent}.elementor-widget-text-editor:not(.elementor-drop-cap-view-default) .elementor-drop-cap{margin-top:8px}.elementor-widget-text-editor:not(.elementor-drop-cap-view-default) .elementor-drop-cap-letter{width:1em;height:1em}.elementor-widget-text-editor .elementor-drop-cap{float:left;text-align:center;line-height:1;font-size:50px}.elementor-widget-text-editor .elementor-drop-cap-letter{display:inline-block}<\/style>\t\t\t\t<ol style=\"margin: 4px 0px; padding-inline-start: 1.75rem; font-family: 'Google Sans', 'Helvetica Neue', sans-serif; font-size: 16px; font-style: normal; font-weight: 400;\" data-sourcepos=\"1:1-10:69\"><li data-sourcepos=\"1:1-1:73\">I can&#8217;t believe we finished the long-distance race in under an hour!<\/li><li data-sourcepos=\"9:1-9:85\">Let&#8217;s brainstorm some birthday party ideas for our friend&#8217;s upcoming celebration.<\/li><li data-sourcepos=\"3:1-3:102\">Sarah accidentally left her homework at home, so she&#8217;ll need to ask the teacher for an extension.<\/li><li data-sourcepos=\"6:1-6:73\">Don&#8217;t forget to water the houseplants; they need moisture to survive.<\/li><li data-sourcepos=\"6:1-6:73\"><span style=\"white-space-collapse: preserve;\">Despite the malfunctioning GPS sending us miles off course, we&#8217;ll never forget the breathtaking mountain scenery we stumbled upon!<\/span><\/li><li data-sourcepos=\"6:1-6:73\"><span style=\"white-space-collapse: preserve;\">She couldn&#8217;t resist the temptation of grabbing a double-scoop ice cream cone before heading to the beach.<br \/><\/span><\/li><\/ol>\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-de53a05 e-flex e-con-boxed e-con e-parent\" data-id=\"de53a05\" data-element_type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t<div class=\"elementor-element elementor-element-8206919 e-con-full e-flex e-con e-child\" data-id=\"8206919\" data-element_type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-8d9132f elementor-widget elementor-widget-heading\" data-id=\"8d9132f\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">TreeBankWord Tokenizer<\/h2>\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-eb11736 e-con-full e-flex e-con e-child\" data-id=\"eb11736\" data-element_type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-d66c072 elementor-widget elementor-widget-text-editor\" data-id=\"d66c072\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<p>Had trouble deciphering the word can&#8217;t: (ca)(n&#8217;t). Treated all punctuation like a separate entity.<\/p>\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-a7a23de e-flex e-con-boxed e-con e-parent\" data-id=\"a7a23de\" data-element_type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t<div class=\"elementor-element elementor-element-126687e e-con-full e-flex e-con e-child\" data-id=\"126687e\" data-element_type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-80bf5a6 elementor-widget elementor-widget-heading\" data-id=\"80bf5a6\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">WordPunct Tokenizer<\/h2>\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-7d7c7bd e-con-full e-flex e-con e-child\" data-id=\"7d7c7bd\" data-element_type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-4365d77 elementor-widget elementor-widget-text-editor\" data-id=\"4365d77\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<p>Separated words that are connected by hyphens. Handled contraction by splitting them: (can)(&#8216;)(t). Also handled punctuation as a separate entitiy.<\/p>\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-9eac495 e-flex e-con-boxed e-con e-parent\" data-id=\"9eac495\" data-element_type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t<div class=\"elementor-element elementor-element-a4cf9e7 e-con-full e-flex e-con e-child\" data-id=\"a4cf9e7\" data-element_type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-bf19fb0 elementor-widget elementor-widget-heading\" data-id=\"bf19fb0\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">PunkWord Tokenizer<\/h2>\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-2c4d166 e-con-full e-flex e-con e-child\" data-id=\"2c4d166\" data-element_type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-f69238c elementor-widget elementor-widget-text-editor\" data-id=\"f69238c\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<p>Separated contraction by seperating the two parts: (can)(&#8216;t). Handled punctuation as a separate entity.<\/p>\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-1b897cf e-flex e-con-boxed e-con e-parent\" data-id=\"1b897cf\" data-element_type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t<div class=\"elementor-element elementor-element-94c9570 e-con-full e-flex e-con e-child\" data-id=\"94c9570\" data-element_type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-3fb2e56 elementor-widget elementor-widget-heading\" data-id=\"3fb2e56\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">White Space Tokenizer<\/h2>\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-b9dfd62 e-con-full e-flex e-con e-child\" data-id=\"b9dfd62\" data-element_type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-33ba656 elementor-widget elementor-widget-text-editor\" data-id=\"33ba656\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<p>Separated all the words at the given spaces in\u00a0<span style=\"font-style: inherit; font-weight: inherit; font-family: var( --e-global-typography-text-font-family ), Sans-serif; text-align: var(--text-align); background-color: var(--ast-global-color-5);\">between the words. Each word contained any punctuation attached to it.<\/span><\/p>\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-d92ece2 e-flex e-con-boxed e-con e-parent\" data-id=\"d92ece2\" data-element_type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t<div class=\"elementor-element elementor-element-d21708b e-con-full e-flex e-con e-child\" data-id=\"d21708b\" data-element_type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-8165871 elementor-widget elementor-widget-heading\" data-id=\"8165871\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Pattern Tokenizer<\/h2>\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-b4fcb1c e-con-full e-flex e-con e-child\" data-id=\"b4fcb1c\" data-element_type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-981ea75 elementor-widget elementor-widget-text-editor\" data-id=\"981ea75\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<p>Split select contractions as: (ca)(n&#8217;t) or (do)(n&#8217;t) while others are split as: (we)(&#8216;ll). Punctuation was treated as a separate entity.<\/p>\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-96702c7 e-flex e-con-boxed e-con e-parent\" data-id=\"96702c7\" data-element_type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t<div class=\"elementor-element elementor-element-4650e89 e-con-full e-flex e-con e-child\" data-id=\"4650e89\" data-element_type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-c32608d elementor-widget elementor-widget-text-editor\" data-id=\"c32608d\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<p>For the small sample size used, white space tokenization may be enough, but when words are only separated at the white space, issues can arise depending on whether compound words are separate words or hyphenated. In the case of &#8220;long-distance,&#8221; the white space tokenizer kept it as one token, which makes sense, but for &#8220;ice cream,&#8221; the white space tokenizer split the compound word into two tokens due to the space, and splitting it may lose the meaning of the phrase because &#8220;ice&#8221; and &#8220;cream&#8221; mean different things separately.<\/p>\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-e5eada6 e-con-full e-flex e-con e-child\" data-id=\"e5eada6\" data-element_type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-544b24c elementor-widget__width-initial elementor-widget elementor-widget-heading\" data-id=\"544b24c\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Are White Spaces Sufficient?<\/h2>\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Is White Space Tokenization enough? Example Sentences Used I can&#8217;t believe we finished the long-distance race in under an hour! [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"site-sidebar-layout":"no-sidebar","site-content-layout":"","ast-site-content-layout":"full-width-container","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"disabled","ast-breadcrumbs-content":"","ast-featured-img":"disabled","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"footnotes":""},"class_list":["post-260","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/citestu16.savecicadabuzz.org\/index.php\/wp-json\/wp\/v2\/pages\/260","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/citestu16.savecicadabuzz.org\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/citestu16.savecicadabuzz.org\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/citestu16.savecicadabuzz.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/citestu16.savecicadabuzz.org\/index.php\/wp-json\/wp\/v2\/comments?post=260"}],"version-history":[{"count":23,"href":"https:\/\/citestu16.savecicadabuzz.org\/index.php\/wp-json\/wp\/v2\/pages\/260\/revisions"}],"predecessor-version":[{"id":435,"href":"https:\/\/citestu16.savecicadabuzz.org\/index.php\/wp-json\/wp\/v2\/pages\/260\/revisions\/435"}],"wp:attachment":[{"href":"https:\/\/citestu16.savecicadabuzz.org\/index.php\/wp-json\/wp\/v2\/media?parent=260"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}