Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Darkly)
  • No Skin
Collapse
Brand Logo
  1. Home
  2. Uncategorized
  3. Is anyone good with #Rstats and #regex ?

Is anyone good with #Rstats and #regex ?

Scheduled Pinned Locked Moved Uncategorized
rstatsregexgeany
11 Posts 6 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Proto Himbo EuropeanG Proto Himbo European

    Is anyone good with #Rstats and #regex ? I'm having issues.

    strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

    I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

    sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

    I get this as a result:

    [1] "50"
    [2] "70"
    [3] NA
    [4] "00"
    [5] "15"
    [6] "10"
    [7] "44"
    [8] "It is a hysterical idling. More vibraton than sound."
    [9] NA

    I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

    Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

    Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

    Ralph Straumann (@rastrau)R This user is from outside of this forum
    Ralph Straumann (@rastrau)R This user is from outside of this forum
    Ralph Straumann (@rastrau)
    wrote last edited by
    #2

    @guyjantic I don’t know if that’s an option, but stringr::str_extract() could be interesting to achieve this (if you don’t mind the dependency)? https://stringr.tidyverse.org/reference/str_extract.html #rstats

    Proto Himbo EuropeanG 2 Replies Last reply
    0
    • Proto Himbo EuropeanG Proto Himbo European

      Is anyone good with #Rstats and #regex ? I'm having issues.

      strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

      I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

      sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

      I get this as a result:

      [1] "50"
      [2] "70"
      [3] NA
      [4] "00"
      [5] "15"
      [6] "10"
      [7] "44"
      [8] "It is a hysterical idling. More vibraton than sound."
      [9] NA

      I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

      Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

      Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

      Eli Roberson (he/him)T This user is from outside of this forum
      Eli Roberson (he/him)T This user is from outside of this forum
      Eli Roberson (he/him)
      wrote last edited by
      #3

      @guyjantic is it getting confused by .*<number stuff>.*?

      . Includes numbers.

      Maybe a something like ^[//s ]*(//d{2,5}).*$

      1 Reply Last reply
      0
      • Ralph Straumann (@rastrau)R Ralph Straumann (@rastrau)

        @guyjantic I don’t know if that’s an option, but stringr::str_extract() could be interesting to achieve this (if you don’t mind the dependency)? https://stringr.tidyverse.org/reference/str_extract.html #rstats

        Proto Himbo EuropeanG This user is from outside of this forum
        Proto Himbo EuropeanG This user is from outside of this forum
        Proto Himbo European
        wrote last edited by
        #4

        @rastrau I don't mind tidyverse dependencies at all. I have tried stringr::str_replace() and it gave me exactly the results of sub() and gsub(), but I haven't tried str_extract() yet. I'll give it a shot. Thanks.

        Ralph Straumann (@rastrau)R 1 Reply Last reply
        0
        • Proto Himbo EuropeanG Proto Himbo European

          Is anyone good with #Rstats and #regex ? I'm having issues.

          strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

          I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

          sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

          I get this as a result:

          [1] "50"
          [2] "70"
          [3] NA
          [4] "00"
          [5] "15"
          [6] "10"
          [7] "44"
          [8] "It is a hysterical idling. More vibraton than sound."
          [9] NA

          I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

          Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

          Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

          Joris MeysJ This user is from outside of this forum
          Joris MeysJ This user is from outside of this forum
          Joris Meys
          wrote last edited by
          #5

          @guyjantic
          You need to make that greedy. Might be as easy as

          sub("(^.*?)(\\d{2,5})(.*?$)", "\\2", strings)

          This makes the matches before and after "lazy", meaning they match as few as possible.

          Edit: I didn't test it due to on my phone now.

          1 Reply Last reply
          0
          • Proto Himbo EuropeanG Proto Himbo European

            @rastrau I don't mind tidyverse dependencies at all. I have tried stringr::str_replace() and it gave me exactly the results of sub() and gsub(), but I haven't tried str_extract() yet. I'll give it a shot. Thanks.

            Ralph Straumann (@rastrau)R This user is from outside of this forum
            Ralph Straumann (@rastrau)R This user is from outside of this forum
            Ralph Straumann (@rastrau)
            wrote last edited by
            #6

            @guyjantic Since the regex seems to swallow the first digit, i suspect “.” in the first group is too generous? Maybe \D (non-digit) would be better? But I’m not a regexpert (alas).

            Proto Himbo EuropeanG 1 Reply Last reply
            0
            • Ralph Straumann (@rastrau)R Ralph Straumann (@rastrau)

              @guyjantic I don’t know if that’s an option, but stringr::str_extract() could be interesting to achieve this (if you don’t mind the dependency)? https://stringr.tidyverse.org/reference/str_extract.html #rstats

              Proto Himbo EuropeanG This user is from outside of this forum
              Proto Himbo EuropeanG This user is from outside of this forum
              Proto Himbo European
              wrote last edited by
              #7

              @rastrau Hey, that works! Thanks a ton!

              Ralph Straumann (@rastrau)R 1 Reply Last reply
              0
              • Proto Himbo EuropeanG Proto Himbo European

                @rastrau Hey, that works! Thanks a ton!

                Ralph Straumann (@rastrau)R This user is from outside of this forum
                Ralph Straumann (@rastrau)R This user is from outside of this forum
                Ralph Straumann (@rastrau)
                wrote last edited by
                #8

                @guyjantic 🥳 Yay! You’re most welcome.

                1 Reply Last reply
                0
                • Ralph Straumann (@rastrau)R Ralph Straumann (@rastrau)

                  @guyjantic Since the regex seems to swallow the first digit, i suspect “.” in the first group is too generous? Maybe \D (non-digit) would be better? But I’m not a regexpert (alas).

                  Proto Himbo EuropeanG This user is from outside of this forum
                  Proto Himbo EuropeanG This user is from outside of this forum
                  Proto Himbo European
                  wrote last edited by
                  #9

                  @rastrau I suspect you're more of a regexpert than I am, and your explanation seems plausible.

                  1 Reply Last reply
                  0
                  • Proto Himbo EuropeanG Proto Himbo European

                    Is anyone good with #Rstats and #regex ? I'm having issues.

                    strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

                    I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

                    sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

                    I get this as a result:

                    [1] "50"
                    [2] "70"
                    [3] NA
                    [4] "00"
                    [5] "15"
                    [6] "10"
                    [7] "44"
                    [8] "It is a hysterical idling. More vibraton than sound."
                    [9] NA

                    I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

                    Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

                    Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

                    Ken ButlerN This user is from outside of this forum
                    Ken ButlerN This user is from outside of this forum
                    Ken Butler
                    wrote last edited by
                    #10

                    @guyjantic if you are happy with just the first numerical thing, parse_number() works really well.

                    1 Reply Last reply
                    0
                    • Proto Himbo EuropeanG Proto Himbo European

                      Is anyone good with #Rstats and #regex ? I'm having issues.

                      strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

                      I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

                      sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

                      I get this as a result:

                      [1] "50"
                      [2] "70"
                      [3] NA
                      [4] "00"
                      [5] "15"
                      [6] "10"
                      [7] "44"
                      [8] "It is a hysterical idling. More vibraton than sound."
                      [9] NA

                      I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

                      Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

                      Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

                      Julius MäkinenJ This user is from outside of this forum
                      Julius MäkinenJ This user is from outside of this forum
                      Julius Mäkinen
                      wrote last edited by
                      #11

                      @guyjantic Is "87" what you want from the fourth string?

                      If so, making it non greedy seems to work:
                      sub(".*?(\\d{2,5}).*", "\\1", strings)

                      1 Reply Last reply
                      1
                      0
                      • R ActivityRelay shared this topic
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      Powered by NodeBB Contributors
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups