Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Darkly)
  • No Skin
Collapse
Brand Logo
  1. Home
  2. Uncategorized
  3. Is anyone good with #Rstats and #regex ?

Is anyone good with #Rstats and #regex ?

Scheduled Pinned Locked Moved Uncategorized
rstatsregexgeany
11 Posts 6 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Proto Himbo EuropeanG This user is from outside of this forum
    Proto Himbo EuropeanG This user is from outside of this forum
    Proto Himbo European
    wrote last edited by
    #1

    Is anyone good with #Rstats and #regex ? I'm having issues.

    strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

    I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

    sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

    I get this as a result:

    [1] "50"
    [2] "70"
    [3] NA
    [4] "00"
    [5] "15"
    [6] "10"
    [7] "44"
    [8] "It is a hysterical idling. More vibraton than sound."
    [9] NA

    I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

    Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

    Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

    Ralph Straumann (@rastrau)R Eli Roberson (he/him)T Joris MeysJ Ken ButlerN Julius MäkinenJ 5 Replies Last reply
    0
    • Proto Himbo EuropeanG Proto Himbo European

      Is anyone good with #Rstats and #regex ? I'm having issues.

      strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

      I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

      sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

      I get this as a result:

      [1] "50"
      [2] "70"
      [3] NA
      [4] "00"
      [5] "15"
      [6] "10"
      [7] "44"
      [8] "It is a hysterical idling. More vibraton than sound."
      [9] NA

      I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

      Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

      Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

      Ralph Straumann (@rastrau)R This user is from outside of this forum
      Ralph Straumann (@rastrau)R This user is from outside of this forum
      Ralph Straumann (@rastrau)
      wrote last edited by
      #2

      @guyjantic I don’t know if that’s an option, but stringr::str_extract() could be interesting to achieve this (if you don’t mind the dependency)? https://stringr.tidyverse.org/reference/str_extract.html #rstats

      Proto Himbo EuropeanG 2 Replies Last reply
      0
      • Proto Himbo EuropeanG Proto Himbo European

        Is anyone good with #Rstats and #regex ? I'm having issues.

        strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

        I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

        sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

        I get this as a result:

        [1] "50"
        [2] "70"
        [3] NA
        [4] "00"
        [5] "15"
        [6] "10"
        [7] "44"
        [8] "It is a hysterical idling. More vibraton than sound."
        [9] NA

        I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

        Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

        Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

        Eli Roberson (he/him)T This user is from outside of this forum
        Eli Roberson (he/him)T This user is from outside of this forum
        Eli Roberson (he/him)
        wrote last edited by
        #3

        @guyjantic is it getting confused by .*<number stuff>.*?

        . Includes numbers.

        Maybe a something like ^[//s ]*(//d{2,5}).*$

        1 Reply Last reply
        0
        • Ralph Straumann (@rastrau)R Ralph Straumann (@rastrau)

          @guyjantic I don’t know if that’s an option, but stringr::str_extract() could be interesting to achieve this (if you don’t mind the dependency)? https://stringr.tidyverse.org/reference/str_extract.html #rstats

          Proto Himbo EuropeanG This user is from outside of this forum
          Proto Himbo EuropeanG This user is from outside of this forum
          Proto Himbo European
          wrote last edited by
          #4

          @rastrau I don't mind tidyverse dependencies at all. I have tried stringr::str_replace() and it gave me exactly the results of sub() and gsub(), but I haven't tried str_extract() yet. I'll give it a shot. Thanks.

          Ralph Straumann (@rastrau)R 1 Reply Last reply
          0
          • Proto Himbo EuropeanG Proto Himbo European

            Is anyone good with #Rstats and #regex ? I'm having issues.

            strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

            I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

            sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

            I get this as a result:

            [1] "50"
            [2] "70"
            [3] NA
            [4] "00"
            [5] "15"
            [6] "10"
            [7] "44"
            [8] "It is a hysterical idling. More vibraton than sound."
            [9] NA

            I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

            Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

            Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

            Joris MeysJ This user is from outside of this forum
            Joris MeysJ This user is from outside of this forum
            Joris Meys
            wrote last edited by
            #5

            @guyjantic
            You need to make that greedy. Might be as easy as

            sub("(^.*?)(\\d{2,5})(.*?$)", "\\2", strings)

            This makes the matches before and after "lazy", meaning they match as few as possible.

            Edit: I didn't test it due to on my phone now.

            1 Reply Last reply
            0
            • Proto Himbo EuropeanG Proto Himbo European

              @rastrau I don't mind tidyverse dependencies at all. I have tried stringr::str_replace() and it gave me exactly the results of sub() and gsub(), but I haven't tried str_extract() yet. I'll give it a shot. Thanks.

              Ralph Straumann (@rastrau)R This user is from outside of this forum
              Ralph Straumann (@rastrau)R This user is from outside of this forum
              Ralph Straumann (@rastrau)
              wrote last edited by
              #6

              @guyjantic Since the regex seems to swallow the first digit, i suspect “.” in the first group is too generous? Maybe \D (non-digit) would be better? But I’m not a regexpert (alas).

              Proto Himbo EuropeanG 1 Reply Last reply
              0
              • Ralph Straumann (@rastrau)R Ralph Straumann (@rastrau)

                @guyjantic I don’t know if that’s an option, but stringr::str_extract() could be interesting to achieve this (if you don’t mind the dependency)? https://stringr.tidyverse.org/reference/str_extract.html #rstats

                Proto Himbo EuropeanG This user is from outside of this forum
                Proto Himbo EuropeanG This user is from outside of this forum
                Proto Himbo European
                wrote last edited by
                #7

                @rastrau Hey, that works! Thanks a ton!

                Ralph Straumann (@rastrau)R 1 Reply Last reply
                0
                • Proto Himbo EuropeanG Proto Himbo European

                  @rastrau Hey, that works! Thanks a ton!

                  Ralph Straumann (@rastrau)R This user is from outside of this forum
                  Ralph Straumann (@rastrau)R This user is from outside of this forum
                  Ralph Straumann (@rastrau)
                  wrote last edited by
                  #8

                  @guyjantic 🥳 Yay! You’re most welcome.

                  1 Reply Last reply
                  0
                  • Ralph Straumann (@rastrau)R Ralph Straumann (@rastrau)

                    @guyjantic Since the regex seems to swallow the first digit, i suspect “.” in the first group is too generous? Maybe \D (non-digit) would be better? But I’m not a regexpert (alas).

                    Proto Himbo EuropeanG This user is from outside of this forum
                    Proto Himbo EuropeanG This user is from outside of this forum
                    Proto Himbo European
                    wrote last edited by
                    #9

                    @rastrau I suspect you're more of a regexpert than I am, and your explanation seems plausible.

                    1 Reply Last reply
                    0
                    • Proto Himbo EuropeanG Proto Himbo European

                      Is anyone good with #Rstats and #regex ? I'm having issues.

                      strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

                      I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

                      sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

                      I get this as a result:

                      [1] "50"
                      [2] "70"
                      [3] NA
                      [4] "00"
                      [5] "15"
                      [6] "10"
                      [7] "44"
                      [8] "It is a hysterical idling. More vibraton than sound."
                      [9] NA

                      I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

                      Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

                      Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

                      Ken ButlerN This user is from outside of this forum
                      Ken ButlerN This user is from outside of this forum
                      Ken Butler
                      wrote last edited by
                      #10

                      @guyjantic if you are happy with just the first numerical thing, parse_number() works really well.

                      1 Reply Last reply
                      0
                      • Proto Himbo EuropeanG Proto Himbo European

                        Is anyone good with #Rstats and #regex ? I'm having issues.

                        strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

                        I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

                        sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

                        I get this as a result:

                        [1] "50"
                        [2] "70"
                        [3] NA
                        [4] "00"
                        [5] "15"
                        [6] "10"
                        [7] "44"
                        [8] "It is a hysterical idling. More vibraton than sound."
                        [9] NA

                        I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

                        Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

                        Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

                        Julius MäkinenJ This user is from outside of this forum
                        Julius MäkinenJ This user is from outside of this forum
                        Julius Mäkinen
                        wrote last edited by
                        #11

                        @guyjantic Is "87" what you want from the fourth string?

                        If so, making it non greedy seems to work:
                        sub(".*?(\\d{2,5}).*", "\\1", strings)

                        1 Reply Last reply
                        1
                        0
                        • R ActivityRelay shared this topic
                        Reply
                        • Reply as topic
                        Log in to reply
                        • Oldest to Newest
                        • Newest to Oldest
                        • Most Votes


                        • Login

                        • Don't have an account? Register

                        • Login or register to search.
                        Powered by NodeBB Contributors
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • World
                        • Users
                        • Groups