Cryptocurrency analysis and predictions using AI and big data

Discussion in 'CryptoCurrency' started by healzer, Jan 3, 2018.

  1. Topiano

    Topiano Jr. VIP Jr. VIP

    Joined:
    Dec 3, 2015
    Messages:
    1,630
    Likes Received:
    628
    Gender:
    Male
    Home Page:
    I was almost answering this post before @healzer gave a good response to that --- I'm a data science student and I worked on a project like this recently though with R programming rather than Python Op using to categorize sentiments.

    Coupled words used in making the tweets , posts etc would help categorize if positive/negative/neutral sentiments from each of the scraped/gathered data.

    I'm getting so much hooked with your analysis brother - I believe I could learn much more from it ..
    Goodluck.
     
    • Thanks Thanks x 2
  2. iliketurtlez

    iliketurtlez Regular Member

    Joined:
    Jul 2, 2016
    Messages:
    283
    Likes Received:
    100
    Gender:
    Male
    Occupation:
    SEO lurker
    Location:
    where hatz are black
    Awesome man, would try to do something like this myself if i had the coding skills for it. All the best, i hope u succeed! Will follow for sure
     
  3. theRevolt

    theRevolt Jr. VIP Jr. VIP

    Joined:
    Jul 29, 2009
    Messages:
    1,753
    Likes Received:
    629
    Occupation:
    Click below to find out
    Location:
    CPA Money
    Home Page:
    Watson does sentiment analysis (as do a few others) , so you don't waste valuable time on something that already has a pretty solid solution.
     
  4. theRevolt

    theRevolt Jr. VIP Jr. VIP

    Joined:
    Jul 29, 2009
    Messages:
    1,753
    Likes Received:
    629
    Occupation:
    Click below to find out
    Location:
    CPA Money
    Home Page:
    There are different platforms, one being IBM Watson (or Google NLP) which you just feed the content and you will get back a sentiment score which tells you whether the content is leaving towards positive, negative or neutral.
     
    • Thanks Thanks x 1
  5. healzer

    healzer Elite Member

    Joined:
    Jun 26, 2011
    Messages:
    2,896
    Likes Received:
    2,875
    Gender:
    Male
    Location:
    Somewhere in Europe
    Home Page:
    Thanks for the input, I will definitely look into these.
    I did not reinvent the wheel in my current solution, I used existent libraries and material as well, just different ones.
    I can however say that if Google NLP and IBM Watson are API based then they are probably not a good fit for me.
    I analyze my data in real-time, so I don't store it in a database first and then process it; I process it first and then make a selection of interesting mentions to store. Making API calls would be a big bottleneck in my system.
     
  6. SamLewis

    SamLewis Power Member

    Joined:
    Oct 25, 2012
    Messages:
    669
    Likes Received:
    203
    I think you need to give more data. You have about 20 points and 3 or so match the trend and you think it can predict something???
     
  7. Ensili

    Ensili Jr. VIP Jr. VIP

    Joined:
    Dec 19, 2012
    Messages:
    157
    Likes Received:
    105
    Great read, definitely subbed and looking forward to what you can come up with.

    One of those threads where the road ahead is more important than actually reaching a set goal (whatever that may be for you).

    I'd like to add a few of my observations doing simple market analysis and because I'd like to understand your reasoning for choosing bitcoin as a representation of the market (next to the obvious ones of course). Reason why is that I experienced predicting price increases (or decreases) often also depend on the stage of a coin. I noticed that is where the volatility of the market often comes from. Not so much from its daily movements (due to selling, buying), gradual increase in interest, often not even due to whales dumping and pumping but because a coin has reached a new stage.

    For example, ripple got recently introduced to a wider public via major media (internet) coverage. You can see the outcome by doing a simple google search. "Ripple" now triggers different search results with major news websites ranking top compared to, for example, raiblocks.
    If a coin has reached that stage you can expect the wider public to invest, therefore a major price increase occurs (in case of ripple beyond its expected market cap limit). Also only a very few coins have reached that level of societal attention yet.
    Once that happens the results of you customer group analysis may widen by quite a bit. I am sure quite a few people have invested in bitcoin now who are not on social media (active or own a profile), aren't online daily (investing via friends, family), frequently watch TV, etc.

    There may be a few more stages, e.g. "coin getting out of ICO", "being introduced to the 2-3 biggest exchanges", "major partnerships introduced", "tv media coverage outside of internet media". Some of them are bigger, some of them are smaller but they are often followed by larger movements in price.
    Also depending on its current stage, price movements can be weight differently. Coins getting out of ICO see 1000% price increases more frequently than coins already at later stages.

    Bitcoin is in the furthest stage out of all cryptos right now and only accounts for about 35% of the total market. Sure, its market share might increase heavily again but I am talking about status quo. Hence to me it doesn't respresent the overall crypto market (anymore) but rather the furthest "attention stage" the market can currently get to.

    Have you thought about implementing said stages to some degree to reflect the volatility of the whole market? Otherwise I feel like your analysis won't be able to properly predict the current market with all its different (alt-)coins in their different stages.
    Or are said stages already accounted for due to your measurement of hype spikes (right Y-axis)? In that case do you think using these spikes you can predict future occurence of events that lead to such stages?

    Btw, I am not saying I am in any way 100% sure of this. I am not at all that is why I'd like your opinion on it because to me cryptos currently behave differently than fiat currency where society already knows about them so other factors influence their movements instead.

    I hope I was able to get my point across.


    Kind regards

    Ensili
     
    • Thanks Thanks x 2
  8. iliketurtlez

    iliketurtlez Regular Member

    Joined:
    Jul 2, 2016
    Messages:
    283
    Likes Received:
    100
    Gender:
    Male
    Occupation:
    SEO lurker
    Location:
    where hatz are black
    @Ensili , awesome post man, so much interesting information! I think its analyzing and building advanced models like this that could really give a super edge in predicting the crypto market.
     
    • Thanks Thanks x 1
  9. ttmschine

    ttmschine Power Member

    Joined:
    Mar 27, 2013
    Messages:
    625
    Likes Received:
    348
    "There definitely seems to be a correlation between bitcoin price and hype."

    "It's not a bubble"

    o_O
     
    • Thanks Thanks x 1
  10. theRevolt

    theRevolt Jr. VIP Jr. VIP

    Joined:
    Jul 29, 2009
    Messages:
    1,753
    Likes Received:
    629
    Occupation:
    Click below to find out
    Location:
    CPA Money
    Home Page:
    Yes they are rest API based. Pretty quick and I compared it to other Python libraries and results were much better.

    Tip, sign up to Google cloud with $300 credit to use for testing the NLP API for free for now
     
  11. makinon

    makinon Newbie

    Joined:
    Jan 2, 2018
    Messages:
    21
    Likes Received:
    0
    Gender:
    Male
    Occupation:
    CellPhone Repair, Computer Technician
    Location:
    Blue House
    i keep an eye on this one great analysis your a great man
     
  12. healzer

    healzer Elite Member

    Joined:
    Jun 26, 2011
    Messages:
    2,896
    Likes Received:
    2,875
    Gender:
    Male
    Location:
    Somewhere in Europe
    Home Page:
    Thanks everyone for the amazing comments!!! Love you all :)

    @neu009
    I will definitely check out NLP google and IBM Watson in the near future.
    I would love to see how their results compare to my algorithm, so stay tuned for that. :)

    @Ensili
    I loved your post and suggestion!! :D
    Since I started this thread quite a lot of people have Skype'd/emailed me and most referred to your post.
    Each coin has its own "stage" and it's pretty important to categorize and detect its stage.
    To clarify this a bit more, and maybe you can confirm my understanding:
    e.g.: NEO coins are not talked about on major news channels such as CNN/BBC/CNBC/... but Bitcoin is, and ETH sometimes. So from these stats we can already categorize which coins are at a high-stage and which are early-stage.

    =====
    ===== Friday 29 - Saturday 31, 2017
    =====

    It took several hours to convert my old system into a new one but it got done.
    Meanwhile, my scrapers were gathering social mentions in the background and by Sunday evening I already had three days' worth of data.
    (Yes, I worked on new years' eve).

    You may remember this graph:
    [​IMG]

    The gray dots are the hype (on this picture the dotted line was hidden).
    If you care to know, the black-rectangle area from my first post was at around "Dec 30 10:00".

    But here is what I noticed: it's not an intuitive way of viewing this data. :(
    There is too much going on and "seeing" any relationship between "social hype" and BTC price isn't easy...
    A solution is to draw a trend line: I basically take 3 sequential data-points, create a trend line for these, and repeat this until the end of the data. Finally I just combine all trend lines into one big line.

    The graph below has dates from Dec 29 to Dec 31, 2017:
    [​IMG]

    As you can see, the blue is the trend-line for social hype.
    The black one is the trend for the avg price.
    I have drawn red shapes (in Paint) to indicate interesting zones.
    *) remember that both lines are independent, meaning, if they cross (are above or below) at a certain time interval then it means nothing. This is because they both correspond to different Y-axes.
    **) The x-axis is a 30 minutes interval.

    The most fascinating part is that there are many zones (especially peaks ; local maxima) which look the same for BTC's price as for the Trend hype.
    Let me briefly go over some of them: (remember that PRICE = BLACK ; HYPE = BLUE)

    -------------
    [​IMG]
    Here it looks like the short spike in BTC's price caused a spike in hype (but it came 60mins later).
    Looks like the people responded to this short increase in price.

    -------------
    [​IMG]
    The lowest left circle is a very sharp drop in hype and price.
    If I remember correctly the drop in hype was because my scraping script crashed or maybe I took it down on purpose to edit (can't remember).
    But the drop in price was real. And this drop in price caused massive hype over the next few hours as you can see.
    Once again, big hype came after (massive) drop in price.

    -------------
    [​IMG]
    Here is a similar situation of the first one: first the price goes up, then half an hour later there is a peak in hype.

    -------------
    [​IMG]
    This one on the other hand is different from the three previous ones. Can you see why?
    It's pretty obvious, here the hype rises quicker and reaches a peak prior to price's peak. Could it be that people started buying more Bitcoins due to various mentions on social media (from friends, experts, etc...) and then shortly after the price went up? If this is true then by analyzing the current hype we "could" predict a rise in price after 15-60 minutes.

    We are getting a step closer to confirming that "there is a relationship between social media and BTC's price".
    The next big question is: can we measure media's impact on the price? if so then we can also make near-future predictions.

    =====
    ===== Friday 29, 2017 to Monday 1st, 2018
    =====

    Early on I also added a new scraper.
    Instead of scraping social media channels, I added 20+ big news sites (including some crypto-related ones) to analyze them in real-time.
    This was a bit harder than analyzing social media since there is no official API of course.

    What I soon realized is that news sites such as CNN, CNBC, etc... post much much less frequently than we get mentions from social media (thousands per minute that is).
    It took some time to get enough data from these, so the graph contains data from the past 24 hours (since Jan 4 2018 , 11am GMT+1):

    [​IMG]
    *) the date and time on the graph is local to EST timezone.

    You can see that at some time intervals there are similar peaks in the news and price of BTC.
    But there are far less similarities compared to hype from social channels.

    That's it for today folks :)

    [​IMG]
     
    • Thanks Thanks x 5
  13. Shadexpwn

    Shadexpwn Jr. VIP Jr. VIP

    Joined:
    Sep 12, 2010
    Messages:
    1,727
    Likes Received:
    471
    Gender:
    Male
    Occupation:
    Profile Visitor
    Location:
    In the Shade
    Based on my predictions, cryptocurrency will be brought into the real world allowing actions at retail outlets to connect with virtual signals.

    Kiosks and other teller machines made should open interaction across distant and domestic regions together which will form new existence hosted by digital infrastructure booming micro-transactions.

    No matter if something is digital or online, hybrid styles of development & living outperform expectations to broadened life and broken limits.
     
  14. healzer

    healzer Elite Member

    Joined:
    Jun 26, 2011
    Messages:
    2,896
    Likes Received:
    2,875
    Gender:
    Male
    Location:
    Somewhere in Europe
    Home Page:
    I am not really following, could you explain what you mean by "connect with virtual signals." What kind of signals are you referring to?
     
  15. Shadexpwn

    Shadexpwn Jr. VIP Jr. VIP

    Joined:
    Sep 12, 2010
    Messages:
    1,727
    Likes Received:
    471
    Gender:
    Male
    Occupation:
    Profile Visitor
    Location:
    In the Shade
    QR codes, messages and other transaction notes left for people using the currency. Thus referencing "signals" virtually.
     
    • Thanks Thanks x 1
  16. Ecodor

    Ecodor Regular Member

    Joined:
    Nov 5, 2017
    Messages:
    273
    Likes Received:
    77
    Gender:
    Male
    Location:
    localhost
    Damn i was going to ask the same damn thing! OP Please share it with me too if you decide to tell him.

    EDIT: What library are you using for Python to scrape ?
     
  17. healzer

    healzer Elite Member

    Joined:
    Jun 26, 2011
    Messages:
    2,896
    Likes Received:
    2,875
    Gender:
    Male
    Location:
    Somewhere in Europe
    Home Page:
    There exist quite a few good open-source libraries for scraping news sites, blogs, etc... so you can do a quick google search and pick one you like.

    The social signals at this point come solely from Reddit and Twitter -- the reason is that these platforms have a streaming-API.
    With streaming I just have to "listen" to all new incoming mentions, as opposed to polling myself every X seconds.

    Hope this helps :)
     
    • Thanks Thanks x 1
  18. healzer

    healzer Elite Member

    Joined:
    Jun 26, 2011
    Messages:
    2,896
    Likes Received:
    2,875
    Gender:
    Male
    Location:
    Somewhere in Europe
    Home Page:
    Short update regarding previous post ( https://www.blackhatworld.com/seo/c...g-ai-and-big-data.998766/page-2#post-10705203 )

    Past hour I was looking for ways to visualize the data more meaningfully over "longer" time periods.
    I was mainly messing around with my trendline function.
    Until now I created trend lines by creating sub-trendlines from just 3 subsequent data points.
    I added the ability to change "3" into any number (e.g. 10) and this is what I saw next...

    Do you remember this (https://i.imgur.com/34JpOqk.png) graph?
    This is its equivalent but with a 10-point trendline:
    [​IMG]
    There are a few interesting points on this graph.
    As shown on the label above, we "could" make a theoretical conclusion that goes like: "The price of BTC was falling hour by hour, people started to panic more and more, until it went up (just briefly) and the hype went down. But then it started to fall again (after 04:00) and panic went up again".

    [​IMG]
    Above: "As the price kept falling and falling, it reached a peak and then went up quickly again at 14:00 -- the panic also dropped sharply during that hour. Then The price started going up slowly and panic slowly decreased as well."

    [​IMG]
    "After several hours, the price went up quickly and so did the hype. Then the price kept going up but much slower, and so the hype decreased".

    I find it very fascinating that some peaks (not all) have a story behind them. :)
     
    • Thanks Thanks x 2
  19. elavmunretea

    elavmunretea Jr. VIP Jr. VIP

    Joined:
    May 14, 2016
    Messages:
    2,212
    Likes Received:
    2,973
    Occupation:
    bUsY nOt TiPpInG pEoPlE
    Home Page:
    This is probably one of the most interesting BHW topics I have read in a while and, as someone who has always been good at maths, I absolutely love statistical analysis.

    Funnily enough, I was actually having a conversation with someone today about how interesting machine learning is and how you could make an obscene amount of money if you crack the secret to Crypto trends.

    Some things to think about:

    Thought #1 - Are Trends Good/ Bad?

    One of the biggest issues I see with analyzing social media trends is understanding what is positive and what is negative. Solume is interesting, but not very useful at the moment. Firstly, they are only working with Reddit info. Coin specific subs are practically cults and you rarely get negative posts in a Coin's sub about it, but often negative "look at how bad XYZ is. har har everyone buy [/r/COIN NAME]" so looking at the social sentiment (Shame they don't explain what that actually means on the site) is not very useful.

    Anyway, coming back to the larger issue of understanding what is +/- in general, it is very hard because people on Social Media aren't writing essays, so there's not enough text to accurately predict what they are saying. Usually, it goes [High Follow Count Posts] > [Smaller accounts reply & discuss in comments] > [ Smaller accounts start to post their own posts outside of the original thread] > [Even smaller accounts reply] and as you get deeper into threaded-replies, they are using less and less "trigger words" ("bad" "good" etc) and more likely arguing with someone of a varying opinion. So once you clock that account [A] posted with the words [bad] & [BTC] do you then assume all their replies on that thread are them saying BTC is bad, or do you ignore them? What if they post a picture/ "meme"?

    Another issue with this is that Twitter is probably the last place news breaks. It first happens in a small Telegram/whatever group with xx massive whales who then trickle that information through their various groups/ boards/ sites & then the mainstream Twitter audience picks it up. If you want to get ahead of the game, you need to be in every Telegram, Discord etc group, but that's not feasible unless you're Google's ML team who are experts at this stuff.

    An idea for you to test: Gather a list (as large as you can) of SM accounts that are big into Crypto. Plot the same graph with just their trends and see if that is a better representation of the price increase. Refine the list until you have a pool of accounts that can be used to predict the price as best as possible.

    YouTube is also massive. I'm subbed to a few channels that post coin reviews and, when they post certain ones, you almost always see a 10-40% price increase within seconds. Not because they pump/ overly promote the coins, but because people respect their opinion so much and just buy buy buy before doing their own research.

    Thought #2 - Using 1 Exchange?

    One of the first things that I didn't like about your experiment was the fact that you are only using a single exchange. Not only does exchange activity vary massively depending on time (Korean exchanges are more active during Korean hours etc) but the price also varies massively too. You often notice a pump coming if it starts on a specific exchange and then ripples across onto the others.

    An easy way to do this would just be to do the same thing across more exchanges. Have the option to show a specific exchange or gather averages (Weight the averages based on the exchange's volume, because you don't want an exchange with 0 volume messing the chart)

    Thought #3 - Volume

    You should not only look at how the price increases, but also how the volume of trading increases, as well as looking at what % are sellers and what % are buyers (Not sure if you can see that, actually)

    Thought #4 -Don't Just Monitor BTC

    Regardless of whether you want to predict the price of multiple coins at once, you need to track more than just Bitcoin. Say there is a sudden spike in positive chatter about coin Y. That could be an indication that people are about to sell their BTC in order to buy this coin, subsequently driving the price of BTC down. It works both ways too.

    Problem #5 - When to sell?

    This is cool and all, but it's no use if it doesn't know when to sell. You not only need to be able to identify when it's gonna go up, but when it's at the top. I was making decent money with a PY script that bought whatever coin Mcafee Tweeted about, however, I had to either manually sell or set a % I wanted it to reach. Sure, you make a profit, but you could end up selling at 10% profit only for it to rise another 50%.


    Just some thoughts. Interested to know where this is going to go :)
     
    • Thanks Thanks x 6
  20. elavmunretea

    elavmunretea Jr. VIP Jr. VIP

    Joined:
    May 14, 2016
    Messages:
    2,212
    Likes Received:
    2,973
    Occupation:
    bUsY nOt TiPpInG pEoPlE
    Home Page:
    Another graph that would be interesting to see is the price of a coin overlayed on top of the dates new exchanges were added, as well as when it was announced that they were going to add it.

    Maybe I'll make that one myself manually, as I'm not an expert coder like you :p
     
    • Thanks Thanks x 1